Perl

Perl Index *|* Back to cv *|* to Home Page *|* Perl Next

perl - Practical Extraction and Report Language - a great script/shell ... with lots of power ;=)) [http://www.perl.com/] for the source, or [http://www.activestate.com/Products/ActivePerl/?_x=1] for windows binaries ... using the Microsoft Installer (MSI), gets you set up with perl in a very short time after you have downloaded and run the 'current' MSI file ... in my case was a big, descriptive name -
12/03/2005 12:11 PM 13,195,084 ActivePerl-5.8.6.811-MSWin32-x86-122208.msi
which, when run, and accepting the MS 'unsigned' warning, it will install in a $root, say C:\Perl and establish a CONSOLE path to $root\bin ...

Once installed, in a CONSOLE, write a file, say test1.pl, containing just the line -
print("Hello World!\n");
then, [current-path>] perl test1.pl, will print out "Hello World!", plus a new line, on the CONSOLE window (screen) ... of course, perl can do a little more than that ;=))

Naturally, most *nix perl samples will start with a line like -
#!/usr/bin/perl -w
but, in WIN32, XP this is not 'used', unless you have installed an 'environment' that emulates, or simulates a unix/linux (*nix) work place, or you could have chosen this as your 'root' install choice ...

An agreed power of perl, is its string handling ... particularly using 'regular expressions' to parse up a line ... understanding that 'regular expressions' is a 'language' in itself ... see http://www.regular-expressions.info/ ... and as stated there "... whether that code is written in Perl, PHP, Java, a .NET language or a multitude of other languages...." ... or, as implemented in perl - [http://search.cpan.org/dist/perl/pod/perlre.pod] ...

The examples, $root\eg contains some great, simple code, doing some 'powerful' actions ... how it joins with the Windows Scripting Host  to, for example, get the current ENVIRONMENT variables ... see showenv.wsf ... open notepad, and send it a sentence ... see notepad.wsf ... etc ...

Where can you find an 'Integrated Development Environment' (IDE), that assists with perl and regular expression syntax, and where you can fully, interactively, debug a script, stopping anywhere, and examining variable ... try http://search.cpan.org/dist/perl/pod/perlre.pod ... but, be warned, this is quite a 'rudimentary' colour coded editor, with no 'context-sensitive' help, on perl itself ... and the debugging is 'difficult' ... but these *nix tools are only VERY SLOWLY becoming available in the FREE, OPEN SOURCE community ... they represent lots of 'dedicated' hours of code work ... a big thank you, to the maintainers ...

Of course, Perl can be used in the 'Common Gateway Interface' (CGI) context ... that is, since a CGI program can be written in any language that allows it to be executed on the HTTP (HTTPd) server system, such as C/C++, Fortran, TCL, Any Unix shell, Visual Basic, AppleScript, ... Perl is the choice of some ...

Perl handles command line arguments very easily, in a way ... for example, it can be put in a 'function', or sub-routine, or whatever you call it ... ;=))
parse_arguments(@ARGV);
would pass the commands to such a service ... here is a little program to do that -

Back to cv or to Home Page

#!perl -w

my $program = 'test5';
my $verbose = 0;
my $static_lib = 0;
my $package = 'temptest5';
my @input_files = ();
my $file_lines = 0;

parse_arguments(@ARGV);

die "program: no input files found or specified\n" if ! @input_files;

init_out_file($package);

my $in_file;

foreach $in_file (@input_files) {
do_this_file($package, $in_file);
}

$in_file = $package . '.txt';
print "Done $in_file.\n";

sub parse_arguments {
my @av = @_; # take it off the passed stack
while (@av) {
if ($av[0] eq '--version') {
print 'Version 0.1\n';
} elsif ($av[0] eq '--help' || $av[0] eq '--h' || $av[0] eq '-h' || $av[0] eq '-?') {
die "No help available!\n";
} elsif ($av[0] eq '--verbose' || $av[0] eq '-v') {
print "Setting verbose.\n";
$verbose = 1;
} elsif ($av[0] eq '--package' || $av[0] eq '-p') {
require_argument(@av);
shift @av;
$package = $av[0];
} elsif ($av[0] eq '--lib') {
print 'Setting static.\n';
$static_lib = 1;
} elsif ($av[0] =~ /^-/) {
die "$program: unrecognised option -- `$av[0]'\nTry $program --help for more information.\n";
} else {
print "Storing argument [$av[0]].\n";
push(@input_files, $av[0]);
}
shift @av; # move to next argument
} # while arguments
}

# Ensure argument exists, or die.
sub require_argument
{
my ($arg, @arglist) = @_;
die "$program: no argument given for option \`$arg'\n" if ! @arglist;
}

sub init_out_file
{
my $name = shift;
my $out_name = $name . '.txt';
print "Creating $out_name\n" if $verbose;

open(DSP, ">$out_name")
|| die "Can not create $out_name: $!\n";

print "Writing to $out_name ...\n" if $verbose;

$file_lines++;

print DSP <<"EOF";
# line $file_lines out to File - "$out_name" \r
EOF

print "Closing $out_name.\n" if $verbose;

close(DSP);
}

sub do_this_file {
my ($name,$mfile) = @_;
my $out_name = $name . '.txt';

print "Opening, for append $out_name\n" if $verbose;

open(DSP, ">>$name" . '.txt')
|| die "Can't append to $out_name: $!\n";

print "Writing to $out_name ...\n" if $verbose;

$file_lines++;

print DSP <<"EOF";
line $file_lines: Next file line with $mfile ...\r
EOF

close(DSP);

print "Closing $out_name.\n" if $verbose;

}

Back to cv or to Home Page

A good perl reference is http://www.rexswain.com/perl5.html - it quickly, and succinctly, IMHO, describes just about everything you need to know ... and a link to (book) http://www.squirrel.nl/people/jvromans/perlref.html ... and
  http://www.perldoc.com/perl5.8.0/pod/perlfunc.html ... http://www.perl.com/pub/q/documentation ... etc, etc, etc ... Of course, the best teacher is writing some ... ;=))

A few 'samples' can be found at - http://www.sentex.net/~jmackay/code-samples.html#Perl -...- http://www.dwarfworks.com/demo/index2.html -...- http://sedition.com/perl/code-index.html -...- http://search.cpan.org/modlist/File_Handle_Input_Output -..-. http://search.cpan.org/ -...-

'Regular Expressions' plays an important part of Perl ... this uses a relatively small set of 'special' characters ... characters that have special meaning, like '^' matches the beginning of the target. If you want to find a '^', then is must be 'escaped', with the special 'escape' character, '\', like '\^' = '^', '\\' = '\', etc ... the set of 'specials' is -

+ ? . * ^ $ ( ) [ ] { } | \

The '\' escapes any special meaning of the above, but it also turns most alphanumeric characters into something special ... like '\r' is carriage return (CR), '\n' is line feed (LF), '\t' TAB, ... and other specials, like '\w' matches alphanumeric, including _, \W matches non-alphanumeric ... and there are some special '$' variables, like '$_', being the default input space ... '$/' input record separator ...

The search and replace functions - 
[ EXPR =~ ] [ m ] /PATTERN/ [ g ] [ i ] [ m ] [ o ] [ s ] [ x ]
[ $VAR =~ ] s/PATTERN/REPLACEMENT/ [ e ] [ g ] [ i ] [ m ] [ o ] [ s ] [ x ]
and translate -
[ $VAR =~ ] tr/SEARCHLIST/REPLACEMENTLIST/ [ c ] [ d ] [ s ]

To be able to 'read', and 'understand' perl code, you also need to be able to 'read' regular expressions ... so what does this say -
my $IGNORE_PATTERN = "^##([^#].*)?\$";
and here is it use -
while (<M_FILE> { # process line by line file input
 if (/$IGNORE_PATTERN/o) { # note the [ m ] left out - o interpolates variables only once
# Merely ignore, delete comments beginning with two hashes.
 } elsif (/$WHITE_PATTERN/o) ...

The '^' matches the beginning of the target ($_), then 2 hashes, then start a group, '(', until ')' to match a single element, then start a class, '[', until ']', of character(s) to match to, remembering that '[^...]' negates the class, so in this case it say match 2 hashes, and any third character, except a hash ...  followed by any character, '.', quantified by the zero or more, '*', then end this group, ']' ... then match zero or one time, '?' ... to end '\$' ...

Or this -
my $WHITE_PATTERN = "^[ \t]*\$";

Maybe, something like, match from beginning of line, '^', for the class, '[', space, ' ', and tab, '\t', end class, ']', quantified to zero or more times, '*', until end of line, '\$' ... used in the form - if ( /$WHITE_PATTERN/o ) - will be true if the line is ALL spacey ... ;=))

my $PATH_PATTERN='(\\w|/|\\.)+';
the style is ( ... | ... | ... ), matches one of the alternatives, '\\w', '/', and '\\.' ... like ... search for group, '(', escape '\' the '\', alphanumeric, plus '_', '\w', or '|', forward slash, '/', or '|', back slash, '\\', ... and arbitrary characters, '.', then, close group, ')' ... and this is quantified by match the preceding pattern one or more times, "+'  ... simple huh, ;=))

An example using this type of 'path' pattern would be - # Return directory name of file.
sub dirname { # passed a path, './dir1/dir2/file.name' returns './dir1/dir2/
   my ($file) = @_;
   my ($sub);
   ($sub = $file) =~ s,/+[^/]+$,,g;
   $sub = '.' if $sub eq $file;
    return $sub;
}

Using the $VAR =~ s/PATTERN/REPLACEMENT/g form, where a ',' character is used to separate the s,PATTERN,<nul replacement>,g, so it says, search for '/' character, match one or more times, '+', then a group, '[', not, '^' the '/' character, end group, one or more times, '+', until the end of line, '$' ... and put this match into $sub, if no '/' characters in string, default to $file, the whole thing ... return the relative, '.', if only given a file name ...

To use this service in windows, you will need to make sure ALL DOS path separator, '\', are converted to the unix path separator, '/' ... like the following .
my $src_dir = $base_dir . $relative_dir . '/'; # use *nix forward slash ...
$src_dir =~ s/\\/\//g; # set *nix path separators ...
which says - search for '\', written as escape, '\\', substitute '/', written as, '\/', for all instances, 'g' ... 

or ...
$src_dir =~ s/\//\\/g; # set DOS path separators, for when unsure ...
search for '/', written as escape forward slash, '\/', and replace with '\', written as escape backslash, '\\' ... or use s,\/,\\,g, where g = for all instances ...

Putting perl variables into a PATTERN string also make 'read' a little more difficult ... consider -
 if ($path =~ s/^\$\(top_srcdir\)\///)
note the $top_srcdir has to be suitably 'escaped' ... start of line, '^', find $(top_srcdir)

This site - http://www.regular-expressions.info/quickstart.html - has some help ... it lays it out in a colourful way ... like -

Back to cv or to Home Page

<quote>

Greedy and Lazy Repetition

The repetition operators or quantifiers are greedy. They will expand the match as far as they can, and only give back if they must to satisfy the remainder of the regex. The regex <.+> will match <EM>first</EM> in This is a <EM>first</EM> test.

Place a question mark after the quantifier to make it lazy. <.+?> will match <EM> in the above string.

A better solution is to follow my advice to use the dot sparingly. Use <[^<>]+> to quickly match an HTML tag without regard to attributes. The negated character class is more specific than the dot, which helps the regex engine find matches quickly.

</quote>

The page contains one of my most useful perl applications ... of course, I wrap in in a batch file ... it uses the following 'regular expressions' ...

my $WHITE_PATTERN = "^[ \t]*\$"; # only spacey stuff, like if ( /$WHITE_PATTERN/o ) { ...}
$ff =~ s/\\/\//g; # sub *nix path separators
$sb =~ s/\//\\/g; # set DOS path separators
if ($a =~ /^-/) { ... } # if the line begins with a dash
[$_ =~] s/\t/$tab_stg /g; # substitute TAB characters
[$_ =~] s/"/&quot;/g; # sub double quotes with &quot;
[$_ =~] s/\</&lt;/g; # substitute less than tag beginning
[$_ =~] s/\>/&gt;/g; # and substitute html/xml tag ending
[$_ =~] s/ {$sps}/$nbs/; # replace (N) spaces with '&nbsp; x N
($sub = $file) =~ s,/+[^/]+$,,g; # passed a path, './d1/d2/fn' returns './d1/d2/
and this application does VERY LITTLE ... ;=))

Some others seen ...
$reserved = q(;/?:@&=+$,[]);
$mark = q(-_.!~*'()); #'; emacs
$unreserved = "A-Za-z0-9\Q$mark\E";
$uric = quotemeta($reserved) . $unreserved . "%";
$scheme_re = '[a-zA-Z][a-zA-Z0-9.+\-]*';

foreach (@hash{qw[key1 key2]}) {
s/^\s+//; # trim leading whitespace
s/\s+$//; # trim trailing whitespace
s/(\w+)/\u\L$1/g; # "titlecase" words
}
my $actdir = "C:/GTools";
sub subactdir {
my ($d) = @_; my ($nd);
($nd = $d) =~ s,^$actdir,,; # remove, at beginning, $actdir
if (length($nd) == 0) {
$nd = $actdir;
} else {
$nd =~ s,^/,,; # remove leading '/'
} return $nd;}
my (@mcols) = split( /\|/, $msg); # split message "one|two|..."
foreach my $col (@mcols) {

$oldExt = $ARGV[0];$newExt = $ARGV[1]; # prefix each extension with $extSep if it doesn't already start with it:
$extSep = "."; $oldExt = $extSep . $oldExt if (index($oldExt, $extSep) != 0);
$newExt = $extSep . $newExt if (index($newExt, $extSep) != 0);
# put all files in the current directory in @files:
opendir(THEDIR, ".") || die("Couldn't open current directory\n");
@files = readdir(THEDIR);
closedir(THEDIR);
# construct the $newName from the $oldName then rename $oldName to $newName:
foreach $oldName (@files) {
if ($oldName =~ /(.*)$oldExt$/) {
$newName = $1 . $newExt; system("mv $oldName $newName"); } }
my @tmp = split(/^/m, $list);
$extras = join '', grep /^[^0-9a-fA-F]/, @tmp; # items that do NOT begin with 0-9, a-z, or A-Z ...
local($_) = @_; # make stack passed parameter local ...
s/\s+//g; # strip white space
if (s/^([+-]?)0*(\d+)$/$1$2/) { # test if number
substr($_,$[,0) = '+' unless $1; # Add missing sign
s/^-0/+0/;
$_; # return number
} else {'NaN';} # return NaN!
local(@dirs) = grep(!-l, grep(!/^\.\.?$/, grep(-d, readdir(THISDIR)))); # do all directories, except DOT and DOUBLE-DOT!
$newtext = <READFILE>;
eval "\$newtext =~ s/\$oldstring/\$newstring/$global$ignore;";


Back to cv or to Home Page

Further references: http://www.sentex.net/~jmackay/software.html#PerlUtil
Yahoo search : perl reference documentation package
http://www.rexswain.com/perl5.html - Simple list style help ... no sample code ...
http://search.cpan.org/src/LDS/CGI.pm-3.08/cgi_docs.html ... CGI = creating HTML forms ... and ...
http://search.cpan.org/src/LDS/CGI.pm-3.08/examples/ ... various HTML creation examples ...
http://aspn.activestate.com/ASPN/Perl ... ActivePerl documentation ...
http://aspn.activestate.com/ASPN/docs/VisualPerl/readme.html#installation ... ActivePerl add-in for Visual Studio ...
http://www.activestate.com/store/trial/register.plex?id=PerlDevKit ... Get ActivePerl DK (formerly ASPN)...
http://www.whitefire.com/programs/docs/MtmlParserDoc.html (broken!) ... A file parser ...
http://www.whitefire.com/programs/ (broken!) - MtmlParser-0.8.tar.gz and perltools.tar.gz This contains some useful perl tools ...
http://theoryx5.uwinnipeg.ca/modperl/docs/general/perl_reference/perl_reference.html  ...
http://www.cs.fsu.edu/general/perl/doc/PerlDoc.txt - Big text list ... perl 5.005, patch 02 19/Jul/98 ...
http://www.perl.org/ ... Current Release: 5.8.6, as of April, 2005!
http://blob.perl.org/books/impatient-perl/iperl.html ... says, said ... [Originally, "Pearl" shortened to "Perl" to gain status as a 4-letter word. Now considered an acronym for Practical Extraction and Report Language, as well as Petty Ecclectic Rubbish Lister. The name was invented first. The acronyms followed.]! ;=))
http://perl.apache.org/docs/2.0/user/porting/compat.html ... mod_perl 1 to 2 migration ...
http://www.perl.com/lpt/a/834 ... Book - Apocalypse 12 By Larry Wall April 16, 2004 ...
http://sedition.com/perl/perlview.cgi?file=Class-Prototype.pm&line=code ... A Fishy sample of a simple use base 'class::prototype' and a Fish object ... use Fish; my $fish_obj = Fish->new(); # set favorite; $fish_obj->favorite("Kuhli Loach"); ... etc... see testfish.pl, Fish.pm and class/prototype.pm for details ... in /tmp ...

http://homepages.wmich.edu/~l0lazaro/perld/fileio.html - File IO, and more ... or
http://www.developingwebs.net/perl/file_handling.php - more FILE IO ... and other things ... few samples or
http://cslibrary.stanford.edu/108/EssentialPerl.html#iofiles - more File IO ...
http://asp-pro.com/Perl/File_delete.pro - Delete or rename a file ...

http://www.datamystic.com/easypatterns_reference.html - this site offers another form of Regular Expression searching ... like it uses m/[digit]/ rather than m/[0-9]/ ... replaces the character-based learning curve with a word-based one ... should work for some people ;=)) ... some they call 'Real-world patterns', like [HTMLTag] vs /<[1+ not '>']>/, [HTMLStartTag] vs /<[not '/', 0+ not '>']>/ (i.e. any tag except an end tag), or [HTMLEndTag] vs /</[1+ not '>']>/ ...
http://aspn.activestate.com/ASPN/docs/ActivePython/2.4/python/lib/module-htmllib.html - Specific HTML stuff for ActivePython 2.4 ...
http://virtual.park.uga.edu/humcomp/perl/regex2a.html - A Perl Regular Expression Tutorial ...
http://search.cpan.org/~gaas/HTML-Parser-3.35/Parser.pm - parsing a HTML file ...

email:
"b.g@ms.com" =~ m/^(\w|\.)+@(\w|\.)+$/ ==> TRUE , and
"b.g_emp@ms.com" =~ m/^[\w._-]+@[\w._-]+$/ ==> TRUE, example -
$str = 'blah blah nick@cs.stanford.edu, blah blah balh billg@microsoft.com blah blah';
while ($str =~ /(([\w._-]+)\@([\w._-]+))/) { ## look for an email addr
print "user:$2 host:$3 all:$1\n"; ## parts of the addr
$str = $'; ## set the str to be the "rest" of the string
}

http://www.cs.wcupa.edu/~rkline/perl2php/ - An excellent perl-2-php view ... a great comparison ... check it out ...
http://www.tek-tips.com/viewthread.cfm?qid=760114 - A perl forum ... with an 'interesting' join-us dialog ... and you have to JOIN to do a 'keyword-search' ;=((
http://www.ebb.org/PickingUpPerl/pickingUpPerl.html - So many 'reference' sites ...
http://www.cs.mcgill.ca/~abatko/computers/programming/perl/howto/hash/ - more ...
http://www.perlarchive.com/articles/perl/djm0001.shtml - Say output a hash ...
foreach my $key (keys %hash) { print "$key = $hash{$key}\n"; } or while (($key,$value) = each %hash){ print "$key = $value\n"; }

Perl seems to have a MASSIVE presence, when it come to search engines, like, say, Yahoo! With a simple HOW TO thought, quite often, the answer will be almost immediate. For example, how to get an array length? Using, for example, a query of "perl array length" yields - http://www.unix.com/showthread.php?t=3149 - the answer ... like "Set a variable $a=@testarray, $a will be the number of elements in the array." ... and that was only the first 'find' ... a few down ...

http://michael.mathews.net/perlcircus/site/arrays.html - What a fantastic name 'Perl Circus' ... with lots, and lots of helpful samples ... even things you did not ask, but are good 'tutorial' examples - Create an array from a hash
$hash{"apple"} = "pomme"; $hash{"book"} = "livre"; @arr = %hash; print "@arr"; ... then ..
http://www.cs.mcgill.ca/~abatko/computers/programming/perl/howto/array/ - where it is spelled out, multiple times ;=)) ... Get the size of an array.
Solution 1: If you just want to print the size, this is the simplest way.
print "size of array: " . @array . ".\n";
Solution 2: If you want to retain the size in a variable, just evaluate the array in implicit scalar context.
$size = @array; print "size of array: $size.\n"; 
Explicit scalar context can be achieved by using the function scalar.
$size = scalar @array; print "size of array: $size.\n";
Since the results are the same, I recommend the implicit solution.
Solution 3: There are also a number of other methods of getting the array size.
$size = $#array + 1; ... and ...
http://www.sidhe.org/perldocs/lib/Pod/perlguts.html - has more details again ... and, of the original search, there are some half million its to explore ;=))
http://www-2.cs.cmu.edu/People/rgs/pl-exp-arr.html - with simple code samples ... hash, as an array ...
http://www.cs.mcgill.ca/~abatko/computers/programming/perl/howto/hash/ - hash creation, manipulation ... returning a HASH reference, from a sub ...
http://www.cs.rpi.edu/~hollingd/eiw/notes/PerlArrays/PerlArrays.html  - arrays continue ... EIW Fall 2004 Lecture Notes ...

Back to cv or to Home Page

Perl and HTML

http://www.hypernews.org/HyperNews/get/www/html/converters.html - offer some 'converters' ...

http://www.cclabs.missouri.edu/things/instruction/perl/perlcourse.html (broken!) - A whole, quick course in perl ...

search : perl no strict refs
http://readlist.com/lists/perl.org/beginners/0/146.html - 'hard' and 'soft' function references ... if, say, a function list is auto-generated ...

http://www.perl.com/pub/a/2002/10/01/hashes.html?page=2 - looking up a hash ...
http://www.perl.com/lpt/a/2003/03/07/apocalypse6.html - the future ...

A developing MS Word HTML modifier ... uses the HTML::Parser to load, and modify the HTML output ... stripms ... *WIP ;=))* or a more references ... Some perl code samples - Socket Server - perl-to-html converter p2html8.htm (old, very broken version perl5.htm)

Perl Index *|* Back to cv *|* to Home Page *|* Perl Next