Welcome to the first virtual edition of @internet! Those of you who have followed this column know it used to appear in the Hands On section of Oh, well..these things happen in the publishing industry. Still, it seemed a shame to allow @internet to bite the dust, simply because it wasn't being distributed on dead trees anymore, so I decided to continue to publish it here, on the Internet, where it's always belonged. Besides my work as a consultant and speaker, I'll be dividing my attention between this column, my new Web Technologies column for Internetwork magazine and the feature articles I write on a freelance basis for various computer magazines. I can't yet predict how frequently @internet will appear, but I can promise that there will be more of them. A lot more. Benchley's Law of Distinction states, "There are two kinds of people in the world: those who believe there are two kinds of people and those who don't." Likewise, there are two broad kinds of Web site on the Internet: static sites and dynamic ones. My site is static. It just sits here, loaded with content, and makes you read documents. Other sites are dynamic. They feature HTML forms, Java applets, browser-customized content and so on. Java is a special case of inline content--you write a Java tag in much the same way you'd write an imagemap and the Java code gets downloaded directly to the browser, just as the image in an imagemap is downloaded. HTML forms (and their cousins, browser-customized and other dynamically-created content) are another breed of cat. Unlike image maps and Java applets, they invoke a piece of secondary code which runs on the server itself, rather than in the browser or in a client-side "helper" application. HTML forms, for example, create an HTML-encoded data stream (which is to say that things like whitespace, punctuation marks and the like are encoded as HTML escape sequences and incorporated into what, otherwise, is simply an ASCII text stream) and send it to a Common Gateway Interface (CGI) application, which then performs some operation or operations on it. What happens from there might be anything from nothing (if the conditions to perform further operations aren't met) to passing the CGI's output to yet a third, non-CGI application, to generating a new HTML document and sending it directly to the client's browser. The point is that the output of HTML forms is essentially an unformatted text stream which includes escaped characters. Therefor, most CGI applications are primarily designed to retranslate escaped characters to their original form, to format the resulting ASCII data, to test it for integrity and meaning and (in many cases) to then pass the formatted, tested, validated data through to a database of some sort. That CGIs need to be capable of performing sophisticated text manipulation has led to the widespread adoption of Larry Wall's Practical Extraction and Report Language (Perl) as the standard platform for creating CGIs. Since Perl programs are generally referred to as "scripts", the art of producing CGIs is usually referred to as "CGI scripting". (Note, though, that a CGI need not be a script. It can be, and frequently is, a C or C++ program, a Unix shell script which invokes a program such as sed or awk, or even a BASIC program.) So, what the heck is Perl, anyway? Larry describes it this way: "Perl is an interpreted language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. Expression syntax corresponds quite closely to C expression syntax. Although optimized for scanning text, Perl can also deal with binary data. Setuid Perl scripts are safer than C programs through a dataflow tracing mechanism which prevents many stupid security holes." Because Perl is an interpreted language, it's somewhat slower than C, but it's also easier to learn, more portable (the same scripts will run fairly transparently on different OSes) and, since it doesn't need to be compiled, more easily maintained. It's also free, and, although its native environment is Unix, there are versions of the interpreter which run on DOS, assorted flavors of Microsoft Windows, the Commodore Amiga, Macintosh, MVS, NetWare, VMS, the AS400--in short, pretty much any computer platform for which there are Web servers available. The native (Unix) source code is available from http://www.perl.com/CPAN/src/latest.tar.gz and the non-native interpreters (everything else, including various x86-based Unixes) are available from ftp://ftp.epix.net/pub/languages/perl/ports/. Malcolm Beattie ( mbeattie@sable.ox.ac.uk) has also written a Perl compiler which outputs C code. The third alpha-code release is available from its home at ftp://ftp.ox.ac.uk/pub/perl/, where you'll also find Malcolm's Safe CGI Perl, which traps unsafe CGI operations and exits with a fatal error. Malcolm wrote Safe CGI Perl because it is entirely possible to do extremely unsafe things with Perl scripts on a Web server. The problem is less acute in a native environment, but can be a giant, gaping security hole on a Microsoft Windows-based Web server, especially if the Perl interpreter executable is installed in the CGI-BIN directory. Peter Prymmer (pvhp@lns62.lns.cornell.edu) has written a detailed description of the problem, its scope and what can be done about it at http://w4.lns.cornell.edu/~pvhp/perl/ntperl.html on a page which is heavily hyperlinked to relevant resources. In order to get started with Perl, you'll want to grab a copy of the interpreter for your preferred platform, of course. Before you do, though, you may want to check out Tom Christiansen's (tchrist@perl.com) Perl FAQ at http://www.perl.com/perl/faq/index.html or the Perl for Windows 32 FAQ maintained by Evangelo Prodromou (evangelo@endcontsw.com) at http://www.endcontsw.com/people/evangelo/Perl_for_Win32_FAQ.html. Depending on your interest, you might also want to browse the Idiot's Guide to Solving Perl CGI Problems at http://www.perl.com/perl/faq/idiots-guide.html. Johan Vromans (JVromans@Squirrel.NL) has written a Perl Reference Guide in PostScript which can be retrieved from any member site of the Comprehensive Perl Archive Network (CPAN) in the directory authors/Johan_Vromans. The current version is in perlref-5.001.2.tar.gz, and is bundled with a Perl script which allows it to be reformatted for output in A4, US Letter and various other paper sizes. The Guide includes a concise description of all Perl 5 statements, functions, variables, etc., but it isn't a tutorial, just a quick reference, so you'll need to learn Perl before it will be very useful. Then there are the printed references. Larry, Tom and Randal L. Schwartz have written an online manual, browsable free at ftp://ftp.epix.net/pub/languages/perl/doc/manual/html/perl.html. They're also the authors of the $39.95 so-called "Camel book" (it has a dromedary on the cover), Programming Perl, (2nd Edition copyright 1996 by O'Reilly & Associates, ISBN 1-56592-149-6), the first chapter of which is available online at http://www.ora.com/catalog/pperl2/excerpt/ch01-01.htm. The Camel book is the standard reference work on Perl and every single Perl hacker I know owns a copy of the first edition (still useful for the Perl 4 examples it contains) and wants or has the 2nd edition, as well. Randal is also the author of Learning Perl, another O'Reilly book (copyright 1993, ISBN 1-56592-042-2). You can order it for $24.95 ($29.95 after February 1, 1997) from http://www.ora.com (have a credit card ready and make sure your browser is SSL- or S-HTTP- capable) or you can browse the first chapter at http://www.ora.com/info/perl/lperlch01.html. Jon Orwant ( orwant@orwant.com) edits The Perl Journal (http://www.tpj.com/) which is published quarterly and is a steal at $18 US per year. It features articles for Perl programmers of all levels (the submission guide tells would-be contributors to "assume that readers will have 'minimal Perl fluency,'") and emphasizes readability over code density. There are, of course, Perl-oriented Usenet groups. Among them are comp.lang.perl.announce, comp.lang.perl.misc, comp.lang.perl.modules and comp.lang.perl.tk. comp.infosystems.www.authoring.cgi also has a lot of Perl-related posts. There are also a number of Perl mailing lists. These include perl5-porters@nicoh.com, which, for all practical purposes, is the Perl developers' list, (send a subscribe perl5-porters yourname@youraddress.domain message to majordomo@nicoh.com,) which is cross-subscribed to ntperl@mail.hip.com for WindowsNT-related Perl discussions, (subscribe ntperl yourname@youraddress.domain to Majordomo@mail.hip.com,) perl-packrats@metronet.com, which is the CPAN discussion list, (subscribe perl-packrats yourname@youraddress.domain to perl-packrats-request@metronet.com,) ptk@lists.Stanford.EDU, for folks who work with Tk modules in Perl, (subscribe ptk yourname@youraddress.domain to ptk-request@lists.Stanford.EDU,) CGI-perl@webstorm.com, for developers who write CGI modules for the Perl language, rather than for those writing CGI applications in Perl, (subscribe CGI-perl yourname@youraddress.domain to CGI-perl-request@webstorm.com,) the DBperl announce, users and developers mailing lists, (best subscribed to by pointing your forms-capable Web browser to http://www.fugue.com/dbi/) and the MacPerl mailing list (subscribe to mac-perl-request@iis.ee.ethz.ch). Note that, if you have to have any of the mailing list subscription information further explained, you probably don't yet know enough about the Internet, Unix or Perl to make the effort worth your while. Pick up the Camel book, Learning Perl and the appropriate interpreter and first learn at least the basics. (Copyright© 1997 by Thom Stark--all rights reserved) |