NB: These pages were mostly written in 2001 or so. The résumé dates are accurate but the code is aged and unlike whiskey, 8 year-old code doesn't usually taste better. For a look at my current skills and to see my CPAN modules, sample code, and code discussions, please see these pages instead: Perl resources and sample code and PangyreSoft.
WWW::Spyder, for simple, easy web crawling
Social links
View Ashley Pond V's profile on LinkedIn
Miscellaneous

Other pages

WWW::Spyder examples

WWW::Spyder is a generalized Perl module to create simple spiders or web robots. To view the POD in a nice format please see this page.

This is the first module I have released through the standard Perl distribution channel, CPAN, the Comprehensive Perl Archive Network. It’s still in development but is already quite useful.

Discussion

Spiders and robots are programs that browse the web automatically, usually for gathering and indexing links or other information.

XML and its grandparent SGML are attempts to instill meaningful order into information. With them, single documents become leaves of databases. A collection of pages can be displayed as HTML easily through conversion or used for indexed searching or even generating entirely new documents.

The Internet has always been full of data, just never with any real meta-organization. You can think of the Internet itself as the single most important database in existence, but without it all being in a formatted language like XML or some other rigid scheme, it’s not a valuable database. Information without order, indices and strong categorization, reduces quickly to noise.

The real value of the Internet is found in its surfeit of plain text, no offense to the porn industry. The one arena where no one debates the supremacy of Perl is text parsing and manipulating. Therefore, it’s no real stretch to set some Perl loose on the Internet, with the right instructions, and find the value in that great unkeyed DB.

So let’s do something really valuable with the WWW! Let’s find a celebrity’s birthday. We’ll pick Jimmy Page to dull the irony somewhat. We are using simple regexes to check for birthdays. Much better ones could be crafted for serious applications.

Code
Usage
jinx[96]>spyder-birthday Jimmy Page
Output
Check-->> http://www.google.com/search?q=%22Jimmy%20Page%22
Check-->> http://www.led-zeppelin.com/
Check-->>
   http://directory.google.com/Top/Arts/Music/Bands_and...
Check-->> http://home.earthlink.net/~juliannwh/

Jimmy Page's birthday seems to be: January 9, 1944

Search these pages via Google
Text, original code, fonts, and graphics ©1990-2009 Ashley Pond V.