|
WWW::Spyder, spyder-mini-bio
|
Social links
Class::Prototype
WWW::Spyder Javascript tricks serial() join function Smart quotes Text to Excel Developing Featherweight Web Services with JavaScript
Miscellaneous
|
|
| WWW::Spyder, spyder-mini-bio
|
Sample biographical data spyder WWW::Spyder is a generalized Perl module to create simple spiders or web robots. Here is its POD. Here is a discussion and another sample. This is a simple minded (I mean simplistic!) code sample to show how to use a spyder to get biographical data. Code #!/usr/bin/perl use strict; use warnings; #--------------------------------------------------------------------- use WWW::Spyder; # our crawler use URI::Escape; # to properly escape our query for the search engine use Text::Wrap; # to show our results #--------------------------------------------------------------------- # we want a 'firstname' 'lastname' and an optional page count @ARGV == 2 or @ARGV == 3 or usage(); $ARGV[2] =~ /^\d\d?\d?$/ or usage() if $ARGV[2]; my $spyder = WWW::Spyder->new (sleep_base => 21, exit_on => { pages => $ARGV[2] ? $ARGV[2] + 1 : 10 }); # 1st page is search $spyder->terms(@ARGV[0,1]); # to help hit the best pages first my $name = join(' ',@ARGV[0,1]); my $name_rx = qr/\b(?:$ARGV[0]\s+)?$ARGV[1]|$ARGV[0]\b/; $spyder->seed( 'http://www.google.com/search?q=' . uri_escape(qq{"$name"}) ); my @info; while ( my $page = $spyder->crawl ) { print STDERR "Checking-->> ", $page->url, "\n"; # try to extract the info here --------------------------- push @info, grep defined, $page->text =~ m`($name_rx\s+ (?:did|made|sold|was|had|went|is|left|said|became | \w\w\w+ed)\b (?:[A-Z][bcdf-hj-np-tv-xz]{0,4}\. | [^.?!])+[.?!]+['"]?)\s* `xsg } if ( @info ) { s/[\n\r\t]+/ /g, y/ / /s for @info; # clean up spacing print "\n Here's what I found about $name:\n\n"; for my $datum ( @info ) { print wrap(' * ', ' ', $datum), "\n"; } print "\n"; } else { print "\n Sorry, couldn't find much out about $name.\n\n"; } exit 0; #===================================================================== sub usage { my ( $tool ) = $0 =~ m,([^\/]+)$,; die <<MoreFunThanAbarrelOfVikings; ---------------------------------------------------------------------- USAGE: $tool [Proper Name] [# pages to try 1-999] I can only do two word names right now. Give me the name of someone who is at least slightly well-known and I'll try to prepare a mini-bio for you. If you don't give a "# pages to try" I will default to 10. Tip: you can trick me into doing three word names this way, $tool ["First Middle" Last] ---------------------------------------------------------------------- MoreFunThanAbarrelOfVikings } #===================================================================== Usage jinx[96]>spyder-mini-bio Jimmy Page Output Checking-->> http://www.google.com/search?q=%22Jimmy%20Page%22 Checking-->> http://www.led-zeppelin.com/jimmypage.html Checking-->> http://www.geocities.com/SunsetStrip/Limo/9801... Checking-->> http://www.jimmypageonline.com/ Checking-->> http://directory.google.com/Top/Arts/Music/Bands_and_Arti... Checking-->> http://www.j-onishi.com/ Checking-->> http://home.earthlink.net/~juliannwh/ Checking-->> http://www.auburn.edu/~speedhe/jimmy.html Checking-->> http://images.google.com/images?q=%22Jimmy+Page%22&hl=en&... Checking-->> http://www.rollingstone.com/artists/default.asp?oid=4503 Here's what I found about Jimmy Page: * Page became entranced with rock and roll, inspired by Elvis Presley's "Baby, Let's Play House". * Page expressed his excitement: "I was lucky enough to play backing guitar (on "Diamonds"), the ex-Shadows Jet Harris & Tony Meehan. * Page became one of the hottest session guitarists in the U.K., turning down an offer to join the Yardbirds after Clapton left. * Page was this soundtrack album in 1982. * Jimmy Page became rock's premiere guitarist, while he created the heavy metal genre of music in the band Led Zeppelin. * Page formed his dream band, Led Zeppelin, he had quite a bit to offer in the way of guitar expertise. * Jimmy Page was born on January 9, 1944 in Heston, England. * Page joined the Yardbirds, originally playing bass, then doubling on lead guitar with Jeff Beck, and ending up as solo guitarist. * Page is a Crowley fanatic, it is not necessarily true that he is an atheist and it is rather unfair to say that his association with Crowley paraphernalia caused the deaths of Robert Plant's first son and John Bonham. * Page joined Plant in a band called the Honeydrippers. * Page teamed up with Bad Company's Paul Rodgers in The Firm. |
|
|
Perl Books ·
CPAN ·
mod_perl ·
Perl Monks ·
Perl Mongers ·
Perl Journal ·
Use Perl ·
Perl Jobs ·
ActiveState ·
perldoc.perl.org ·
O’Reilly Perl ·
W3Schools tutorials ·
Ovid's CGI Course ·
Catalyst ·
Perl at Wikipedia
Text, original code, fonts, and graphics ©1990-2008 Ashley Pond V. |
||