WWW-Mechanize-1.73/000755 000767 000024 00000000000 12206033602 014277 5ustar00etherstaff000000 000000 WWW-Mechanize-1.73/bin/000755 000767 000024 00000000000 12206033601 015046 5ustar00etherstaff000000 000000 WWW-Mechanize-1.73/Changes000644 000767 000024 00000140124 12206032511 015572 0ustar00etherstaff000000 000000 Revision history for WWW::Mechanize Mech now has its own mailing list at Google Groups: http://groups.google.com/group/www-mechanize-users 1.73 2013-08-24 ======================================== [TESTS] - Update t/local/back.t to use LocalServer for 404 checking to avoid fails on win32. Fix by Matt S Trout, patient diagnostics and testing provided by jayefuu of freenode #perl - Blow away more proxy env vars in LocalServer, and do it on load so that the LWP env checking doesn't happen before we've done it. [OTHER CHANGES] - Better error when passing only one parameter to follow_link 1.72 Thu Feb 2 18:37:28 EST 2012 ======================================== [DEPENDENCIES] Bumped the HTML::Form dependency to fix failures on CentOS 5 1.71 Tue Nov 14 13:50:41 EDT 2011 ======================================== [ENHANCEMENTS] Recognise application/xhtml+xml as HTML. [DOCUMENTATION] Improved docs about support of JavaScript Typo fixes. [TESTS] Updated tests as oops-music.com is in utf-8 now 1.70 Fri Aug 26 13:46:30 EDT 2011 ======================================== [ENHANCEMENTS] Mech now defaults to _not_ running live tests by default. You can still enable them by running "perl Makefile.PL --live" Thanks to RJBS for the suggestion 1.69_01 ======================================== [INTERNALS] The test suite for the local tests was updated 1.68 Fri Apr 22 01:10:40 EST 2011 ======================================== No changes from 1.67_01 1.67_01 ======================================== [ANNOUNCE] As of this release, Jesse Vincent has taken over maintenance of WWW-Mechanize. The project's repository can be found at: https://github.com/bestpractical/www-mechanize [FIXED] Added prereq for HTML::TreeBuilder. 1.66 Fri Sep 10 16:25:44 CDT 2010 ======================================== [FIXED] Fixed prerequisites on HTTP::Server::Simple on Windows. DNS checks in t/autocheck.t and t/local/failure.t improved. Thanks, Schwern. [ENHANCEMENTS] New $mech->text method returns the text from your HTML page. The exact rendering of this text is simply removing all the HTML tags, but this will change. It's pretty ugly. If anyone wants to work on a better-looking text dump, I'd love to see it. Added mech-dump --text. [DOCUMENTATION] Improvements to the docs explaining explicitly about the subclassed methods we inherit from LWP::UserAgent. Thanks, Lyle Hopkins! 1.64 Thu Jul 1 10:41:00 CDT 2010 ======================================== [THINGS THAT MAY BREAK YOUR CODE] If you've been accessing $mech->{forms} or $mech->{form} values directly, instead of going through the $mech->forms or $mech->current_form accessors, respectively, then this version of Mech will break your code. [ENHANCEMENTS] Parsing of forms has been delayed until they're actually needed. If don't use forms on a page, you'll no longer waste time and memory parsing them. $mech->title now caches the title of the page after parsing the page to find it. mech-dump now takes a --cookie-file parameter for keeping cookies between calls. Thanks, Damien Clark. [DOCUMENTATION] Typo fixes. 1.62 Sat Apr 10 23:10:07 CDT 2010 ======================================== [FIXED] Fixed a declaration in the Movable Type example in WWW::Mechanize::Examples. Quiet warnings if %ENV has undef values. $mech->follow_link() no longer dies with an inappropriate error if the link is not found. $mech->click_button() now checks to see if a form is selected. [INCOMPATIBILITIES] $mech->form_name() and $mech->form_number() no longer throw warnings if they can't find the form specified. They still return undef, though. [DOCUMENTATION] More additions to the FAQ. 1.60 Mon Aug 17 00:41:39 CDT 2009 ======================================== No new features. Exists only to skip tests that always fail on Windows. Fixed up some minor documentation problems. 1.58 Mon Jul 13 22:32:23 CDT 2009 ======================================== No new features. If you have 1.56 installed OK, you do NOT need to install 1.58. [FIXES] Removed prereq of HTTP::Response::Encoding, even though it was never used. Thanks for the catch, Gisle. 1.56 Thu Jul 9 00:36:54 CDT 2009 ======================================== [THINGS THAT MAY BREAK YOUR CODE] For a while, Mech used HTTP::Response::Encoding to try to suss out the proper encoding of the page it receives. Now, it lets LWP::UserAgent do the work, and no longer requires HTTP::Response::Encoding. [ENHANCEMENTS] Added a new dump_headers() method to dump the HTTP response headers. Added --headers flag to mech-dump to dump the HTTP response headers. [FIXES] Now requires LWP version 5.829 because HTTP::Response has memory cycle bugs. [DOCUMENTATION] Added a few notes to the FAQ, and fixed some incorrect docs. 1.55_01 Mon Jul 6 12:17:10 CDT 2009 ======================================== This is mostly a bug fix release. There will be a number of other bug fix releases in the next few days. [FIXED] New test server now randomizes the port it runs on. t/cookies.t should not hang on Windows any more. META.yml has been updated so the search.cpan.org links should be correct. Passing no_proxy would make LWP::UserAgent barf. Thanks to Mike Schilli for the fix. Cookies test would fail under Windows. Fixed, thanks to many people reporting it. [ENHANCEMENTS] $mech->submit_form() now can specify the form by ID using the form_id parameter. [DOCUMENTATION] The docs used to say that ->stack_depth(0) was an infinite stack size. This is wrong. Zero will tell Mech not to keep any history. 1.54 Mon Jan 12 00:36:08 CST 2009 ======================================== [FIXED] Removed the computers4sure test that was failing. 1.52 Tue Nov 25 09:52:30 CST 2008 ======================================== [FIXED] Improved some error messages in $mech->submit_form(). Thanks to Norbert Buchmuller. 1.51_03 Thu Nov 20 11:05:49 CST 2008 ======================================== [FIXED] The $mech->clone() method was not passing the cookie jar to its clone properly. Thanks to David Sainty. The $mech->back() can fail if there's nothing on the stack to go back to. Thanks to Dave Page. $mech->follow_link() did not complain if a link could not be found, even with autocheck on. Now it does. Thanks, Flavio Poletti. [ENHANCEMENTS] Added a $mech->form_id() method so you can look up forms by ID. Added $mech->content_type(), because $mech->ct() is too cryptic. 1.51_02 Tue Nov 18 01:30:54 CST 2008 ======================================== [STILL BROKEN] t/local/click_button.t is still failing its tests for calling ->click on an HTML::Form object. I suspect this is an LWP change, but I haven't dug into it enough yet. [FIXES] Fixed the bad credentials API that stomped on LWP::UserAgent's credentials() method. Thanks to Max Maschien and Matt Lawrence. The $mech->links method now finds links. Thanks to H.Merijn Brand. Makefile.PL explicitly requires Perl 5.8.0. URI.pm has to be version 1.36 or else URIs don't get encoded correctly. LWP has to be 5.819 or we have encoding problems. 1.51_01 Thu Nov 6 15:13:03 CST 2008 ======================================== [FIXES] Page history is now working much better. The $mech->back() method should behave more like a browser now. Most notably, it no longer restores the cookie state, just like your browser doesn't restore cookie state when you page back. It also should use much less memory. 1.50 Sun Sun Oct 26 22:42:46 CDT 2008 ======================================== [THINGS THAT MAY BREAK YOUR CODE] WWW::Mechanize now requires version 5.815 of LWP. This in itself may cause problems for you because of changes in how LWP does authentication. 1.49_01 Sat Sep 27 23:50:04 CDT 2008 ======================================== [THINGS THAT MAY BREAK YOUR CODE] The autocheck argument to the constructor is now ON by default, unless WWW::Mechanize is being subclassed. There are so many new programmers whose ->get() calls fail unchecked that I'm now putting on the seat belts for them. [FIXES] I do believe we are on the way to having all the encoding problems ironed out. This version incorporates a patch from here: http://code.google.com/p/www-mechanize/issues/detail?id=61 and tests from Miyagawa's WWW::Mechanize::DecodedContent http://search.cpan.org/dist/WWW-Mechanize-DecodedContent/ to finally fix this. [ENHANCEMENTS] You can now specify not to set up the proxy, if there is one. The proxy causes problems for Crypt::SSLeay. For details see: http://code.google.com/p/www-mechanize/issues/detail?id=39 [DOCUMENTATION] Fixed internal links. [INTERNALS] Lots of refactoring based on Schwern's "Skimmable Code" talk. http://use.perl.org/~schwern/journal/36704 http://schwern.org/~schwern/talks/Skimmable%20Code%20-%20YAPC-NA-2008.pdf 1.34 Mon Dec 10 00:30:39 CST 2007 ======================================== [FIXES] Many fixes to make the test suite more portable. 1.32 Tue Oct 30 12:02:17 CDT 2007 ======================================== [ENHANCEMENTS] Added dump methods to mirror mech-dump: * $mech->dump_images() * $mech->dump_links() * $mech->dump_forms() * $mech->dump_all() Sanity checks in the WWW::Mechanize::Image constructor. Every Image must have a "url" and "tag" field passed in to it. 1.31_02 Thu Oct 25 11:48:29 CDT 2007 ======================================== [ENHANCEMENTS] Added class, class_regex, id and id_regex limiters to find_link() and find_all_links(). Thanks to Adriano Ferreira. 1.31_01 Mon Sep 17 23:38:03 CDT 2007 ======================================== [FIXES] Mech tests now pass even if your DNS server gives A records for anything (like OpenDNS). Thanks, Miyagawa! Searching for the is now case-inensitive. A better solution would be to actually parse the HTML. [ENHANCEMENTS] mech-dump now handles --user and --password arguments for sites that require authentication. 1.30 Thu May 24 21:31:10 CDT 2007 ======================================== [DOCUMENTATION] Minor doc fixes. Thanks David Steinbrunner. 1.29_01 Tue May 22 14:02:55 CDT 2007 ======================================== Kevin Falcone and I ask for your assistance in figuring out how to handle the warnings thrown by the tests, other than hiding them. [FIXES] Overhauled how tainting was done. Stole code directly from Test::Taint. Have LWP only handle decoding of Content-Encoding, not charset. [DOCUMENTATION] Fixed the docs for $mech->submit_form()'s with_fields arg. Thanks, Peteris Krumins. 1.26 Wed May 16 14:21:29 CDT 2007 ======================================== [FIXES] Re-reversed the content decoding. This is critical for reading from sites with gzip on the fly, like Wikipedia. Content is now properly tainted. [ENHANCEMENTS] mech-dump can now pass --agent and --agent-alias flags so you can fetch from sites like Wikipedia that block LWP user agents. [INSTALLATION] The mech-dump program is now always installed. It no longer is presented as an option. 1.24 Fri May 11 15:57:56 CDT 2007 ======================================== NOTE: Version 1.24 will NOT automatically decode gzipped content for you any more. Consider it a "do not use" release. [FIXES] * Fixed failures in "make test" with some versions of HTTP::Server::Simple * RT #26593: Improved handling of charsets. Thanks Kevin Falcone. * RT #24354: find_link now handles http-equivs with quoted URLs. * Reverses the change in 1.21_01 where it decodes the content. [ENHANCEMENTS] * Added find_all_inputs() and find_all_submits() methods. Thanks, Mike O'Regan. * Test::LongString is no longer needed, so has been removed as a requirement. [TESTS] * Added a test for save_content() 1.22 Fri Mar 2 00:05:57 CST 2007 ======================================== [INTERNALS] Added new tests. Added Perl::Critic changes and a perlcriticrc file. 1.21_04 Sat Oct 7 21:35:42 CDT 2006 ======================================== [FIXES] * $mech->content( type => 'text' ) was not freeing memory. Thanks to Cat Okita for finding it. [INTERNALS] * Made the order of parms to $mech->content() not relevant. 1.21_03 Sat Oct 7 01:21:46 CDT 2006 ======================================== [THINGS THAT MAY BREAK YOUR CODE] * The methods $mech->form() and $mech->follow() have been removed. They've been deprecated since 1.10, which was released in Feb 2005. [ENHANCEMENTS] * I'm trying to nail down what seems to be a memory leak on long-running Mech programs. I'm stringifying URI::URL objects wherever I can. [INTERNALS] * No longer uses UNIVERSAL. 1.21_02 Wed Oct 4 13:14:30 CDT 2006 ======================================== [ENHANCEMENTS THAT MAY BREAK YOUR CODE] * The $mech->stack_depth() setting had no way to say "don't cache any pages at all". How silly! Now, if you set $mech->stack_depth(0), no history of pages will be kept. In the past, it would mean "Keep all pages." This means that if you want to set it to keep all pages, set it to some ridiculously large number. [DOCUMENTATION] * The docs previously refered to Compress::Gzip instead of Compress::Zlib. 1.21_01 Mon Sep 18 17:18:43 CDT 2006 ======================================== [ENHANCEMENTS] * If Compress::Zlib is installed, gzipped content is now accepted and transparently decoded. No additional syntax needed! This should save time and bandwidth in a number of cases. (Mark Stosberg) * Added a put() method. It also calls a subfunction called _SUPER_put that will be removed once LWP::UserAgent supports put(). 1.20 Sat Aug 19 09:09:08 EDT 2006 [ENHANCEMENTS] * Added new two-argument form of credentials() method. $mech->credentials($username, $password); That provides simpler visiting of password-protected resources in the vast majority of cases and still allows the other cases to be supported. (Peter Scott) [BUG FIXES] * autocheck no longer is triggered when informational responses are returned. (Mark Stosberg) [INTERNALS] * test suite no longer fails when Test::Warn is missing. (CPAN testers, Mark Stosberg) * Removed all the testing against live sites. The networking code is not actually in Mech anway, and they were prone to breaking, as the live sites changed. (Mark Stosberg) 1.19_02 Mon Aug 7 23:57:56 CDT 2006 [ENHANCEMENTS] * Add new Do-What-I-Mean submit_form() option. $mech->submit_form( with_fields => \%data ); That expresses that you want to select the first form contains all fields in \%data, and then submit the data to that form. See the docs for form_with_fields() and submit_form() for details. (Mark Stosberg, inspired by RT#6100) [BUG FIXES] * The behavior of clone() now copies over the cookie jar, which is probably what you expected it did in the first place. This fixes bug RT#13541 filed against Test::WWW::Mechanize, which was using clone() internally. (Mark Stosberg) * The correct URL is returned after redirecting. This a regression from 1.04 and was reported as RT#9059, RT#12882, and RT#12786. The documentation about this has also been clarified that we return a URI object, but that it stringifies to the URI itself. [DOCUMENTATION] * Fixed a misleading parm in the constructor. * Document the return value of set_visible (RT#6071, MJD, Mark Stosberg) * Document that form_name and form_number return an HTML::Form object (Mark Stosberg) [INTERNALS] * Made lots of little cleanups based on Perl::Critic * Fix Taint-mode warnings with Perl 5.6.1 (RT#16945) 1.18 Thu Feb 2 00:11:26 CST 2006 [TESTS] * Makefile.PL now takes four new parms: * --live/nolive turns on/off the live tests * --local/nolocal turns on/off the local tests * --mech-dump/nomech-dump installs/doesn't the mech-dump program * --all turns on all tests and installs mech-dump * Fixed some failures in tests. Non-existent URLs now have a "." postpended to them, so if someone's got a search domain with a wildcard (i.e. ignore.us) it'll ignore that. Also, Google's second link is now a https:// link, which some Mechs can't handle. Added a 'url_regex' which now makes it look at the second non-https link. Thanks to Pete Krawczyk. 1.16 Fri Oct 28 17:34:20 CDT 2005 [ENHANCEMENTS] * Sped up Mech significantly (~20% in some cases). Images and links are extracted from the HTML, and objects are created, only when they're actually needed. This will be a speedup for pages where you're only following links, or vice versa. [THINGS THAT MAY BREAK YOUR CODE] * If you've been relying on the $mech->{images} and $mech->{links} fields being populated so that you can bypass the $mech->images() and $mech->links() accessors, your code will break. That's OK, because you should have been using the accessors all along. 1.14 Tue Aug 30 17:17:40 CDT 2005 [DOCUMENTATION] * Added lots of new FAQs. Thanks to Peter Stevens. [INTERNALS] * Now requires Test::LongString. That's not too odious. [FIXES] * Tests now pass with the shuffling around that Google did. 1.13_01 Tue Apr 12 14:11:18 CDT 2005 [ENHANCEMENTS] * Now dies if you call submit_form() with a non-existing form_number or form_name. Before, it would just warn. [DOCUMENTATION] * Added an example of using credentials() in the cookbook. 1.12 Thu Feb 24 23:38:44 CST 2005 [FIXES] * Fixed RT #9026: hang in t/local/back.t under Windows XP. Thanks Andrew Savige. It also should no longer complain about being unable to clean up a temp file. 1.11_01 Mon Feb 14 00:12:48 CST 2005 [THINGS THAT MAY BREAK YOUR CODE] * Removed deprecated _parse_html() method. [FIXES] * Was incorrectly looking for INPUT tags TYPE="SUBMIT" as images. Thanks to Abe Timmerman. [ENHANCEMENTS] * Calling $mech->set_fields() with no current form now dies. Thanks to Julien Beasley. 1.10 Tue Jan 31 11:30pm-ish [FIXES] * Fixed bug where images inside of links would not be found. * Fixed test failures because of Google changes. Thanks to Offer Kaye and others who sent in patches. [DOCUMENTATION] * More samples in the FAQ. Thanks to Joshua Gatcomb. [INTERNALS] * Added explanation of running live tests against Google in Makefile.PL. 1.08 Fri Dec 24 01:01:06 CST 2004 [ENHANCEMENTS] * Added find_image() and find_all_images(). 1.06 Wed Dec 8 14:58:39 CST 2004 [INTERNALS] * Now uses the base pragma instead of setting @ISA. 1.05_04 Fri Nov 5 23:35:38 CST 2004 [ENHANCEMENTS] * Added WWW::Mechanize::Image object for representing images. * Improved the regex on the URL for META tags. * Added --images flag to mech-dump. [FIXES] * When parsing urls out of meta refresh tags, "url" may now be uppercase (RT#8230) * Behavior of back() fixed in a number of cases (RT#8109 reported by Josh Purinton, patched by Dominique Quatravaux) [INTERNALS] * Mark figured out to how to prevent his text editor from putting tabs into the code. Andy's blood pressure dropped slightly. 1.05_03 Sun Oct 31 20:54:33 CST 2004 [ENHANCEMENTS] * click_button() has a new input option for HTML::Form::SubmitInput objects (DOMQ) * content() has new options to return the page formatted as text, with a added. (RT#8087, patch by Dominique Quatravaux) * update_html() method has been added, which can be used to modify the HTML that Mech parses. It should be sub-classed instead of _parse_html(), which has been deprecated. (RT#8087, patch by Dominique Quatravaux) * select() has new option to select an option by number (RT#5789, Scott Lanning) * WWW::Mechanize::Link now has support providing all the attributes of the link through a new attrs() method, which returns them as a hashref. This is a replacement for the alt() method, added in 1.05_01. It's not backwards compatible with that, but, hey, that's what developer releases are for. (RT#8092, Rob Casey and Mark Stosberg) [FIXES] * Upload does not use the default value to prevent attacks, patch by Jan Pazdziora (RT #7843). [INTERNALS] * Improved tests and documentation for select() (RT#5789, Scott Lanning) * Improve taint-safeness on Perl 5.6.1 (RT#8042, patch by Dominique Quatravaux) * Added tests for click_button() (RT#8061, by Dominique Quatravaux) * Require URI 1.25, fixing bug which exposed itself in WWW::Mechanize (RT#3048) * Move select() to better location in docs. Document and test the return values. The return value is now "1" on success instead of the undocumented behavior of returning a form value. (RT#6138, spotted by MJD, patched by Mark Stosberg) * Possible matching tags for the find_link() 'tag_regex' attribute are now documented. (RT#2989, by Mark Stosberg) * refactored find_link() to avoid use of eval(). This should improve performance a bit and avoid potential security issues. (Mark Stosberg) 1.05_02 Sat Oct 2 16:55:59 CDT 2004 [ENHANCEMENTS] * Added the $mech->save_content( $filename ) function, so you can dump stuff to files easily. 1.05_01 Thu Sep 30 21:04:44 CDT 2004 [FIXES] * set_visible() doesn't stop setting values when it finds a zero. [ENHANCEMENTS] * WWW::Mechanize::Link has a new, easier to remember constructor interface. The old one is still supported. Support for including an 'alt' attribute was added, which is useful for links. (RT #3317). Thanks to Mark Stosberg. * When links are extracted from tags, the ALT attribute will be captured and become part of the WWW::Mechanize::Link object. (RT #3317). Patch by Mark Stosberg. [INTERNALS] * t/mech-dump.t is now more portable (RT #7690) * t/local/follow.t has new tests to confirm that 'follow*' functions work with characters like o-umlaut, even when the o-umlaut is encoded in the HTML, but not in the call to follow(). (RT #2416) By Mark Stosberg. 1.04 Wed Sep 15 23:27:53 CDT 2004 [ENHANCEMENTS] * $mech->get() now accepts a WWW::Mechanize::Link object. * $mech->stack_depth(n) lets you set the depth of the mech object's page stack. This way, if you have a Mech that does lots of stuff and never/rarely goes back(), you won't be eating up memory. Thanks to BooK and Chi-Fung. (RT #5362) [FIXES] * Fixed tests that fail under LWP >= 5.800. * Added a workaround for LWP::UserAgent->clone() when ->{proxy} is undef. (RT #6443) * The Referer was getting passed as a URI object sometimes, and that caused sadness. Eugene Haimov supplied a workaround. (RT #6372) [DOCUMENTATION] * Added Ian Langworth's listmod and John Beppu's photobucket uploader programs to WWW::Mechanize::Examples. * Minor doc tweak for find_link() * Finally added a value() func. Thanks to Spoon, who even now, months after his passing, is still contributing to Mechanize. 1.02 Tue Apr 13 22:45:10 CDT 2004 No reason to install if you have 1.00. Fixes are only in tests. [FIXES] * t/referer.t didn't cope with spaces in $FindBin::Bin. Plus, it now forces its URL to localhost. 1.00 Sat Apr 10 00:35:51 CDT 2004 I figure it's about time we hit 1.00, and this version seems like a good place to do it, because of the potential breakage described below... [THINGS THAT WILL BREAK YOUR CODE] * Header handling has changed. There is no more package variable %headers that holds all the headers to be added. They are now added on a per-object basis. If you were adding a header with add_header(), and the code relied on that header still being set later on in a later instance of the class, that code will now break, because the later instance won't have the header set. [ENHANCEMENTS] * You can now prevent a header from being sent by adding it with an undef value, as in: $mech->add_header( Referer => undef ); [FIXES] * Now correctly adds Accept-Encoding to all requests that need it. [INTERNALS] * Added new $mech->_modify_request($req) method to do all the HTTP header modification before the actual request gets sent off. Subclasses are able to override it if they want. * Removed the unused Compress::Zlib stuff. 0.76 Wed Apr 7 22:01:43 CDT 2004 [ENHANCEMENTS] * Added update_html() to let you update the HTML for the page you're on. [FIXES] * Test files account for new Google layout. [INTERNALS] * Rearranged the local tests into their own t/local/ directory. * Made the standalone tests show what server they're hitting. * Checked that it runs under LWP 5.78. 0.74 Mon Mar 22 23:36:46 CST 2004 [ENHANCEMENTS] * WWW::Mechanize now sends an Accept-Encoding header of "identity" to always enforce plaintext responses. Preliminary support for Compress::Zlib is also there, but is disabled by default. * Added click_button() and select() methods. The field() method can now take an arrayref of values, if appropriate. Thanks, Linda Lee Julien. * Added url_abs and url_abs_regex parms to find_all_links(). * URLs in META REFRESH tags are now treated as links. * t/taint.t makes sure that things that should be tainted are. [FIXES] * Still more fixes if the machine you're on doesn't have DNS pointing to it. * The local changes use localhost as the local host name, instead of whatever host name that might be on the box, but not in DNS. Thanks to David Wheeler for letting me play on his box. * The http_proxy and HTTP_PROXY environment variables get deleted during the tests that access the dummy local server. This should let your tests pass, and clear up a lot of RT tickets. 0.72 Mon Jan 26 21:07:20 CST 2004 [ENHANCEMENTS] * Added the set_visible() method, thanks to Peter Scott. [DOCUMENTATION] * Started the Cookbook at WWW::Mechanize::Cookbook.pod. [INTERNALS] * Made the globbing in Makefile.PL a little less command-line intensive. Also fixed the missing files in MANIFEST. * Added t/pod-coverage.t for testing POD coverage. 0.71_02 Mon Dec 22 14:29:13 CST 2003 [THINGS THAT MAY BREAK YOUR CODE] * Added a 5th, optional parameter to WWW::Mechanize::Link's constructor. In 0.71_01, it was at the beginning of the argument list and was required. Now it's at the end and is optional. If, in the 15 hours since 0.71_01 came out, you went and changed all your WWW::Mechanize::Link constructors, you'll have to change them around again. Otherwise, you can just ignore this change. 0.71_01 Sun Dec 21 23:48:12 CST 2003 [THINGS THAT MAY BREAK YOUR CODE] * WWW::Mechanize::Link's constructor has a new argument that needs to be passed in, at the start of the argument list. [ENHANCEMENTS] * WWW::Mechanize::Link object now takes a $base URL, and will return absolute URLs with the url_abs() method. Thanks to Ashley Pond. * Added another script to WWW::Mechanize::Examples. It's a script that didn't make it into Spidering Hacks. [INSTALL & TESTS] * Heavy use of the new Test::Memory::Cycle module. * Fixed Makefile.PL so that the tests are selected under Win32. * Changed t/mech-dump.t so that the test succeeds under Win32. * Updated t/referer.t and t/mech-dump.t so they run under VMS. Thanks to Peter Prymmer. 0.70 Sun Nov 30 23:45:27 CST 2003 [THINGS THAT MAY BREAK YOUR CODE] * Redirects are now handled better by LWP, so the code that changes POSTs to GETs on redirects has been removed. [FIXES] * Fixed redirect_ok(), which had its API changed out from under it in LWP 5.76. [ENHANCEMENTS] * New warnings in find_link() for strings that are space padded, and for text matches that are passed a regex. Thanks to Jim Cromie. [DOCUMENTATION] * Patches from Mark Stosberg and Jim Cromie. [INTERNALS] * Removed all the checking for Carp. I don't know why I was thinking that Carp wasn't core. RT #4523. Also, a big bump in requirements on LWP: We need 5.76. 0.66 Thu Nov 13 14:35:31 CST 2003 No new functionality. Fixed up some install bugs and made a few documentation tweaks, mostly to plug Spidering Hacks. 0.65 Mon Nov 10 00:11:06 CST 2003 [ENHANCEMENTS] * Made a _parse_html() method that you can override or call manually, per request from Gavin Estey. [FIXES] * Made some path naming use File::Spec->catfile so that they work correctly under Windows. * "make clean" cleans up temp flag files. [INTERNALS] * Uses the new Test::Pod 1.00 for simplicity. 0.64 October 23, 2003 11:15pm [ENHANCEMENTS] * Many new tests, based on the excellent coverage reporting created by Paul Johnson's Devel::Cover module. * The start of JavaScript support, sort of! If you have an tag that does an onClick that opens a window, Mech will find the URL from that and make that be the link for the tag. This is for things like Movable Type that pop little windows to rebuild indexes. This is subject to change in the future. I don't know if it will, but I'm not making promises. It might be so buggy I just yank the whole thing. * Big jump in requirements, since we'll soon be using Gisle's new HTML::Form stuff. Also, older versions of HTML::Form don't give output I'm expecting. [FIXES] * Fixed the t/mech-dump.t failure. 0.63 October 13, 2003 2:56pm [ENHANCEMENTS] * mech-dump defaults to dumping forms. * Added name, name_regex, tag and tag_regex options to find_link() and follow_link(). * Added tests from Jim Brandt. 0.62 October 7, 2003 8:46pm [THINGS THAT MIGHT BREAK YOUR CODE] * The parms for find_link()'s url_regex and text_regex must now be actual regex objects, as in qr// objects. They can't just be little text strings. If this is a big bummer, let me know. [ENHANCEMENTS] * Added autocheck parm, to tell your Mech object to die on any error. This saves you from having to check yourself. This closes RT #3056. * Renamed the internal _carp() method as warn(). * Added a die() method. * Can now override the warn() and die() handlers in the constructor. * find_link() now complains if it gets a *_regex parm that isn't actually a regex. See RT #3032. [FIXES] * mech-dump.t no longer runs if you're not installing mech-dump. See RT #3724. [DOCUMENTATION] * More FAQs. Thanks to Gavin Estey. 0.61 October 6, 2003 6:30pm No new functionality here. It's mostly to get the new tests into the pipeline so the CPAN testers can run 'em. [FIXES] * Missing dependency on File::Temp. Thanks, Ask. [ENHANCEMENTS] * Added the test case for the form processing problem as a .t file, since I spent so long getting it down to a simple case. * Internal code uses accessors instead of direct hash entries. Prepare for deprecation of existing hash entries! [DOCUMENTATION] * The FAQ is now its own document at WWW::Mechanize::FAQ. 0.60 September 22, 2003 10:00pm [FIXES] * Changed how t/failure.t tries to fail. It used to hit a bogus hostname in .com, but with Verisign doing its SiteFinder crap, even bogus addresses in .com succeed. [ENHANCEMENTS] * Added _make_request() to let WWW::Mechanize::Cached easily hook into the request chain. 0.59 September 3, 2003 11:56pm [FIXES] * Squelched a warning in follow() where it tries to do a regex match against an undef value. * The page stack functionality, including the back() button, was entirely broken. Now it works. Thanks to the mighty Iain Truskett for help. [ENHANCEMENTS] * Added the mech-dump script, which replaces mech-forms. It will dump forms and lists of links. Eventually it will do lists of images, too, but not yet. 0.58 August 14, 2003 11:30pm [THINGS THAT MIGHT BREAK YOUR CODE] * $mech->uri() now returns a plain string, not a URI object. The automatic stringification of the URI object was causing problems on Win32 and/or threaded Perls, and I didn't feel like figuring out why. If the non-objectness of the uri() method is a problem, let me know. * form(), form_name() and form_number() now return the HTML::Form object of the form that was chosen. They used to return a 1 or 0. This means that if you're explicitly checking for 1 or 0, instead of evaluating the return code in a boolean context, your code will break. [FIXES] * The -handling in extract_links() was incorrectly building the text. * uri() now returns a string, not a URI object. * form(), form_name() and form_number() now return the HTML::Form object of the form that was chosen. [INTERNALS] * Determination of live vs. local tests is now done in Makefile.PL, and we don't have to set those silly semaphore files any more. * Made other cleanups in Makefile.PL, like using ExtUtils::Command instead of rolling my own touch(). * Moved all the *-live.t tests into t/live/*.t, and renamed the *-local.t files to not be -local. * Added more tests for tags. 0.57 July 31, 2003 11:21pm [ENHANCEMENTS] * Added tags to those that are links per find_links(). 0.56 July 24, 2003 12:15pm [THINGS THAT MIGHT BREAK YOUR CODE] * Created agent_alias() method to do the browser string translation. Passing "Windows IE 6" to agent() will get you back exactly that string as the agent. You have to call $a->agent_alias( "Windows IE 6" ) to get the translation. Fortunately, unless you used the new functionality of agent() in the past two days since I released 0.55, it won't be a problem. [ENHANCEMENTS] * Removed the dependencies on Carp and Test::Builder. There still is a dependency on Test::Builder for Test::More, but it's no longer explicit in the Makefile.PL. Mech will use Carp if possible, but it's no longer a requirement. [INTERNALS] * Added _carp method for handling conditional warnings, rather than checking quiet() all the time. 0.55 July 22, 2003 12:10pm [ENHANCEMENTS] * Added WWW::Mechanize::Link object to encapsulate what used to be an array reference of stuff from find_link(). This replaces having to know that $link->[0] was URL and so on. However, since WWW::Mechanize::Link is a blessed arrayref, it's backwards compatible with existing code. * The WWW::Mechanize::Link object now tracks what tag the link came from (, or
Fake Signature Signature Fake Signature WWW-Mechanize-1.73/t/find_link_id.t000644 000767 000024 00000002424 12026436042 017347 0ustar00etherstaff000000 000000 #!perl -Tw use warnings; use strict; use Test::More 'no_plan'; use URI::file; BEGIN { delete @ENV{qw(PATH IFS CDPATH ENV BASH_ENV)}; # Placates taint-unsafe Cwd.pm in 5.6.1 use_ok( 'WWW::Mechanize' ); } my $mech = WWW::Mechanize->new( cookie_jar => undef ); isa_ok( $mech, 'WWW::Mechanize' ); my $uri = URI::file->new_abs( 't/find_link_id.html' )->as_string; $mech->get( $uri ); ok( $mech->success, "Fetched $uri" ) or die q{Can't get test page}; FIND_BY_ID: { my $x = $mech->find_link( id => 'signature' ); isa_ok( $x, 'WWW::Mechanize::Link' ); is( $x->url, 'signature.html', 'found link with given ID' ); } FIND_BY_CLASS: { my $x = $mech->find_link( tag => 'iframe', class => 'smart_iframe' ); isa_ok( $x, 'WWW::Mechanize::Link' ); is( $x->url, 'http://boo.xyz.com/boo_app', 'found link within "iframe" with given class' ); } FIND_ID_BY_REGEX: { my $x = $mech->find_link( id_regex => qr/^sig/ ); isa_ok( $x, 'WWW::Mechanize::Link' ); is( $x->url, 'signature.html', 'found link with ID matching a regex' ); } FIND_CLASS_BY_REGEX: { my $x = $mech->find_link( tag => 'iframe', class_regex => qr/IFRAME$/i ); isa_ok( $x, 'WWW::Mechanize::Link' ); is( $x->url, 'http://boo.xyz.com/boo_app', 'found link with class matching a regex' ); } WWW-Mechanize-1.73/t/form-parsing.t000644 000767 000024 00000000734 12026436042 017344 0ustar00etherstaff000000 000000 #!perl -Tw use strict; use warnings; use Test::More tests=>1; use HTML::Form; my $base = 'http://localhost/'; my $content = do { local $/ = undef; }; my $forms = [ HTML::Form->parse( $content, $base ) ]; is( scalar @{$forms}, 1, 'Find one form, please' ); __DATA__ WWW::Mechanize::Shell test page
WWW-Mechanize-1.73/t/form_with_fields.html000644 000767 000024 00000000774 12026436042 020771 0ustar00etherstaff000000 000000
Like in PHP!
WWW-Mechanize-1.73/t/form_with_fields.t000644 000767 000024 00000002110 12026436042 020252 0ustar00etherstaff000000 000000 #!perl -Tw use warnings; use strict; use Test::More 'no_plan'; use URI::file (); BEGIN { delete @ENV{qw(PATH IFS CDPATH ENV BASH_ENV)}; # Placates taint-unsafe Cwd.pm in 5.6.1 use_ok( 'WWW::Mechanize' ); } my $mech = WWW::Mechanize->new( cookie_jar => undef, autocheck => 0 ); isa_ok( $mech, 'WWW::Mechanize' ); my $uri = URI::file->new_abs( 't/form_with_fields.html' )->as_string; $mech->get( $uri ); ok( $mech->success, "Fetched $uri" ) or die q{Can't get test page}; { my $test = 'dies with no input'; eval{ my $form = $mech->form_with_fields(); }; ok($@,$test); } { my $form = $mech->form_with_fields(qw/1b/); isa_ok( $form, 'HTML::Form' ); is($form->attr('name'), '1st_form', 'first form matches'); } { my $form = $mech->form_with_fields('1b', 'opt[2]'); isa_ok( $form, 'HTML::Form' ); is($form->attr('name'), '2nd_form', 'second form matches'); } { $mech->get($uri); eval { $mech->submit_form( with_fields => { '1b' => '', 'opt[2]' => '' }, ); }; is($@,'', ' submit_form( with_fields => %data ) ' ); } WWW-Mechanize-1.73/t/frames.html000644 000767 000024 00000000607 12026436042 016715 0ustar00etherstaff000000 000000 WWW-Mechanize-1.73/t/frames.t000644 000767 000024 00000001450 12026436042 016211 0ustar00etherstaff000000 000000 #!perl -Tw use warnings; use strict; use Test::More tests => 7; use URI::file; BEGIN { delete @ENV{qw(PATH IFS CDPATH ENV BASH_ENV)}; # Placates taint-unsafe Cwd.pm in 5.6.1 use_ok( 'WWW::Mechanize' ); } my $mech = WWW::Mechanize->new( cookie_jar => undef ); isa_ok( $mech, 'WWW::Mechanize' ); my $uri = URI::file->new_abs( 't/frames.html' )->as_string; $mech->get( $uri ); ok( $mech->success, "Fetched $uri" ) or die q{Can't get test page}; my $link = $mech->find_link(); isa_ok( $link, 'WWW::Mechanize::Link' ); my @links = $mech->find_all_links(); is( scalar @links, 2, 'Only two links' ); is_deeply( [@{$links[0]}[0..3]], [ 'find_link.html', undef, 'top', 'frame' ], 'First frame OK' ); is_deeply( [@{$links[1]}[0..3]], [ 'google.html', undef, 'bottom', 'frame' ], 'Second frame OK' ); WWW-Mechanize-1.73/t/google.html000644 000767 000024 00000005415 12026436042 016716 0ustar00etherstaff000000 000000 Google
Google

 Web Images Groups Directory News 

 
 • Advanced Search
 • Preferences
 • Language Tools

Want more from Google? Try these expert search tips


Advertise with Us - Business Solutions - Services & Tools - Jobs, Press, & Help

©2003 Google - Searching 3,083,324,652 web pages

WWW-Mechanize-1.73/t/image-new.t000644 000767 000024 00000001422 12026436042 016604 0ustar00etherstaff000000 000000 #!perl -Tw use warnings; use strict; use Test::More tests=>10; BEGIN { use_ok( 'WWW::Mechanize::Image' ); } # test new style API my $link = WWW::Mechanize::Image->new( { url => 'url.html', base => 'http://base.example.com/', name => 'name', alt => 'alt', tag => 'a', height => 2112, width => 5150, } ); is( $link->url, 'url.html', 'url() works' ); is( $link->base, 'http://base.example.com/', 'base() works' ); is( $link->name, 'name', 'name() works' ); is( $link->alt, 'alt', 'alt() works' ); is( $link->tag, 'a', 'tag() works' ); is( $link->height, 2112, 'height works' ); is( $link->width, 5150, 'width works' ); is( $link->url_abs, 'http://base.example.com/url.html', 'url_abs works' ); isa_ok( $link->URI, 'URI::URL', 'Returns an object' ); WWW-Mechanize-1.73/t/image-parse.html000644 000767 000024 00000001154 12026436042 017630 0ustar00etherstaff000000 000000 Testing image extraction blargle
And now, the dreaded wango The world of the wango CNN
BBC Blongo! WWW-Mechanize-1.73/t/image-parse.t000644 000767 000024 00000002077 12026436042 017134 0ustar00etherstaff000000 000000 #!perl -Tw use warnings; use strict; use Test::More tests=>15; use URI::file; BEGIN { delete @ENV{qw(PATH IFS CDPATH ENV BASH_ENV)}; # Placates taint-unsafe Cwd.pm in 5.6.1 use_ok( 'WWW::Mechanize' ); } my $mech = WWW::Mechanize->new( cookie_jar => undef ); isa_ok( $mech, 'WWW::Mechanize' ); my $uri = URI::file->new_abs( 't/image-parse.html' )->as_string; $mech->get( $uri ); ok( $mech->success, "Fetched $uri" ) or die 'Can\'t get test page'; my @images = $mech->images; is( scalar @images, 3, 'Only two images' ); my $first = $images[0]; is( $first->tag, 'img', 'img tag' ); is( $first->url, 'wango.jpg', 'URL matches' ); is( $first->alt, 'The world of the wango', 'alt matches' ); my $second = $images[1]; is( $second->tag, 'input', 'input tag' ); is( $second->url, 'bongo.gif', 'URL matches' ); is( $second->alt, undef, 'alt matches' ); is( $second->height, 142, 'height' ); is( $second->width, 43, 'width' ); my $third = $images[2]; is( $third->url, 'linked.gif', 'Got the third image' ); is( $third->tag, 'img', 'input tag' ); is( $third->alt, undef, 'alt' ); WWW-Mechanize-1.73/t/link-base.t000644 000767 000024 00000000726 12026436042 016606 0ustar00etherstaff000000 000000 #!perl -Tw use warnings; use strict; use Test::More tests => 5; BEGIN { use_ok( 'WWW::Mechanize::Link' ); } NO_BASE: { my $link = WWW::Mechanize::Link->new( 'url.html', 'Click here', undef, undef ); isa_ok( $link, 'WWW::Mechanize::Link', 'constructor OK' ); my $URI = $link->URI; isa_ok( $URI, 'URI::URL', 'URI is proper type' ); is( $URI->rel, 'url.html', 'Short form of the url' ); is( $link->url_abs, 'url.html', 'url_abs works' ); } WWW-Mechanize-1.73/t/link-relative.t000644 000767 000024 00000001300 12026436042 017474 0ustar00etherstaff000000 000000 #!perl -Tw use warnings; use strict; use Test::More tests => 6; use URI::file; BEGIN { delete @ENV{qw(PATH IFS CDPATH ENV BASH_ENV)}; # Placates taint-unsafe Cwd.pm in 5.6.1 use_ok( 'WWW::Mechanize' ); } my $mech = WWW::Mechanize->new( cookie_jar => undef ); isa_ok( $mech, 'WWW::Mechanize' ); my $uri = URI::file->new_abs( 't/image-parse.html' )->as_string; $mech->get( $uri ); ok( $mech->success, "Fetched $uri" ) or die q{Can't get test page}; $mech->get( 'select.html' ); ok( $mech->success, 'Fetch select.html, no directory' ); $mech->get( './select.html' ); ok( $mech->success, 'Fetch select.html from ./' ); $mech->get( 'local/click.t' ); ok( $mech->success, 'Fetched click.t' ); WWW-Mechanize-1.73/t/link.t000644 000767 000024 00000003554 12026436042 015700 0ustar00etherstaff000000 000000 #!perl -Tw use warnings; use strict; use Test::More tests=>23; BEGIN { use_ok( 'WWW::Mechanize::Link' ); } OLD_API: { my $link = WWW::Mechanize::Link->new( 'url.html', 'text', 'name', 'frame', 'http://base.example.com/', { alt => 'alt text' } ); isa_ok( $link, 'WWW::Mechanize::Link' ); is( scalar @$link, 6, 'Should have five elements' ); # Test the new-style accessors is( $link->url, 'url.html', 'url() works' ); is( $link->text, 'text', 'text() works' ); is( $link->name, 'name', 'name() works' ); is( $link->tag, 'frame', 'tag() works' ); is( $link->base, 'http://base.example.com/', 'base() works' ); is( $link->attrs->{alt}, 'alt text', 'attrs() works' ); # Order of the parms in the blessed array is important for backwards compatibility. is( $link->[0], 'url.html', 'parm 0 is url' ); is( $link->[1], 'text', 'parm 1 is text' ); is( $link->[2], 'name', 'parm 2 is name' ); is( $link->[3], 'frame', 'parm 3 is tag' ); is( $link->[4], 'http://base.example.com/', 'parm 4 is base' ); my $URI = $link->URI; isa_ok( $URI, 'URI::URL', 'URI is proper type' ); is( $URI->rel, 'url.html', 'Short form of the url' ); is( $link->url_abs, 'http://base.example.com/url.html', 'url_abs works' ); } NEW_API: { # test new style API my $link = WWW::Mechanize::Link->new( { url => 'url.html', text => 'text', name => 'name', tag => 'frame', base => 'http://base.example.com/', attrs => { alt => 'alt text' }, } ); is( $link->url, 'url.html', 'url() works' ); is( $link->text, 'text', 'text() works' ); is( $link->name, 'name', 'name() works' ); is( $link->tag, 'frame', 'tag() works' ); is( $link->base, 'http://base.example.com/', 'base() works' ); is( $link->attrs->{alt}, 'alt text', 'attrs() works' ); } WWW-Mechanize-1.73/t/live/000755 000767 000024 00000000000 12206033601 015500 5ustar00etherstaff000000 000000 WWW-Mechanize-1.73/t/local/000755 000767 000024 00000000000 12206033601 015633 5ustar00etherstaff000000 000000 WWW-Mechanize-1.73/t/mech-dump/000755 000767 000024 00000000000 12206033601 016420 5ustar00etherstaff000000 000000 WWW-Mechanize-1.73/t/new.t000644 000767 000024 00000002171 12026436042 015526 0ustar00etherstaff000000 000000 #!perl -Tw use warnings; use strict; use Test::More tests => 14; BEGIN { use_ok( 'WWW::Mechanize' ); } RES_ON_NEW: { my $m = WWW::Mechanize->new; isa_ok( $m, 'WWW::Mechanize' ); ok( !$m->success, 'success() is false before any get' ); my $res = $m->res; ok( !defined $res, 'res() is undef' ); } NO_AGENT: { my $m = WWW::Mechanize->new; isa_ok( $m, 'WWW::Mechanize' ); can_ok( $m, 'request' ); like( $m->agent, qr/WWW-Mechanize/, 'Set user agent string' ); like( $m->agent, qr/$WWW::Mechanize::VERSION/, 'Set user agent version' ); $m->agent( 'foo/bar v1.23' ); is( $m->agent, 'foo/bar v1.23', 'Can set the agent' ); } USER_AGENT: { my $alias = 'Windows IE 6'; my $m = WWW::Mechanize->new( agent => $alias ); isa_ok( $m, 'WWW::Mechanize' ); can_ok( $m, 'request' ); is( $m->agent, $alias, q{Aliases don't get translated in the constructor} ); $m->agent_alias( $alias ); like( $m->agent, qr/^Mozilla.+compatible.+Windows/, 'Alias sets the agent' ); $m->agent( 'ratso/bongo v.43' ); is( $m->agent, 'ratso/bongo v.43', 'Can still set the agent' ); } WWW-Mechanize-1.73/t/pod-coverage.t000644 000767 000024 00000000307 12026436042 017307 0ustar00etherstaff000000 000000 #!perl -Tw use warnings; use strict; use Test::More; eval 'use Test::Pod::Coverage 1.04'; plan skip_all => 'Test::Pod::Coverage 1.04 required for testing POD coverage' if $@; all_pod_coverage_ok(); WWW-Mechanize-1.73/t/pod.t000644 000767 000024 00000000247 12026436042 015521 0ustar00etherstaff000000 000000 #!perl -Tw use warnings; use strict; use Test::More; eval 'use Test::Pod 1.14'; plan skip_all => 'Test::Pod 1.14 required for testing POD' if $@; all_pod_files_ok(); WWW-Mechanize-1.73/t/regex-error.t000644 000767 000024 00000001140 12026436042 017171 0ustar00etherstaff000000 000000 #!perl -Tw use warnings; use strict; use Test::More; BEGIN { eval 'use Test::Warn'; plan skip_all => "Test::Warn required to test $0" if $@; plan tests => 4; } BEGIN { use_ok( 'WWW::Mechanize' ); } my $m = WWW::Mechanize->new; isa_ok( $m, 'WWW::Mechanize' ); warning_like { $m->find_link( link_regex => 'foo' ); } qr[Unknown link-finding parameter "link_regex".+line \d+], 'Passes message, and includes the line number'; warning_like { $m->find_link( url_regex => 'foo' ); } qr[foo passed as url_regex is not a regex.+line \d+], 'Passes message, and includes the line number'; WWW-Mechanize-1.73/t/save_content.html000644 000767 000024 00000000217 12026436042 020125 0ustar00etherstaff000000 000000 Però poi si vedrà!!! WWW-Mechanize-1.73/t/save_content.t000644 000767 000024 00000002600 12026436042 017422 0ustar00etherstaff000000 000000 #!perl -Tw use warnings; use strict; use Test::More tests => 8; use URI::file; BEGIN { delete @ENV{qw(PATH IFS CDPATH ENV BASH_ENV)}; # Placates taint-unsafe Cwd.pm in 5.6.1 use_ok( 'WWW::Mechanize' ); } my $mech = WWW::Mechanize->new( cookie_jar => undef ); isa_ok( $mech, 'WWW::Mechanize' ); my $original = 't/find_inputs.html'; my $saved = 'saved1.test.txt'; my $uri = URI::file->new_abs( $original )->as_string; $mech->get( $uri ); ok( $mech->success, "Fetched $uri" ) or die q{Can't get test page}; #unlink $saved; ok( !-e $saved, "$saved does not exist" ); $mech->save_content( $saved ); my $old_text = slurp( $original ); my $new_text = slurp( $saved ); ok( $old_text eq $new_text, 'Saved copy matches the original' ) && unlink $saved; { my $original = 't/save_content.html'; my $saved = 'saved2.test.txt'; my $uri = URI::file->new_abs( $original )->as_string; $mech->get( $uri ); ok( $mech->success, "Fetched $uri" ) or die q{Can't get test page}; #unlink $saved; ok( !-e $saved, "$saved does not exist" ); $mech->save_content( $saved, binary => 1 ); my $old_text = slurp( $original ); my $new_text = slurp( $saved ); ok( $old_text eq $new_text, 'Saved copy matches the original' ) && unlink $saved; } sub slurp { my $name = shift; open( my $fh, '<', $name ) or die "Can't open $name: $!\n"; return join '', <$fh>; } WWW-Mechanize-1.73/t/select.html000644 000767 000024 00000000662 12026436042 016720 0ustar00etherstaff000000 000000 Like a hole
WWW-Mechanize-1.73/t/select.t000644 000767 000024 00000005005 12026436042 016213 0ustar00etherstaff000000 000000 #!perl -T use warnings; use strict; use Test::More tests => 14; use URI::file; BEGIN { delete @ENV{qw(PATH IFS CDPATH ENV BASH_ENV)}; # Placates taint-unsafe Cwd.pm in 5.6.1 use_ok( 'WWW::Mechanize' ); } my $mech = WWW::Mechanize->new( cookie_jar => undef ); isa_ok( $mech, 'WWW::Mechanize' ); my $uri = URI::file->new_abs( 't/select.html' )->as_string; my $response = $mech->get( $uri ); ok( $response->is_success, "Fetched $uri" ); my ($sendsingle, @sendmulti, %sendsingle, %sendmulti, $rv, $return, @return, @singlereturn, $form); # possible values are: aaa, bbb, ccc, ddd $sendsingle = 'aaa'; @sendmulti = qw(bbb ccc); @singlereturn = ($sendmulti[0]); %sendsingle = (n => 1); %sendmulti = (n => [2, 3]); ok($mech->form_number(1), 'set form to number 1'); $form = $mech->current_form(); # Multi-select # pass multiple values to a multi select $mech->select('multilist', \@sendmulti); @return = $form->param('multilist'); is_deeply(\@return, \@sendmulti, 'multi->multi value is ' . join(' ', @return)); $mech->select('multilist', \%sendmulti); @return = $form->param('multilist'); is_deeply(\@return, \@sendmulti, 'multi->multi value is ' . join(' ', @return)); # pass a single value to a multi select $mech->select('multilist', $sendsingle); $return = $form->param('multilist'); is($return, $sendsingle, "single->multi value is '$return'"); $mech->select('multilist', \%sendsingle); $return = $form->param('multilist'); is($return, $sendsingle, "single->multi value is '$return'"); # Single select # pass multiple values to a single select (only the _first_ should be set) $mech->select('singlelist', \@sendmulti); @return = $form->param('singlelist'); is_deeply(\@return, \@singlereturn, 'multi->single value is ' . join(' ', @return)); $mech->select('singlelist', \%sendmulti); @return = $form->param('singlelist'); is_deeply(\@return, \@singlereturn, 'multi->single value is ' . join(' ', @return)); # pass a single value to a single select $rv = $mech->select('singlelist', $sendsingle); $return = $form->param('singlelist'); is($return, $sendsingle, "single->single value is '$return'"); $rv = $mech->select('singlelist', \%sendsingle); $return = $form->param('singlelist'); is($return, $sendsingle, "single->single value is '$return'"); # test return value from $mech->select is($rv, 1, 'return 1 after successful select'); EAT_THE_WARNING: { # Mech complains about the non-existent field local $SIG{__WARN__} = sub {}; $rv = $mech->select('missing_list', 1); } is($rv, undef, 'return undef after failed select'); WWW-Mechanize-1.73/t/taint.t000644 000767 000024 00000001276 12026436042 016061 0ustar00etherstaff000000 000000 #!perl -T use warnings; use strict; use Test::More; BEGIN { eval 'use Test::Taint'; plan skip_all => 'Test::Taint required for checking taintedness' if $@; plan tests=>6; } BEGIN { delete @ENV{ qw( http_proxy HTTP_PROXY ) }; } BEGIN { use_ok( 'WWW::Mechanize' ); } my $mech = WWW::Mechanize->new( autocheck => 1 ); isa_ok( $mech, 'WWW::Mechanize', 'Created object' ); $mech->get( 'file:t/google.html' ); # Make sure taint checking is on correctly my @keys = keys %ENV; tainted_ok( $ENV{ $keys[0] }, 'ENV taints OK' ); is( $mech->title, 'Google', 'Correct title' ); untainted_ok( $mech->title, 'Title should not be tainted' ); tainted_ok( $mech->content, 'But content should' ); WWW-Mechanize-1.73/t/TestServer.pm000644 000767 000024 00000004456 12026436042 017224 0ustar00etherstaff000000 000000 package TestServer; use warnings; use strict; use Test::More; use HTTP::Server::Simple::CGI; use base qw( HTTP::Server::Simple::CGI ); my $dispatch_table = {}; =head1 OVERLOADED METHODS =cut our $pid; sub new { die 'An instance of TestServer has already been started.' if $pid; my $class = shift; my $port = shift; if ( !$port ) { $port = int(rand(20000)) + 20000; } my $self = $class->SUPER::new( $port ); my $root = $self->root; return $self; } sub run { my $self = shift; $pid = $self->SUPER::run(@_); $SIG{__DIE__} = \&stop; return $pid; } sub handle_request { my $self = shift; my $cgi = shift; my $path = $cgi->path_info(); my $handler = $dispatch_table->{$path}; if (ref($handler) eq "CODE") { print "HTTP/1.0 200 OK\r\n"; $handler->($cgi); } else { my $file = $path; if ( $file =~ m{/$} ) { $file .= 'index.html'; } $file =~ s/\s+//g; my $filename = "t/html/$file"; if ( -r $filename ) { if (my $response=do { local (@ARGV, $/) = $filename; <> }) { print "HTTP/1.0 200 OK\r\n"; print "Content-Type: text/html\r\nContent-Length: ", length($response), "\r\n\r\n", $response; return; } } else { print "HTTP/1.0 404 Not found\r\n"; print $cgi->header, $cgi->start_html('Not found'), $cgi->h1('Not found'), $cgi->end_html; } } } =head1 METHODS UNIQUE TO TestServer =cut sub set_dispatch { my $self = shift; $dispatch_table = shift; return; } sub background { my $self = shift; $pid = $self->SUPER::background() or Carp::confess( q{Can't start the test server} ); sleep 1; # background() may come back prematurely, so give it a second to fire up my $root = $self->root; diag( "Test server $root as PID $pid" ); return $pid; } sub hostname { my $self = shift; return '127.0.0.1'; } sub root { my $self = shift; my $port = $self->port; my $hostname = $self->hostname; return "http://$hostname:$port"; } sub stop { if ( $pid ) { kill( 9, $pid ) unless $^S; undef $pid; } return; } 1; WWW-Mechanize-1.73/t/tick.html000644 000767 000024 00000000712 12026436042 016367 0ustar00etherstaff000000 000000
Hello
Bye
Arse
Wibble
Foo
WWW-Mechanize-1.73/t/tick.t000644 000767 000024 00000001440 12026436042 015665 0ustar00etherstaff000000 000000 #!perl -Tw use warnings; use strict; use Test::More tests => 5; use URI::file; delete @ENV{qw(PATH IFS CDPATH ENV BASH_ENV)}; # Placates taint-unsafe Cwd.pm in 5.6.1 use_ok( 'WWW::Mechanize' ); my $mech = WWW::Mechanize->new( cookie_jar => undef ); isa_ok( $mech, 'WWW::Mechanize' ); my $uri = URI::file->new_abs( 't/tick.html' )->as_string; $mech->get( $uri ); ok( $mech->success, $uri ); $mech->form_number( 1 ); $mech->tick('foo','hello'); $mech->tick('foo','bye'); $mech->untick('foo','hello'); my $form = $mech->form_number(1); isa_ok( $form, 'HTML::Form' ); my $reqstring = $form->click->as_string; my $wanted = <<'EOT'; POST http://localhost/ Content-Length: 21 Content-Type: application/x-www-form-urlencoded foo=bye&submit=Submit EOT is( $reqstring, $wanted, 'Proper posting' ); WWW-Mechanize-1.73/t/Tools.pm000644 000767 000024 00000000476 12026436042 016214 0ustar00etherstaff000000 000000 package Tools; use base 'Exporter'; our @EXPORT_OK = qw( $canTMC memory_cycle_ok ); our @EXPORT = @EXPORT_OK; our $canTMC; sub import { delete @ENV{ qw( http_proxy HTTP_PROXY PATH IFS CDPATH ENV BASH_ENV) }; eval 'use Test::Memory::Cycle'; $canTMC = !$@; Tools->export_to_level(1, @_); } 1; WWW-Mechanize-1.73/t/untaint.t000644 000767 000024 00000000575 12026436042 016425 0ustar00etherstaff000000 000000 #!/usr/bin/perl use strict; use warnings; use URI::file; use Test::More; eval 'use Test::NoWarnings'; if ( $@ ) { plan( skip_all => 'Test::NoWarnings not installed' ); } plan( tests => 2 ); # the use_ok and then the warning check $ENV{test} = 14; use_ok( 'WWW::Mechanize' ); my $uri = URI::file->new_abs( 't/find_link_id.html' )->as_string; WWW::Mechanize->new->get($uri); WWW-Mechanize-1.73/t/upload.html000644 000767 000024 00000000350 12026436042 016717 0ustar00etherstaff000000 000000
WWW-Mechanize-1.73/t/upload.t000644 000767 000024 00000001660 12026436042 016223 0ustar00etherstaff000000 000000 #!perl -Tw use strict; use warnings; use Test::More tests => 5; use URI::file; BEGIN { delete @ENV{ qw( http_proxy HTTP_PROXY PATH IFS CDPATH ENV BASH_ENV) }; } use_ok( 'WWW::Mechanize' ); my $mech = WWW::Mechanize->new( cookie_jar => undef ); isa_ok( $mech, 'WWW::Mechanize' ); my $uri = URI::file->new_abs( 't/upload.html' )->as_string; $mech->get( $uri ); ok( $mech->success, $uri ); my $form = $mech->form_number(1); my $reqstring = $form->click->as_string; $reqstring =~ s/\r//g; # trim off possible extra newline $reqstring =~ s/^\Z\n//m; my $wanted = <<'EOT'; POST http://localhost/ Content-Length: 77 Content-Type: multipart/form-data; boundary=xYzZY --xYzZY Content-Disposition: form-data; name="submit" Submit --xYzZY-- EOT is( $reqstring, $wanted, 'Proper posting' ); $mech->field('upload', 'MANIFEST'); $reqstring = $form->click->as_string; like( $reqstring, qr/Cookbook/, 'The uploaded file should be in the request'); WWW-Mechanize-1.73/t/warn.t000644 000767 000024 00000001244 12026436042 015704 0ustar00etherstaff000000 000000 #!perl -T use warnings; use strict; use Test::More; BEGIN { eval 'use Test::Warn'; plan skip_all => 'Test::Warn required to test warn' if $@; plan tests => 6; } BEGIN { use_ok( 'WWW::Mechanize' ); } my $m = WWW::Mechanize->new; isa_ok( $m, 'WWW::Mechanize' ); warning_like { $m->warn( 'Something bad' ); } qr[Something bad.+line \d+], 'Passes the message, and includes the line number'; warning_like { $m->quiet(1); $m->warn( 'Something bad' ); } undef, 'Quiets correctly'; my $hushed = WWW::Mechanize->new( quiet => 1 ); isa_ok( $hushed, 'WWW::Mechanize' ); warning_like { $hushed->warn( 'Something bad' ); } undef, 'Quiets correctly'; WWW-Mechanize-1.73/t/warnings.t000644 000767 000024 00000000674 12026436042 016573 0ustar00etherstaff000000 000000 #!perl -T use warnings; use strict; use Test::More; BEGIN { eval 'use Test::Warn'; plan skip_all => 'Test::Warn required to test warnings' if $@; plan tests => 3; } BEGIN { use_ok( 'WWW::Mechanize' ); } UNKNOWN_ALIAS: { my $m = WWW::Mechanize->new; isa_ok( $m, 'WWW::Mechanize' ); warning_is { $m->agent_alias( 'Blongo' ); } 'Unknown agent alias "Blongo"', 'Unknown aliases squawk appropriately'; } WWW-Mechanize-1.73/t/mech-dump/mech-dump.t000644 000767 000024 00000003217 12026436042 020475 0ustar00etherstaff000000 000000 #!perl -Tw use warnings; use strict; use Test::More; use File::Spec; use LWP; BEGIN { delete @ENV{ qw( IFS CDPATH ENV BASH_ENV PATH ) }; } plan skip_all => 'Not installing mech-dump' if -e File::Spec->catfile( qw( t SKIP-MECH-DUMP ) ); plan tests => 4; my $exe = File::Spec->catfile( qw( blib script mech-dump ) ); if ( $^O eq 'VMS' ) { $exe = qq[mcr $^X "-mblib" $exe]; } # Simply use a file: uri instead of the filename to make this test # more independent of what URI::* thinks. my $source = 'file:t/google.html'; my $perl; $perl = $1 if $^X =~ /^(.+)$/; my $command = "$perl -Mblib $exe --forms $source"; my $actual = `$command`; my $expected; if ( $LWP::VERSION < 5.800 ) { $expected = <<'EOF'; GET file:/target-page [bob-the-form] hl=en (hidden) ie=ISO-8859-1 (hidden) q= btnG=Google Search (submit) btnI=I'm Feeling Lucky (submit) EOF } else { $expected = <<'EOF'; GET file:/target-page [bob-the-form] hl=en (hidden readonly) ie=ISO-8859-1 (hidden readonly) q= (text) btnG=Google Search (submit) btnI=I'm Feeling Lucky (submit) EOF } my @actual = split /\s*\n/, $actual; my @expected = split /\s*\n/, $expected; # First line is platform-dependent, so handle it accordingly. shift @expected; my $first = shift @actual; like( $first, qr/^GET file:.*\/target-page \[bob-the-form\]/, 'First line matches' ); cmp_ok( @expected, '>', 0, 'Still some expected' ); cmp_ok( @actual, '>', 0, 'Still some actual' ); is_deeply( \@actual, \@expected, 'Rest of the lines match' ); WWW-Mechanize-1.73/t/local/back.t000644 000767 000024 00000010160 12026436042 016724 0ustar00etherstaff000000 000000 #!perl use warnings; use strict; use Test::More tests => 47; use lib qw( t t/local ); use LocalServer; use HTTP::Daemon; use HTTP::Response; =head1 NAME =head1 SYNOPSIS This tests Mech's Back "button". Tests were converted from t/live/back.t, and subsequently enriched to deal with RT ticket #8109. =cut BEGIN { use Tools; } BEGIN { delete @ENV{ qw( IFS CDPATH ENV BASH_ENV ) }; use_ok( 'WWW::Mechanize' ); } my $mech = WWW::Mechanize->new(cookie_jar => {}); isa_ok( $mech, 'WWW::Mechanize' ); isa_ok( $mech->cookie_jar(), 'HTTP::Cookies', 'this $mech starts with a cookie jar' ); my $html = <<'HTML'; %s Whatever. Images Scripts Ports Modules
HTML my $server = LocalServer->spawn( html => $html ); isa_ok( $server, 'LocalServer' ); ok( !$mech->back(), 'With no stack, no going back' ); $mech->get($server->url); ok( $mech->success, 'Fetched OK' ); my $first_base = $mech->base; my $title = $mech->title; $mech->follow_link( n=>2 ); ok( $mech->success, 'Followed OK' ); ok( $mech->back(), 'Back should succeed' ); is( $mech->base, $first_base, 'Did the base get set back?' ); is( $mech->title, $title, 'Title set back?' ); $mech->follow_link( text => 'Images' ); ok( $mech->success, 'Followed OK' ); ok( $mech->back(), 'Back should succeed' ); is( $mech->base, $first_base, 'Did the base get set back?' ); is( $mech->title, $title, 'Title set back?' ); is( scalar @{$mech->{page_stack}}, 0, 'Pre-search check' ); $mech->submit_form( fields => { 'q' => 'perl' }, ); ok( $mech->success, 'Searched for Perl' ); like( $mech->title, qr/search.cgi/, 'Right page title' ); is( scalar @{$mech->{page_stack}}, 1, 'POST is in the stack' ); $mech->head( $server->url ); ok( $mech->success, 'HEAD succeeded' ); is( scalar @{$mech->{page_stack}}, 1, 'HEAD is not in the stack' ); ok( $mech->back(), 'Back should succeed' ); ok( $mech->success, 'Back' ); is( $mech->base, $first_base, 'Did the base get set back?' ); is( $mech->title, $title, 'Title set back?' ); is( scalar @{$mech->{page_stack}}, 0, 'Post-search check' ); =head2 Back and misc. internal fields RT ticket #8109 reported that back() is broken after reload(), and that the cookie_jar was also damaged by back(). We test for that: reload() should not alter the back() stack, and the cookie jar should not be versioned (once a cookie is set, hitting the back button in a browser does not cause it to go away). =cut $mech->follow_link( text => 'Images' ); $mech->reload(); ok( $mech->back(), 'Back should succeed' ); is($mech->title, $title, 'reload() does not push page to stack' ); ok(defined($mech->cookie_jar()), '$mech still has a cookie jar after a number of back()'); # Now some other weird stuff. Start with a fresh history by recreating # $mech. SKIP: { skip 'Test::Memory::Cycle not installed', 1 unless $canTMC; memory_cycle_ok( $mech, 'No memory cycles found' ); } $mech = WWW::Mechanize->new( autocheck => 0 ); isa_ok( $mech, 'WWW::Mechanize' ); $mech->get( $server->url ); ok( $mech->success, 'Got root URL' ); my @links = qw( /scripts /ports/ modules/ ); is( scalar @{$mech->{page_stack}}, 0, 'Pre-404 check' ); my $server404url = $server->error_notfound('404check'); $mech->get($server404url); is( $mech->status, 404 , '404 check') or diag( qq{\$server404url=$server404url\n\$mech->content="}, $mech->content, qq{"\n} ); is( scalar @{$mech->{page_stack}}, 1, 'Even 404s get on the stack' ); ok( $mech->back(), 'Back should succeed' ); is( $mech->uri, $server->url, 'Back from the 404' ); is( scalar @{$mech->{page_stack}}, 0, 'Post-404 check' ); for my $link ( @links ) { $mech->get( $link ); warn $mech->status() if (! $mech->success()); is( $mech->status, 200, "Get $link" ); ok( $mech->back(), 'Back should succeed' ); is( $mech->uri, $server->url, "Back from $link" ); } SKIP: { skip 'Test::Memory::Cycle not installed', 1 unless $canTMC; memory_cycle_ok( $mech, 'No memory cycles found' ); } WWW-Mechanize-1.73/t/local/click.t000644 000767 000024 00000001543 12026436042 017116 0ustar00etherstaff000000 000000 #!perl use warnings; use strict; use lib 't/local'; use LocalServer; use Test::More tests => 9; BEGIN { delete @ENV{ qw( IFS CDPATH ENV BASH_ENV ) }; use_ok( 'WWW::Mechanize' ); } my $mech = WWW::Mechanize->new(); isa_ok( $mech, 'WWW::Mechanize', 'Created the object' ); my $server = LocalServer->spawn(); isa_ok( $server, 'LocalServer' ); my $response = $mech->get( $server->url ); isa_ok( $response, 'HTTP::Response', 'Got back a response' ); ok( $response->is_success, 'Got URL' ) or die q{Can't even fetch local url}; ok( $mech->is_html, 'Local page is HTML' ); $mech->field(query => 'foo'); # Filled the 'q' field $response = $mech->click('submit'); isa_ok( $response, 'HTTP::Response', 'Got back a response' ); ok( $response->is_success, q{Can click 'Go' ('Google Search' button)} ); is( $mech->field('query'),'foo', 'Filled field correctly'); WWW-Mechanize-1.73/t/local/click_button.t000644 000767 000024 00000004143 12026436042 020510 0ustar00etherstaff000000 000000 #!perl use warnings; use strict; use lib 't/local'; use LocalServer; use Test::More tests => 19; BEGIN { delete @ENV{ qw( IFS CDPATH ENV BASH_ENV ) }; use_ok( 'WWW::Mechanize' ); } my $mech = WWW::Mechanize->new(); isa_ok( $mech, 'WWW::Mechanize', 'Created the object' ); my $server = LocalServer->spawn(); isa_ok( $server, 'LocalServer' ); my $response = $mech->get( $server->url ); isa_ok( $response, 'HTTP::Response', 'Got back a response' ); ok( $response->is_success, 'Got URL' ) or die q{Can't even fetch local url}; ok( $mech->is_html, 'Local page is HTML' ); my @forms = $mech->forms; my $form = $forms[0]; CLICK_BY_NUMBER: { $mech->click_button(number => 1); like( $mech->uri, qr/formsubmit/, 'Clicking on button by number' ); like( $mech->uri, qr/submit=Go/, 'Correct button was pressed' ); like( $mech->uri, qr/cat_foo/, 'Parameters got transmitted OK' ); $mech->back; ok(! eval { $mech->click_button(number => 2); 1 }, 'Button number out of range'); } CLICK_BY_NAME: { $mech->click_button(name => 'submit'); like( $mech->uri, qr/formsubmit/, 'Clicking on button by name' ); like( $mech->uri, qr/submit=Go/, 'Correct button was pressed' ); like( $mech->uri, qr/cat_foo/, 'Parameters got transmitted OK' ); $mech->back; ok(! eval { $mech->click_button(name => 'bogus'); 1 }, 'Button name unknown'); } CLICK_BY_OBJECT_REFERENCE: { local $TODO = q{It seems that calling ->click() on an object is broken in LWP. Need to investigate further.}; my $clicky_button = $form->find_input( undef, 'submit' ); isa_ok( $clicky_button, 'HTML::Form::Input', 'Found the submit button' ); is( $clicky_button->value, 'Go', 'Named the right thing, too' ); my $resp = $mech->click_button(input => $clicky_button); {use Data::Dumper; local $Data::Dumper::Sortkeys=1; diag Dumper( $resp->request )} like( $mech->uri, qr/formsubmit/, 'Clicking on button by object reference' ); like( $mech->uri, qr/submit=Go/, 'Correct button was pressed' ); like( $mech->uri, qr/cat_foo/, 'Parameters got transmitted OK' ); $mech->back; } WWW-Mechanize-1.73/t/local/content.t000644 000767 000024 00000002012 12026436042 017473 0ustar00etherstaff000000 000000 #!perl use warnings; use strict; use lib 't/local'; use LocalServer; use Test::More tests => 10; BEGIN { delete @ENV{ qw( IFS CDPATH ENV BASH_ENV ) }; use_ok( 'WWW::Mechanize' ); } my $mech = WWW::Mechanize->new(); isa_ok( $mech, 'WWW::Mechanize', 'Created the object' ); my $server = LocalServer->spawn(); isa_ok( $server, 'LocalServer' ); diag('Running tests against ' . $server->url . '?xml=1'); my $response = $mech->get( $server->url . '?xml=1' ); isa_ok( $response, 'HTTP::Response', 'Got back a response' ); ok( $response->is_success, 'Got URL' ) or die q{Can't even fetch local url}; is( $response->content_type, 'application/xhtml+xml', 'Content type is application/xhtml+xml' ); ok( $mech->is_html, 'Local page is HTML' ); $mech->field(query => 'foo'); # Filled the 'q' field $response = $mech->click('submit'); isa_ok( $response, 'HTTP::Response', 'Got back a response' ); ok( $response->is_success, q{Can click 'Go' ('Google Search' button)} ); is( $mech->field('query'),'foo', 'Filled field correctly'); WWW-Mechanize-1.73/t/local/encoding.t000644 000767 000024 00000000763 12026436042 017622 0ustar00etherstaff000000 000000 #!perl use warnings; use strict; use Test::More tests => 5; use lib qw( t/local ); use LocalServer; BEGIN { delete @ENV{qw( IFS CDPATH ENV BASH_ENV )}; use_ok('WWW::Mechanize'); } my $mech = WWW::Mechanize->new(); isa_ok( $mech, 'WWW::Mechanize' ); my $server = LocalServer->spawn(); isa_ok( $server, 'LocalServer' ); my $response = $mech->get( $server->url . 'encoding/euc-jp' ); ok( $mech->success, 'Fetched OK' ); is( $response->content_charset(), 'EUC-JP', 'got encoding enc-jp' ); WWW-Mechanize-1.73/t/local/failure.t000644 000767 000024 00000003053 12026436042 017456 0ustar00etherstaff000000 000000 #!perl use warnings; use strict; use Test::More; use lib 't/local'; use LocalServer; BEGIN { delete @ENV{ qw( IFS CDPATH ENV BASH_ENV ) }; } my $NONEXISTENT = 'blahblahblah.xx-only-testing.foo'; my @results = gethostbyname( $NONEXISTENT ); if ( @results ) { my ($name,$aliases,$addrtype,$length,@addrs) = @results; my $ip = join( '.', unpack('C4',$addrs[0]) ); plan skip_all => "Your ISP is overly helpful and returns $ip for non-existent domain $NONEXISTENT. This test cannot be run."; } my $bad_url = "http://$NONEXISTENT/"; plan tests => 15; require_ok( 'WWW::Mechanize' ); my $server = LocalServer->spawn; isa_ok( $server, 'LocalServer' ); my $mech = WWW::Mechanize->new( autocheck => 0 ); isa_ok( $mech, 'WWW::Mechanize', 'Created object' ); GOOD_PAGE: { my $response = $mech->get($server->url); isa_ok( $response, 'HTTP::Response' ); ok( $response->is_success, 'Success' ); ok( $mech->success, 'Get webpage' ); ok( $mech->is_html, 'It\'s HTML' ); is( $mech->title, 'WWW::Mechanize test page', 'Correct title' ); my @links = $mech->links; is( scalar @links, 10, '10 links, please' ); my @forms = $mech->forms; is( scalar @forms, 2, 'Two form' ); } BAD_PAGE: { my $bad_url = "http://$NONEXISTENT/"; $mech->get( $bad_url ); ok( !$mech->success, 'Failed the fetch' ); ok( !$mech->is_html, "Isn't HTML" ); ok( !defined $mech->title, "No title" ); my @links = $mech->links; is( scalar @links, 0, "No links" ); my @forms = $mech->forms; is( scalar @forms, 0, "No forms" ); } WWW-Mechanize-1.73/t/local/follow.t000644 000767 000024 00000004372 12206031123 017325 0ustar00etherstaff000000 000000 #!perl use warnings; use strict; use Test::More tests => 28; use lib 't/local'; use LocalServer; use encoding 'iso-8859-1'; BEGIN { delete @ENV{ qw( IFS CDPATH ENV BASH_ENV ) }; use_ok( 'WWW::Mechanize' ); } my $server = LocalServer->spawn; isa_ok( $server, 'LocalServer' ); my $agent = WWW::Mechanize->new( autocheck => 0 ); isa_ok( $agent, 'WWW::Mechanize', 'Created object' ); $agent->quiet(1); my $response; $agent->get( $server->url ); ok( $agent->success, 'Got some page' ); is( $agent->uri, $server->url, 'Got local server page' ); $response = $agent->follow_link( n => 99999 ); ok( !$response, q{Can't follow too-high-numbered link}); $response = $agent->follow_link( n => 1 ); isa_ok( $response, 'HTTP::Response', 'Gives a response' ); isnt( $agent->uri, $server->url, 'Need to be on a separate page' ); ok($agent->back(), 'Can go back'); is( $agent->uri, $server->url, 'Back at the first page' ); ok(! $agent->follow_link( text_regex => qr/asdfghjksdfghj/ ), "Can't follow unlikely named link"); ok($agent->follow_link( text => 'Link /foo' ), 'Can follow obvious named link'); isnt( $agent->uri, $server->url, 'Need to be on a separate page' ); ok($agent->back(), 'Can still go back'); ok($agent->follow_link( text_regex=>qr/Lschen/ ), 'Can follow link with o-umlaut'); isnt( $agent->uri, $server->url, 'Need to be on a separate page' ); ok($agent->back(), 'Can still go back'); ok($agent->follow_link( text_regex=>qr/Stsberg/ ), q{Can follow link with o-umlaut, when it's encoded in the HTML, but not in "follow"}); isnt( $agent->uri, $server->url, 'Need to be on a separate page' ); ok($agent->back(), 'Can still go back'); is( $agent->uri, $server->url, 'Back at the start page again' ); $response = $agent->follow_link( text_regex => qr/Snargle/ ); ok( !$response, q{Couldn't find it} ); ok($agent->follow_link( url => '/foo' ), 'can follow url'); isnt( $agent->uri, $server->url, 'Need to be on a separate page' ); ok($agent->back(), 'Can still go back'); ok(!$agent->follow_link( url => '/notfoo' ), "can't follow wrong url"); is( $agent->uri, $server->url, 'Needs to be on the same page' ); eval {$agent->follow_link( '/foo' )}; like($@, qr/Needs to get key-value pairs of parameters.*follow\.t/, "Invalid parameter passing gets better error message"); WWW-Mechanize-1.73/t/local/form.t000644 000767 000024 00000002211 12026436042 016765 0ustar00etherstaff000000 000000 #!perl use warnings; use strict; use Test::More tests => 13; use lib 't/local'; use LocalServer; BEGIN { delete @ENV{ qw( IFS CDPATH ENV BASH_ENV ) }; use_ok( 'WWW::Mechanize' ); } my $server = LocalServer->spawn; isa_ok( $server, 'LocalServer' ); my $mech = WWW::Mechanize->new(); isa_ok( $mech, 'WWW::Mechanize' ) or die; $mech->quiet(1); $mech->get($server->url); ok( $mech->success, 'Got a page' ) or die; is( $mech->uri, $server->url, 'Got page' ); my $form_number_1 = $mech->form_number(1); isa_ok( $form_number_1, 'HTML::Form', 'Can select the first form'); is( $mech->current_form(), $mech->{forms}->[0], 'Set the form attribute' ); ok( !$mech->form_number(99), 'cannot select the 99th form'); is( $mech->current_form(), $mech->{forms}->[0], 'Form is still set to 1' ); my $form_name_f = $mech->form_name('f'); isa_ok( $form_name_f, 'HTML::Form', 'Can select the form' ); ok( !$mech->form_name('bargle-snark'), 'cannot select non-existent form' ); my $form_id_pounder = $mech->form_id('pounder'); isa_ok( $form_id_pounder, 'HTML::Form', 'Can select the form' ); ok( !$mech->form_id('bargle-snark'), 'cannot select non-existent form' ); WWW-Mechanize-1.73/t/local/get.t000644 000767 000024 00000004553 12026436042 016614 0ustar00etherstaff000000 000000 #!perl use warnings; use strict; use Test::More tests => 32; use lib qw( t t/local ); use LocalServer; BEGIN { use Tools; } BEGIN { delete @ENV{ qw( IFS CDPATH ENV BASH_ENV ) }; use_ok( 'WWW::Mechanize' ); } my $server = LocalServer->spawn; isa_ok( $server, 'LocalServer' ); my $agent = WWW::Mechanize->new; isa_ok( $agent, 'WWW::Mechanize', 'Created object' ); my $response = $agent->get($server->url); isa_ok( $response, 'HTTP::Response' ); isa_ok( $agent->response, 'HTTP::Response' ); ok( $response->is_success, 'Page read OK' ); ok( $agent->success, "Get webpage" ); is( $agent->ct, "text/html", "Got the content-type..." ); ok( $agent->is_html, "... and the is_html wrapper" ); is( $agent->title, 'WWW::Mechanize test page', 'Titles match' ); $agent->get( '/foo/' ); ok( $agent->success, 'Got the /foo' ); is( $agent->uri, sprintf('%sfoo/',$server->url), 'Got relative OK' ); ok( $agent->is_html,'Got HTML back' ); is( $agent->title, 'WWW::Mechanize test page', 'Got the right page' ); $agent->get( '../bar/' ); ok( $agent->success, 'Got the /bar page' ); is( $agent->uri, sprintf('%sbar/',$server->url), 'Got relative OK' ); ok( $agent->is_html, 'is HTML' ); is( $agent->title, 'WWW::Mechanize test page', 'Got the right page' ); $agent->get( 'basics.html' ); ok( $agent->success, 'Got the basics page' ); is( $agent->uri, sprintf('%sbar/basics.html',$server->url), 'Got relative OK' ); ok( $agent->is_html, 'is HTML' ); is( $agent->title, 'WWW::Mechanize test page', 'Title matches' ); like( $agent->content, qr/WWW::Mechanize test page/, 'Got the right page' ); $agent->get( './refinesearch.html' ); ok( $agent->success, 'Got the "refine search" page' ); is( $agent->uri, sprintf('%sbar/refinesearch.html',$server->url), 'Got relative OK' ); ok( $agent->is_html, 'is HTML' ); is( $agent->title, 'WWW::Mechanize test page', 'Title matches' ); like( $agent->content, qr/WWW::Mechanize test page/, 'Got the right page' ); my $rslength = do {use bytes; length $agent->content}; my $tempfile = './temp'; unlink $tempfile; ok( !-e $tempfile, 'tempfile not there right now' ); $agent->get( './refinesearch.html', ':content_file'=>$tempfile ); ok( -e $tempfile, 'File exists' ); is( -s $tempfile, $rslength, 'Did all the bytes get saved?' ); unlink $tempfile; SKIP: { skip 'Test::Memory::Cycle not installed', 1 unless $canTMC; memory_cycle_ok( $agent, 'Mech: no cycles' ); } WWW-Mechanize-1.73/t/local/LocalServer.pm000644 000767 000024 00000014626 12026436042 020431 0ustar00etherstaff000000 000000 package LocalServer; # start a fake webserver, fork, and connect to ourselves use warnings; use strict; # this has to happen here because LWP::Simple creates a $ua # on load so any time after this is too late. BEGIN { delete @ENV{qw( HTTP_PROXY http_proxy CGI_HTTP_PROXY HTTPS_PROXY https_proxy HTTP_PROXY_ALL http_proxy_all )}; } use LWP::Simple; use FindBin; use File::Spec; use File::Temp; use URI::URL qw(); use Carp qw(carp croak); use vars qw($VERSION); $VERSION = '0.55'; =head1 SYNOPSIS use LWP::Simple qw(get); my $server = Test::HTTP::LocalServer->spawn; ok get $server->url, "Retrieve " . $server->url; $server->stop; =head1 METHODS =head2 Cspawn %ARGS> This spawns a new HTTP server. The server will stay running until C<< $server->stop >> is called. Valid arguments are: =over 4 =item * C<< html => >> scalar containing the page to be served =item * C<< file => >> filename containing the page to be served =item * C<< debug => 1 >> to make the spawned server output debug information =item * C<< eval => >> string that will get evaluated per request in the server Try to avoid characters that are special to the shell, especially quotes. A good idea for a slow server would be eval => sleep+10 =back All served HTML will have the first %s replaced by the current location. The following entries will be removed from C<%ENV>: HTTP_PROXY http_proxy CGI_HTTP_PROXY HTTPS_PROXY https_proxy HTTP_PROXY_ALL http_proxy_all =cut sub spawn { my ($class,%args) = @_; my $self = { %args }; bless $self,$class; local $ENV{TEST_HTTP_VERBOSE}; $ENV{TEST_HTTP_VERBOSE} = 1 if (delete $args{debug}); $self->{delete} = []; if (my $html = delete $args{html}) { # write the html to a temp file my ($fh,$tempfile) = File::Temp::tempfile(); binmode $fh; print $fh $html or die "Couldn't write tempfile $tempfile : $!"; close $fh; push @{$self->{delete}},$tempfile; $args{file} = $tempfile; }; my ($fh,$logfile) = File::Temp::tempfile(); close $fh; push @{$self->{delete}},$logfile; $self->{logfile} = $logfile; my $web_page = delete $args{file} || ""; my $server_file = File::Spec->catfile( $FindBin::Bin,'log-server' ); my @opts; push @opts, "-e" => qq{"} . delete($args{ eval }) . qq{"} if $args{ eval }; my $pid = open my $server, qq'$^X "$server_file" "$web_page" "$logfile" @opts|' or croak "Couldn't spawn local server $server_file : $!"; my $url = <$server>; chomp $url; die "Couldn't read back local server url" unless $url; # What is this code supposed to fix? my $lhurl = URI::URL->new( $url ); $lhurl->host( 'localhost' ); $self->{_server_url} = $lhurl; $self->{_fh} = $server; $self->{_pid} = $pid; $self; }; =head2 C<< $server->port >> This returns the port of the current server. As new instances will most likely run under a different port, this is convenient if you need to compare results from two runs. =cut sub port { carp __PACKAGE__ . '::port called without a server' unless $_[0]->{_server_url}; $_[0]->{_server_url}->port }; =head2 C<< $server->url >> This returns the url where you can contact the server. This url is valid until the C<$server> goes out of scope or you call C<< $server->stop >> or C<< $server->get_log >>. =cut sub url { $_[0]->{_server_url}->abs->as_string }; =head2 C<< $server->stop >> This stops the server process by requesting a special url. =cut sub stop { my ($self) = @_; get( $self->quit_server ); undef $self->{_server_url}; if ( $self->{_fh} ) { close $self->{_fh}; delete $self->{_fh}; } }; =head2 C<< $server->kill >> This kills the server process via C. The log cannot be retrieved then. =cut sub kill { CORE::kill( 9 => $_[0]->{ _pid } ); undef $_[0]->{_server_url}; undef $_[0]->{_pid}; }; =head2 C<< $server->get_log >> This stops the server by calling C and then returns the output of the server process. This output will be a list of all requests made to the server concatenated together as a string. =cut sub get_log { my ($self) = @_; my $log = get( $self->get_server_log ); $self->stop; return $log; }; sub DESTROY { $_[0]->stop if $_[0]->{_server_url}; for my $file (@{$_[0]->{delete}}) { unlink $file or warn "Couldn't remove tempfile $file : $!\n"; }; }; =head1 URLs implemented by the server =head2 302 redirect C<< $server->redirect($target) >> This URL will issue a redirect to C<$target>. No special care is taken towards URL-decoding C<$target> as not to complicate the server code. You need to be wary about issuing requests with escaped URL parameters. =head2 404 error C<< $server->error_notfound($target) >> This URL will response with status code 404. =head2 Timeout C<< $server->error_timeout($seconds) >> This URL will send a 599 error after C<$seconds> seconds. =head2 Timeout+close C<< $server->error_close($seconds) >> This URL will send nothing and close the connection after C<$seconds> seconds. =head2 Error in response content C<< $server->error_after_headers >> This URL will send headers for a successfull response but will close the socket with an error after 2 blocks of 16 spaces have been sent. =head2 Chunked response C<< $server->chunked >> This URL will return 5 blocks of 16 spaces at a rate of one block per second in a chunked response. =head2 Other URLs All other URLs will echo back the cookies and query parameters. =cut use vars qw(%urls); %urls = ( 'quit_server' => 'quit_server', 'get_server_log' => 'get_server_log', 'redirect' => 'redirect/%s', 'error_notfound' => 'error/notfound/%s', 'error_timeout' => 'error/timeout/%s', 'error_close' => 'error/close/%s', 'error_after_headers' => 'error/after_headers', 'chunked' => 'chunks', ); for (keys %urls) { no strict 'refs'; my $name = $_; *{ $name } = sub { my $self = shift; $self->url . sprintf $urls{ $name }, @_; }; }; =head1 EXPORT None by default. =head1 COPYRIGHT AND LICENSE This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. Copyright (C) 2003-2011 Max Maischein =head1 AUTHOR Max Maischein, Ecorion@cpan.orgE Please contact me if you find bugs or otherwise improve the module. More tests are also very welcome ! =head1 SEE ALSO L,L,L =cut 1; WWW-Mechanize-1.73/t/local/log-server000755 000767 000024 00000014421 12026436042 017656 0ustar00etherstaff000000 000000 # vi: ft=perl # Thanks to merlyn for nudging me and giving me this snippet! use strict; use HTTP::Daemon; use CGI; use encoding 'iso-8859-1'; use Getopt::Long; use vars qw($VERSION); $VERSION = '0.55'; $|++; GetOptions( 'e=s' => \my $expression, ); my $host = 'localhost'; my $d = HTTP::Daemon->new( LocalAddr => $host, ) or die; # HTTP::Deamon doesn't return http://localhost:.../ # for LocalAddr => 'localhost'. This causes the # tests to fail of many machines. ( my $url = URI->new($d->url) )->host($host); print "$url\n"; my ($filename,$logfile) = @ARGV[0,1]; if ($filename) { open DATA, "< $filename" or die "Couldn't read page '$filename' : $!\n"; }; #open LOG, ">", $logfile # or die "Couldn't create logfile '$logfile' : $!\n"; my $log; binmode DATA,':encoding(iso-8859-1)'; my $body = join "", ; sub debug($) { my $message = $_[0]; $message =~ s!\n!\n#SERVER:!g; warn "#SERVER: $message" if $ENV{TEST_HTTP_VERBOSE}; }; SERVERLOOP: { my $quitserver; while (my $c = $d->accept) { debug "New connection"; while (my $r = $c->get_request) { debug "Request:\n" . $r->as_string; my $location = ($r->uri->path || "/"); my ($link1,$link2) = ('',''); if ($location =~ m!^/link/([^/]+)/(.*)$!) { ($link1,$link2) = ($1,$2); }; my $res; if ($location eq '/get_server_log') { $res = HTTP::Response->new(200, "OK", undef, $log); $log = ''; } elsif ( $location eq '/quit_server') { debug "Quitting"; $res = HTTP::Response->new(200, "OK", [Connection => 'close'], "quit"); $quitserver = 1; } else { eval $expression if $expression; warn "eval: $@" if $@; $log .= "Request:\n" . $r->as_string . "\n"; if ($location =~ m!^/redirect/(.*)$!) { $res = HTTP::Response->new(302); $res->header('location', $d->url . $1); } elsif ($location =~ m!^/error/notfound/(.*)$!) { $res = HTTP::Response->new(404, "Not found", [Connection => 'close']); } elsif ($location =~ m!^/error/timeout/(\d+)$!) { sleep $1; $res = HTTP::Response->new(599, "Timeout reached", [Connection => 'close']); } elsif ($location =~ m!^/error/close/(\d+)$!) { sleep $1; $res = undef; } elsif ( $location =~ m!^/chunks!) { my $count = 5; $res = HTTP::Response->new(200, "OK", undef, sub { sleep 1; my $buf = 'x' x 16; return $buf if $count-- > 0; return undef; # done }); } elsif ($location =~ m!^/error/after_headers$!) { my $count = 2; $res = HTTP::Response->new(200, "OK", undef, sub { sleep 1; my $buf = 'x' x 16; return $buf if $count-- > 0; die "Planned error after headers"; }); } elsif ($location =~ m!^/encoding/(.*)!) { my $encoding = $1; $res = HTTP::Response->new( 200, "OK", [ 'Content-Type' => "text/html; charset=$encoding" ], "encoding $encoding" ); } else { my $q = CGI->new($r->uri->query); # Make sticky form fields my ($query,$session,%cat); $query = defined $q->param('query') ? $q->param('query') : "(empty)"; $session = defined $q->param('session') ? $q->param('session') : 1; %cat = map { $_ => 1 } (defined $q->param('cat') ? $q->param('cat') : qw( cat_foo cat_bar )); my @categories = map { $cat{$_} ? "checked" : "" } qw( cat_foo cat_bar cat_baz ); (my $h = $r->headers->{host}) =~ s/:\d+//; my $rbody = sprintf $body,$location,$session,$query,@categories; $res = HTTP::Response->new(200, "OK", [ "Set-Cookie" => $q->cookie(-name => 'log-server',-value=>'shazam2', -discard=>1,), 'Cache-Control' => 'no-cache', 'Pragma' => 'no-cache', 'Max-Age' => 0, 'Connection' => 'close', 'Content-Length' => length($rbody), ], $rbody); $res->content_type( $q->param('xml') ? 'application/xhtml+xml' : 'text/html' ); debug "Request " . ($r->uri->path || "/"); }; }; debug "Response:\n" . $res->as_string if $res; eval { $c->send_response($res) if $res; }; if (my $err = $@) { debug "Server raised error: $err"; if ($err !~ /^Planned error\b/) { warn $err; }; $c->close; }; if (! $res) { $c->close; }; last if $quitserver; } sleep 1; undef($c); last SERVERLOOP if $quitserver; } }; END { debug "Server stopped" }; __DATA__ WWW::Mechanize test page

Location: %s

Link /test Link /foo Link / /Link /Link in slashes/ Link foo1.save_log_server_test.tmp Link foo2.save_log_server_test.tmp Link foo3.save_log_server_test.tmp Lschen -- testing for o-umlaut. Stösberg -- testing for encoded o-umlaut.
Col1Col2Col3
A1A2A3
B1B2B3
C1C2C3

WWW-Mechanize-1.73/t/local/nonascii.html000644 000767 000024 00000000623 12026436042 020333 0ustar00etherstaff000000 000000 Query Builder
WWW-Mechanize-1.73/t/local/nonascii.t000644 000767 000024 00000001165 12026436042 017634 0ustar00etherstaff000000 000000 #!perl use utf8; use warnings; use strict; use Test::More tests => 5; use lib 't/local'; use LocalServer; BEGIN { delete @ENV{ qw( IFS CDPATH ENV BASH_ENV ) }; use_ok( 'WWW::Mechanize' ); } my $server = LocalServer->spawn( file => 't/local/nonascii.html' ); isa_ok( $server, 'LocalServer' ); my $agent = WWW::Mechanize->new; isa_ok( $agent, 'WWW::Mechanize', 'Created object' ); $agent->quiet(0); $agent->get( $server->url ); ok( $agent->success, 'Got some page' ); # \321\202 is \x{442} $agent->field("ValueOf'CF.{\321\202}'", "\321\201"); is($agent->value("ValueOf'CF.{\321\202}'"), "\321\201", 'set utf value'); WWW-Mechanize-1.73/t/local/overload.t000644 000767 000024 00000004571 12026436042 017650 0ustar00etherstaff000000 000000 #!/usr/bin/perl -w use Test::More skip_all => "Mysteriously stopped passing, and I don't know why."; use warnings; use strict; use lib 't/local'; use LocalServer; use Test::More tests => 11; =head1 NAME overload.t =head1 SYNOPSIS This tests for various ways, advertised in L, to create a subclass of the mech to alter it's behavior in a useful manner. (Of course free-style overloading is discouraged, as it breaks encapsulation big time.) This test first feeds some bad HTML to Mech to make sure that it throws an error. Then, it overloads update_html() to fix the HTML before processing it, and then we should not have an error. =head2 Overloading update_html() This is the recommended way to tidy up the received HTML in a generic way, and/or to install supplemental "surface tests" on the HTML (e.g. link checker). =cut BEGIN { delete @ENV{ qw( IFS CDPATH ENV BASH_ENV ) }; use_ok( 'WWW::Mechanize' ); } my $server = LocalServer->spawn(html => <<'BROKEN_HTML'); Broken document</head> <form> <table> <tr><select name="foo"> <option value="bar">Bar</option></td></tr> </form> </html> BROKEN_HTML isa_ok( $server, 'LocalServer' ); do { package MyMech; use base 'WWW::Mechanize'; sub update_html { my $self = shift; my $html = shift; $html =~ s[Broken][Fixed]isg or die "Couldn't fix the HTML for the test (#1)"; $html =~ s[</option>.{0,3}</td>][</option></select></td>]isg or die "Couldn't fix the HTML for the test (#2)"; $self->WWW::Mechanize::update_html( $html ); } }; my $carpmsg; local $^W = 1; no warnings 'redefine'; local *Carp::carp = sub {$carpmsg = shift}; my $mech = WWW::Mechanize->new(); isa_ok( $mech, 'WWW::Mechanize' ); $mech->get( $server->url ); like($carpmsg, qr{bad.*select}i, 'Standard mech chokes on bogus HTML'); # If at first you don't succeed, try with a shorter bungee... undef $carpmsg; $mech = MyMech->new(); isa_ok( $mech, 'WWW::Mechanize', 'Derived object' ); my $response = $mech->get( $server->url ); isa_ok( $response, 'HTTP::Response', 'Response I got back' ); ok( $response->is_success, 'Got URL' ) or die 'Can\'t even fetch local url'; ok( $mech->is_html, 'Local page is HTML' ); ok( !$carpmsg, 'No warnings this time' ); my @forms = $mech->forms; is( scalar @forms, 1, 'One form' ); like($mech->content(), qr{/select}, 'alteration visible in ->content() too'); ���������������������������������������������������������������������������������������������������������������������������������������WWW-Mechanize-1.73/t/local/page_stack.t�������������������������������������������������������������000644 �000767 �000024 �00000002772 12026436042 020137� 0����������������������������������������������������������������������������������������������������ustar�00ether���������������������������staff���������������������������000000 �000000 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!perl use warnings; use strict; use Test::More tests => 16; use lib 't/local'; use LocalServer; BEGIN { delete @ENV{ qw( IFS CDPATH ENV BASH_ENV ) }; use_ok( 'WWW::Mechanize' ); } my $server = LocalServer->spawn; isa_ok( $server, 'LocalServer' ); STANDARD_STACK: { my $mech = WWW::Mechanize->new; isa_ok( $mech, 'WWW::Mechanize', 'Created object' ); is( scalar @{$mech->{page_stack}}, 0, 'Page stack starts empty' ); ok( $mech->get($server->url)->is_success, 'Got start page' ); is( scalar @{$mech->{page_stack}}, 0, 'Page stack starts empty' ); $mech->_push_page_stack(); is( scalar @{$mech->{page_stack}}, 1, 'Pushed item onto page stack' ); $mech->_push_page_stack(); is( scalar @{$mech->{page_stack}}, 2, 'Pushed item onto page stack' ); $mech->back(); is( scalar @{$mech->{page_stack}}, 1, 'Popped item from page stack' ); $mech->back(); is( scalar @{$mech->{page_stack}}, 0, 'Popped item from page stack' ); $mech->back(); is( scalar @{$mech->{page_stack}}, 0, 'Cannot pop beyond end of page stack' ); } NO_STACK: { my $mech = WWW::Mechanize->new; isa_ok( $mech, 'WWW::Mechanize', 'Created object' ); $mech->stack_depth(0); is( scalar @{$mech->{page_stack}}, 0, 'Page stack starts empty' ); ok( $mech->get($server->url)->is_success, 'Got start page' ); is( scalar @{$mech->{page_stack}}, 0, 'Page stack starts empty' ); $mech->_push_page_stack(); is( scalar @{$mech->{page_stack}}, 0, 'Pushing has no effect' ); } ������WWW-Mechanize-1.73/t/local/referer-server�����������������������������������������������������������000755 �000767 �000024 �00000000765 12026436042 020535� 0����������������������������������������������������������������������������������������������������ustar�00ether���������������������������staff���������������������������000000 �000000 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������# Thanks to merlyn for nudging me and giving me this snippet! use HTTP::Daemon; use URI::URL; $|++; my $d = HTTP::Daemon->new or die; my $lhurl = URI::URL->new( $d->url ); $lhurl->host( "localhost" ); print $lhurl->as_string, "\n"; $counter = 5; while ($counter-- and my $c = $d->accept) { while (my $r = $c->get_request) { my $ref = $r->headers->referer || ""; $c->send_response(HTTP::Response->new(200, "OK", undef, "Referer: '$ref'")); } $c->close; undef($c); } �����������WWW-Mechanize-1.73/t/local/referer.t����������������������������������������������������������������000644 �000767 �000024 �00000004112 12026436042 017456� 0����������������������������������������������������������������������������������������������������ustar�00ether���������������������������staff���������������������������000000 �000000 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl -w use warnings; use strict; use FindBin; use Test::More tests => 13; BEGIN { use lib 't'; use Tools; } BEGIN { delete @ENV{ qw( IFS CDPATH ENV BASH_ENV ) }; use_ok( 'WWW::Mechanize' ); } our $server; my $agent = WWW::Mechanize->new(); isa_ok( $agent, 'WWW::Mechanize' ); SKIP: { eval { require HTTP::Daemon; }; skip 'HTTP::Daemon required to test the referrer header',10 if $@; # We want to be safe from non-resolving local host names delete $ENV{HTTP_PROXY}; # Now start a fake webserver, fork, and connect to ourselves my $command = qq'"$^X" "$FindBin::Bin/referer-server"'; if ($^O eq 'VMS') { $command = qq'mcr $^X t/referer-server'; } open $server, "$command |" or die "Couldn't spawn fake server: $!"; sleep 1; # give the child some time my $url = <$server>; chomp $url; $agent->get( $url ); is($agent->status, 200, 'Got first page') or diag $agent->res->message; is($agent->content, q{Referer: ''}, 'First page gets sent with empty referrer'); $agent->get( $url ); is($agent->status, 200, 'Got second page') or diag $agent->res->message; is($agent->content, "Referer: '$url'", 'Referer got sent for absolute url'); $agent->get( '.' ); is($agent->status, 200, 'Got third page') or diag $agent->res->message; is($agent->content, "Referer: '$url'", 'Referer got sent for relative url'); $agent->add_header( Referer => 'x' ); $agent->get( $url ); is($agent->status, 200, 'Got fourth page') or diag $agent->res->message; is($agent->content, q{Referer: 'x'}, 'Referer can be set to empty again'); my $ref = 'This is not the referer you are looking for *jedi gesture*'; $agent->add_header( Referer => $ref ); $agent->get( $url ); is($agent->status, 200, 'Got fourth page') or diag $agent->res->message; is($agent->content, "Referer: '$ref'", 'Custom referer can be set'); }; SKIP: { skip 'Test::Memory::Cycle not installed', 1 unless $canTMC; memory_cycle_ok( $agent, 'No memory cycles found' ); } END { close $server if $server; } ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������WWW-Mechanize-1.73/t/local/reload.t�����������������������������������������������������������������000644 �000767 �000024 �00000002221 12026436042 017271� 0����������������������������������������������������������������������������������������������������ustar�00ether���������������������������staff���������������������������000000 �000000 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!perl use warnings; use strict; use Test::More tests => 14; use lib qw( t t/local ); use LocalServer; BEGIN { use Tools; } BEGIN { delete @ENV{ qw( IFS CDPATH ENV BASH_ENV ) }; use_ok( 'WWW::Mechanize' ); } my $server = LocalServer->spawn; isa_ok( $server, 'LocalServer' ); my $agent = WWW::Mechanize->new; isa_ok( $agent, 'WWW::Mechanize', 'Created object' ); NO_GET: { my $r = $agent->reload; ok( !defined($r), 'Initial reload should fail' ); } FIRST_GET: { my $r = $agent->get($server->url); isa_ok( $r, 'HTTP::Response' ); ok( $r->is_success, 'Get google webpage'); ok( $agent->is_html, 'Valid HTML' ); is( $agent->title, 'WWW::Mechanize test page' ); } INVALIDATE: { undef $agent->{content}; undef $agent->{ct}; isnt( $agent->title, 'WWW::Mechanize test page' ); ok( !$agent->is_html, 'Not HTML' ); } RELOAD: { my $r = $agent->reload; isa_ok( $r, 'HTTP::Response' ); ok( $agent->is_html, 'Valid HTML' ); ok( $agent->title, 'WWW::Mechanize test page' ); } SKIP: { skip 'Test::Memory::Cycle not installed', 1 unless $canTMC; memory_cycle_ok( $agent, 'Mech: no cycles' ); } �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������WWW-Mechanize-1.73/t/local/submit.t�����������������������������������������������������������������000644 �000767 �000024 �00000002246 12026436042 017335� 0����������������������������������������������������������������������������������������������������ustar�00ether���������������������������staff���������������������������000000 �000000 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!perl -w use warnings; use strict; use lib qw( t t/local ); use Test::More tests => 13; use LocalServer; BEGIN { use Tools; } BEGIN { delete @ENV{ qw( IFS CDPATH ENV BASH_ENV ) }; use_ok( 'WWW::Mechanize' ); } my $server = LocalServer->spawn; isa_ok( $server, 'LocalServer' ); my $mech = WWW::Mechanize->new(); isa_ok( $mech, 'WWW::Mechanize', 'Created the object' ) or die; my $response = $mech->get( $server->url ); isa_ok( $response, 'HTTP::Response', 'Got back a response' ) or die; is( $mech->uri, $server->url, 'Got the correct page' ); ok( $response->is_success, 'Got local page' ) or die 'cannot even fetch local page'; ok( $mech->is_html, 'is HTML' ); is( $mech->value('upload'), '', 'Hopefully no upload happens'); $mech->field(query => 'foo'); # Filled the 'q' field $response = $mech->submit; isa_ok( $response, 'HTTP::Response', 'Got back a response' ); ok( $response->is_success, 'Can click "submit" ("submit" button)'); like($mech->content, qr/\bfoo\b/i, 'Found "Foo"'); is( $mech->value('upload'), '', 'No upload happens' ); SKIP: { skip 'Test::Memory::Cycle not installed', 1 unless $canTMC; memory_cycle_ok( $mech, 'Mech: no cycles' ); } ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������WWW-Mechanize-1.73/t/live/encoding.t����������������������������������������������������������������000644 �000767 �000024 �00000001343 12026436042 017462� 0����������������������������������������������������������������������������������������������������ustar�00ether���������������������������staff���������������������������000000 �000000 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl -w use strict; use warnings; use constant PAIRS => { 'http://delicious.com/' => 'utf-8', 'http://www.liveinternet.ru/users/dashdi/blog' => '(?:cp|windows-)1251', }; use Test::More tests => (4 * keys %{+PAIRS}) + 1; use Encode; BEGIN { use_ok( 'WWW::Mechanize' ); } my %pairs = %{+PAIRS}; for my $url ( sort keys %pairs ) { my $want_encoding = $pairs{$url}; my $mech = WWW::Mechanize->new; isa_ok( $mech, 'WWW::Mechanize' ); $mech->get( $url ); is( $mech->response->code, 200, "Fetched $url" ); like( $mech->res->content_charset, qr/$want_encoding/i, " ... Got encoding $want_encoding" ); ok( Encode::is_utf8( $mech->content ), 'Got back UTF-8' ); } ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������WWW-Mechanize-1.73/t/live/wikipedia.t���������������������������������������������������������������000644 �000767 �000024 �00000001367 12026436042 017650� 0����������������������������������������������������������������������������������������������������ustar�00ether���������������������������staff���������������������������000000 �000000 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!perl -T use warnings; use strict; use constant LANGUAGES => qw( en it ja es nl pl ); use Test::More tests => 3 + (2 * scalar LANGUAGES); use lib 't'; BEGIN { use Tools; } BEGIN { use_ok( 'WWW::Mechanize' ); } my $mech = WWW::Mechanize->new; isa_ok( $mech, 'WWW::Mechanize', 'Created object' ); $mech->agent_alias( 'Windows IE 6' ); # Wikipedia 403s out obvious bots for my $lang ( LANGUAGES ) { my $start = "http://$lang.wikipedia.org/"; $mech->get( $start ); ok( $mech->success, "Got $start" ); my @links = $mech->links(); cmp_ok( scalar @links, '>', 50, "Over 50 links on $start" ); } SKIP: { skip 'Test::Memory::Cycle not installed', 1 unless $canTMC; memory_cycle_ok( $mech, 'No memory cycles found' ); } �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������WWW-Mechanize-1.73/lib/WWW/�������������������������������������������������������������������������000755 �000767 �000024 �00000000000 12206033601 015530� 5����������������������������������������������������������������������������������������������������ustar�00ether���������������������������staff���������������������������000000 �000000 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������WWW-Mechanize-1.73/lib/WWW/Mechanize/���������������������������������������������������������������000755 �000767 �000024 �00000000000 12206033601 017433� 5����������������������������������������������������������������������������������������������������ustar�00ether���������������������������staff���������������������������000000 �000000 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������WWW-Mechanize-1.73/lib/WWW/Mechanize.pm�������������������������������������������������������������000644 �000767 �000024 �00000240637 12206032553 020012� 0����������������������������������������������������������������������������������������������������ustar�00ether���������������������������staff���������������������������000000 �000000 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������package WWW::Mechanize; =head1 NAME WWW::Mechanize - Handy web browsing in a Perl object =head1 VERSION Version 1.73 =cut our $VERSION = '1.73'; =head1 SYNOPSIS C<WWW::Mechanize>, or Mech for short, is a Perl module for stateful programmatic web browsing, used for automating interaction with websites. Features include: =over 4 =item * All HTTP methods =item * High-level hyperlink and HTML form support, without having to parse HTML yourself =item * SSL support =item * Automatic cookies =item * Custom HTTP headers =item * Automatic handling of redirections =item * Proxies =item * HTTP authentication =back Mech supports performing a sequence of page fetches including following links and submitting forms. Each fetched page is parsed and its links and forms are extracted. A link or a form can be selected, form fields can be filled and the next page can be fetched. Mech also stores a history of the URLs you've visited, which can be queried and revisited. use WWW::Mechanize; my $mech = WWW::Mechanize->new(); $mech->get( $url ); $mech->follow_link( n => 3 ); $mech->follow_link( text_regex => qr/download this/i ); $mech->follow_link( url => 'http://host.com/index.html' ); $mech->submit_form( form_number => 3, fields => { username => 'mungo', password => 'lost-and-alone', } ); $mech->submit_form( form_name => 'search', fields => { query => 'pot of gold', }, button => 'Search Now' ); Mech is well suited for use in testing web applications. If you use one of the Test::*, like L<Test::HTML::Lint> modules, you can check the fetched content and use that as input to a test call. use Test::More; like( $mech->content(), qr/$expected/, "Got expected content" ); Each page fetch stores its URL in a history stack which you can traverse. $mech->back(); If you want finer control over your page fetching, you can use these methods. C<follow_link> and C<submit_form> are just high level wrappers around them. $mech->find_link( n => $number ); $mech->form_number( $number ); $mech->form_name( $name ); $mech->field( $name, $value ); $mech->set_fields( %field_values ); $mech->set_visible( @criteria ); $mech->click( $button ); L<WWW::Mechanize> is a proper subclass of L<LWP::UserAgent> and you can also use any of L<LWP::UserAgent>'s methods. $mech->add_header($name => $value); Please note that Mech does NOT support JavaScript, you need additional software for that. Please check L<WWW::Mechanize::FAQ/"JavaScript"> for more. =head1 IMPORTANT LINKS =over 4 =item * L<http://code.google.com/p/www-mechanize/issues/list> The queue for bugs & enhancements in WWW::Mechanize and Test::WWW::Mechanize. Please note that the queue at L<http://rt.cpan.org> is no longer maintained. =item * L<http://search.cpan.org/dist/WWW-Mechanize/> The CPAN documentation page for Mechanize. =item * L<http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize/FAQ.pod> Frequently asked questions. Make sure you read here FIRST. =back =cut use strict; use warnings; use HTTP::Request 1.30; use LWP::UserAgent 5.827; use HTML::Form 1.00; use HTML::TokeParser; use base 'LWP::UserAgent'; our $HAS_ZLIB; BEGIN { $HAS_ZLIB = eval 'use Compress::Zlib (); 1;'; } =head1 CONSTRUCTOR AND STARTUP =head2 new() Creates and returns a new WWW::Mechanize object, hereafter referred to as the "agent". my $mech = WWW::Mechanize->new() The constructor for WWW::Mechanize overrides two of the parms to the LWP::UserAgent constructor: agent => 'WWW-Mechanize/#.##' cookie_jar => {} # an empty, memory-only HTTP::Cookies object You can override these overrides by passing parms to the constructor, as in: my $mech = WWW::Mechanize->new( agent => 'wonderbot 1.01' ); If you want none of the overhead of a cookie jar, or don't want your bot accepting cookies, you have to explicitly disallow it, like so: my $mech = WWW::Mechanize->new( cookie_jar => undef ); Here are the parms that WWW::Mechanize recognizes. These do not include parms that L<LWP::UserAgent> recognizes. =over 4 =item * C<< autocheck => [0|1] >> Checks each request made to see if it was successful. This saves you the trouble of manually checking yourself. Any errors found are errors, not warnings. The default value is ON, unless it's being subclassed, in which case it is OFF. This means that standalone L<WWW::Mechanize>instances have autocheck turned on, which is protective for the vast majority of Mech users who don't bother checking the return value of get() and post() and can't figure why their code fails. However, if L<WWW::Mechanize> is subclassed, such as for L<Test::WWW::Mechanize> or L<Test::WWW::Mechanize::Catalyst>, this may not be an appropriate default, so it's off. =item * C<< noproxy => [0|1] >> Turn off the automatic call to the L<LWP::UserAgent> C<env_proxy> function. This needs to be explicitly turned off if you're using L<Crypt::SSLeay> to access a https site via a proxy server. Note: you still need to set your HTTPS_PROXY environment variable as appropriate. =item * C<< onwarn => \&func >> Reference to a C<warn>-compatible function, such as C<< L<Carp>::carp >>, that is called when a warning needs to be shown. If this is set to C<undef>, no warnings will ever be shown. However, it's probably better to use the C<quiet> method to control that behavior. If this value is not passed, Mech uses C<Carp::carp> if L<Carp> is installed, or C<CORE::warn> if not. =item * C<< onerror => \&func >> Reference to a C<die>-compatible function, such as C<< L<Carp>::croak >>, that is called when there's a fatal error. If this is set to C<undef>, no errors will ever be shown. If this value is not passed, Mech uses C<Carp::croak> if L<Carp> is installed, or C<CORE::die> if not. =item * C<< quiet => [0|1] >> Don't complain on warnings. Setting C<< quiet => 1 >> is the same as calling C<< $mech->quiet(1) >>. Default is off. =item * C<< stack_depth => $value >> Sets the depth of the page stack that keeps track of all the downloaded pages. Default is effectively infinite stack size. If the stack is eating up your memory, then set this to a smaller number, say 5 or 10. Setting this to zero means Mech will keep no history. =back To support forms, WWW::Mechanize's constructor pushes POST on to the agent's C<requests_redirectable> list (see also L<LWP::UserAgent>.) =cut sub new { my $class = shift; my %parent_parms = ( agent => "WWW-Mechanize/$VERSION", cookie_jar => {}, ); my %mech_parms = ( autocheck => ($class eq 'WWW::Mechanize' ? 1 : 0), onwarn => \&WWW::Mechanize::_warn, onerror => \&WWW::Mechanize::_die, quiet => 0, stack_depth => 8675309, # Arbitrarily humongous stack headers => {}, noproxy => 0, ); my %passed_parms = @_; # Keep the mech-specific parms before creating the object. while ( my($key,$value) = each %passed_parms ) { if ( exists $mech_parms{$key} ) { $mech_parms{$key} = $value; } else { $parent_parms{$key} = $value; } } my $self = $class->SUPER::new( %parent_parms ); bless $self, $class; # Use the mech parms now that we have a mech object. for my $parm ( keys %mech_parms ) { $self->{$parm} = $mech_parms{$parm}; } $self->{page_stack} = []; $self->env_proxy() unless $mech_parms{noproxy}; # libwww-perl 5.800 (and before, I assume) has a problem where # $ua->{proxy} can be undef and clone() doesn't handle it. $self->{proxy} = {} unless defined $self->{proxy}; push( @{$self->requests_redirectable}, 'POST' ); $self->_reset_page(); return $self; } =head2 $mech->agent_alias( $alias ) Sets the user agent string to the expanded version from a table of actual user strings. I<$alias> can be one of the following: =over 4 =item * Windows IE 6 =item * Windows Mozilla =item * Mac Safari =item * Mac Mozilla =item * Linux Mozilla =item * Linux Konqueror =back then it will be replaced with a more interesting one. For instance, $mech->agent_alias( 'Windows IE 6' ); sets your User-Agent to Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) The list of valid aliases can be returned from C<known_agent_aliases()>. The current list is: =over =item * Windows IE 6 =item * Windows Mozilla =item * Mac Safari =item * Mac Mozilla =item * Linux Mozilla =item * Linux Konqueror =back =cut my %known_agents = ( 'Windows IE 6' => 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)', 'Windows Mozilla' => 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6', 'Mac Safari' => 'Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/85 (KHTML, like Gecko) Safari/85', 'Mac Mozilla' => 'Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4a) Gecko/20030401', 'Linux Mozilla' => 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624', 'Linux Konqueror' => 'Mozilla/5.0 (compatible; Konqueror/3; Linux)', ); sub agent_alias { my $self = shift; my $alias = shift; if ( defined $known_agents{$alias} ) { return $self->agent( $known_agents{$alias} ); } else { $self->warn( qq{Unknown agent alias "$alias"} ); return $self->agent(); } } =head2 known_agent_aliases() Returns a list of all the agent aliases that Mech knows about. =cut sub known_agent_aliases { return sort keys %known_agents; } =head1 PAGE-FETCHING METHODS =head2 $mech->get( $uri ) Given a URL/URI, fetches it. Returns an L<HTTP::Response> object. I<$uri> can be a well-formed URL string, a L<URI> object, or a L<WWW::Mechanize::Link> object. The results are stored internally in the agent object, but you don't know that. Just use the accessors listed below. Poking at the internals is deprecated and subject to change in the future. C<get()> is a well-behaved overloaded version of the method in L<LWP::UserAgent>. This lets you do things like $mech->get( $uri, ':content_file' => $tempfile ); and you can rest assured that the parms will get filtered down appropriately. B<NOTE:> Because C<:content_file> causes the page contents to be stored in a file instead of the response object, some Mech functions that expect it to be there won't work as expected. Use with caution. =cut sub get { my $self = shift; my $uri = shift; $uri = $uri->url if ref($uri) eq 'WWW::Mechanize::Link'; $uri = $self->base ? URI->new_abs( $uri, $self->base ) : URI->new( $uri ); # It appears we are returning a super-class method, # but it in turn calls the request() method here in Mechanize return $self->SUPER::get( $uri->as_string, @_ ); } =head2 $mech->put( $uri, content => $content ) PUTs I<$content> to $uri. Returns an L<HTTP::Response> object. I<$uri> can be a well-formed URI string, a L<URI> object, or a L<WWW::Mechanize::Link> object. =cut sub put { my $self = shift; my $uri = shift; $uri = $uri->url if ref($uri) eq 'WWW::Mechanize::Link'; $uri = $self->base ? URI->new_abs( $uri, $self->base ) : URI->new( $uri ); # It appears we are returning a super-class method, # but it in turn calls the request() method here in Mechanize return $self->_SUPER_put( $uri->as_string, @_ ); } # Added until LWP::UserAgent has it. sub _SUPER_put { require HTTP::Request::Common; my($self, @parameters) = @_; my @suff = $self->_process_colonic_headers(\@parameters,1); return $self->request( HTTP::Request::Common::PUT( @parameters ), @suff ); } =head2 $mech->reload() Acts like the reload button in a browser: repeats the current request. The history (as per the L</back> method) is not altered. Returns the L<HTTP::Response> object from the reload, or C<undef> if there's no current request. =cut sub reload { my $self = shift; return unless my $req = $self->{req}; return $self->_update_page( $req, $self->_make_request( $req, @_ ) ); } =head2 $mech->back() The equivalent of hitting the "back" button in a browser. Returns to the previous page. Won't go back past the first page. (Really, what would it do if it could?) Returns true if it could go back, or false if not. =cut sub back { my $self = shift; my $stack = $self->{page_stack}; return unless $stack && @{$stack}; my $popped = pop @{$self->{page_stack}}; my $req = $popped->{req}; my $res = $popped->{res}; $self->_update_page( $req, $res ); return 1; } =head1 STATUS METHODS =head2 $mech->success() Returns a boolean telling whether the last request was successful. If there hasn't been an operation yet, returns false. This is a convenience function that wraps C<< $mech->res->is_success >>. =cut sub success { my $self = shift; return $self->res && $self->res->is_success; } =head2 $mech->uri() Returns the current URI as a L<URI> object. This object stringifies to the URI itself. =head2 $mech->response() / $mech->res() Return the current response as an L<HTTP::Response> object. Synonym for C<< $mech->response() >> =head2 $mech->status() Returns the HTTP status code of the response. This is a 3-digit number like 200 for OK, 404 for not found, and so on. =head2 $mech->ct() / $mech->content_type() Returns the content type of the response. =head2 $mech->base() Returns the base URI for the current response =head2 $mech->forms() When called in a list context, returns a list of the forms found in the last fetched page. In a scalar context, returns a reference to an array with those forms. The forms returned are all L<HTML::Form> objects. =head2 $mech->current_form() Returns the current form as an L<HTML::Form> object. =head2 $mech->links() When called in a list context, returns a list of the links found in the last fetched page. In a scalar context it returns a reference to an array with those links. Each link is a L<WWW::Mechanize::Link> object. =head2 $mech->is_html() Returns true/false on whether our content is HTML, according to the HTTP headers. =cut sub uri { my $self = shift; return $self->response->request->uri; } sub res { my $self = shift; return $self->{res}; } sub response { my $self = shift; return $self->{res}; } sub status { my $self = shift; return $self->{status}; } sub ct { my $self = shift; return $self->{ct}; } sub content_type { my $self = shift; return $self->{ct}; } sub base { my $self = shift; return $self->{base}; } sub is_html { my $self = shift; return defined $self->ct && ($self->ct eq 'text/html' || $self->ct eq 'application/xhtml+xml'); } =head2 $mech->title() Returns the contents of the C<< <TITLE> >> tag, as parsed by L<HTML::HeadParser>. Returns undef if the content is not HTML. =cut sub title { my $self = shift; return unless $self->is_html; if ( not defined $self->{title} ) { require HTML::HeadParser; my $p = HTML::HeadParser->new; $p->parse($self->content); $self->{title} = $p->header('Title'); } return $self->{title}; } =head1 CONTENT-HANDLING METHODS =head2 $mech->content(...) Returns the content that the mech uses internally for the last page fetched. Ordinarily this is the same as C<< $mech->response()->decoded_content() >>, but this may differ for HTML documents if L</update_html> is overloaded (in which case the value passed to the base-class implementation of same will be returned), and/or extra named arguments are passed to I<content()>: =over 2 =item I<< $mech->content( format => 'text' ) >> Returns a text-only version of the page, with all HTML markup stripped. This feature requires I<HTML::TreeBuilder> to be installed, or a fatal error will be thrown. This works only if the contents are HTML. =item I<< $mech->content( base_href => [$base_href|undef] ) >> Returns the HTML document, modified to contain a C<< <base href="$base_href"> >> mark-up in the header. I<$base_href> is C<< $mech->base() >> if not specified. This is handy to pass the HTML to e.g. L<HTML::Display>. This works only if the contents are HTML. =item I<< $mech->content( raw => 1 ) >> Returns C<< $self->response()->content() >>, i.e. the raw contents from the response. =item I<< $mech->content( decoded_by_headers => 1 ) >> Returns the content after applying all C<Content-Encoding> headers but with not additional mangling. =item I<< $mech->content( charset => $charset ) >> Returns C<< $self->response()->decoded_content(charset => $charset) >> (see L<HTTP::Response> for details). =back To preserve backwards compatibility, additional parameters will be ignored unless none of C<< raw | decoded_by_headers | charset >> is specified and the text is HTML, in which case an error will be triggered. =cut sub content { my $self = shift; my %parms = @_; my $content = $self->{content}; if (delete $parms{raw}) { $content = $self->response()->content(); } elsif (delete $parms{decoded_by_headers}) { $content = $self->response()->decoded_content(charset => 'none'); } elsif (my $charset = delete $parms{charset}) { $content = $self->response()->decoded_content(charset => $charset); } elsif ( $self->is_html ) { if ( exists $parms{base_href} ) { my $base_href = (delete $parms{base_href}) || $self->base; $content=~s/<head>/<head>\n<base href="$base_href">/i; } if ( my $format = delete $parms{format} ) { if ( $format eq 'text' ) { $content = $self->text; } else { $self->die( qq{Unknown "format" parameter "$format"} ); } } $self->_check_unhandled_parms( %parms ); } return $content; } =head2 $mech->text() Returns the text of the current HTML content. If the content isn't HTML, $mech will die. The text is extracted by parsing the content, and then the extracted text is cached, so don't worry about performance of calling this repeatedly. =cut sub text { my $self = shift; if ( not defined $self->{text} ) { require HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new(); $tree->parse( $self->content ); $tree->eof(); $tree->elementify(); # just for safety $self->{text} = $tree->as_text(); $tree->delete; } return $self->{text}; } sub _check_unhandled_parms { my $self = shift; my %parms = @_; for my $cmd ( sort keys %parms ) { $self->die( qq{Unknown named argument "$cmd"} ); } } =head1 LINK METHODS =head2 $mech->links() Lists all the links on the current page. Each link is a WWW::Mechanize::Link object. In list context, returns a list of all links. In scalar context, returns an array reference of all links. =cut sub links { my $self = shift; $self->_extract_links() unless $self->{links}; return @{$self->{links}} if wantarray; return $self->{links}; } =head2 $mech->follow_link(...) Follows a specified link on the page. You specify the match to be found using the same parms that C<L<find_link()>> uses. Here some examples: =over 4 =item * 3rd link called "download" $mech->follow_link( text => 'download', n => 3 ); =item * first link where the URL has "download" in it, regardless of case: $mech->follow_link( url_regex => qr/download/i ); or $mech->follow_link( url_regex => qr/(?i:download)/ ); =item * 3rd link on the page $mech->follow_link( n => 3 ); =item * the link with the url $mech->follow_link( url => '/other/page' ); or $mech->follow_link( url => 'http://example.com/page' ); =back Returns the result of the GET method (an HTTP::Response object) if a link was found. If the page has no links, or the specified link couldn't be found, returns undef. =cut sub follow_link { my $self = shift; $self->die( qq{Needs to get key-value pairs of parameters.} ) if @_ % 2; my %parms = ( n=>1, @_ ); if ( $parms{n} eq 'all' ) { delete $parms{n}; $self->warn( q{follow_link(n=>"all") is not valid} ); } my $link = $self->find_link(%parms); if ( $link ) { return $self->get( $link->url ); } if ( $self->{autocheck} ) { $self->die( 'Link not found' ); } return; } =head2 $mech->find_link( ... ) Finds a link in the currently fetched page. It returns a L<WWW::Mechanize::Link> object which describes the link. (You'll probably be most interested in the C<url()> property.) If it fails to find a link it returns undef. You can take the URL part and pass it to the C<get()> method. If that's your plan, you might as well use the C<follow_link()> method directly, since it does the C<get()> for you automatically. Note that C<< <FRAME SRC="..."> >> tags are parsed out of the the HTML and treated as links so this method works with them. You can select which link to find by passing in one or more of these key/value pairs: =over 4 =item * C<< text => 'string', >> and C<< text_regex => qr/regex/, >> C<text> matches the text of the link against I<string>, which must be an exact match. To select a link with text that is exactly "download", use $mech->find_link( text => 'download' ); C<text_regex> matches the text of the link against I<regex>. To select a link with text that has "download" anywhere in it, regardless of case, use $mech->find_link( text_regex => qr/download/i ); Note that the text extracted from the page's links are trimmed. For example, C<< <a> foo </a> >> is stored as 'foo', and searching for leading or trailing spaces will fail. =item * C<< url => 'string', >> and C<< url_regex => qr/regex/, >> Matches the URL of the link against I<string> or I<regex>, as appropriate. The URL may be a relative URL, like F<foo/bar.html>, depending on how it's coded on the page. =item * C<< url_abs => string >> and C<< url_abs_regex => regex >> Matches the absolute URL of the link against I<string> or I<regex>, as appropriate. The URL will be an absolute URL, even if it's relative in the page. =item * C<< name => string >> and C<< name_regex => regex >> Matches the name of the link against I<string> or I<regex>, as appropriate. =item * C<< id => string >> and C<< id_regex => regex >> Matches the attribute 'id' of the link against I<string> or I<regex>, as appropriate. =item * C<< class => string >> and C<< class_regex => regex >> Matches the attribute 'class' of the link against I<string> or I<regex>, as appropriate. =item * C<< tag => string >> and C<< tag_regex => regex >> Matches the tag that the link came from against I<string> or I<regex>, as appropriate. The C<tag_regex> is probably most useful to check for more than one tag, as in: $mech->find_link( tag_regex => qr/^(a|frame)$/ ); The tags and attributes looked at are defined below, at L<< $mech->find_link() : link format >>. =back If C<n> is not specified, it defaults to 1. Therefore, if you don't specify any parms, this method defaults to finding the first link on the page. Note that you can specify multiple text or URL parameters, which will be ANDed together. For example, to find the first link with text of "News" and with "cnn.com" in the URL, use: $mech->find_link( text => 'News', url_regex => qr/cnn\.com/ ); The return value is a reference to an array containing a L<WWW::Mechanize::Link> object for every link in C<< $self->content >>. The links come from the following: =over 4 =item C<< <a href=...> >> =item C<< <area href=...> >> =item C<< <frame src=...> >> =item C<< <iframe src=...> >> =item C<< <link href=...> >> =item C<< <meta content=...> >> =back =cut sub find_link { my $self = shift; my %parms = ( n=>1, @_ ); my $wantall = ( $parms{n} eq 'all' ); $self->_clean_keys( \%parms, qr/^(n|(text|url|url_abs|name|tag|id|class)(_regex)?)$/ ); my @links = $self->links or return; my $nmatches = 0; my @matches; for my $link ( @links ) { if ( _match_any_link_parms($link,\%parms) ) { if ( $wantall ) { push( @matches, $link ); } else { ++$nmatches; return $link if $nmatches >= $parms{n}; } } } # for @links if ( $wantall ) { return @matches if wantarray; return \@matches; } return; } # find_link # Used by find_links to check for matches # The logic is such that ALL parm criteria that are given must match sub _match_any_link_parms { my $link = shift; my $p = shift; # No conditions, anything matches return 1 unless keys %$p; return if defined $p->{url} && !($link->url eq $p->{url} ); return if defined $p->{url_regex} && !($link->url =~ $p->{url_regex} ); return if defined $p->{url_abs} && !($link->url_abs eq $p->{url_abs} ); return if defined $p->{url_abs_regex} && !($link->url_abs =~ $p->{url_abs_regex} ); return if defined $p->{text} && !(defined($link->text) && $link->text eq $p->{text} ); return if defined $p->{text_regex} && !(defined($link->text) && $link->text =~ $p->{text_regex} ); return if defined $p->{name} && !(defined($link->name) && $link->name eq $p->{name} ); return if defined $p->{name_regex} && !(defined($link->name) && $link->name =~ $p->{name_regex} ); return if defined $p->{tag} && !($link->tag && $link->tag eq $p->{tag} ); return if defined $p->{tag_regex} && !($link->tag && $link->tag =~ $p->{tag_regex} ); return if defined $p->{id} && !($link->attrs->{id} && $link->attrs->{id} eq $p->{id} ); return if defined $p->{id_regex} && !($link->attrs->{id} && $link->attrs->{id} =~ $p->{id_regex} ); return if defined $p->{class} && !($link->attrs->{class} && $link->attrs->{class} eq $p->{class} ); return if defined $p->{class_regex} && !($link->attrs->{class} && $link->attrs->{class} =~ $p->{class_regex} ); # Success: everything that was defined passed. return 1; } # Cleans the %parms parameter for the find_link and find_image methods. sub _clean_keys { my $self = shift; my $parms = shift; my $rx_keyname = shift; for my $key ( keys %$parms ) { my $val = $parms->{$key}; if ( $key !~ qr/$rx_keyname/ ) { $self->warn( qq{Unknown link-finding parameter "$key"} ); delete $parms->{$key}; next; } my $key_regex = ( $key =~ /_regex$/ ); my $val_regex = ( ref($val) eq 'Regexp' ); if ( $key_regex ) { if ( !$val_regex ) { $self->warn( qq{$val passed as $key is not a regex} ); delete $parms->{$key}; next; } } else { if ( $val_regex ) { $self->warn( qq{$val passed as '$key' is a regex} ); delete $parms->{$key}; next; } if ( $val =~ /^\s|\s$/ ) { $self->warn( qq{'$val' is space-padded and cannot succeed} ); delete $parms->{$key}; next; } } } # for keys %parms return; } # _clean_keys() =head2 $mech->find_all_links( ... ) Returns all the links on the current page that match the criteria. The method for specifying link criteria is the same as in C<L</find_link()>>. Each of the links returned is a L<WWW::Mechanize::Link> object. In list context, C<find_all_links()> returns a list of the links. Otherwise, it returns a reference to the list of links. C<find_all_links()> with no parameters returns all links in the page. =cut sub find_all_links { my $self = shift; return $self->find_link( @_, n=>'all' ); } =head2 $mech->find_all_inputs( ... criteria ... ) find_all_inputs() returns an array of all the input controls in the current form whose properties match all of the regexes passed in. The controls returned are all descended from HTML::Form::Input. If no criteria are passed, all inputs will be returned. If there is no current page, there is no form on the current page, or there are no submit controls in the current form then the return will be an empty array. You may use a regex or a literal string: # get all textarea controls whose names begin with "customer" my @customer_text_inputs = $mech->find_all_inputs( type => 'textarea', name_regex => qr/^customer/, ); # get all text or textarea controls called "customer" my @customer_text_inputs = $mech->find_all_inputs( type_regex => qr/^(text|textarea)$/, name => 'customer', ); =cut sub find_all_inputs { my $self = shift; my %criteria = @_; my $form = $self->current_form() or return; my @found; foreach my $input ( $form->inputs ) { # check every pattern for a match on the current hash my $matched = 1; foreach my $criterion ( sort keys %criteria ) { # Sort so we're deterministic my $field = $criterion; my $is_regex = ( $field =~ s/(?:_regex)$// ); my $what = $input->{$field}; $matched = defined($what) && ( $is_regex ? ( $what =~ $criteria{$criterion} ) : ( $what eq $criteria{$criterion} ) ); last if !$matched; } push @found, $input if $matched; } return @found; } =head2 $mech->find_all_submits( ... criteria ... ) C<find_all_submits()> does the same thing as C<find_all_inputs()> except that it only returns controls that are submit controls, ignoring other types of input controls like text and checkboxes. =cut sub find_all_submits { my $self = shift; return $self->find_all_inputs( @_, type_regex => qr/^(submit|image)$/ ); } =head1 IMAGE METHODS =head2 $mech->images Lists all the images on the current page. Each image is a WWW::Mechanize::Image object. In list context, returns a list of all images. In scalar context, returns an array reference of all images. =cut sub images { my $self = shift; $self->_extract_images() unless $self->{images}; return @{$self->{images}} if wantarray; return $self->{images}; } =head2 $mech->find_image() Finds an image in the current page. It returns a L<WWW::Mechanize::Image> object which describes the image. If it fails to find an image it returns undef. You can select which image to find by passing in one or more of these key/value pairs: =over 4 =item * C<< alt => 'string' >> and C<< alt_regex => qr/regex/, >> C<alt> matches the ALT attribute of the image against I<string>, which must be an exact match. To select a image with an ALT tag that is exactly "download", use $mech->find_image( alt => 'download' ); C<alt_regex> matches the ALT attribute of the image against a regular expression. To select an image with an ALT attribute that has "download" anywhere in it, regardless of case, use $mech->find_image( alt_regex => qr/download/i ); =item * C<< url => 'string', >> and C<< url_regex => qr/regex/, >> Matches the URL of the image against I<string> or I<regex>, as appropriate. The URL may be a relative URL, like F<foo/bar.html>, depending on how it's coded on the page. =item * C<< url_abs => string >> and C<< url_abs_regex => regex >> Matches the absolute URL of the image against I<string> or I<regex>, as appropriate. The URL will be an absolute URL, even if it's relative in the page. =item * C<< tag => string >> and C<< tag_regex => regex >> Matches the tag that the image came from against I<string> or I<regex>, as appropriate. The C<tag_regex> is probably most useful to check for more than one tag, as in: $mech->find_image( tag_regex => qr/^(img|input)$/ ); The tags supported are C<< <img> >> and C<< <input> >>. =back If C<n> is not specified, it defaults to 1. Therefore, if you don't specify any parms, this method defaults to finding the first image on the page. Note that you can specify multiple ALT or URL parameters, which will be ANDed together. For example, to find the first image with ALT text of "News" and with "cnn.com" in the URL, use: $mech->find_image( image => 'News', url_regex => qr/cnn\.com/ ); The return value is a reference to an array containing a L<WWW::Mechanize::Image> object for every image in C<< $self->content >>. =cut sub find_image { my $self = shift; my %parms = ( n=>1, @_ ); my $wantall = ( $parms{n} eq 'all' ); $self->_clean_keys( \%parms, qr/^(n|(alt|url|url_abs|tag)(_regex)?)$/ ); my @images = $self->images or return; my $nmatches = 0; my @matches; for my $image ( @images ) { if ( _match_any_image_parms($image,\%parms) ) { if ( $wantall ) { push( @matches, $image ); } else { ++$nmatches; return $image if $nmatches >= $parms{n}; } } } # for @images if ( $wantall ) { return @matches if wantarray; return \@matches; } return; } # Used by find_images to check for matches # The logic is such that ALL parm criteria that are given must match sub _match_any_image_parms { my $image = shift; my $p = shift; # No conditions, anything matches return 1 unless keys %$p; return if defined $p->{url} && !($image->url eq $p->{url} ); return if defined $p->{url_regex} && !($image->url =~ $p->{url_regex} ); return if defined $p->{url_abs} && !($image->url_abs eq $p->{url_abs} ); return if defined $p->{url_abs_regex} && !($image->url_abs =~ $p->{url_abs_regex} ); return if defined $p->{alt} && !(defined($image->alt) && $image->alt eq $p->{alt} ); return if defined $p->{alt_regex} && !(defined($image->alt) && $image->alt =~ $p->{alt_regex} ); return if defined $p->{tag} && !($image->tag && $image->tag eq $p->{tag} ); return if defined $p->{tag_regex} && !($image->tag && $image->tag =~ $p->{tag_regex} ); # Success: everything that was defined passed. return 1; } =head2 $mech->find_all_images( ... ) Returns all the images on the current page that match the criteria. The method for specifying image criteria is the same as in C<L</find_image()>>. Each of the images returned is a L<WWW::Mechanize::Image> object. In list context, C<find_all_images()> returns a list of the images. Otherwise, it returns a reference to the list of images. C<find_all_images()> with no parameters returns all images in the page. =cut sub find_all_images { my $self = shift; return $self->find_image( @_, n=>'all' ); } =head1 FORM METHODS These methods let you work with the forms on a page. The idea is to choose a form that you'll later work with using the field methods below. =head2 $mech->forms Lists all the forms on the current page. Each form is an L<HTML::Form> object. In list context, returns a list of all forms. In scalar context, returns an array reference of all forms. =cut sub forms { my $self = shift; $self->_extract_forms() unless $self->{forms}; return @{$self->{forms}} if wantarray; return $self->{forms}; } sub current_form { my $self = shift; if ( !$self->{current_form} ) { $self->form_number(1); } return $self->{current_form}; } =head2 $mech->form_number($number) Selects the I<number>th form on the page as the target for subsequent calls to C<L</field()>> and C<L</click()>>. Also returns the form that was selected. If it is found, the form is returned as an L<HTML::Form> object and set internally for later use with Mech's form methods such as C<L</field()>> and C<L</click()>>. Emits a warning and returns undef if no form is found. The first form is number 1, not zero. =cut sub form_number { my ($self, $form) = @_; # XXX Should we die if no $form is defined? Same question for form_name() my $forms = $self->forms; if ( $forms->[$form-1] ) { $self->{current_form} = $forms->[$form-1]; return $self->{current_form}; } return; } =head2 $mech->form_name( $name ) Selects a form by name. If there is more than one form on the page with that name, then the first one is used, and a warning is generated. If it is found, the form is returned as an L<HTML::Form> object and set internally for later use with Mech's form methods such as C<L</field()>> and C<L</click()>>. Returns undef if no form is found. =cut sub form_name { my ($self, $form) = @_; my $temp; my @matches = grep {defined($temp = $_->attr('name')) and ($temp eq $form) } $self->forms; my $nmatches = @matches; if ( $nmatches > 0 ) { if ( $nmatches > 1 ) { $self->warn( "There are $nmatches forms named $form. The first one was used." ) } return $self->{current_form} = $matches[0]; } return; } =head2 $mech->form_id( $name ) Selects a form by ID. If there is more than one form on the page with that ID, then the first one is used, and a warning is generated. If it is found, the form is returned as an L<HTML::Form> object and set internally for later use with Mech's form methods such as C<L</field()>> and C<L</click()>>. Returns undef if no form is found. =cut sub form_id { my ($self, $formid) = @_; my $temp; my @matches = grep { defined($temp = $_->attr('id')) and ($temp eq $formid) } $self->forms; if ( @matches ) { $self->warn( 'There are ', scalar @matches, " forms with ID $formid. The first one was used." ) if @matches > 1; return $self->{current_form} = $matches[0]; } else { $self->warn( qq{ There is no form with ID "$formid"} ); return undef; } } =head2 $mech->form_with_fields( @fields ) Selects a form by passing in a list of field names it must contain. If there is more than one form on the page with that matches, then the first one is used, and a warning is generated. If it is found, the form is returned as an L<HTML::Form> object and set internally for later used with Mech's form methods such as C<L</field()>> and C<L</click()>>. Returns undef if no form is found. Note that this functionality requires libwww-perl 5.69 or higher. =cut sub form_with_fields { my ($self, @fields) = @_; die 'no fields provided' unless scalar @fields; my @matches; FORMS: for my $form (@{ $self->forms }) { my @fields_in_form = $form->param(); for my $field (@fields) { next FORMS unless grep { $_ eq $field } @fields_in_form; } push @matches, $form; } my $nmatches = @matches; if ( $nmatches > 0 ) { if ( $nmatches > 1 ) { $self->warn( "There are $nmatches forms with the named fields. The first one was used." ) } return $self->{current_form} = $matches[0]; } else { $self->warn( qq{There is no form with the requested fields} ); return undef; } } =head1 FIELD METHODS These methods allow you to set the values of fields in a given form. =head2 $mech->field( $name, $value, $number ) =head2 $mech->field( $name, \@values, $number ) Given the name of a field, set its value to the value specified. This applies to the current form (as set by the L</form_name()> or L</form_number()> method or defaulting to the first form on the page). The optional I<$number> parameter is used to distinguish between two fields with the same name. The fields are numbered from 1. =cut sub field { my ($self, $name, $value, $number) = @_; $number ||= 1; my $form = $self->current_form(); if ($number > 1) { $form->find_input($name, undef, $number)->value($value); } else { if ( ref($value) eq 'ARRAY' ) { $form->param($name, $value); } else { $form->value($name => $value); } } } =head2 $mech->select($name, $value) =head2 $mech->select($name, \@values) Given the name of a C<select> field, set its value to the value specified. If the field is not C<< <select multiple> >> and the C<$value> is an array, only the B<first> value will be set. [Note: the documentation previously claimed that only the last value would be set, but this was incorrect.] Passing C<$value> as a hash with an C<n> key selects an item by number (e.g. C<< {n => 3} >> or C<< {n => [2,4]} >>). The numbering starts at 1. This applies to the current form. If you have a field with C<< <select multiple> >> and you pass a single C<$value>, then C<$value> will be added to the list of fields selected, without clearing the others. However, if you pass an array reference, then all previously selected values will be cleared. Returns true on successfully setting the value. On failure, returns false and calls C<< $self>warn() >> with an error message. =cut sub select { my ($self, $name, $value) = @_; my $form = $self->current_form(); my $input = $form->find_input($name); if (!$input) { $self->warn( qq{Input "$name" not found} ); return; } if ($input->type ne 'option') { $self->warn( qq{Input "$name" is not type "select"} ); return; } # For $mech->select($name, {n => 3}) or $mech->select($name, {n => [2,4]}), # transform the 'n' number(s) into value(s) and put it in $value. if (ref($value) eq 'HASH') { for (keys %$value) { $self->warn(qq{Unknown select value parameter "$_"}) unless $_ eq 'n'; } if (defined($value->{n})) { my @inputs = $form->find_input($name, 'option'); my @values = (); # distinguish between multiple and non-multiple selects # (see INPUTS section of `perldoc HTML::Form`) if (@inputs == 1) { @values = $inputs[0]->possible_values(); } else { foreach my $input (@inputs) { my @possible = $input->possible_values(); push @values, pop @possible; } } my $n = $value->{n}; if (ref($n) eq 'ARRAY') { $value = []; for (@$n) { unless (/^\d+$/) { $self->warn(qq{"n" value "$_" is not a positive integer}); return; } push @$value, $values[$_ - 1]; # might be undef } } elsif (!ref($n) && $n =~ /^\d+$/) { $value = $values[$n - 1]; # might be undef } else { $self->warn('"n" value is not a positive integer or an array ref'); return; } } else { $self->warn('Hash value is invalid'); return; } } # hashref if (ref($value) eq 'ARRAY') { $form->param($name, $value); return 1; } $form->value($name => $value); return 1; } =head2 $mech->set_fields( $name => $value ... ) This method sets multiple fields of the current form. It takes a list of field name and value pairs. If there is more than one field with the same name, the first one found is set. If you want to select which of the duplicate field to set, use a value which is an anonymous array which has the field value and its number as the 2 elements. # set the second foo field $mech->set_fields( $name => [ 'foo', 2 ] ); The fields are numbered from 1. This applies to the current form. =cut sub set_fields { my $self = shift; my %fields = @_; my $form = $self->current_form or $self->die( 'No form defined' ); while ( my ( $field, $value ) = each %fields ) { if ( ref $value eq 'ARRAY' ) { $form->find_input( $field, undef, $value->[1])->value($value->[0] ); } else { $form->value($field => $value); } } # while } # set_fields() =head2 $mech->set_visible( @criteria ) This method sets fields of the current form without having to know their names. So if you have a login screen that wants a username and password, you do not have to fetch the form and inspect the source (or use the F<mech-dump> utility, installed with WWW::Mechanize) to see what the field names are; you can just say $mech->set_visible( $username, $password ); and the first and second fields will be set accordingly. The method is called set_I<visible> because it acts only on visible fields; hidden form inputs are not considered. The order of the fields is the order in which they appear in the HTML source which is nearly always the order anyone viewing the page would think they are in, but some creative work with tables could change that; caveat user. Each element in C<@criteria> is either a field value or a field specifier. A field value is a scalar. A field specifier allows you to specify the I<type> of input field you want to set and is denoted with an arrayref containing two elements. So you could specify the first radio button with $mech->set_visible( [ radio => 'KCRW' ] ); Field values and specifiers can be intermixed, hence $mech->set_visible( 'fred', 'secret', [ option => 'Checking' ] ); would set the first two fields to "fred" and "secret", and the I<next> C<OPTION> menu field to "Checking". The possible field specifier types are: "text", "password", "hidden", "textarea", "file", "image", "submit", "radio", "checkbox" and "option". C<set_visible> returns the number of values set. =cut sub set_visible { my $self = shift; my $form = $self->current_form; my @inputs = $form->inputs; my $num_set = 0; for my $value ( @_ ) { # Handle type/value pairs an arrayref if ( ref $value eq 'ARRAY' ) { my ( $type, $value ) = @$value; while ( my $input = shift @inputs ) { next if $input->type eq 'hidden'; if ( $input->type eq $type ) { $input->value( $value ); $num_set++; last; } } # while } # by default, it's a value else { while ( my $input = shift @inputs ) { next if $input->type eq 'hidden'; $input->value( $value ); $num_set++; last; } # while } } # for return $num_set; } # set_visible() =head2 $mech->tick( $name, $value [, $set] ) "Ticks" the first checkbox that has both the name and value associated with it on the current form. Dies if there is no named check box for that value. Passing in a false value as the third optional argument will cause the checkbox to be unticked. =cut sub tick { my $self = shift; my $name = shift; my $value = shift; my $set = @_ ? shift : 1; # default to 1 if not passed # loop though all the inputs my $index = 0; while ( my $input = $self->current_form->find_input( $name, 'checkbox', $index ) ) { # Can't guarantee that the first element will be undef and the second # element will be the right name foreach my $val ($input->possible_values()) { next unless defined $val; if ($val eq $value) { $input->value($set ? $value : undef); return; } } # move onto the next input $index++; } # while # got self far? Didn't find anything $self->warn( qq{No checkbox "$name" for value "$value" in form} ); } # tick() =head2 $mech->untick($name, $value) Causes the checkbox to be unticked. Shorthand for C<tick($name,$value,undef)> =cut sub untick { shift->tick(shift,shift,undef); } =head2 $mech->value( $name [, $number] ) Given the name of a field, return its value. This applies to the current form. The optional I<$number> parameter is used to distinguish between two fields with the same name. The fields are numbered from 1. If the field is of type file (file upload field), the value is always cleared to prevent remote sites from downloading your local files. To upload a file, specify its file name explicitly. =cut sub value { my $self = shift; my $name = shift; my $number = shift || 1; my $form = $self->current_form; if ( $number > 1 ) { return $form->find_input( $name, undef, $number )->value(); } else { return $form->value( $name ); } } # value =head2 $mech->click( $button [, $x, $y] ) Has the effect of clicking a button on the current form. The first argument is the name of the button to be clicked. The second and third arguments (optional) allow you to specify the (x,y) coordinates of the click. If there is only one button on the form, C<< $mech->click() >> with no arguments simply clicks that one button. Returns an L<HTTP::Response> object. =cut sub click { my ($self, $button, $x, $y) = @_; for ($x, $y) { $_ = 1 unless defined; } my $request = $self->current_form->click($button, $x, $y); return $self->request( $request ); } =head2 $mech->click_button( ... ) Has the effect of clicking a button on the current form by specifying its name, value, or index. Its arguments are a list of key/value pairs. Only one of name, number, input or value must be specified in the keys. =over 4 =item * C<< name => name >> Clicks the button named I<name> in the current form. =item * C<< number => n >> Clicks the I<n>th button in the current form. Numbering starts at 1. =item * C<< value => value >> Clicks the button with the value I<value> in the current form. =item * C<< input => $inputobject >> Clicks on the button referenced by $inputobject, an instance of L<HTML::Form::SubmitInput> obtained e.g. from $mech->current_form()->find_input( undef, 'submit' ) $inputobject must belong to the current form. =item * C<< x => x >> =item * C<< y => y >> These arguments (optional) allow you to specify the (x,y) coordinates of the click. =back =cut sub click_button { my $self = shift; my %args = @_; for ( keys %args ) { if ( !/^(number|name|value|input|x|y)$/ ) { $self->warn( qq{Unknown click_button parameter "$_"} ); } } for ($args{x}, $args{y}) { $_ = 1 unless defined; } my $form = $self->current_form or $self->die( 'click_button: No form has been selected' ); my $request; if ( $args{name} ) { $request = $form->click( $args{name}, $args{x}, $args{y} ); } elsif ( $args{number} ) { my $input = $form->find_input( undef, 'submit', $args{number} ); $request = $input->click( $form, $args{x}, $args{y} ); } elsif ( $args{input} ) { $request = $args{input}->click( $form, $args{x}, $args{y} ); } elsif ( $args{value} ) { my $i = 1; while ( my $input = $form->find_input(undef, 'submit', $i) ) { if ( $args{value} && ($args{value} eq $input->value) ) { $request = $input->click( $form, $args{x}, $args{y} ); last; } $i++; } # while } # $args{value} return $self->request( $request ); } =head2 $mech->submit() Submits the page, without specifying a button to click. Actually, no button is clicked at all. Returns an L<HTTP::Response> object. This used to be a synonym for C<< $mech->click( 'submit' ) >>, but is no longer so. =cut sub submit { my $self = shift; my $request = $self->current_form->make_request; return $self->request( $request ); } =head2 $mech->submit_form( ... ) This method lets you select a form from the previously fetched page, fill in its fields, and submit it. It combines the form_number/form_name, set_fields and click methods into one higher level call. Its arguments are a list of key/value pairs, all of which are optional. =over 4 =item * C<< fields => \%fields >> Specifies the fields to be filled in the current form. =item * C<< with_fields => \%fields >> Probably all you need for the common case. It combines a smart form selector and data setting in one operation. It selects the first form that contains all fields mentioned in C<\%fields>. This is nice because you don't need to know the name or number of the form to do this. (calls C<L</form_with_fields()>> and C<L</set_fields()>>). If you choose this, the form_number, form_name, form_id and fields options will be ignored. =item * C<< form_number => n >> Selects the I<n>th form (calls C<L</form_number()>>). If this parm is not specified, the currently-selected form is used. =item * C<< form_name => name >> Selects the form named I<name> (calls C<L</form_name()>>) =item * C<< form_id => ID >> Selects the form with ID I<ID> (calls C<L</form_id()>>) =item * C<< button => button >> Clicks on button I<button> (calls C<L</click()>>) =item * C<< x => x, y => y >> Sets the x or y values for C<L</click()>> =back If no form is selected, the first form found is used. If I<button> is not passed, then the C<L</submit()>> method is used instead. If you want to submit a file and get its content from a scalar rather than a file in the filesystem, you can use: $mech->submit_form(with_fields => { logfile => [ [ undef, 'whatever', Content => $content ], 1 ] } ); Returns an L<HTTP::Response> object. =cut sub submit_form { my( $self, %args ) = @_; for ( keys %args ) { if ( !/^(form_(number|name|fields|id)|(with_)?fields|button|x|y)$/ ) { # XXX Why not die here? $self->warn( qq{Unknown submit_form parameter "$_"} ); } } my $fields; for (qw/with_fields fields/) { if ($args{$_}) { if ( ref $args{$_} eq 'HASH' ) { $fields = $args{$_}; } else { die "$_ arg to submit_form must be a hashref"; } last; } } if ( $args{with_fields} ) { $fields || die q{must submit some 'fields' with with_fields}; $self->form_with_fields(keys %{$fields}) or die "There is no form with the requested fields"; } elsif ( my $form_number = $args{form_number} ) { $self->form_number( $form_number ) or die "There is no form numbered $form_number"; } elsif ( my $form_name = $args{form_name} ) { $self->form_name( $form_name ) or die qq{There is no form named "$form_name"}; } elsif ( my $form_id = $args{form_id} ) { $self->form_id( $form_id ) or die qq{There is no form with ID "$form_id"}; } else { # No form selector was used. # Maybe a form was set separately, or we'll default to the first form. } $self->set_fields( %{$fields} ) if $fields; my $response; if ( $args{button} ) { $response = $self->click( $args{button}, $args{x} || 0, $args{y} || 0 ); } else { $response = $self->submit(); } return $response; } =head1 MISCELLANEOUS METHODS =head2 $mech->add_header( name => $value [, name => $value... ] ) Sets HTTP headers for the agent to add or remove from the HTTP request. $mech->add_header( Encoding => 'text/klingon' ); If a I<value> is C<undef>, then that header will be removed from any future requests. For example, to never send a Referer header: $mech->add_header( Referer => undef ); If you want to delete a header, use C<delete_header>. Returns the number of name/value pairs added. B<NOTE>: This method was very different in WWW::Mechanize before 1.00. Back then, the headers were stored in a package hash, not as a member of the object instance. Calling C<add_header()> would modify the headers for every WWW::Mechanize object, even after your object no longer existed. =cut sub add_header { my $self = shift; my $npairs = 0; while ( @_ ) { my $key = shift; my $value = shift; ++$npairs; $self->{headers}{$key} = $value; } return $npairs; } =head2 $mech->delete_header( name [, name ... ] ) Removes HTTP headers from the agent's list of special headers. For instance, you might need to do something like: # Don't send a Referer for this URL $mech->add_header( Referer => undef ); # Get the URL $mech->get( $url ); # Back to the default behavior $mech->delete_header( 'Referer' ); =cut sub delete_header { my $self = shift; while ( @_ ) { my $key = shift; delete $self->{headers}{$key}; } return; } =head2 $mech->quiet(true/false) Allows you to suppress warnings to the screen. $mech->quiet(0); # turns on warnings (the default) $mech->quiet(1); # turns off warnings $mech->quiet(); # returns the current quietness status =cut sub quiet { my $self = shift; $self->{quiet} = $_[0] if @_; return $self->{quiet}; } =head2 $mech->stack_depth( $max_depth ) Get or set the page stack depth. Use this if you're doing a lot of page scraping and running out of memory. A value of 0 means "no history at all." By default, the max stack depth is humongously large, effectively keeping all history. =cut sub stack_depth { my $self = shift; $self->{stack_depth} = shift if @_; return $self->{stack_depth}; } =head2 $mech->save_content( $filename, %opts ) Dumps the contents of C<< $mech->content >> into I<$filename>. I<$filename> will be overwritten. Dies if there are any errors. If the content type does not begin with "text/", then the content is saved in binary mode (i.e. C<binmode()> is set on the output filehandle). Additional arguments can be passed as I<key>/I<value> pairs: =over =item I<< $mech->save_content( $filename, binary => 1 ) >> Filehandle is set with C<binmode> to C<:raw> and contents are taken calling C<< $self->content(decoded_by_headers => 1) >>. Same as calling: $mech->save_content( $filename, binmode => ':raw', decoded_by_headers => 1 ); This I<should> be the safest way to save contents verbatim. =item I<< $mech->save_content( $filename, binmode => $binmode ) >> Filehandle is set to binary mode. If C<$binmode> begins with ':', it is passed as a parameter to C<binmode>: binmode $fh, $binmode; otherwise the filehandle is set to binary mode if C<$binmode> is true: binmode $fh; =item I<all other arguments> are passed as-is to C<< $mech->content(%opts) >>. In particular, C<decoded_by_headers> might come handy if you want to revert the effect of line compression performed by the web server but without further interpreting the contents (e.g. decoding it according to the charset). =back =cut sub save_content { my $self = shift; my $filename = shift; my %opts = @_; if (delete $opts{binary}) { $opts{binmode} = ':raw'; $opts{decoded_by_headers} = 1; } open( my $fh, '>', $filename ) or $self->die( "Unable to create $filename: $!" ); if ((my $binmode = delete($opts{binmode}) || '') || ($self->content_type() !~ m{^text/})) { if (length($binmode) && (substr($binmode, 0, 1) eq ':')) { binmode $fh, $binmode; } else { binmode $fh; } } print {$fh} $self->content(%opts) or $self->die( "Unable to write to $filename: $!" ); close $fh or $self->die( "Unable to close $filename: $!" ); return; } =head2 $mech->dump_headers( [$fh] ) Prints a dump of the HTTP response headers for the most recent response. If I<$fh> is not specified or is undef, it dumps to STDOUT. Unlike the rest of the dump_* methods, $fh can be a scalar. It will be used as a file name. =cut sub _get_fh_default_stdout { my $self = shift; my $p = shift || ''; if ( !$p ) { return \*STDOUT; } elsif ( !ref($p) ) { open my $fh, '>', $p or $self->die( "Unable to write to $p: $!" );; return $fh; } else { return $p; } } sub dump_headers { my $self = shift; my $fh = $self->_get_fh_default_stdout(shift); print {$fh} $self->response->headers_as_string; return; } =head2 $mech->dump_links( [[$fh], $absolute] ) Prints a dump of the links on the current page to I<$fh>. If I<$fh> is not specified or is undef, it dumps to STDOUT. If I<$absolute> is true, links displayed are absolute, not relative. =cut sub dump_links { my $self = shift; my $fh = shift || \*STDOUT; my $absolute = shift; for my $link ( $self->links ) { my $url = $absolute ? $link->url_abs : $link->url; $url = '' if not defined $url; print {$fh} $url, "\n"; } return; } =head2 $mech->dump_images( [[$fh], $absolute] ) Prints a dump of the images on the current page to I<$fh>. If I<$fh> is not specified or is undef, it dumps to STDOUT. If I<$absolute> is true, links displayed are absolute, not relative. =cut sub dump_images { my $self = shift; my $fh = shift || \*STDOUT; my $absolute = shift; for my $image ( $self->images ) { my $url = $absolute ? $image->url_abs : $image->url; $url = '' if not defined $url; print {$fh} $url, "\n"; } return; } =head2 $mech->dump_forms( [$fh] ) Prints a dump of the forms on the current page to I<$fh>. If I<$fh> is not specified or is undef, it dumps to STDOUT. =cut sub dump_forms { my $self = shift; my $fh = shift || \*STDOUT; for my $form ( $self->forms ) { print {$fh} $form->dump, "\n"; } return; } =head2 $mech->dump_text( [$fh] ) Prints a dump of the text on the current page to I<$fh>. If I<$fh> is not specified or is undef, it dumps to STDOUT. =cut sub dump_text { my $self = shift; my $fh = shift || \*STDOUT; my $absolute = shift; print {$fh} $self->text, "\n"; return; } =head1 OVERRIDDEN LWP::UserAgent METHODS =head2 $mech->clone() Clone the mech object. The clone will be using the same cookie jar as the original mech. =cut sub clone { my $self = shift; my $clone = $self->SUPER::clone(); $clone->cookie_jar( $self->cookie_jar ); return $clone; } =head2 $mech->redirect_ok() An overloaded version of C<redirect_ok()> in L<LWP::UserAgent>. This method is used to determine whether a redirection in the request should be followed. Note that WWW::Mechanize's constructor pushes POST on to the agent's C<requests_redirectable> list. =cut sub redirect_ok { my $self = shift; my $prospective_request = shift; my $response = shift; my $ok = $self->SUPER::redirect_ok( $prospective_request, $response ); if ( $ok ) { $self->{redirected_uri} = $prospective_request->uri; } return $ok; } =head2 $mech->request( $request [, $arg [, $size]]) Overloaded version of C<request()> in L<LWP::UserAgent>. Performs the actual request. Normally, if you're using WWW::Mechanize, it's because you don't want to deal with this level of stuff anyway. Note that C<$request> will be modified. Returns an L<HTTP::Response> object. =cut sub request { my $self = shift; my $request = shift; $request = $self->_modify_request( $request ); if ( $request->method eq 'GET' || $request->method eq 'POST' ) { $self->_push_page_stack(); } return $self->_update_page($request, $self->_make_request( $request, @_ )); } =head2 $mech->update_html( $html ) Allows you to replace the HTML that the mech has found. Updates the forms and links parse-trees that the mech uses internally. Say you have a page that you know has malformed output, and you want to update it so the links come out correctly: my $html = $mech->content; $html =~ s[</option>.{0,3}</td>][</option></select></td>]isg; $mech->update_html( $html ); This method is also used internally by the mech itself to update its own HTML content when loading a page. This means that if you would like to I<systematically> perform the above HTML substitution, you would overload I<update_html> in a subclass thusly: package MyMech; use base 'WWW::Mechanize'; sub update_html { my ($self, $html) = @_; $html =~ s[</option>.{0,3}</td>][</option></select></td>]isg; $self->WWW::Mechanize::update_html( $html ); } If you do this, then the mech will use the tidied-up HTML instead of the original both when parsing for its own needs, and for returning to you through L</content>. Overloading this method is also the recommended way of implementing extra validation steps (e.g. link checkers) for every HTML page received. L</warn> and L</die> would then come in handy to signal validation errors. =cut sub update_html { my $self = shift; my $html = shift; $self->_reset_page; $self->{ct} = 'text/html'; $self->{content} = $html; return; } =head2 $mech->credentials( $username, $password ) Provide credentials to be used for HTTP Basic authentication for all sites and realms until further notice. The four argument form described in L<LWP::UserAgent> is still supported. =cut sub credentials { my $self = shift; # The latest LWP::UserAgent also supports 2 arguments, # in which case the first is host:port if (@_ == 4 || (@_ == 2 && $_[0] =~ /:\d+$/)) { return $self->SUPER::credentials(@_); } @_ == 2 or $self->die( 'Invalid # of args for overridden credentials()' ); return @$self{qw( __username __password )} = @_; } =head2 $mech->get_basic_credentials( $realm, $uri, $isproxy ) Returns the credentials for the realm and URI. =cut sub get_basic_credentials { my $self = shift; my @cred = grep { defined } @$self{qw( __username __password )}; return @cred if @cred == 2; return $self->SUPER::get_basic_credentials(@_); } =head2 $mech->clear_credentials() Remove any credentials set up with C<credentials()>. =cut sub clear_credentials { my $self = shift; delete @$self{qw( __username __password )}; } =head1 INHERITED UNCHANGED LWP::UserAgent METHODS As a subclass of L<LWP::UserAgent>, WWW::Mechanize inherits all of L<LWP::UserAgent>'s methods. Many of which are overridden or extended. The following methods are inherited unchanged. View the L<LWP::UserAgent> documentation for their implementation descriptions. This is not meant to be an inclusive list. LWP::UA may have added others. =head2 $mech->head() Inherited from L<LWP::UserAgent>. =head2 $mech->post() Inherited from L<LWP::UserAgent>. =head2 $mech->mirror() Inherited from L<LWP::UserAgent>. =head2 $mech->simple_request() Inherited from L<LWP::UserAgent>. =head2 $mech->is_protocol_supported() Inherited from L<LWP::UserAgent>. =head2 $mech->prepare_request() Inherited from L<LWP::UserAgent>. =head2 $mech->progress() Inherited from L<LWP::UserAgent>. =head1 INTERNAL-ONLY METHODS These methods are only used internally. You probably don't need to know about them. =head2 $mech->_update_page($request, $response) Updates all internal variables in $mech as if $request was just performed, and returns $response. The page stack is B<not> altered by this method, it is up to caller (e.g. L</request>) to do that. =cut sub _update_page { my ($self, $request, $res) = @_; $self->{req} = $request; $self->{redirected_uri} = $request->uri->as_string; $self->{res} = $res; $self->{status} = $res->code; $self->{base} = $res->base; $self->{ct} = $res->content_type || ''; if ( $res->is_success ) { $self->{uri} = $self->{redirected_uri}; $self->{last_uri} = $self->{uri}; } if ( $res->is_error ) { if ( $self->{autocheck} ) { $self->die( 'Error ', $request->method, 'ing ', $request->uri, ': ', $res->message ); } } $self->_reset_page; # Try to decode the content. Undef will be returned if there's nothing to decompress. # See docs in HTTP::Message for details. Do we need to expose the options there? my $content = $res->decoded_content(); $content = $res->content if (not defined $content); $content .= _taintedness(); if ($self->is_html) { $self->update_html($content); } else { $self->{content} = $content; } return $res; } # _update_page our $_taintbrush; # This is lifted wholesale from Test::Taint sub _taintedness { return $_taintbrush if defined $_taintbrush; # Somehow we need to get some taintedness into our $_taintbrush. # Let's try the easy way first. Either of these should be # tainted, unless somebody has untainted them, so this # will almost always work on the first try. # (Unless, of course, taint checking has been turned off!) $_taintbrush = substr("$0$^X", 0, 0); return $_taintbrush if _is_tainted( $_taintbrush ); # Let's try again. Maybe somebody cleaned those. $_taintbrush = substr(join('', grep { defined } @ARGV, %ENV), 0, 0); return $_taintbrush if _is_tainted( $_taintbrush ); # If those don't work, go try to open some file from some unsafe # source and get data from them. That data is tainted. # (Yes, even reading from /dev/null works!) for my $filename ( qw(/dev/null / . ..), values %INC, $0, $^X ) { if ( open my $fh, '<', $filename ) { my $data; if ( defined sysread $fh, $data, 1 ) { $_taintbrush = substr( $data, 0, 0 ); last if _is_tainted( $_taintbrush ); } } } # Sanity check die "Our taintbrush should have zero length!" if length $_taintbrush; return $_taintbrush; } sub _is_tainted { no warnings qw(void uninitialized); return !eval { join('', shift), kill 0; 1 }; } # _is_tainted =head2 $mech->_modify_request( $req ) Modifies a L<HTTP::Request> before the request is sent out, for both GET and POST requests. We add a C<Referer> header, as well as header to note that we can accept gzip encoded content, if L<Compress::Zlib> is installed. =cut sub _modify_request { my $self = shift; my $req = shift; # add correct Accept-Encoding header to restore compliance with # http://www.freesoft.org/CIE/RFC/2068/158.htm # http://use.perl.org/~rhesa/journal/25952 if (not $req->header( 'Accept-Encoding' ) ) { # "identity" means "please! unencoded content only!" $req->header( 'Accept-Encoding', $HAS_ZLIB ? 'gzip' : 'identity' ); } my $last = $self->{last_uri}; if ( $last ) { $last = $last->as_string if ref($last); $req->header( Referer => $last ); } while ( my($key,$value) = each %{$self->{headers}} ) { if ( defined $value ) { $req->header( $key => $value ); } else { $req->remove_header( $key ); } } return $req; } =head2 $mech->_make_request() Convenience method to make it easier for subclasses like L<WWW::Mechanize::Cached> to intercept the request. =cut sub _make_request { my $self = shift; return $self->SUPER::request(@_); } =head2 $mech->_reset_page() Resets the internal fields that track page parsed stuff. =cut sub _reset_page { my $self = shift; $self->{links} = undef; $self->{images} = undef; $self->{forms} = undef; $self->{current_form} = undef; $self->{title} = undef; $self->{text} = undef; return; } =head2 $mech->_extract_links() Extracts links from the content of a webpage, and populates the C<{links}> property with L<WWW::Mechanize::Link> objects. =cut my %link_tags = ( a => 'href', area => 'href', frame => 'src', iframe => 'src', link => 'href', meta => 'content', ); sub _extract_links { my $self = shift; $self->{links} = []; if ( defined $self->{content} ) { my $parser = HTML::TokeParser->new(\$self->{content}); while ( my $token = $parser->get_tag( keys %link_tags ) ) { my $link = $self->_link_from_token( $token, $parser ); push( @{$self->{links}}, $link ) if $link; } # while } return; } my %image_tags = ( img => 'src', input => 'src', ); sub _extract_images { my $self = shift; $self->{images} = []; if ( defined $self->{content} ) { my $parser = HTML::TokeParser->new(\$self->{content}); while ( my $token = $parser->get_tag( keys %image_tags ) ) { my $image = $self->_image_from_token( $token, $parser ); push( @{$self->{images}}, $image ) if $image; } # while } return; } sub _image_from_token { my $self = shift; my $token = shift; my $parser = shift; my $tag = $token->[0]; my $attrs = $token->[1]; if ( $tag eq 'input' ) { my $type = $attrs->{type} or return; return unless $type eq 'image'; } require WWW::Mechanize::Image; return WWW::Mechanize::Image->new({ tag => $tag, base => $self->base, url => $attrs->{src}, name => $attrs->{name}, height => $attrs->{height}, width => $attrs->{width}, alt => $attrs->{alt}, }); } sub _link_from_token { my $self = shift; my $token = shift; my $parser = shift; my $tag = $token->[0]; my $attrs = $token->[1]; my $url = $attrs->{$link_tags{$tag}}; my $text; my $name; if ( $tag eq 'a' ) { $text = $parser->get_trimmed_text("/$tag"); $text = '' unless defined $text; my $onClick = $attrs->{onclick}; if ( $onClick && ($onClick =~ /^window\.open\(\s*'([^']+)'/) ) { $url = $1; } } # a # Of the tags we extract from, only 'AREA' has an alt tag # The rest should have a 'name' attribute. # ... but we don't do anything with that bit of wisdom now. $name = $attrs->{name}; if ( $tag eq 'meta' ) { my $equiv = $attrs->{'http-equiv'}; my $content = $attrs->{'content'}; return unless $equiv && (lc $equiv eq 'refresh') && defined $content; if ( $content =~ /^\d+\s*;\s*url\s*=\s*(\S+)/i ) { $url = $1; $url =~ s/^"(.+)"$/$1/ or $url =~ s/^'(.+)'$/$1/; } else { undef $url; } } # meta return unless defined $url; # probably just a name link or <AREA NOHREF...> require WWW::Mechanize::Link; return WWW::Mechanize::Link->new({ url => $url, text => $text, name => $name, tag => $tag, base => $self->base, attrs => $attrs, }); } # _link_from_token sub _extract_forms { my $self = shift; my @forms = HTML::Form->parse( $self->content, $self->base ); $self->{forms} = \@forms; for my $form ( @forms ) { for my $input ($form->inputs) { if ($input->type eq 'file') { $input->value( undef ); } } } return; } =head2 $mech->_push_page_stack() The agent keeps a stack of visited pages, which it can pop when it needs to go BACK and so on. The current page needs to be pushed onto the stack before we get a new page, and the stack needs to be popped when BACK occurs. Neither of these take any arguments, they just operate on the $mech object. =cut sub _push_page_stack { my $self = shift; my $req = $self->{req}; my $res = $self->{res}; return unless $req && $res && $self->stack_depth; # Don't push anything if it's a virgin object my $stack = $self->{page_stack} ||= []; if ( @{$stack} >= $self->stack_depth ) { shift @{$stack}; } push( @{$stack}, { req => $req, res => $res } ); return 1; } =head2 warn( @messages ) Centralized warning method, for diagnostics and non-fatal problems. Defaults to calling C<CORE::warn>, but may be overridden by setting C<onwarn> in the constructor. =cut sub warn { my $self = shift; return unless my $handler = $self->{onwarn}; return if $self->quiet; return $handler->(@_); } =head2 die( @messages ) Centralized error method. Defaults to calling C<CORE::die>, but may be overridden by setting C<onerror> in the constructor. =cut sub die { my $self = shift; return unless my $handler = $self->{onerror}; return $handler->(@_); } # NOT an object method! sub _warn { require Carp; return &Carp::carp; ## no critic } # NOT an object method! sub _die { require Carp; return &Carp::croak; ## no critic } 1; # End of module __END__ =head1 WWW::MECHANIZE'S GIT REPOSITORY WWW::Mechanize is hosted at GitHub, though the bug tracker still lives at Google Code. Repository: L<https://github.com/libwww-perl/WWW-Mechanize>. Bugs: L<http://code.google.com/p/www-mechanize/issues>. =head1 OTHER DOCUMENTATION =head2 I<Spidering Hacks>, by Kevin Hemenway and Tara Calishain I<Spidering Hacks> from O'Reilly (L<http://www.oreilly.com/catalog/spiderhks/>) is a great book for anyone wanting to know more about screen-scraping and spidering. There are six hacks that use Mech or a Mech derivative: =over 4 =item #21 WWW::Mechanize 101 =item #22 Scraping with WWW::Mechanize =item #36 Downloading Images from Webshots =item #44 Archiving Yahoo! Groups Messages with WWW::Yahoo::Groups =item #64 Super Author Searching =item #73 Scraping TV Listings =back The book was also positively reviewed on Slashdot: L<http://books.slashdot.org/article.pl?sid=03/12/11/2126256> =head1 ONLINE RESOURCES AND SUPPORT =over 4 =item * WWW::Mechanize mailing list The Mech mailing list is at L<http://groups.google.com/group/www-mechanize-users> and is specific to Mechanize, unlike the LWP mailing list below. Although it is a users list, all development discussion takes place here, too. =item * LWP mailing list The LWP mailing list is at L<http://lists.perl.org/showlist.cgi?name=libwww>, and is more user-oriented and well-populated than the WWW::Mechanize list. =item * Perlmonks L<http://perlmonks.org> is an excellent community of support, and many questions about Mech have already been answered there. =item * L<WWW::Mechanize::Examples> A random array of examples submitted by users, included with the Mechanize distribution. =back =head1 ARTICLES ABOUT WWW::MECHANIZE =over 4 =item * L<http://www-128.ibm.com/developerworks/linux/library/wa-perlsecure.html> IBM article "Secure Web site access with Perl" =item * L<http://www.oreilly.com/catalog/googlehks2/chapter/hack84.pdf> Leland Johnson's hack #84 in I<Google Hacks, 2nd Edition> is an example of a production script that uses WWW::Mechanize and HTML::TableContentParser. It takes in keywords and returns the estimated price of these keywords on Google's AdWords program. =item * L<http://www.perl.com/pub/a/2004/06/04/recorder.html> Linda Julien writes about using HTTP::Recorder to create WWW::Mechanize scripts. =item * L<http://www.developer.com/lang/other/article.php/3454041> Jason Gilmore's article on using WWW::Mechanize for scraping sales information from Amazon and eBay. =item * L<http://www.perl.com/pub/a/2003/01/22/mechanize.html> Chris Ball's article about using WWW::Mechanize for scraping TV listings. =item * L<http://www.stonehenge.com/merlyn/LinuxMag/col47.html> Randal Schwartz's article on scraping Yahoo News for images. It's already out of date: He manually walks the list of links hunting for matches, which wouldn't have been necessary if the C<find_link()> method existed at press time. =item * L<http://www.perladvent.org/2002/16th/> WWW::Mechanize on the Perl Advent Calendar, by Mark Fowler. =item * L<http://www.linux-magazin.de/Artikel/ausgabe/2004/03/perl/perl.html> Michael Schilli's article on Mech and L<WWW::Mechanize::Shell> for the German magazine I<Linux Magazin>. =back =head2 Other modules that use Mechanize Here are modules that use or subclass Mechanize. Let me know of any others: =over 4 =item * L<Finance::Bank::LloydsTSB> =item * L<HTTP::Recorder> Acts as a proxy for web interaction, and then generates WWW::Mechanize scripts. =item * L<Win32::IE::Mechanize> Just like Mech, but using Microsoft Internet Explorer to do the work. =item * L<WWW::Bugzilla> =item * L<WWW::CheckSite> =item * L<WWW::Google::Groups> =item * L<WWW::Hotmail> =item * L<WWW::Mechanize::Cached> =item * L<WWW::Mechanize::FormFiller> =item * L<WWW::Mechanize::Shell> =item * L<WWW::Mechanize::Sleepy> =item * L<WWW::Mechanize::SpamCop> =item * L<WWW::Mechanize::Timed> =item * L<WWW::SourceForge> =item * L<WWW::Yahoo::Groups> =item * L<WWW::Scripter> =back =head1 ACKNOWLEDGEMENTS Thanks to the numerous people who have helped out on WWW::Mechanize in one way or another, including Kirrily Robert for the original C<WWW::Automate>, Lyle Hopkins, Damien Clark, Ansgar Burchardt, Gisle Aas, Jeremy Ary, Hilary Holz, Rafael Kitover, Norbert Buchmuller, Dave Page, David Sainty, H.Merijn Brand, Matt Lawrence, Michael Schwern, Adriano Ferreira, Miyagawa, Peteris Krumins, Rafael Kitover, David Steinbrunner, Kevin Falcone, Mike O'Regan, Mark Stosberg, Uri Guttman, Peter Scott, Phillipe Bruhat, Ian Langworth, John Beppu, Gavin Estey, Jim Brandt, Ask Bjoern Hansen, Greg Davies, Ed Silva, Mark-Jason Dominus, Autrijus Tang, Mark Fowler, Stuart Children, Max Maischein, Meng Wong, Prakash Kailasa, Abigail, Jan Pazdziora, Dominique Quatravaux, Scott Lanning, Rob Casey, Leland Johnson, Joshua Gatcomb, Julien Beasley, Abe Timmerman, Peter Stevens, Pete Krawczyk, Tad McClellan, and the late great Iain Truskett. =head1 COPYRIGHT Copyright (c) 2005-2010 Andy Lester. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut �������������������������������������������������������������������������������������������������WWW-Mechanize-1.73/lib/WWW/Mechanize/Cookbook.pod���������������������������������������������������000644 �000767 �000024 �00000004406 12026436042 021717� 0����������������������������������������������������������������������������������������������������ustar�00ether���������������������������staff���������������������������000000 �000000 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������=head1 NAME WWW::Mechanize::Cookbook - Recipes for using WWW::Mechanize =head1 INTRODUCTION First, please note that many of these are possible just using L<LWP::UserAgent>. Since C<WWW::Mechanize> is a subclass of L<LWP::UserAgent>, whatever works on C<LWP::UserAgent> should work on C<WWW::Mechanize>. See the L<lwpcook> man page included with the L<libwww-perl> distribution. =head1 BASICS =head2 Launch the WWW::Mechanize browser use WWW::Mechanize; my $mech = WWW::Mechanize->new( autocheck => 1 ); The C<< autocheck => 1 >> tells Mechanize to die if any IO fails, so you don't have to manually check. It's easier that way. If you want to do your own error checking, leave it out. =head2 Fetch a page $mech->get( "http://search.cpan.org" ); print $mech->content; C<< $mech->content >> contains the raw HTML from the web page. It is not parsed or handled in any way, at least through the C<content> method. =head2 Fetch a page into a file Sometimes you want to dump your results directly into a file. For example, there's no reason to read a JPEG into memory if you're only going to write it out immediately. This can also help with memory issues on large files. $mech->get( "http://www.cpan.org/src/stable.tar.gz", ":content_file" => "stable.tar.gz" ); =head2 Fetch a password-protected page Generally, just call C<credentials> before fetching the page. $mech->credentials( 'admin' => 'password' ); $mech->get( 'http://10.11.12.13/password.html' ); print $mech->content(); =head1 LINKS =head2 Find all image links Find all links that point to a JPEG, GIF or PNG. my @links = $mech->find_all_links( tag => "a", url_regex => qr/\.(jpe?g|gif|png)$/i ); =head2 Find all download links Find all links that have the word "download" in them. my @links = $mech->find_all_links( tag => "a", text_regex => qr/\bdownload\b/i ); =head1 APPLICATIONS =head2 Check all pages on a web site Use Abe Timmerman's L<WWW::CheckSite> L<http://search.cpan.org/dist/WWW-CheckSite/> =head1 SEE ALSO L<WWW::Mechanize> =head1 AUTHORS Copyright 2005-2010 Andy Lester C<< <andy@petdance.com> >> Later contributions by Peter Scott, Mark Stosberg and others. See Acknowledgements section in L<WWW::Mechanize> for more. =cut ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������WWW-Mechanize-1.73/lib/WWW/Mechanize/Examples.pod���������������������������������������������������000644 �000767 �000024 �00000037164 12206031123 021725� 0����������������������������������������������������������������������������������������������������ustar�00ether���������������������������staff���������������������������000000 �000000 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������=head1 NAME WWW::Mechanize::Examples - Sample programs that use WWW::Mechanize =head1 SYNOPSIS Plenty of people have learned WWW::Mechanize, and now, you can too! Following are user-supplied samples of WWW::Mechanize in action. If you have samples you'd like to contribute, please send 'em to C<< <andy@petdance.com> >>. You can also look at the F<t/*.t> files in the distribution. Please note that these examples are not intended to do any specific task. For all I know, they're no longer functional because the sites they hit have changed. They're here to give examples of how people have used WWW::Mechanize. Note that the examples are in reverse order of my having received them, so the freshest examples are always at the top. =head2 Starbucks Density Calculator, by Nat Torkington Here's a pair of programs from Nat Torkington, editor for O'Reilly Media and co-author of the I<Perl Cookbook>. =over 4 Rael [Dornfest] discovered that you can easily find out how many Starbucks there are in an area by searching for "Starbucks". So I wrote a silly scraper for some old census data and came up with some Starbucks density figures. There's no meaning to these numbers thanks to errors from using old census data coupled with false positives in Yahoo search (e.g., "Dodie Starbuck-Your Style Desgn" in Portland OR). But it was fun to waste a night on. Here are the top twenty cities in descending order of population, with the amount of territory each Starbucks has. E.g., A New York NY Starbucks covers 1.7 square miles of ground. New York, NY 1.7 Los Angeles, CA 1.2 Chicago, IL 1.0 Houston, TX 4.6 Philadelphia, PA 6.8 San Diego, CA 2.7 Detroit, MI 19.9 Dallas, TX 2.7 Phoenix, AZ 4.1 San Antonio, TX 12.3 San Jose, CA 1.1 Baltimore, MD 3.9 Indianapolis, IN 12.1 San Francisco, CA 0.5 Jacksonville, FL 39.9 Columbus, OH 7.3 Milwaukee, WI 5.1 Memphis, TN 15.1 Washington, DC 1.4 Boston, MA 0.5 =back C<get_pop_data> #!/usr/bin/perl -w use WWW::Mechanize; use Storable; $url = 'http://www.census.gov/population/www/documentation/twps0027.html'; $m = WWW::Mechanize->new(); $m->get($url); $c = $m->content; $c =~ m{<A NAME=.tabA.>(.*?)</TABLE>}s or die "Can't find the population table\n"; $t = $1; @outer = $t =~ m{<TR.*?>(.*?)</TR>}gs; shift @outer; foreach $r (@outer) { @bits = $r =~ m{<TD.*?>(.*?)</TD>}gs; for ($x = 0; $x < @bits; $x++) { $b = $bits[$x]; @v = split /\s*<BR>\s*/, $b; foreach (@v) { s/^\s+//; s/\s+$// } push @{$data[$x]}, @v; } } for ($y = 0; $y < @{$data[0]}; $y++) { $data{$data[1][$y]} = { NAME => $data[1][$y], RANK => $data[0][$y], POP => comma_free($data[2][$y]), AREA => comma_free($data[3][$y]), DENS => comma_free($data[4][$y]), }; } store(\%data, "cities.dat"); sub comma_free { my $n = shift; $n =~ s/,//; return $n; } C<plague_of_coffee> #!/usr/bin/perl -w use WWW::Mechanize; use strict; use Storable; $SIG{__WARN__} = sub {} ; # ssssssh my $Cities = retrieve("cities.dat"); my $m = WWW::Mechanize->new(); $m->get("http://local.yahoo.com/"); my @cities = sort { $Cities->{$a}{RANK} <=> $Cities->{$b}{RANK} } keys %$Cities; foreach my $c ( @cities ) { my $fields = { 'stx' => "starbucks", 'csz' => $c, }; my $r = $m->submit_form(form_number => 2, fields => $fields); die "Couldn't submit form" unless $r->is_success; my $hits = number_of_hits($r); # my $ppl = sprintf("%d", 1000 * $Cities->{$c}{POP} / $hits); # print "$c has $hits Starbucks. That's one for every $ppl people.\n"; my $density = sprintf("%.1f", $Cities->{$c}{AREA} / $hits); print "$c : $density\n"; } sub number_of_hits { my $r = shift; my $c = $r->content; if ($c =~ m{\d+ out of <b>(\d+)</b> total results for}) { return $1; } if ($c =~ m{Sorry, no .*? found in or near}) { return 0; } if ($c =~ m{Your search matched multiple cities}) { warn "Your search matched multiple cities\n"; return 0; } if ($c =~ m{Sorry we couldn.t find that location}) { warn "No cities\n"; return 0; } if ($c =~ m{Could not find.*?, showing results for}) { warn "No matches\n"; return 0; } die "Unknown response\n$c\n"; } =head2 pb-upload, by John Beppu This program takes filenames of images from the command line and uploads them to a www.photobucket.com folder. John Beppu, the author, says: =over 4 I had 92 pictures I wanted to upload, and doing it through a browser would've been torture. But thanks to mech, all I had to do was `./pb.upload *.jpg` and watch it do its thing. It felt good. If I had more time, I'd implement WWW::Photobucket on top of WWW::Mechanize. =back #!/usr/bin/perl -w -T use strict; use WWW::Mechanize; my $login = "login_name"; my $password = "password"; my $folder = "folder"; my $url = "http://img78.photobucket.com/albums/v281/$login/$folder/"; # login to your photobucket.com account my $mech = WWW::Mechanize->new(); $mech->get($url); $mech->submit_form( form_number => 1, fields => { password => $password }, ); die unless ($mech->success); # upload image files specified on command line foreach (@ARGV) { print "$_\n"; $mech->form_number(2); $mech->field('the_file[]' => $_); $mech->submit(); } =head2 listmod, by Ian Langworth Ian Langworth contributes this little gem that will bring joy to beleaguered mailing list admins. It discards spam messages through mailman's web interface. #!/arch/unix/bin/perl use strict; use warnings; # # listmod - fast alternative to mailman list interface # # usage: listmod crew XXXXXXXX # die "usage: $0 <listname> <password>\n" unless @ARGV == 2; my ($listname, $password) = @ARGV; use CGI qw(unescape); use WWW::Mechanize; my $m = WWW::Mechanize->new( autocheck => 1 ); use Term::ReadLine; my $term = Term::ReadLine->new($0); # submit the form, get the cookie, go to the list admin page $m->get("https://lists.ccs.neu.edu/bin/admindb/$listname"); $m->set_visible( $password ); $m->click; # exit if nothing to do print "There are no pending requests.\n" and exit if $m->content =~ /There are no pending requests/; # select the first form and examine its contents $m->form_number(1); my $f = $m->current_form or die "Couldn't get first form!\n"; # get me the base form element for each email item my @items = map {m/^.+?-(.+)/} grep {m/senderbanp/} $f->param or die "Couldn't get items in first form!\n"; # iterate through items, prompt user, commit actions foreach my $item (@items) { # show item info my $sender = unescape($item); my ($subject) = [$f->find_input("senderbanp-$item")->value_names]->[1] =~ /Subject:\s+(.+?)\s+Size:/g; # prompt user my $choice = ''; while ( $choice !~ /^[DAX]$/ ) { print "$sender\: '$subject'\n"; $choice = uc $term->readline("Action: defer/accept/discard [dax]: "); print "\n\n"; } # set button $m->field("senderaction-$item" => {D=>0,A=>1,X=>3}->{$choice}); } # submit actions $m->click; =head2 ccdl, by Andy Lester Steve McConnell, author of the landmark I<Code Complete> has put up the chapters for the 2nd edition in PDF format on his website. I needed to download them to take to Kinko's to have printed. This little program did it for me. #!/usr/bin/perl -w use strict; use WWW::Mechanize; my $start = "http://www.stevemcconnell.com/cc2/cc.htm"; my $mech = WWW::Mechanize->new( autocheck => 1 ); $mech->get( $start ); my @links = $mech->find_all_links( url_regex => qr/\d+.+\.pdf$/ ); for my $link ( @links ) { my $url = $link->url_abs; my $filename = $url; $filename =~ s[^.+/][]; print "Fetching $url"; $mech->get( $url, ':content_file' => $filename ); print " ", -s $filename, " bytes\n"; } =head2 quotes.pl, by Andy Lester This was a program that was going to get a hack in I<Spidering Hacks>, but got cut at the last minute, probably because it's against IMDB's TOS to scrape from it. I present it here as an example, not a suggestion that you break their TOS. Last I checked, it didn't work because their HTML didn't match, but it's still good as sample code. #!/usr/bin/perl -w use strict; use WWW::Mechanize; use Getopt::Long; use Text::Wrap; my $match = undef; my $random = undef; GetOptions( "match=s" => \$match, "random" => \$random, ) or exit 1; my $movie = shift @ARGV or die "Must specify a movie\n"; my $quotes_page = get_quotes_page( $movie ); my @quotes = extract_quotes( $quotes_page ); if ( $match ) { $match = quotemeta($match); @quotes = grep /$match/i, @quotes; } if ( $random ) { print $quotes[rand @quotes]; } else { print join( "\n", @quotes ); } sub get_quotes_page { my $movie = shift; my $mech = WWW::Mechanize->new; $mech->get( "http://www.imdb.com/search" ); $mech->success or die "Can't get the search page"; $mech->submit_form( form_number => 2, fields => { title => $movie, restrict => "Movies only", }, ); my @links = $mech->find_all_links( url_regex => qr[^/Title] ) or die "No matches for \"$movie\" were found.\n"; # Use the first link my ( $url, $title ) = @{$links[0]}; warn "Checking $title...\n"; $mech->get( $url ); my $link = $mech->find_link( text_regex => qr/Memorable Quotes/i ) or die qq{"$title" has no quotes in IMDB!\n}; warn "Fetching quotes...\n\n"; $mech->get( $link->[0] ); return $mech->content; } sub extract_quotes { my $page = shift; # Nibble away at the unwanted HTML at the beginnning... $page =~ s/.+Memorable Quotes//si; $page =~ s/.+?(<a name)/$1/si; # ... and the end of the page $page =~ s/Browse titles in the movie quotes.+$//si; $page =~ s/<p.+$//g; # Quotes separated by an <HR> tag my @quotes = split( /<hr.+?>/, $page ); for my $quote ( @quotes ) { my @lines = split( /<br>/, $quote ); for ( @lines ) { s/<[^>]+>//g; # Strip HTML tags s/\s+/ /g; # Squash whitespace s/^ //; # Strip leading space s/ $//; # Strip trailing space s/"/"/g; # Replace HTML entity quotes # Word-wrap to fit in 72 columns $Text::Wrap::columns = 72; $_ = wrap( '', ' ', $_ ); } $quote = join( "\n", @lines ); } return @quotes; } =head2 cpansearch.pl, by Ed Silva A quick little utility to search the CPAN and fire up a browser with a results page. #!/usr/bin/perl # turn on perl's safety features use strict; use warnings; # work out the name of the module we're looking for my $module_name = $ARGV[0] or die "Must specify module name on command line"; # create a new browser use WWW::Mechanize; my $browser = WWW::Mechanize->new(); # tell it to get the main page $browser->get("http://search.cpan.org/"); # okay, fill in the box with the name of the # module we want to look up $browser->form_number(1); $browser->field("query", $module_name); $browser->click(); # click on the link that matches the module name $browser->follow_link( text_regex => $module_name ); my $url = $browser->uri; # launch a browser... system('galeon', $url); exit(0); =head2 lj_friends.cgi, by Matt Cashner #!/usr/bin/perl # Provides an rss feed of a paid user's LiveJournal friends list # Full entries, protected entries, etc. # Add to your favorite rss reader as # http://your.site.com/cgi-bin/lj_friends.cgi?user=USER&password=PASSWORD use warnings; use strict; use WWW::Mechanize; use CGI; my $cgi = CGI->new(); my $form = $cgi->Vars; my $agent = WWW::Mechanize->new(); $agent->get('http://www.livejournal.com/login.bml'); $agent->form_number('3'); $agent->field('user',$form->{user}); $agent->field('password',$form->{password}); $agent->submit(); $agent->get('http://www.livejournal.com/customview.cgi?user='.$form->{user}.'&styleid=225596&checkcookies=1'); print "Content-type: text/plain\n\n"; print $agent->content(); =head2 Hacking Movable Type, by Dan Rinzel use strict; use WWW::Mechanize; # a tool to automatically post entries to a moveable type weblog, and set arbitrary creation dates my $mech = WWW::Mechanize->new(); my $entry; $entry->{title} = "Test AutoEntry Title"; $entry->{btext} = "Test AutoEntry Body"; $entry->{date} = '2002-04-15 14:18:00'; my $start = qq|http://my.blog.site/mt.cgi|; $mech->get($start); $mech->field('username','und3f1n3d'); $mech->field('password','obscur3d'); $mech->submit(); # to get login cookie $mech->get(qq|$start?__mode=view&_type=entry&blog_id=1|); $mech->form_name('entry_form'); $mech->field('title',$entry->{title}); $mech->field('category_id',1); # adjust as needed $mech->field('text',$entry->{btext}); $mech->field('status',2); # publish, or 1 = draft $results = $mech->submit(); # if we're ok with this entry being datestamped "NOW" (no {date} in %entry) # we're done. Otherwise, time to be tricksy # MT returns a 302 redirect from this form. the redirect itself contains a <body onload=""> handler # which takes the user to an editable version of the form where the create date can be edited # MT date format of YYYY-MM-DD HH:MI:SS is the only one that won't error out if ($entry->{date} && $entry->{date} =~ /^\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}/) { # travel the redirect $results = $mech->get($results->{_headers}->{location}); $results->{_content} =~ /<body onLoad="([^\"]+)"/is; my $js = $1; $js =~ /\'([^']+)\'/; $results = $mech->get($start.$1); $mech->form_name('entry_form'); $mech->field('created_on_manual',$entry->{date}); $mech->submit(); } =head2 get-despair, by Randal Schwartz Randal submitted this bot that walks the despair.com site sucking down all the pictures. use strict; $|++; use WWW::Mechanize; use File::Basename; my $m = WWW::Mechanize->new; $m->get("http://www.despair.com/indem.html"); my @top_links = @{$m->links}; for my $top_link_num (0..$#top_links) { next unless $top_links[$top_link_num][0] =~ /^http:/; $m->follow_link( n=>$top_link_num ) or die "can't follow $top_link_num"; print $m->uri, "\n"; for my $image (grep m{^http://store4}, map $_->[0], @{$m->links}) { my $local = basename $image; print " $image...", $m->mirror($image, $local)->message, "\n" } $m->back or die "can't go back"; } ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������WWW-Mechanize-1.73/lib/WWW/Mechanize/FAQ.pod��������������������������������������������������������000644 �000767 �000024 �00000037423 12206031123 020554� 0����������������������������������������������������������������������������������������������������ustar�00ether���������������������������staff���������������������������000000 �000000 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������=head1 NAME WWW::Mechanize::FAQ - Frequently Asked Questions about WWW::Mechanize =head1 How to get help with WWW::Mechanize If your question isn't answered here in the FAQ, please turn to the communities at: =over =item * L<http://perlmonks.org> =item * The libwww-perl mailing list at L<http://lists.perl.org> =back =head1 JavaScript =head2 I have this web page that has JavaScript on it, and my Mech program doesn't work. That's because WWW::Mechanize doesn't operate on the JavaScript. It only understands the HTML parts of the page. =head2 I thought Mech was supposed to work like a web browser. It does pretty much, but it doesn't support JavaScript. I added some basic attempts at picking up URLs in C<window.open()> calls and return them in C<< $mech->links >>. They work sometimes. Since Javascript is completely visible to the client, it cannot be used to prevent a scraper from following links. But it can make life difficult. If you want to scrape specific pages, then a solution is always possible. One typical use of Javascript is to perform argument checking before posting to the server. The URL you want is probably just buried in the Javascript function. Do a regular expression match on C<< $mech->content() >> to find the link that you want and C<< $mech->get >> it directly (this assumes that you know what you are looking for in advance). In more difficult cases, the Javascript is used for URL mangling to satisfy the needs of some middleware. In this case you need to figure out what the Javascript is doing (why are these URLs always really long?). There is probably some function with one or more arguments which calculates the new URL. Step one: using your favorite browser, get the before and after URLs and save them to files. Edit each file, converting the argument separators ('?', '&' or ';') into newlines. Now it is easy to use diff or comm to find out what Javascript did to the URL. Step 2 - find the function call which created the URL - you will need to parse and interpret its argument list. The Javascript Debugger in the Firebug extension for Firefox helps with the analysis. At this point, it is fairly trivial to write your own function which emulates the Javascript for the pages you want to process. Here's another approach that answers the question, "It works in Firefox, but why not Mech?" Everything the web server knows about the client is present in the HTTP request. If two requests are identical, the results should be identical. So the real question is "What is different between the mech request and the Firefox request?" The Firefox extension "Tamper Data" is an effective tool for examining the headers of the requests to the server. Compare that with what LWP is sending. Once the two are identical, the action of the server should be the same as well. I say "should", because this is an oversimplification - some values are naturally unique, e.g. a SessionID, but if a SessionID is present, that is probably sufficient, even though the value will be different between the LWP request and the Firefox request. The server could use the session to store information which is troublesome, but that's not the first place to look (and highly unlikely to be relevant when you are requesting the login page of your site). Generally the problem is to be found in missing or incorrect POSTDATA arguments, Cookies, User-Agents, Accepts, etc. If you are using mech, then redirects and cookies should not be a problem, but are listed here for completeness. If you are missing headers, C<< $mech->add_header >> can be used to add the headers that you need. =head2 Which modules work like Mechanize and have JavaScript support? In no particular order: L<Gtk2::WebKit::Mechanize>, L<Win32::IE::Mechanize>, L<WWW::Mechanize::Firefox>, L<WWW::Scripter>, L<WWW::Selenium> =head1 How do I do X? =head2 Can I do [such-and-such] with WWW::Mechanize? If it's possible with LWP::UserAgent, then yes. WWW::Mechanize is a subclass of L<LWP::UserAgent>, so all the wondrous magic of that class is inherited. =head2 How do I use WWW::Mechanize through a proxy server? See the docs in L<LWP::UserAgent> on how to use the proxy. Short version: $mech->proxy(['http', 'ftp'], 'http://proxy.example.com:8000/'); or get the specs from the environment: $mech->env_proxy(); # Environment set like so: gopher_proxy=http://proxy.my.place/ wais_proxy=http://proxy.my.place/ no_proxy="localhost,my.domain" export gopher_proxy wais_proxy no_proxy =head2 How can I see what fields are on the forms? Use the mech-dump utility, optionally installed with Mechanize. $ mech-dump --forms http://search.cpan.org Dumping forms GET http://search.cpan.org/search query= mode=all (option) [*all|module|dist|author] <NONAME>=CPAN Search (submit) =head2 How do I get Mech to handle authentication? use MIME::Base64; my $agent = WWW::Mechanize->new(); my @args = ( Authorization => "Basic " . MIME::Base64::encode( USER . ':' . PASS ) ); $agent->credentials( ADDRESS, REALM, USER, PASS ); $agent->get( URL, @args ); If you want to use the credentials for all future requests, you can also use the L<LWP::UserAgent> C<default_header()> method instead of the extra arguments to C<get()> $mech->default_header( Authorization => 'Basic ' . encode_base64( USER . ':' . PASSWORD ) ); =head2 How can I get WWW::Mechanize to execute this JavaScript? You can't. JavaScript is entirely client-based, and WWW::Mechanize is a client that doesn't understand JavaScript. See the top part of this FAQ. =head2 How do I check a checkbox that doesn't have a value defined? Set it to the value of "on". $mech->field( my_checkbox => 'on' ); =head2 How do I handle frames? You don't deal with them as frames, per se, but as links. Extract them with my @frame_links = $mech->find_link( tag => "frame" ); =head2 How do I get a list of HTTP headers and their values? All L<HTTP::Headers> methods work on a L<HTTP::Response> object which is returned by the I<get()>, I<reload()>, I<response()/res()>, I<click()>, I<submit_form()>, and I<request()> methods. my $mech = WWW::Mechanize->new( autocheck => 1 ); $mech->get( 'http://my.site.com' ); my $response = $mech->response(); for my $key ( $response->header_field_names() ) { print $key, " : ", $response->header( $key ), "\n"; } =head2 How do I enable keep-alive? Since L<WWW::Mechanize> is a subclass of L<LWP::UserAgent>, you can use the same mechanism to enable keep-alive: use LWP::ConnCache; ... $mech->conn_cache(LWP::ConnCache->new); =head2 How can I change/specify the action parameter of an HTML form? You can access the action of the form by utilizing the L<HTML::Form> object returned from one of the specifying form methods. Using C<< $mech->form_number($number) >>: my $mech = WWW::mechanize->new; $mech->get('http://someurlhere.com'); # Access the form using its Zero-Based Index by DOM order $mech->form_number(0)->action('http://newAction'); #ABS URL Using C<< $mech->form_name($number) >>: my $mech = WWW::mechanize->new; $mech->get('http://someurlhere.com'); #Access the form using its Zero-Based Index by DOM order $mech->form_name('trgForm')->action('http://newAction'); #ABS URL =head2 How do I save an image? How do I save a large tarball? An image is just content. You get the image and save it. $mech->get( 'photo.jpg' ); $mech->save_content( '/path/to/my/directory/photo.jpg' ); You can also save any content directly to disk using the C<:content_file> flag to C<get()>, which is part of L<LWP::UserAgent>. $mech->get( 'http://www.cpan.org/src/stable.tar.gz', ':content_file' => 'stable.tar.gz' ); =head2 How do I pick a specific value from a C<< <select> >> list? Find the C<HTML::Form::ListInput> in the page. my ($listbox) = $mech->find_all_inputs( name => 'listbox' ); Then create a hash for the lookup: my %name_lookup; @name_lookup{ $listbox->value_names } = $listbox->possible_values; my $value = $name_lookup{ 'Name I want' }; If you have duplicate names, this method won't work, and you'll have to loop over C<< $listbox->value_names >> and C<< $listbox->possible_values >> in parallel until you find a matching name. =head2 How do I get Mech to not follow redirects? You use functionality in LWP::UserAgent, not Mech itself. $mech->requests_redirectable( [] ); Or you can set C<max_redirect>: $mech->max_redirect( 0 ); Both these options can also be set in the constructor. Mech doesn't understand them, so will pass them through to the LWP::UserAgent constructor. =head1 Why doesn't this work: Debugging your Mechanize program =head2 My Mech program doesn't work, but it works in the browser. Mechanize acts like a browser, but apparently something you're doing is not matching the browser's behavior. Maybe it's expecting a certain web client, or maybe you've not handling a field properly. For some reason, your Mech problem isn't doing exactly what the browser is doing, and when you find that, you'll have the answer. =head2 My Mech program gets these 500 errors. A 500 error from the web server says that the program on the server side died. Probably the web server program was expecting certain inputs that you didn't supply, and instead of handling it nicely, the program died. Whatever the cause of the 500 error, if it works in the browser, but not in your Mech program, you're not acting like the browser. See the previous question. =head2 Why doesn't my program handle this form correctly? Run F<mech-dump> on your page and see what it says. F<mech-dump> is a marvelous diagnostic tool for figuring out what forms and fields are on the page. Say you're scraping CNN.com, you'd get this: $ mech-dump http://www.cnn.com/ GET http://search.cnn.com/cnn/search source=cnn (hidden readonly) invocationType=search/top (hidden readonly) sites=web (radio) [*web/The Web ??|cnn/CNN.com ??] query= (text) <NONAME>=Search (submit) POST http://cgi.money.cnn.com/servlets/quote_redirect query= (text) <NONAME>=GET (submit) POST http://polls.cnn.com/poll poll_id=2112 (hidden readonly) question_1=<UNDEF> (radio) [1/Simplistic option|2/VIEW RESULTS] <NONAME>=VOTE (submit) GET http://search.cnn.com/cnn/search source=cnn (hidden readonly) invocationType=search/bottom (hidden readonly) sites=web (radio) [*web/??CNN.com|cnn/??] query= (text) <NONAME>=Search (submit) Four forms, including the first one duplicated at the end. All the fields, all their defaults, lovingly generated by HTML::Form's C<dump> method. If you want to run F<mech-dump> on something that doesn't lend itself to a quick URL fetch, then use the C<save_content()> method to write the HTML to a file, and run F<mech-dump> on the file. =head2 Why don't https:// URLs work? You need either L<IO::Socket::SSL> or L<Crypt::SSLeay> installed. =head2 Why do I get "Input 'fieldname' is readonly"? You're trying to change the value of a hidden field and you have warnings on. First, make sure that you actually mean to change the field that you're changing, and that you don't have a typo. Usually, hidden variables are set by the site you're working on for a reason. If you change the value, you might be breaking some functionality by faking it out. If you really do want to change a hidden value, make the changes in a scope that has warnings turned off: { local $^W = 0; $agent->field( name => $value ); } =head2 I tried to [such-and-such] and I got this weird error. Are you checking your errors? Are you sure? Are you checking that your action succeeded after every action? Are you sure? For example, if you try this: $mech->get( "http://my.site.com" ); $mech->follow_link( "foo" ); and the C<get> call fails for some reason, then the Mech internals will be unusable for the C<follow_link> and you'll get a weird error. You B<must>, after every action that GETs or POSTs a page, check that Mech succeeded, or all bets are off. $mech->get( "http://my.site.com" ); die "Can't even get the home page: ", $mech->response->status_line unless $mech->success; $mech->follow_link( "foo" ); die "Foo link failed: ", $mech->response->status_line unless $mech->success; =head2 How do I figure out why C<< $mech->get($url) >> doesn't work? There are many reasons why a C<< get() >> can fail. The server can take you to someplace you didn't expect. It can generate redirects which are not properly handled. You can get time-outs. Servers are down more often than you think! etc, etc, etc. A couple of places to start: =over 4 =item 1 Check C<< $mech->status() >> after each call =item 2 Check the URL with C<< $mech->uri() >> to see where you ended up =item 3 Try debugging with C<< LWP::Debug >>. =back If things are really strange, turn on debugging with C<< use LWP::Debug qw(+); >> Just put this in the main program. This causes LWP to print out a trace of the HTTP traffic between client and server and can be used to figure out what is happening at the protocol level. It is also useful to set many traps to verify that processing is proceeding as expected. A Mech program should always have an "I didn't expect to get here" or "I don't recognize the page that I am processing" case and bail out. Since errors can be transient, by the time you notice that the error has occurred, it might not be possible to reproduce it manually. So for automated processing it is useful to email yourself the following information: =over 4 =item * where processing is taking place =item * An Error Message =item * $mech->uri =item * $mech->content =back You can also save the content of the page with C<< $mech->save_content( 'filename.html' ); >> =head2 I submitted a form, but the server ignored everything! I got an empty form back! The post is handled by application software. It is common for PHP programmers to use the same file both to display a form and to process the arguments returned. So the first task of the application programmer is to decide whether there are arguments to processes. The program can check whether a particular parameter has been set, whether a hidden parameter has been set, or whether the submit button has been clicked. (There are probably other ways that I haven't thought of). In any case, if your form is not setting the parameter (e.g. the submit button) which the web application is keying on (and as an outsider there is no way to know what it is keying on), it will not notice that the form has been submitted. Try using C<< $mech->click() >> instead of C<< $mech->submit() >> or vice-versa. =head2 I've logged in to the server, but I get 500 errors when I try to get to protected content. Some web sites use distributed databases for their processing. It can take a few seconds for the login/session information to percolate through to all the servers. For human users with their slow reaction times, this is not a problem, but a Perl script can outrun the server. So try adding a C<sleep(5)> between logging in and actually doing anything (the optimal delay must be determined experimentally). =head2 Mech is a big memory pig! I'm running out of RAM! Mech keeps a history of every page, and the state it was in. It actually keeps a clone of the full Mech object at every step along the way. You can limit this stack size with the C<stack_depth> parm in the C<new()> constructor. If you set stack_size to 0, Mech will not keep any history. =head1 AUTHOR Copyright 2005-2009 Andy Lester C<< <andy at petdance.com> >> =cut ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������WWW-Mechanize-1.73/lib/WWW/Mechanize/Image.pm�������������������������������������������������������000644 �000767 �000024 �00000005007 12026436042 021023� 0����������������������������������������������������������������������������������������������������ustar�00ether���������������������������staff���������������������������000000 �000000 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������package WWW::Mechanize::Image; # vi:et:sw=4 ts=4 use strict; use warnings; =head1 NAME WWW::Mechanize::Image - Image object for WWW::Mechanize =head1 SYNOPSIS Image object to encapsulate all the stuff that Mech needs =head1 Constructor =head2 new() Creates and returns a new C<WWW::Mechanize::Image> object. my $image = WWW::Mechanize::Image->new( { url => $url, base => $base, tag => $tag, name => $name, # From the INPUT tag height => $height, # optional width => $width, # optional alt => $alt, # optional } ); =cut sub new { my $class = shift; my $parms = shift || {}; my $self = bless {}, $class; for my $parm ( qw( url base tag height width alt name ) ) { # Check for what we passed in, not whether it's defined $self->{$parm} = $parms->{$parm} if exists $parms->{$parm}; } # url and tag are always required for ( qw( url tag ) ) { exists $self->{$_} or die "WWW::Mechanize::Image->new must have a $_ argument"; } return $self; } =head1 Accessors =head2 $link->url() URL from the link =head2 $link->base() Base URL to which the links are relative. =head2 $link->name() Name for the field from the NAME attribute, if any. =head2 $link->tag() Tag name (either "image" or "input") =head2 $link->height() Image height =head2 $link->width() Image width =head2 $link->alt() ALT attribute from the source tag, if any. =cut sub url { return ($_[0])->{url}; } sub base { return ($_[0])->{base}; } sub name { return ($_[0])->{name}; } sub tag { return ($_[0])->{tag}; } sub height { return ($_[0])->{height}; } sub width { return ($_[0])->{width}; } sub alt { return ($_[0])->{alt}; } =head2 $link->URI() Returns the URL as a L<URI::URL> object. =cut sub URI { my $self = shift; require URI::URL; my $URI = URI::URL->new( $self->url, $self->base ); return $URI; } =head2 $link->url_abs() Returns the URL as an absolute URL string. =cut sub url_abs { my $self = shift; return $self->URI->abs; } =head1 SEE ALSO L<WWW::Mechanize> and L<WWW::Mechanize::Link> =head1 COPYRIGHT & LICENSE Copyright 2004-2010 Andy Lester. This program is free software; you can redistribute it and/or modify it under the terms of either: =over 4 =item * the GNU General Public License as published by the Free Software Foundation; either version 1, or (at your option) any later version, or =item * the Artistic License version 2.0. =back =cut 1; �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������WWW-Mechanize-1.73/lib/WWW/Mechanize/Link.pm��������������������������������������������������������000644 �000767 �000024 �00000004603 12026436042 020677� 0����������������������������������������������������������������������������������������������������ustar�00ether���������������������������staff���������������������������000000 �000000 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������package WWW::Mechanize::Link; use strict; use warnings; =head1 NAME WWW::Mechanize::Link - Link object for WWW::Mechanize =head1 SYNOPSIS Link object to encapsulate all the stuff that Mech needs but nobody wants to deal with as an array. =head1 Constructor =head2 new() my $link = WWW::Mechanize::Link->new( { url => $url, text => $text, name => $name, tag => $tag, base => $base, attr => $attr_href, } ); For compatibility, this older interface is also supported: new( $url, $text, $name, $tag, $base, $attr_href ) Creates and returns a new C<WWW::Mechanize::Link> object. =cut sub new { my $class = shift; my $self; # The order of the first four must stay as they are for # compatibility with older code. if ( ref $_[0] eq 'HASH' ) { $self = [ @{$_[0]}{ qw( url text name tag base attrs ) } ]; } else { $self = [ @_ ]; } return bless $self, $class; } =head1 Accessors =head2 $link->url() URL from the link =head2 $link->text() Text of the link =head2 $link->name() NAME attribute from the source tag, if any. =head2 $link->tag() Tag name (one of: "a", "area", "frame", "iframe" or "meta"). =head2 $link->base() Base URL to which the links are relative. =head2 $link->attrs() Returns hash ref of all the attributes and attribute values in the tag. =cut sub url { return ($_[0])->[0]; } sub text { return ($_[0])->[1]; } sub name { return ($_[0])->[2]; } sub tag { return ($_[0])->[3]; } sub base { return ($_[0])->[4]; } sub attrs { return ($_[0])->[5]; } =head2 $link->URI() Returns the URL as a L<URI::URL> object. =cut sub URI { my $self = shift; require URI::URL; my $URI = URI::URL->new( $self->url, $self->base ); return $URI; } =head2 $link->url_abs() Returns a L<URI::URL> object for the absolute form of the string. =cut sub url_abs { my $self = shift; return $self->URI->abs; } =head1 SEE ALSO L<WWW::Mechanize> and L<WWW::Mechanize::Image> =head1 COPYRIGHT & LICENSE Copyright 2004-2010 Andy Lester. This program is free software; you can redistribute it and/or modify it under the terms of either: =over 4 =item * the GNU General Public License as published by the Free Software Foundation; either version 1, or (at your option) any later version, or =item * the Artistic License version 2.0. =back =cut # vi:et:sw=4 ts=4 1; �����������������������������������������������������������������������������������������������������������������������������WWW-Mechanize-1.73/etc/www-mechanize-logo.png�������������������������������������������������������000644 �000767 �000024 �00000114117 12026436042 021315� 0����������������������������������������������������������������������������������������������������ustar�00ether���������������������������staff���������������������������000000 �000000 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������PNG  ��� IHDR�����a���Q���gAMA��OX2���tEXtSoftware�Adobe ImageReadyqe<��IDATxbk#vψG=#A[Fi?'\jH1~�` ?X F3)22T T .TTIG+Q0 F_8�0#}Ύ:h4i*.�\kJ:7L/i,xiXgcn u/Mddce~jrjZxʧz02?^PUM3Lfg3_8� @$'@k%mơg ?B;>9'!6|[dG4"$ ߪ-vźB H_  L39 p/8Hd g)�@@_w Z r6MSPts)c0)b B)$/rGӫ}qdb4%3mqM#h7 qc@GMSA) Di0*εj ܻy@32>q;@1D'ϻŷs}ff&7~g`,$?a.#j1_!c-zoTsz}Ep?#\=RVpM Hr`>$ `1!nX k=x)E > eZ 09|�R@u¿AHB f= ݳ&< XLJZ!b 9"_#cQ Li@4 + Hk%)!2Rq$b ZIl]1 Cըa#uqmT{I(GQf}z̙pb"e@-DFZĠ+=$i?b )ߠ:p"C //!P⪘Bӊ U̔T(lX~?ޠʘM23Bff`bffp|u00FE$#LQ'Tt"FLIM z.(ZUr a=Xʜ?%{H2f6 zXE ˳^_ Ue~_[x*DRA@F2Bf`0]7c&+ ?;;sq0pp+bP `eHon,P62#Bp0?,\U0ÐcUhϨ71ẀOрZ]L<ZQ�fW^28'R1OPP/??1|>˅h.2T0 VrLh8~;c;';B JX9+cf5|8B-S<\=P'Yt|{6bP*X.R?J?}csYdԴ. bz4ӺXk:#r4[f cd H+hP nWgA+D01 l31Vnl5C;`vv6n+fVȬdHo2f9LXjh$[*Vl!#,r%2>R#A~ n.I:&ew_Fg5/ e3B=hH/47r 2B  XIx";B+`Bn4 XLLo.2rqs1q3ppa3=ePe =eQ\1b)) ZH&׈׊V/ZIWh9ROU#9D_0# :31h+"҇adK>P+!!2 ?@*_/p47_+?D {02b@x +Cb[[)~=׿rrq p*e0D-+enN<3+ 3rf]y11b/Ph 91bIz)T22 ;t.+U -=b_i P{Ì Hu3feA6\GOiy5 Flk2"z0ߠ ~昑*hp?=s-o2|<~ZAUzT)*`3@�&A3i x)z&ȥ Ҙiq1l 4 |WQގhjUl_6zilY-Rrdg_}bRce874Y$+ŀ~CHCtv?[KACI_Q/U-HBQ~Ḽr@V3PrYY@eLZ!E}ͩn- �$@1�5D ^G%겈W3y5OX!+e z̐lP`a=e&FFzSfZF2)#EYJbF-B_T}jO?J mS0hT!A gz( 76nĥѕ#o7T6lg#lTQz[1Gca6V 2 Y A;x0+NvhJFaҿHlP [h!;� I2Rf23-\C+d`/O4 -bbP{ː!lfX)L><HȌ5 'Ne00[Ok%W }'ӓX;-vEHl0ed@vfd&b A5x/?ʗ7p% FY [s;@u&L 6,H43fB=3 ULh}+g�b! w`feVƼJR98y 3x[BṴ l2WFF@<*'W ׶нȩo/fd@lB>D?QV@#oaRo<%}VӰX h(fb fE+h&Fؚ&f6]F ׷�fEa2#R=ÒȪ޲h&fh^`E' ^ VҠ!m6dRfa2|{#Z^)chfBM8\|MOre6wX.b�aو<ȳ=ZWM ž Nv ŚqBv |} kU?*HG?8g06+3 4o lZO2C1RMQ ^i_L3`C >h^T)9A{!eИ? Rf,b_;ƆlD&'m3`~&y=/q"x*i41 و\A PWu-,!_EkH3St#9tb'z3ghp j> ۹<%H~A?? 4m 1C*eQ-gB?*flۈ"9f�B_~/^17 A[882h>ΌW t/d4lܔԻ#&Qǐ0$3ֽȕ+#8*$68F�RI(A(1 7I8GoT`6%LH= /h?(h匣,]7+fXڅ^Y;} ^@a;| 3A0F$f|3l_4Og0K K%̀V93!U$U�;{A :bSR.>qtLq@bK�!յ2nwٹd�^06i \ rMnFA\-Hk_x}w8qio6~,P;;!i:TJ!kQj߬_1Y`ja�/+c8f+gSg!uAv0 ftDZ+ν4YAyc=V~֜0w-9@C,01ny31t쟡T(?Iu1 .5/SjZZ*B. Gv* t'z/HZR_?@: Xh t<.>�6TΡ!�lX Yuw^Y۱:g{;Z2f~C1OyX,MR[+ $|/'[-݂5g.ˀ/ĥ+, mCKꡟvxf[婩Ry(_pw- 0 P e S/[t B2@sH*3kVCɆaO|1}Xgx], 3c'2' qiGy`HE+F#j*0b (EIy7iu:oar=(Pkͤ;hV $ZTY˳@>6oQL,8@3]JWK a_*z&.YrPPVA%z,}N0!J5? w/`KMX1MI;ڃIϓZk7EB�#_EUSkm=cϩOV o&ro6 LS1asӬt\jbq4mڤe t8p<o' ^ f�Li)PԡS nX+i>֫<}b;n׉v!ke[ #&` f =Go>a$:~_AtNEvGLsf4m%z\9|G0Rp 0[.ip zཷVU1D.z^;-ygx]�M2^춅Fe4l'黮Ͽ ]Qf~*q1W^}݅Hۗ�x0K63A%.Se4{,<甭ִ ׵TC$`9mvZ:%WjpZ!z\4KW\?bٓ?+uGX8Dw>祮nJڰ/qgF#T{X^cب%9'Z�ȸ V@%�j@C./P-S|e#WuU\!P@\)E;cg$2`*LaXb n^@ٲ0{!9a~#xu2GP}+hm"l"4((iEmzЊPEP% = ""D/ zVD$(YJ&&E*FuN3%dwg͛p2;Fo>|世12s@@n7, ahZH$D"@FJ_O8}~klC6;L&$d28ԋI b1i\jr9;|>PV$ЅaDwFe$+ݟ@uޖ5uYdsPm sq2›$kmnFc_pNH1SĽDҮ̫ѳwbJcaݔ�CKQ*"<p#!d5#mVi<r=`wwoñܓ$ (\ :~C1}V/?vQ71jO9>nj{B-[ݗJps6^}Y <7\PqŁ&`<? 1 (Aңׂ,:u T|nEOw]epc"g9=żJpDžRƼ֧S'A,`Z|b)6`?K[6e26Crk\LNRvhe9:ך�J8O�.`AOe&}LU‚ |E_ ܠD@̰#7`ۣX!Gm s0ed Ie`8A]pBH޲<+___366f6|||rފnݼy3ýb`3} 1[ eJ52l"Q'`nyGj!֌ :gpZק "  -bxk\+<c[W ``btܫR(gs|+p.,St28WV3HV2paKb8缕. x4/><t0P`ϡ%pV h C/%6xCWW7 ђ?zQk1?r yn{V1#7{HH�22Kօ5)y.7~&x JXeL)0=<d(|1EF|!Wн5$`bJ'FqJ\^h(}C4|4haj?@o1a\@;N篟0| ߼g6oA+$6 bP ;vJCH0 �];KAH(h6bVV6ւ,E,Z&( J *(>4cfvonsOrlvvvs0XI1Ӯ O gW aqd%aމ<JuAW. z>3\Bdy@hk_c- "eʝzyqY!!d_Z)yhgF}1Iu Ie\P}GLNAemnsfmn^/[sP4-C#wK PXiIVW`>>$vm!Ki�{n0EwG|>#YJ䴡N9ps}I)sLJ 䭭MϪn65mFA09Y\'FF4p1c6~^;l8ߠCFTsk9�Fӿ~Z]P]QҠzb@ 1sOqC!#ŧ99u B4WAgv[2j"2^^0/g<P60Q{h́|DlWl~ @ڵ$ٻgHlL"Bce ?hL1Jƈ @ugvv;pɅno73Ѣo?KwryeѨ8)0n"þDhp:`Xz6Q赹:@&cEWl^[F .M69Bp̖ږZoc.W ux@I2[M%k:{Owu 7=]J%8ȟ\7\ +kd'@2 4#vJv'r8_'#o)F oF<B@qbHIZg()nn!&S/JS~=ED%\>KDQ2(Kz?0d gr>l(+|{(yZY5l׎wuZ�' o~z[OKpkD 辠@SpEzhI/ XJ^cPJ!4V܊53zl pfӍ|Uܺ1&͖&n3+o5;[r*wJ>+�gDO ҆")EA4(`|O㓉Hb>ᣏ&MLErPK;춐 MeI33;sΙ߬lı]Ͽ|f}!O/d״AmCqYR6 (~8 GU['�Tu˨DGLJQULKw̏{PǐžvT"{h!QّGY!kJr9gP܄[JKP~=׌X* pRu&*E6\39|4yK/͓02y)c' 2}p p+[Ze~ޜ<LOMQDH*ED Af}5!<e8n5'؉r `1Q6 &GJ(R@ e'޶]m5b.Іp(= Z.i&XjRMvgj JmcX(5(pc \p2qΞľ8'$*icd댦I�džu<R[>~zP.c-s3.,X_-yȏ/o%M qru⅊@x-}ߝ3,o5,c|c}i-e~*K6lw,zV.o+Ng0J ޭe+h']˥p!gMg&m~#5ɦ_,\;Gθ8NZ:Zp&\Qc;6?.m+aC?(&0!jJAbXO/`L#&c$&'z4"@) `l! -elrhٙfnQV6Koݝ9ʬrKsLg2i钔Q_Yx-D|2TAijnXYUňnv!G)ZjOYUl_+iG2Nי 4-*#YInO{Wַqf9<Ԧƌg644pY-9]XijL.%dv3R+_!|_LS[n^e-\$ԭ877 ttkR9ua@];iٶIPPgsn_ZW12CC>,֤H'jrr  3hW'٣4B$9ȶ>0={d0>nrj$l?@](*Y]pW(x|`BOh.=.] E{t9I*X\,Ct]jB-;|?D{uZֺF|d#奺Β8(??=y.+P5u}N&TaǍ>Jss[fi=~U$b(6600p+?O8Gbi跰QaD0eD|>3?INel({o['؞b' 0띙h&K+ 'W'k<B6@9y;خr0DnȲJ'esC]* ?@9oO�ή-$0 ;{_]5W,-B$ zz!ħꭠ $=ERF-E(`f^Q)]fЅu8s?gs?+ET}uQW'^.=fEwd؋v?cy]){ڢ9`bQ媂ME;3"'vuoYH0T1W-rw$kec^eO'-棦&kx]#Q0P{g"^I6w\sXsu:B].d?|KصՁ݌H<s^iyU7E4>xĨ/@یޞH»Av{a6׺ԍ}OX餍^ Px/w<gFa8{;=Írݫ7T+ҺqԦE<\U杓LUffuO߷&epT4}o 39^mOg EL'KpZ=ה9QSUo@tb՘1y(vīr>9dPI: [t^|Lmx?: Lv?/`Æ b$HҼl5'%]z χ!M3ntCNY%JZcW!&.ӛLyl=_K}OBfϢ"yEMCQTmF=U F�06 0"x,)&A+B#Bط~k>S!Zҁ`I ԡ3*#O�\;VB/FU 25:/h37P=5bbB=_ N/ذҦo6ZP<,Vb�6rXՃ2÷?<"I\akܺ[6m| $AxWW[?eg8~#d(TUܺ[�*ls{3;1wxsq#<@ }b5)`{R YXofQwEe>YӮ9[̞I<x+epI+fА6+÷!tU̠y,T &7?R١CnF(gX31Q!ɆBTS` 0(.FcCؠL!AaN&+ehll7[t^@qA�TV@NLՁ&ZaCal -Plo `s3pTyx3$|5b&T)#P|5xXpIM/Kb)\ oz/�F  0\%//4̌T!3@6 DgvZG*�15# g`#2vT \]y}Mē8)`;vgY6 i2mb^KoL 9B+iNAj<xz+Yȅ&%\~h2a@>plm<Ϻ4,j<]8k9E4(ka!R 2ŌKIuZ/p\3f\LM~]ɦYC]BT~ڬiO*E̜|*<oҗ6Δuq~{::'IF)ؘ/! x&0 rEJYDNL_=20Y؄bTyM9Z]ʦk捜>}G{o9!h#qt\i^b0bq!1."L:YզZ'%Ydp '2.xp~UdW._?(]23g!qY i[<:v3]qɁ SQJKAF1{"R)jU_(玻[usHt (Ð.;|:BR[:fhO($(|(".\ԖqYOĦq禚ş1]v]PZSH Vw܁$ ̽^wwtXl&<(e�a�ejGPؔw.kDp]uF $ئ~\- oeW�fQm៱y'[{BE(iG]j>�qodw,Uro.mem϶gQm}]g7EVf5cz1/Fs\Y#4ߢYFF*F#x^w s7 4w8XDz^W/.Ⱥg,M,[M->xLvP�V4a:1{R˸u,[@z*u<@T!r׋)>{=?S\J|B![|Πg3| EcےUP 㽽7豙_څÅ]7zdž_E %FК#zV^XVȍ('c.JRd7K@.*g4z�;?_Gۻ"3b~J-l4mٯg.1g.t <A`C}(*;QP�B@K={BWD};L54=Џ>e gYhkU;�fU'~sD鳝=:`-P< †O[㻤F>j0v}MAKL+b-҆PAUE(K_ G>ZPI]rkٽ\penvvwٽxŇx^ޤt] D1 '\Q!n+Ou!dXA͑f�RNa Jf;ǔtQ�<=l좼P)SOdxV~Eʊyj|T~(28(;Zrݶ"!Ukj.G޳j>?RCh+QZ]Q `oVҧH:,2/^ΓgD{u> gwTAśL^3声ffw鞰6 `&e Ttf:` ۹=gWiKpPdcCtK!|+Hf _ckr,Herq15%tأf^8FN󗯤Sz ">QjOOُziUjثwȠB߾{_+&Wv2:o8a#l39}R:TA{ury?>=פNS1mO XGv4,MLvyd>%d@͹b3풆/ee vfI{zw,%kO:6d@uX.Qw`cZvtF~r!iE^4(O=oCmLg(' 8c@qn6E- +- qZą q!qx� VU[v4Nώlꏜؙf<vn�dPӲXn"2I,A2Y%%B ?%e\GXRy3XsWR1B N6T40Wե C[#{W :1Ζ;$H XfDFC֥6 N ٖCw 쎵;@][nWd+NȘ^aI>e1UXS=lD| mc)�O?{Nֿo2L'l4?Bͦ�yjx4)1>ah ˢj">b[ZKx]mkEBKϒNзTzGS"2: ٖtPP4f޳ׯ^,x?j=e9?D隓�Ӝ’܉HTH#1I-@FB h\{A3.*:UF `g:"i/5M:<~:\4z-ww Max̷ccۀE8^x K?tT Ճd/횉 =qԘsb P%\woAA4'@*ybw]Ŭjs}\MVms/�e4iS7IcM[BqS/} ހާx-\Uq*FV7:/vNEU|f>ϮZ@9Z!+.۳ Y>sP_Oq`(=*f}d &7NpDhcTh,1EQ�T-Lp\aDk'h4lå_֘^�6`G>tB޿yd e`YdžRsVzk_FW=<rTj< r%?>Km^PLorqF>=m�hE5_.:_ރVojeyM=d̀}Lv" zl>̱I&=�[VSpӊ~Ԅ`4v8֥Ҝ%򽬲Z) 4?w r�J# 2d@bV78*KkWI)'cL.v&�f/24-0z9S l^<a10Z�>v 2!c{?1^<h@ėůj<͟UA溛WE_iȍ5$O oMkϢ_ WOөe1ƷQE0N uط v,E6D�W4H#�j A@$:*%" Nvy>'3KVd)ݛ7lc֥3 z@/U)r;4*N<̎�<3_49&Bk}t&!g*;'xOM31fVD5kS1'&pէ*YeZMb5c!juNC"ĭ%VDuc=`t#BPQ['؎: KGW/^p\;=_>M2eNOl֢ݗbsK}o-<|Do>Zw](U ]}J^ UĊzQCQCzg#O}#7{5ΈRjQ Ni3oX40lJ�($p"^JbDYώ9/ "Mr)c)O9Y?tuĖI)b,м"NnMKFj>幧"qB9lE6~qy-&3F ލ}~cvH&iWIסYoBOt6skRi#02<gNJ`dV [IBwn.#eŮmh0o"B@:aqLR13MJ2r)lZ+usf-  h✋9mL5JRܷq8?v-MQE)ʂE"t uMx~{H�tZU ؎m LkR#9Μ8>VmlD0MH <X8֍Srh"q@rR"rǯĚaC.�ee15+YJ?#1Y"=SƋPսlGS^e ,r0bc9ib$a�pf-z(Om׳e#ط߹(Q2LI$IgDk6/kB:H*\!&T >3 .6''ƣ4⏽z7c#ΓgcEnmx.Ѳy%=ۼ".q}Gpsj&˓}ό ?lu_(#ΚRg <�B >sc{ ,qH#ͨMDz̃\J=K눃ڲR"D2o{woRբ8e ٕR,0CMmִ7 JG9Ճ� O0$/ftu#Yy Z̋}܁-Y9r#lg^k''pT�R:׊Z CeUs_ nCPsg|􍮬g4*%l.k>RI$ETk٦@ |z$D Ua~pTUA9_ 7\»7/N8`�o˘*`jZdڴŴڃ Jڛq,?P^R$q$xP?4*)Imv3&dh Ͳwvgw&B(ۤ2~6]m2 73(r f"O Py�Ht˖ #1ɽ 0j[I>xUj4ˠ k *pi+==)EIzWA&`}0m(h�фǭQxnXK+~ ^N=2EKwp8wo%JwUutΒ`I`OR'*| T;l_(姏zgfDmfnN+M4hw_#l g7^C!0x^Cvb `f{v[@vxt3Uұ1A*3;4k*&r9,\\UAC%N_QPz\,TDvJe|A�^ֆ8O׆_?RCbC(#T#CCcCTv[@ǮMN:t/RD6o;698DUW3.8uȶr$obT62nWO' +tvݾ3!qU(lvnݼ/B zsdI-0#i8߷ _\V l5"`V�Ebf֠ m/UE c7AyX�]kSQ&6%`$ɐ .]$".v7 :iU,JF&ĘyϹ&6yWSL};;(bSi$[W wPy@/ Yզ)vJ-&NjӦ":7=UEM˦6Mך2DM]B>gYBv\"ORtf~RQ�Au5* jr^ x9OΠ.ZtN93hz+l<6;֩'I>tsWŀ7 bgq6*.O^D|jk7a'`ܛÏKr ΰ*QRhE #+Ɨgx(](ĩ7씶`5h~&d2)bo[7(he0ED�1p\-"^tܟ,ԙՇ[]T;DAж p`=_h|?gp?9k43%"\pa<| /Ž*+6RHb&m?Yz'~GTI&"$`ukn"}52Ƚ\)3eI\5{,[Yʅ,-V7|-m Rٶ)R{NtC 20y0Vs)lR)oNeK-!KkC@eghGN 9`M  նiwזu~기wI+/Ucxvkdרf㗧j-�k4ωMĈ%* C"H1 ufbdg`ae`gfd@ bA$D[q:vwދ"K;]c{EQFG5PcH&zs\ qp$14WKULvY}ˤ;{}X/) ,TʦRS|dVUc]cІUy;%XkEfpn~&`w9 2߳�u2¯mOi7d}%EQ 7yO8ن# "_C[l(-^נgC#.=ʛ7P|z, z t>C0T6K'cigЧLf# wWo,q|{ K!:E\WvA46FD0suaqŠ/stxob2c"t(ƏD 'nXH㕹]&B=0 9 t]Y4N@kۡn9zsUcF hm\;WwZP,> c)@P96Qj>?.a5Z:: e`^Xw=T: TRkJ`[�À@'�Ohqqz,f)M&.fqx\,T�j ST1fe�]KkQ>4Ʊ6Jl((ڵą_Wn\J6 n7څVF*1apwi;m@3w{wZ[^ IOELP7Y;_ {RUA۠!UdW(, /ζ)+)xMQubF-kQq 1TCaOV&yA(Ej|8~Zk3MLΫhr74bŽ` r:0<:q 507D1VL%f:"*J  ;LF vᓚ3fϦs6Aa~6ƞըs</("? Ң}(mJ+5H䧂CPڈ]k*$E_E-Nr݈*fi:7g5(ay}><,f$E*A~AϜ/capb?|@SZ%T7׶(Qz>�ke:8G @?'uQ:I.]oǶy*fX<*-Cf8m_9NMIV<DN~5 xͽzo,̕]q#]H"$ĭx)z{E# *i~BKCO^8:=Nd+,{7ҵ6NyYJ(=;\'䵨ɻ~ ݭ-SgVH5ǜaQa3]Jqi?hhW�ʮ(ޙ$Z4Xl"bZS\ _PD4X;܋5"~nKՆHCmIhffދi^x=7~)/@3x$S`ri 8rNP:RrUdƎ$�*H1G f/5%=]sI67NCZ�Y^kc(NN"HSW괸N}IY>t꭫EfgH]ryuC1ջ}ԍJ♕j]*è,H�H7 U�dϝYy>/!ug ao%gnO7rG޳#j{sb:7GYjFiS=&T5Q'E\h1?%KZz3v%il;BⒺe&@xe Fvj?0cTZC.bvMK8:߶mjR(]T`ϥR`=,RؔJoseԲ 8BLDDgOVS~"I}Q[@lp11< SsDSZ١nV-gkXSǗ2y%tl̺Ŝm٦?Q3`K;8 KԱJ{B�_gvMʟJ)lilSjٷ2b�//v͚٪pB,S\PٵYH]p썂-O[kV#d (+0 6D5im,Z`h{PP(ӣWfsP z" &VLԒf4$;q73M;mByo{o}/Fe:MCp5b  'hIlO+ց.̧bKzI'޼]r97"J#zFӵ"W7b+&(9]T/tCXn|eV1޺?RlZ/Z_6Be?K8b#JJ4j#ꖷߧ]!\ }D[)SozT;]~V ,e:r`&Hi/['_5n3{.mC}GyąY S6›T6&i7֞ Y*Z%"f ",q"P%7Tv+UC@vZ2isg{WπAh` xsxǷ<+0;/!^}Ƃ7 jK|~'lM A)eL)]ᄎV\ץ TKC SRB"WGr %/�Dˋ6o.yܯ9J%^_ @ciK_'tA/ш)Eb2&>i/ P,J(ΕK ,8c|]1m"o} )hT H P*@DGPABJI�$31oݱ3w ,kmff߼2p4cV'rpW ϙ'Bk+rLYjYs1K^kfR/$�<U,4;eIs4+CDf6+�PrRՕO̲--3:ȨLzVy7ʉ^}+5dD|ř ۇR6KnڢԜ0vm~ynYQ4OdmV{!!rm7?{" !hJ.ְ-x,bn8B[~vGп@9Zh"aSiCU{ oYin5{z I)SKF6,XJW Q5a>5>E0Ӣ"Hm-&ɨ|N b 麎r K[4RQV)1?nSzS3[<bȉ�ē|&gY9Hu!sD1'MWy6sHaY[T5"'Ry^9SXL!tBgWU)\{Fd=]6<u 2T*Aʕ 9m0j S%29&iFT;U[; 0?ݦ :~$RVE$Q*4j! Ğ.W  ]tQt("!vǵk7d۹uhS٫7_FBڊi @JSғRK3Ph'3me TmRdP갶8R>1s,"F~z,i.D: c TR^8ԥ}ծ;$ea`)/{QsV  ߁])Z־ҳZ@,-k R,ݬCo lV`Dɪd-a׀ Μ"_D}(C*cko`מt=h__pX-6]b+hCLFTcq,6tֹgRq_QHQ~oy80/J"@A I{{8wރ;TR>U3y߭gk#2,90MR3u;**V6@S8+OL8M p뵊v ñ26/IZ+ѹ ȱ<_bwgZ 0JTSMhmq6)&C5p"u&ϴZ EΓeo>}kgivKg xvKҋ=J&&s#բ3 9ga?eq\]iԹ�]_kArIZ|Q >AD_b`TBFy6IvCȑ͟(O/y=nu;A!Ga@x u$tpsQF(u7t[zH@%[R{b+I-6N?r${Җ/c߽l [v:o=ml !P>(Ɨ{Cqg Q~syO+ yg 씔r~ L69č5n{ BՆ/cLbބ3&a&l[kί=nJ֐EXP:@$dH.%OZFzI [Cm1qBAkU.d}7RLMT չ&|Oc0qfx5< RՆGXsLfTA*pz3{:yllmm4+$=xb{UG"ql1=v=Ͷؚ^:QŃU|6«CxTg^t`ՠqlf}8n$bRxW+ʎvŞR)#,mш|İfު/c 5BR*fc=9_gj-O7_cT(f݁|>qa G:/I3Sfu),4Pmh#߇O4Eɪ<c_8ޤ <<[-) *n4ƥsֺa\R7ąF&4>j (zh"Ы3ss+�ιsf7C.&."ۡqdΞHM::UC13\9B4Qk5<.AB \nKv #@F[Ӆc97  3FK("kqGL_MiR٠.7[NQ4:$z/;v%h>P-? ZKXlV` ]n۷/w"ӁYb �*ݸ ,|{z)xMpñ/i%=)J2 KQ9x7qA€. 22#C =h~*.d&=?Z”~2 IX pXdN�H'MA.Gl)rd2x-]c,J:ݝkm3p >>*"BZ-#A1೴GB $iF뤲9)d2 _0Xmt8s9 dkM1? ΀Z>3(Jpe5JaҌqSn:s$A  ^>[Lڟ̌mD,a`3 8s"NTQ@eՋ-i {m: _ߪ eO,EK08VUz " Ńtw٣ED},ˑkdM/2T{zH͢�fG{#�mWDv-Q)4Fڒh"VH$GϞx\<w1BR%z)iiQJ?uޛmiZ63yߛ}. qt:,YVdw~b׆cB{Xz dQ'kJl�XOG`lqE-VAz\T8Aۈ8BmJ}E`tEu<2kzF <R*Uˁ2l}Ŷ LcM:L׾/jT68] Lؑy=kӐvnj հf8@ sfg\ a)B6H0bc<MOcJs;Jt>*|-Τ) %TmV3A nuzd; ]u؉?=p~z/21As0R[w*px 98g C~ wrJGCPdn#av3k^iI޹ @ \.77=;OeܝVPb N;#SfXA/`8]ɼaAPv'ᠵ Νh*ǴvCa'ghubE wcA8s!ۑ?^$&=/ud9Upޫ$ó39f—Ϊv|>}NeY)Nq]bsXAv)[( QrN1k y>3W )f1C&[�ʮ0 ?&T*Z6҂ZCXlANBC3h: -hD#CLip$9K!$>>=OzhZIo$w/t)Xfd@e *-$Җ)ŊPNQS jye /y8-/#_|^/:S}5�b!P C?y\[8_^6L=$""^.GFYן|/Zd`*Ǻ{Me pfU]̔_;A[jϣPS6.8v]A˧8~,K7[Oı<Ls) `?3uf3qPgZ{v0q|ۚ9u Cux}1`_<8$~ }Ȁ \*sOcv.G czM;ڳ]Q;mcYCى4׶ yd*itw t%V2B""/4H2NDRKo^nkk3VW䴫*S/uh>ADչ2{D⭰A-bh"غ1i}G6nYc'avwPt"E�=}'</kRޛxda 78i{6<*ljʨeI%4Ɩf[:U^r LUxkgi S B QP" F"gBpPwќ;Kr띏!pewf)Ugtu '.~p?fh#B3i%&((`#KgrEUk*>IҘdq9/7K8\iZ̩p*]pd8eAw51Z=^{iu_FjK3.nԯAo慽G& ֍pEN/DؼnlB0Ө2 ʃ?�wx!Icfl]ɘXW ֗g"`0Hvw [:|4+~I@g֜$n3yLQ$Bǣ@̓j%X\\Ph al'LE,tuLRY;D`nmm�--f"k #N$]NiJt5~o'Ȉ\VR}O,dƲ_z+)հdoH9 x=:uTW(Tʰƺ&<9&*}) hƮKY?F,&&�o&+ *5b'B@&!*gݵHoD}7Ea\/x G{v T*/p`_={Kt*A95(u+d-X'7Re IC; fظUas+hT)C w#Qz@AkM V,d3`J &3D͔y-GT3d;{ !ÃV*(=PBla I0} o JCo?|'/7'^Vo18 m㯐#5x #`0d虳\9V27aɆ#`@tcL^'R&6`ًw8{eB^]] � 4x.~~?DP% bZ511{x >1~貅T� \UΠzQ ,lD%  yzhGX*]{qV�8&(6 *ERAQ<b*KOѓ` xRJ!'Aг-ѐnllK M[SCw&i45vyoޛMlj'~ϊdSd EEwsKrkfڣbXY0z证 +T`B0qf-㙭?M/Ӈkl(.2m8=ypkCQ[Ru]kYR(Nّs]2> *6aY6S9gp{.gyȻ7RҵL\+Lz-6oٰ픈^=OO:bcd$gS?gq?%ԧbcLOʋ`:[AG!-8W telccu;uGo )B!;\uul8#o^$KV%1z~"L١� @W0ˆrCذOQi>GfegӔPYBfx8ש*ex< > 8/e()^bR󎞣p(,,ݛpe< +G_xDK]SCÔeeJWInq I_ӡ9*n|Lz/<\ԄYpQ"rxLTC }}\ h~z,f  wz?K燝7ޫ3 92By8Gy@6eєQG0KleCrzHݎ ٵ4FGEZnfTHF^R2`"XѠ_aATB/QAgBd)^j./mڦt=m^c۷=yys3+C=A:>v|W 2<gR]..WU ~r?l%0:06_);[!֜. tIe*Dبo"hKqT�Xd7ST+Kz)G{0NY+ 8/˺HA5 %-[Zqqk幞&~D^6*ZDͪ UR=18INnΚ fr A8lJEp&Db߭Q? >Sk5pXmpF`VB.;S.P!#ddHh{�wSش-�PfҒ NHH( pމS|AE,~0]LM0£5;PoЪUISF&$&+=?/^{m=| `W<g3%EqoڒTG3O hXi'F3jS;.q[o@VmLx]Ɂ~Xӄ&od9W.C[mX41 9\>Or}!H@-":A73u* Ez =g/sSaAkTHNA۷z|풜 }#wmFxL4'PfUN(Ks*1q+d\qolE3?356:2Ab4ۘ*&P̕WjJD+݋ae(0n0;W*{lw=E&& 瞱'0{D (+Ӄ)nۼ2B@<=;Pv-MDQL&%"v碠VՅOJʵ�7BEDtF҇Pi5H*mXҴ3gr&m:003{EBA쵈;}T1Qg#DlHƢeѰ,jeMZD;PiQ>|'\=T!r@NȐ _o~Dv:Rz\<Xɠ. GIƺcJ]-uKvZXi 75شhR8uGEХt8yGOi8T]K|SAMF<6F䘹UD݅ʟkx`y@�";r1DˣMNJ*[[EaFIvڋN U:EF@0TݕYD-todH ]T22r>3zG9֖wc[U3]" ARp׬S e]x2 ^MA'w|hpAO^/|X#s9#kFD:793yn]σFu=$ᱼA](AhVJOgr9YE+p8fX/{ƲxPM3.rCLsz28INk.9,rzZ1b`y&H [AecukɰQm`2 SEÈkGogI,R?IJSW-pMv ~?ZZ8flψ�]kA~m (x((bUxOAT<A(񤅖^ *VCP,9x0 ZERj[&&yogvgvv}{} J(/+Bn-L^Vª5J((h\en$pPHBMb 1~W0K;9 ]DFSg" -0pl:A_%D~:BW4طfR{pipcM!yHY'\c_}!<59HoדӔ@P@[T1<~T}K8q> mg~PgBl ̎swV*KuÞu@7m=g/aǪ;}yX>C% gtL,]'JFFш0I &4C2q՞9μSE&fCO8i`)mV r ~{.ȍd`Uj-d]Zd&ą|{qd&62hEZ10,o`DŽPZ9{Nub= $yQ s|SK5Nj ~ -xFbĉ<Q?UQNIh\#`[pP [jT hVDPm{mmzw')Z6ۑcxAqX9Tt`�o{(&nĦd`r;~Ps%8[uܜP^AAZؽ��]OkAT Z=K*So=z#;x3ؓ z)( CVbRd}oƔ ð$̾yf޼nXr~+ZaVmm_2yw4ZHВMO`xOY=_ZF*1d2 wB*-U(M— !VGh̍pi챊o>+4$dz{#,nU"Vy�a/MC;iaMR.+>7[xA*,aXCD;vH"(E$eV)O!<2 #{�KhV~@f&GfT!6,3 %YZ# Gkd</0WW JNP`MӃ6քz8bR!]Jy4DW+?LYb878p{ lR"b6X0pp!#Zc..`R JJQfEhl-T-FOdbE8|TG^ Eȱlt(8Tx )XP2!3 P΂ɩ]M!QWhJ #xK~R#C o$?tvZw=-rx rP$•IpڑǒQ6|ݞ-y3plZ1a 'fcgϑNQc&|}+86(owݔ>w<:%PnCu#wb{&i`c(CJ?ow=. #�ggA IQAF@FTQl| B_l @+ Qbc7f&d^2L23™f^?<<vesκϮ&^;;~b|4Bs o}Qa]E5*UE͵B-`%i10*-RBT"?6f~>O~&?PI[ z $Hlv;/ j0lY_lné1]=Ö<OmPH{U9>k'ހz%Z0 GӳtMKw=Y^݉hmsH_#}�X+tA_ozc=C-aK&9 <zF48E{Y_sac.y64Ȼ80H#ls ^tv͒L9ZGj@y!L\\a# UC/x`t0gnSXJ>AA=TvujwڎdU# ( F#pvVg+X ) Ј](DLS>KgҨG?͗@(wCS�33�+C9/s> u9m)R"!`,S,,�H,<� HH6Q;bG%͐Ċc?wɬl/szm{j2i6PG/O΍TSKRiQb眱Ki%cnxS-ccg=&PH~rBH'�k2�(?di7i{ԆOK#c<p^%a``uWL=Fh۝4L+%\_|mp pk-ڟ2gH0XU靅N EiێרWnpWhg;aa6K -:Av?̪ Y,ȋ̐ k*$q~ XR3#Kv2%Ds%sa#OFDo)(ӭ88XeSݎiŀq@A.Bڈ's57H$ƉV^x`Ւ l> ˬ-ћhIbU޲TQ뗅m� `Ow]V$eqȺ&aQ/qpnW�֮d7 *".�G\NmRS@DBThQ zC(RfɌͯ.l޺ؒw'O_ku;մe۶xtt;wiҺJsb (0BP(Cڕ.K ޗU ڻb@Ȟ@}6o1=4P)əE83)s:�D}qf#1L0 rpKYB-w:p-]DcE\|wAۃ{XIY 盲4C2Dșj 0taW@�b_V+rh11 ٮ7?r&9R)Č<GNagrn\ )u4>^I k {J0thl܂g=ĭE4|*9 S@?7ame0[IaQ{Ƨe!i[|̣?hva(۴ ;|o Y*Hy_Ư؍J>sboÖ:~m̕?Dnvi?as>jpC&`"i ɋ`ݳf MV{QA_бhY҉DATx�v\'gY0*wO=-0K*mGZX'1ޓ)؅D;'ٝ iQcxWS{$f'ۧ58[b o8zfi4O6:Z=>4Ppywo8Y29ܩP:EHNc<&#. Jj2EV=@ ) ΞkT8_ s@rAtӤPi:|k ^×J\!0NAY.0O�ڮA 2lgba'YYXc&h xb޼z9Ϸ(|,`Q\|} -/ڹAc * i = ET%HvГ9m,9/1qk@G[*F8�ЍԪ(C@'@dҪ)=␎V^�'<֠B`~}4:ip 0LF; ^UIHU4dj찆SۿU+6u=Y&絼' 0.2K0K RpJ=-\utʃ|`qa`@EE L x +[h5N'- [U5|<W,k. TEcj/ږe UMSu[b$mÖp1F)4T_",[z +DҭJSAU=0|7v-3O&%P�c|H$_lއEB3TzVV5S*iLF `'pfe7v$$TMՎ*4Ҫ ;vX0/Ɵ_#MQY#Q*9(7OlbB� gA=R'?a-}HR9#֓!sHE6@FE^4#Uhif,Sǽݝ=/' ndu'CSPךwn8-gD{x@&h%l_'([0 B'1`&;TxRmf%.Qa;xVr[=?=g|>^2'ZŌ\9< �ػza(B م?cq&,N$W(oi.$wy{~V6|͊dYdUuVfы;!.2M)D{(YS$[2~A6eG#j…i3n�tR}Z#NT7A~DW"1/Nd=Cj|a[ijʕCW!,hnW{ub4DŽSM5NՇ#K7K:[CO h_n�bNqq!p6# g Tegg_X$Yw"ܕ3�CL` PFj=FRHC۰mUCbVo7ßWȰŀ У3!U9 s"ɑzUȍs5_=ۖ{(spOx` �-|ڶ ؇!a"?zf􊙨S)�sH@qyOls\#CL$  dK "E("k^T8޽R΄tʝoKmWÙV[�`8ԁr/;pUa6o\Kʖ`m~�(OsKTJOH5(j'P~ z5=Q3K3b/ma0 E`=c~o޿7?~߿Ͽb̪85QW2&L]̃CQ RFۤiV2TEy}U6JV!WQ"[( _SE$/T2xi?Ȉ}/j?G{ɣ` S�:< w)C+`  ľ1 +�OkV !lF&&ƿ?9mV@Y>_e, Ll¨{{+Ƚgm4!1,#aH(H  @^e;% ћ^!3[e?$ hCӈ,|6†Pp1X^G,a퐌Jya�~u3>/ћ?*gdu2�z bz^/1kaVl~�V~j\Xq;of/F&vA|g^3rNjy,NЌo{J~SEa8!#WQN¬1!Dg@o</Ħ ya�@߮grR&Ƕ 5_zͰ3rO6p6>, ]DpQ&&&k/߼ϟ3~ �23331r|Ϗ ~pcfdb{F3f`DG,BVqjN?JА6jBSOH ۖѫf}v�j#"F(=3|ʰ5+a]J;F}Y[@� XUf^6|~?^3R2ro5#e]&޽X HhgFP+E5gMoU܃WʈRy"*d7и{\n~l(`H.iWZŌ<\9G%c[[Poիd???ӷo?C,�9@^0.VԠyhFT9{Y<u`I\?fifnaXWDW ]{U)YwP;*4 AX+cĐ5fFS&~(`U?}fw]{@{/J;R+ZoZo� �{g0 I#S .&1Q!,B`^f 3 E `Ay~* &eY q1!2cA3k3溎L5!ꆤlwN$ڪhNyVkg$p`s،W?^넨'-p[qu c 3lb{&duN,^~<(,G '\PMuߗri] _CU�`aËžk{p d.20Pz?qOu2x!Dgo4LUCxtp=ݲD" 5�" KU{6 d%)t,'<L>]s*.cVhR#ƽhR][S^rY?IXSNtZ7k()G {-堀\Sbk5v#z 0AxKC?^,-U4{21,>Xx'etY{:u!FxfTOg8gA8Bk2.g#ָ0B"QMw7/Tel6_6~o5?n)5C>S(> ϳ΃~=hbWϯN_Ot:(,?ru`V8 {d ί6PSCo .'vΑ1;֌w> Ff\ !R[YWg5lʖ*NjzW>]u{C͏-sV\tWSKdǂ10۽71'7C9)8DtʲRoJ<=Nxw.+�@U?V̱2{@QAtu<WĻ` gM�jk{:s�-Ma-jkLvёS-lqq�C,o\*'[%%0v|wX86|`ʌ]G/-9wO(�{g CQ$VJ6..;�A:ABW2pFXJȀF:X5>$?C5 gd? y wO�fdaW֘/QVJ z Yy~;u됶V×Չ>F: ܵQ;�A6W QPQm&RA&] 61xcNX(4 |9ezWʶAEDmh 6NrÏ lhP6jBn6J+ɚK#~!SҴ>@yy/vEc2)RVXl�m^sVd<s ; ;lWl3Xhz !L2QH &ovYb~Ujn9l7bzRʰW35inA|Se$&ǼHL" P8uRo2"Ղ+pLL?[ٹcҶucOΟ � E)j&t(ZlDQws1UZx gWfc8u:EQZ2I�0=hP׌06K6im6!uUw.PǘʜB[n=>ҫy3~fIE"U= S|̻�UTAW^xv{؃{>WƠEc̠JZ93Ah`"nyݣi^5hn3 wR`[9CKW4[l6E2r =;X�L_î!m WgG%9+wzy3ڐ%5S\@kBFzʦ5%S69hޏۂ҂5R؍xTT<K#+{| +V9?z~6'(4 ? 泰.9@qfl�PhϙJ.=d0}E]ԀlE~!;qccQ0+hl͸h9p5+e { gb06V#"kn^Ȩ!Lj~?} wfLL{mQRfeކV r'VUb?ƪ?a|1e.` T2r @3EQ匯AQ1!Wa88ؠ_FPE A5FAdG&۷dõX),?q$j .;G(#BF#vIӥBFg|3P+gb q`A3VƠ{|H UG{?)⤂ϟVʣ`|23zO5M*e 0*f+ }bKi ؓWҐ0*jP 99PVwR;Q PE×±w>Vȣ`R+gl.!>M+d V̠^?0}z1ŕF++iXE-&&�fMS7|uW>T/_?>}=Z)Q0 H+e\5+d pV � P%׮mj\Q*iXE +kd�Gꡍi ^@R|MkF+Q0 F921hrͣj @X+f\f`% }cضj!J[*$%Er'?C} ~`$`G(Ԩn3@�᭘{o~N_@ZA⪤qUڣƍGXbB` 6]*f�Y1#Wί^}�\zv /^k]UGNY%IlE=X G(õ"U1Q3\t/7o>�O09Q@5~&F|oLd? @L&Q0 <CATX9@x+f֭;VĂ+iА6Ƞ/_ 1P%۷Vңd?RW)#_O, U6Jw"`~�VJw~>:]xh!`�)51����IENDB`�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������WWW-Mechanize-1.73/bin/mech-dump��������������������������������������������������������������������000755 �000767 �000024 �00000007644 12026436042 016674� 0����������������������������������������������������������������������������������������������������ustar�00ether���������������������������staff���������������������������000000 �000000 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl -w =head1 NAME mech-dump - Dumps information about a web page =cut use warnings; use strict; use WWW::Mechanize (); use Getopt::Long; use Pod::Usage; use HTTP::Cookies; my @actions; my $absolute; my $user; my $pass; my $agent; my $agent_alias; my $cookie_filename; GetOptions( 'user=s' => \$user, 'password=s' => \$pass, headers => sub { push( @actions, \&dump_headers ) }, forms => sub { push( @actions, \&dump_forms ) }, links => sub { push( @actions, \&dump_links ) }, images => sub { push( @actions, \&dump_images ) }, all => sub { push( @actions, \&dump_headers, \&dump_forms, \&dump_links, \&dump_images ) }, text => sub { push( @actions, \&dump_text ) }, absolute => \$absolute, 'agent=s' => \$agent, 'agent-alias=s' => \$agent_alias, 'cookie-file=s' => \$cookie_filename, help => sub { pod2usage(1); }, ) or pod2usage(2); =head1 SYNOPSIS mech-dump [options] [file|url] Options: --headers Dump HTTP response headers --forms Dump table of forms (default action) --links Dump table of links --images Dump table of images --all Dump all four of the above, in that order --text Dumps the textual part of the web page --user=user Set the username --password=pass Set the password --cookie-file=filename Set the filename to use for persistent cookies --agent=agent Specify the UserAgent to pass --agent-alias=alias Specify the alias for the UserAgent to pass. Pick one of: * Windows IE 6 * Windows Mozilla * Mac Safari * Mac Mozilla * Linux Mozilla * Linux Konqueror --absolute Show URLs as absolute, even if relative in the page --help Show this message The order of the options specified is relevant. Repeated options get repeated dumps. =cut my $uri = shift or die "Must specify a URL or file to check. See --help for details.\n"; if ( -e $uri ) { require URI::file; $uri = URI::file->new_abs( $uri )->as_string; } @actions = (\&dump_forms) unless @actions; my $mech = WWW::Mechanize->new(); if ( defined $agent ) { $mech->agent( $agent ); } elsif ( defined $agent_alias ) { $mech->agent_alias( $agent_alias ); } if ( defined $cookie_filename ) { my $cookies = HTTP::Cookies->new( file => $cookie_filename, autosave => 1, ignore_discard => 1 ); $cookies->load() ; $mech->cookie_jar($cookies); } else { $mech->cookie_jar(undef) ; } $mech->env_proxy(); my $response = $mech->get( $uri ); if (!$response->is_success and defined ($response->www_authenticate)) { if (!defined $user or !defined $pass) { die("Page requires username and password, but none specified.\n"); } $mech->credentials($user,$pass); $response = $mech->get( $uri ); $response->is_success or die "Can't fetch $uri with username and password\n", $response->status_line, "\n"; } $mech->is_html or die qq{$uri returns type "}, $mech->ct, qq{", not "text/html"\n}; while ( my $action = shift @actions ) { $action->( $mech ); print "\n" if @actions; } sub dump_headers { my $mech = shift; $mech->dump_headers( undef ); return; } sub dump_forms { my $mech = shift; $mech->dump_forms( undef, $absolute ); return; } sub dump_links { my $mech = shift; $mech->dump_links( undef, $absolute ); return; } sub dump_images { my $mech = shift; $mech->dump_images( undef, $absolute ); return; } sub dump_text { my $mech = shift; $mech->dump_text(); return; } =head1 SEE ALSO L<WWW::Mechanize> ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������