ruby-spider-0.4.4/ 0000775 0000000 0000000 00000000000 11205312536 012462 5 ustar root root ruby-spider-0.4.4/CHANGES 0000664 0000000 0000000 00000002665 11205312536 013466 0 ustar root root 2009-05-21 * fixed an issue with robots.txt on ssl hosts * fixed an issue with pulling robots.txt from disallowed hosts * fixed a documentation error with ExpiredLinks * Many thanks to Brian Campbell 2008-10-09 * fixed a situation with nested slashes in urls, thanks to Sander van der Vliet and John Buckley 2008-07-06 * Trap interrupts and shutdown gracefully * Support for custom urls-to-crawl objects * Example AmazonSQS urls-to-crawl support (next_urls_in_sqs.rb) 2007-11-09: * Handle redirects that assume a base URL. 2007-11-08: * Move spider_instance.rb, robot_rules.rb, and included_in_memcached.rb into spider subdirectory. 2007-11-02: * Memcached support. 2007-10-31: * Add `setup' and `teardown' handlers. * Can set the headers for a HTTP request. * Changed :any to :every . * Changed the arguments to the :every, :success, :failure, and code handler. 2007-10-23: * URLs without a page component but with a query component. * HTTP Redirect. * HTTPS. * Version 0.2.1 . 2007-10-22: * Use RSpec to ensure that it mostly works. * Use WEBrick to create a small test server for additional testing. * Completely re-do the API to prepare for future expansion. * Add the ability to apply each URL to a series of custom allowed?-like matchers. * BSD license. * Version 0.2.0 . 2007-03-30: * Clean up the documentation. 2007-03-28: * Change the tail recursion to a `while' loop, to please Ruby. * Documentation. * Initial release: version 0.1.0 . ruby-spider-0.4.4/doc/ 0000775 0000000 0000000 00000000000 11205312536 013227 5 ustar root root ruby-spider-0.4.4/doc/fr_method_index.html 0000664 0000000 0000000 00000010516 11205312536 017256 0 ustar root root