pax_global_header00006660000000000000000000000064113341033140014503gustar00rootroot0000000000000052 comment=3cacebb309205fb4035dd33e6ace706b653a4f14 libwhisker2-perl-2.5/000077500000000000000000000000001133410331400145365ustar00rootroot00000000000000libwhisker2-perl-2.5/CHANGES000066400000000000000000000753561133410331400155510ustar00rootroot00000000000000--- changes per libwhisker release ------------------------------------- [] libwhisker 2.5 - LibWhisker is now licensed under the 'simplied' (2 clause) BSD license. - Added the {whisker}->{allow_short_reads} option, which will return success if some body data is read but it is less than the server-advertised content length. Thanks to Dave Lodge for the suggestion. - Thomas Reinke pointed out that cookie_parse() was lowercasing cookie names, which causes problems if the server is being case-sensitive with cookie names. - Fixed a documentation typo for uri_absolute(). Thanks to Sullo for pointing it out. - A bug in http_fixup_request() would append a port to the Host header even if there was already one. Thanks to Sullo for reporting it. - Francisco Amato recommended two new anti-IDS modes that involve using 0x0d and 0x0b as request separator/spacers. IDS modes 'A' and 'B' were added, respectively. ---------------------------------------------------------------------------- [] libwhisker 2.4 - Minor code change to utils_delete_lowercase_key(), but it doesn't change the functionality. Mostly just performance. - Modifications to Makefile.pl. I have become a big fan of the three argument open() variant, but that's not backwards compatible with older perl versions. So I switched everything back to the two argument version. - More modifications to Makefile.pl, having to do with backwards compatibility with older perl versions. - Major overhaul to utils_request_clone(). Basically changed it to fully copy the source request elements into the destination request, while deep copying embedded arrays and such. This is different behavior than previous, where utils_request_clone() would only copy a few specific values from the source to the destination. After thinking about it for quite a while, I decided the previous functionality was not very useful and had shortcomings. The current change in functionality only affects people who set unique values in the destination request *prior* to cloning. Under the old functionality, the unique values could be carried over. Under the new version, they will be clobbered/deleted to match the source request. - POD docs for utils_array_shuffle() was wrong; the function takes \@array as a parameter, not @array. - Added new option: {whisker}->{save_raw_chunks}. When set to a value of 1, the raw chunked data, including chunk sizes, will be saved to {whisker}->{data}. Normally libwhisker interprets the chunk sizes and stitches just the raw data together on your behalf; use this option if you just want the raw chunked server response. - Added new option: {whisker}->{hide_chunked_responses}. By default, when libwhisker gets a chunked response, it will interpret the chunks into the final output; however, the original 'Transfer-Encoding: chunked' header is left and the Content-Length header is not set. If you set the {hide_chunked_responses} option to 1, libwhisker will cleanup the response so that it resembles a regular non-chunked response...namely, libwhisker deletes the Transfer-Encoding header and adds an appropriate Content-Length header. Thus the fact that the server used chunk encoding is completely normalized out and hidden from the application. - Changes to _http_do_request_ex() and http_read_body() in order to account for the above two new options. - http_construct_headers() incorrectly included empty headers if a header name was found in {whisker}->{header_order} but was not actually set. - http_do_request() wasn't correctly returning the value returned by http_do_request_ex(), so {whisker}->{invalid_protocol_return_value} wasn't actually being honored. All fixed now. - If the server didn't return a Connection header, then http_do_request_ex looked into the request for a Connection header; when doing so, it assumed only one header would exist, and it would explicitly be named 'Connection'. This has been changed to use utils_find_lowercase_key() and to account for the possibility that multiple connection header values might be defined (in which case, it uses the first one). - SSL requests silently went through and failed if an SSL library wasn't available; now stream_new() should return an error. - Added ssl_is_available() function for an official way to check to see if SSL is installed. No more relying on $LW_SSL_LIB global variable! - This is a preemptive notice that $LW_SSL_LIB might be going away, and that you should no longer use it! Use the new ssl_is_installed() instead. - Changed how optional modules are loaded. MIME::Base64 and MD5 are only loaded when you try to use a related function; they are no longer loaded automatically every time you use LW2. This helps speed up load time, but the first call to md5() or [encode|decode]_base64() will have a small lag while the module is initially loaded. If you anticipate needing these functions and can't tolerate the initial latency of the first call, you should call them ahead of time with empty/test data in order to force the module load before it comes time to use them in latency-critical operations. Note that other libwhisker functions operate in the same manner--the internal pure-perl MD5, MD4, and DES/NTLM crypto code suffers latency upon the very first function call because the code has to be compiled before use. - The %LW2::AVAILABLE hash has been depreciated, since it was redundant with Perl's normal global symbol table. You can get the same information by checking for the module's $VERSION variable. - A major overhaul was made in the underlying stream/network communication code. Non-blocking connects() on Windows has been implemented, which should speed up error conditions and enforce timeouts on Windows platforms. Sockets are left in non-blocking state for normal TCP connections; they are put back into blocking more for everything else. Thus the stream code had to be updated to account for EWOULDBLOCK non-blocking conditions. - $LW_NONBLOCK_CONNECT is now 1/enabled by default. If you want to turn off non-blocking connects, set this to 0. If Libwhisker encounters errors during nonblocking connect operations, it will still degrade to regular (blocking) connections (with an eval+alarm wrapper). - While not a change to the code itself, I just want to officially note that I am no longer testing Libwhisker2 on platforms other than Windows and Linux. I just don't have the time to account for more OSes. This has been unofficial for quite a while; every so often, I would do a sanity check by running Libwhisker2 on various platforms (IRIX, Tru64, Solaris, etc.) and checking that most things were compatible. I have downsized my lab and purged those platforms, so I no longer have the ability to test them. I am still willing to support them, if someone else will run Libwhisker on them and send me bug reports. :) - A new $LW2::_SSL_LIBRARY variable was added, for internal purposes only. Use ssl_is_available(), as I make no guarantees that $_SSL_LIBRARY will stick around. - Blocking connects had an error, due to the use of a return in an eval block. The end result was that failed connect() attempts were being treated as successful, and then failing later down the line when the HTTP request write failed. Since the error appeared to happen during the write, and not during the connect, the default retry value was kicking in and the process was repeating a second time, making it a double- whammy. This would only occur on systems where libwhisker downgraded to a blocking connect, and a connection attempt was made to a closed or non-existing host/port combo. - It kind of goes without saying, but I'm going to say it anyways: functions and variables beginning with '_' (underscore) are internal only and can be subject to change without notice or backwards compatibility. In short, you should not be using them; if you feel a certain internal function/variable is vital to your application and there is no way to achieve the functionality with other official libwhisker resources, please email me and we will come up with a way to resolve it. - The stream code wasn't updating the connect count ("syns"), which was causing {whisker}->{stats_syns} to always be zero. - Net::SSLeay SSL sockets are now closed with the SSL_shutdown function, which is more courteous than just closing the raw TCP connection. This allows for the SSL close notify alerts to be sent. - SSL keep-alive support has been added! It is currently only supported with Net::SSLeay, and is disabled by default. If you wish to enable it, you need to set $LW::LW_SSL_KEEPALIVE=1. This results in a drastic performance increase if you are making multiple requests to the same SSL server, and the server supports keep-alive connections (i.e. HTTP 1.1). This isn't enabled by default because I haven't been able to thoroughly test it against a large number of different SSL server implementations. And obviously keep-alive support only matters if you and the server use the proper HTTP headers to indicate the connection should be kept alive... - http_do_request() had a small change in order to make sure ssl_save_info was always honored with the new SSL keep-alive feature. - http_do_request[_ex]() had a string case comparision bug in handling keep-alives when the server didn't respond with an official connection header. The result was that connections were not kept-alive, even though they could have been. - http_do_request[_ex]() used a non-robust method to locate Connection headers in determining whether or not to close the connection. It's been changed to use utils_find_lowercase_key(). The close code was also refactored to take into account some other close situations. - dump()/_dumpd() was modified to no longer escape NULLs (\x00) as "\0", since that is a kludge shorthand which can backfire if numbers follow it. - dump()/_dump() didn't print out hash entries which had an empty key. - perltidy was used on all the src/ files. It was about that time...using multiple different text editors on multiple different platforms has resulted in a whitespace mess. - There is now a libwhisker2 test harness! Well, the beginnings of one, anyways. Essentially this your standard fare of tests to ensure the library functions are operating as expected, and that no regression errors creep into the mix. The new test/ directory houses the test harness and associated files. Included with the test harness is testserver.pl, a testcase web server which can be used to feed premade HTTP response testcases or create ad-hoc testcase responses based on URL designations. - uri_split() didn't set {whisker}->{ssl}=0 when splitting a non-HTTPS URL into a request hash that previously had SSL enabled. - Lots of POD documentation clarifications, additions, and updates. - Modified uri_absolute() to not add the port (":443") into the URL for HTTPS URLs (because it's redundant). - Modified uri_normalize() to now preserve any URL parameters and fragments, so things like "http://server/a/b/../p?foo" will now come out as "http://server/a/p?foo", while "http://server/a/b/../p?foo=/../" will also correctly come out as "http://server/a/p?foo=/../" and not "http://server/a/" - uri_strip_path_parameters() was not correctly returning any trailing slashes. This was actually caused by Perl's split(), which ignores trailing elements if no 3rd parameter limit is defined. Supposedly using a -1 for the limit fixes this in more recent Perls, but I'm not sure how backwards-compatible this is to older Perls. If split() behaves differently in this respect, I likely will create a run-time test to determine the available split behavior and act accordingly. - uri_parse_parameters() incorrectly took a shortcut and returned an empty parameter set if there was as single name parameter without a value. - uri_parse_parameters() had a little bit of refactoring, in order to use uri_unescape() rather than internally duplicating the same functionality. - uri_escape() didn't escape the #, /, or \ characters (which have special meaning to web browsers). I also added the @ and ; characters to the list to always escape, since they could have special meaning depending on where they are used (username or password embedded into the URL, path parameter, etc.). Better safe than sorry (and it doesn't hurt anything to encode them, other than wasting an extra two bytes per character). - utils_lowercase_keys() had an improper test to determine if the key needed lower-casing (it looked for tr/A-Z//c rather than tr/A-Z//). - utils_find_lowercase_keys() (and utils_find_key()) were modified to account for times when two unique normal keys result in the same lowercase name. Previously, libwhisker just returned value(s) associated with the first key that matches. Now the function gathers all values of all possible matches, and returns a value/array based on the final super-set. - utils_getline() and utils_getline_crlf() were modified to use \x0d\x0a instead of \r\n, in order to be more portable. - Added 'my' to limit the scope of the $POS internal position variables used by utils_getline() and utils_getline_crlf(). - _stream_buffer_read() used len() instead of length(). - http_req2line() incorrectly added {uri_user} if {uri_password} was set, resulting in a string which looked like "user:user" rather than "user:pass" - Major overhaul to http_fixup_request() to make it more robust, and to clean out any lingering values which may conflict with having a RFC- compliant request. http_fixup_request() will now forcibly make the request HTTP compliant as much as possible. - Changes to all the various cookie functions, in order to accomodate the default domain and URL of a cookie and otherwise be RFC 2109 compliant. All of the information regarding how Libwhisker handles cookies is now available in the new docs/cookies.txt file. All changes are backwards compatible with previous 2.4 formats and functionality, with a single exception: cookie domains in the form of 'http://server.com/' are no longer accepted (they were never legal, and I'm not sure why I implemented that superfluous parsing to begin with). Also, the 'expires' cookie value is now always undefined. - cookie_write() didn't ignore the 'secure' restriction when $override was true. - Added cookie_get_names() function, so you can get the names for use with cookie_get() without having to access the raw $jar structure (which should be an undefined object that shouldn't be used directly). - Added cookie_get_valid_names() function, which gives you the list of cookies which qualify as valid for the specified domain and url. - Modified cookie_set() to delete the given cookie name if the cookie value is empty or undefined. - Added utils_carp() and utils_croak(), which act like the respective functions in the Carp module (except not quite as configurable/flexible). - Changed all internal use of die() and warn() to use the new utils_carp() and utils_croak() functions. - Added time_mktime(), which is similar to the mktime function in the POSIX module or the TimeLocal module. Namely it converts a set of values (such as those returned by localtime/gmtime) back to a single seconds value. - Added time_gmtolocal() which converts a GMT seconds value to a local timezone value. - Tweak the internal _http_get*() functions to not enter an infinite loop when reading from a partially filled buffer stream. - The internal _http_getall() function didn't clear stream->{bufin}, which is technically wrong but never caused a problem since _http_getall() was only used in situations where we read until EOF and then close the stream (and thus never use {bufin} again). - http_construct_headers() did not print out all values of a multi-value header if the header_order explicitly only printed out some (but not all) of the values; the remaining values were ignored. - Changed how the max_size parameter of the internal _http_getall() function is handled. Since it's an internal function, you shouldn't be using it anyway. :) - http_read_body() now clears {whisker}->{data} for all code paths, which it should have been doing but only did some of the time; caused a bug when the length parameter == 0. - http_read_body() how shortcuts out if the supplied length parameter is negative. - The max_size calculations when handling chunked bodies for http_read_body() were off, causing all kinds of read problems when a max_size was specified in the request. Note that when you specify a max_size, you run the risk of interrupting the chunk processing, which means the connection has to be closed and you lose any keep-alive advantages. - Just a quick note that http_read_body() doesn't quite honor {max_size} if {save_raw_chunks} is enabled. Only the size of the actual data value is computed against the max_size limit; the extra bytes comprising the chunk values, as well as any trailing headers, are not calculated against the max_size value. This might change in the future, but I feel this is an acceptable caveat for now. - Change to decode_unicode() in order to get rid of a Perl warning involving pack(). - http_reset() was modified to forcibly clear the internal http host cache. - Turns out Net::SSL die()'s during connection attempts if they don't succeed, and libwhisker wasn't trapping the die(). Basically just wrapped the connect in an eval{}. - Sprinkled in some missing binmode()'s to appease the spawn of Gates. - http_do_request_ex() incorrectly set {whisker}->{http_message} instead of {message} when dealing with HTTP/0.9 requests. - There was a call to Net::SSLeay::ctrl() which was causing occasional errors. The code was introduced way back in Libwhisker 1.5 for SSL session resuming, and has been carried through since then. It's now been removed. - Added the {whisker}->{shortcut_on_404} option, which causes http_do_request to *NOT* read the response body content and instead return (although all headers are read and returned like normal; just the body content is skipped). This can be a useful speed improvement for CGI scanners built on Libwhisker, since a 404 normally indicates the file isn't found, and there's no point on reading the body content. (Ultimately it's essentially the same outcome as a HEAD requestr; but in this case, it's like a HEAD request for 404's & a GET request for everything else, without having to make two different requests to get the body content for non-404 responses). - Sprinkled in some more error checking for Net::SSLeay functions. Apparently not all functions return the error value; it has to be checked for separately. - The Net::SSLeay stream had a bug when used to connect via a proxy, resulting in a malformed CONNECT request. - Auth_brute_force() was incorrectly calling auth_set_header() instead of auth_set(). ---------------------------------------------------------------------------- [] libwhisker 2.3 - utils_find_key() had a bug which caused lowercase comparisions (which is fine for utils_find_lowercase_key(), but not utils_find_key()) - http_req2line() didn't take SSL into account when {include_host_in_uri} was set - David Maciejak pointed out that there was no way to use {whisker}-> {bind_socket} to bind a specific address but not a port (i.e. you were always required to specify a port). So I added the '*' option to {whisker}->{bind_port}, which will attempt to find a valid port to bind to. - The default behavior of {whisker}->{bind_socket} has been changed to use '*' as the value of {whisker}->{bind_port} if {bind_port} is not explicitly set. The prior behavior was to use port 14011 if {bind_port} was not set. - Added some error checking on the {whisker}->{bind_socket} values in _stream_socket_alloc(). - Added a new buffer stream type. Basically, if you define {whisker}-> {buffer_stream}, you get a stream which acts much like an echo server. First it starts out empty. Then, whatever you write to it, is turned around and available when you read from it. This functionality is only useful for those using the low-level stream abstractions in situations where you want to stuff dynamic data into a stream and then pass it to a function which reads from the stream. - Changed the stream key generation order around a bit. Prior to this version, SSL streams trumped file streams...which is incorrect. The new order of priority is: buffer, file, SSL, UDP, TCP. - Added http_resp2line() function, which is the compliment to http_req2line(). - Added utils_flatten_lwhash() function, which takes a %request or %response hash and recreates the approximate HTTP request/response text. - The {whisker}->{data_sock} option turned out to not be exactly ideal, since the stream wrapper might have buffered some of the socket data. So now when you set {whisker}->{data_sock}, you get {whisker}->{data_sock} (the old, not recommended way) and a newer {whisker}->{data_stream} response item. This is only for people who don't want libwhisker to read the body of an HTTP response, and would rather have direct access to the stream/socket to read it themselves. - Antonio "s4tan" Parata pointed out that uri_get_dir() returned the original URI when the URI didn't have a directory (i.e. "index.html"). This is obviously incorrect, as "index.html" would not be the directory. For the cases where no '/' is found in the URI, the return value will now be an empty string. - Uber at hush.com pointed out a bug in crawl() when {use_params} is enabled. Essentially a shortcut was taken prematurely, before the use_params processing occurred. - Dave King found a bug in _stream_is_valid(), due to my incorrect use of 'last' in a 'do/while' block (which is NOT an actual loop!). - Uber at hush.com found that http_fixup_request() was setting Content-Encoding, rather than Content-Type, on POST requests. I'm amazed and embarrassed that I never noticed this before, especially by the fact that many servers still accept the bunk Content-Encoding and lack of Content-Type as valid. ---------------------------------------------------------------------------- [] libwhisker 2.2 - Sullo pointed out that the api_demo.pl script uses save_ssl_info, rather than ssl_save_info. As it turns out, api_demo.pl was horribly out of date. It's been given an overhaul using the new libwhisker 2.x programming semantics. - Sullo also pointed out that ssl_save_info wasn't working. This was because _http_do_request_ex() was erasing the ssl data previously set by http_do_request(). - Added 'use_referrers' crawl() config option, which is enabled by default. This causes the crawler to send appropriate HTTP referrers for all crawled links. - Changed some internals of crawl() to keep the url_queue inside the crawl object, which makes it accessible to the source callback function. - Added 'install_lw1' option to Makefile to automatically install the LW(1) compatibility bridge file (which emulates Libwhisker 1.x, letting Libwhisker version 1.x programs use the Libwhisker 2.x library transparently). - Added cookie_new_jar() function, to handling creating new jars (which are still hashes at this point, but you should still use the function). - More documentation updates. - http_fixup_request() now forces the correct POST data length, and does some extra checks to make sure leftover POST headers from previous requests are dealt with. - Added utils_delete_lowercase_key() function for deleting hash keys without worrying about the capitalization. - Small change to http_read_headers() to reset the match position pointer. - sgt_b made me realize the need for a utils_find_key() function, which is similar to utils_find_lowercase_key() but is case-sensitive. Sure, a quick hash lookup would find the key too, but utils_find_key() has the added bonus of dereferencing anonymous arrays of multiple header values, if encountered. - Dave King's query on handling form data returned from forms_read() led me to creating a forms_walkthrough.txt document and the form_demo.pl scripts. - Found a bug in the _forms_parse_callback() (used internally by forms_read()) which caused it to mishandle textareas. The html_find_tags() function already does the work of finding the closing tag, so _forms_parse_callback() doesn't need to do it. The HTML parser must do it this way, since tags within a are not to be parsed. - Bug in uri_parse_parameters(), which took a shortcut exit if a '&' wasn't found. The shortcut should actually be triggered on a '='. - Bug in html_find_tags() which caused non-value attributes to be saved with a empty string value. Now non-value attributes are saved with an undef value. - For some reason an extra empty element was being introduced into hash during forms_write(). I've traced it and can't figure out where it's coming from. In the meantime, I've added a check to make sure any empty elements don't show up in the output. - Mathieu Dessus pointed out a bug where the older anti_ids() function was still being called, rather than the renamed encode_anti_ids(). You still use {whisker}->{anti_ids} to set the values though... ---------------------------------------------------------------------------- [] libwhisker 2.1 - Sullo pointed out that $LW_HAS_SSL has disappeared. Forgot to document that. Use $LW_SSL_LIB instead. - Changed a (len!=0) to (len>0) check in the chunk decoder, to be more robust. - added html_link_extractor() function, which uses code already present in the crawl module. - The regex was a bit broken in encode_uri_randomhex(). Pointed out by John McDonald. - John also found a typo in encode_anti_ids(), causing it to call the non-existant function encode_randomase(). - New Makefile.pl build environment. - Bug in forms_read() and _forms_callback() which prevented the proper storage of multiple forms. ---------------------------------------------------------------------------- [] libwhisker 2.0 - Libwhisker 2.0 is officially dubbed LW2. Below are the incompatible changes from libwhisker 1.x. There were lots of general changes, but only the non-backwards-compatible ones are documented. - Following were renamed: {whisker}->{req_spacer*} => {whisker}->{http_space*} {whisker}->{http_ver} => {whisker}->{version} {whisker}->{http_protocol} => {whisker]->{protocol} {whisker}->{uri_param} => {whisker}->{parameters} {whisker}->{recv_header_order} => {whisker}->{header_order} {whisker}->{http_resp_message} => {whisker}->{message} {whisker}->{INITIAL_MAGIC} => {whisker}->{MAGIC} {whisker}->{sockstate} => {whisker}->{socket_state} utils_lowercase_(hashkeys|headers) => utils_lowercase_keys utils_split_uri => uri_split utils_join_uri => uri_join utils_normalize_uri => uri_normalize utils_absolute_uri => uri_absolute utils_get_dir => uri_get_dir utils_unidecode_uri => decode_unicode anti_ids => encode_anti_ids bruteurl => utils_bruteurl auth_set_header => auth_set encode_str2uri => encode_uri_hex encode_str2ruri => encode_uri_randomhex dumper => dump dumper_writefile => dump_writefile - Following are now depreciated (along with their functionality): {whisker}->{method_postfix} {whisker}->{http_req_trailer} {whisker}->{queue_md5} (use {request_fingerprint}) {whisker}->{http_resp} (use {code}) {whisker}->{retry_errors} {whisker}->{ids_session_splice} do_auth (use auth_set) upload_file download_file (use get_page_to_file) md5_perl (use md5) md4_perl (use md4) (en|de)code_base64_perl (use (en|de)code_base64) crawl_get_config crawl_set_config - {whisker}->{parameters} will not be included if it's an empty string - {whisker}->{normalize_incoming_headers} now changes AA-Bb-cc-dD to Aa-Bb-Cc-Dd, instead of the prior AA-Bb-Cc-DD. - Invalid HTTP response error message does not include invalid response (but it's still in {whisker}->{data}) - IDS session splicing is depreciated. Most IDSes do stream reassembly anyways, so this is not a big loss. The depreciation is due to limitations of the current stream implementation. It will reappear in future versions. - cookie_* now operates independantly of the actual set-cookie header. http_do_request now has internal magic, so that all cookies are saved and processed regardless of header capitalization, normalization, and duplication (including the default ignore_duplicate_headers). - Lots of the global variables were changed/renamed or removed. See globals.pl for details. - Crawl was completely rebuilt to be more object-ish (the use of so many global variables made it hard to have multiple crawl sessions going at once). If you were using crawl(), then you will need to review the new way of calling crawl() and accessing related data. All the crawl data structures (and locations) were changed, as were the format for configuring the crawler and callbacks. - Dumper() returns undef on error, instead of the string 'ERROR'. - html_find_tags() takes a few more optional parameters. Using a tag map can lead to speed increases by reducing the amount of times the callback function is actually called. - The libwhisker 1.x series did not properly generate forms structures (via forms_read()). It was corrected, but the generated structure, while now accurate per documention, is not backwards-compatible. - Authorization is now handled via auth_set(), and not merely by the presence of the Authorization header. Also, the internal {whisker}->{ntlm_*} keys relating to NTLM authentication have been deleted. You shouldn't have been using them anyway. :) - Socket timeout values are read from {whisker}->{timeout}, and are saved per stream. The global $TIMEOUT variable no longer exists. - HTML rewriting via html_find_tags() is now done by calling html_find_tags_rewrite() within our callback function. The return value of the callback is ignored (and thus not required, unlike LW1.x). - auth_set() will now call http_reset() whenever any NTLM-based authentication is used. This is because NTLM is a connection-based authentication, and thus all connections need to start from scratch when NTLM is enabled. - The ETag header is now normalized to ETag, and not Etag. - All new POD documentation, which follows the more standard format for use with pod2man. - utils_find_lowercase_keys() will now dereference multi-value entries and return a full array if it is called in array context. - A bug in Crypt::SSLeay (Net::SSL) 0.51 (and probably prior) causes it to puke when it is used in proxy mode. Hopefully it will be fixed in future versions. - Turns out the Net::SSLeay implementation of MD5 was returning bad hashes (it truncated them at the first NULL byte). Use of Net::SSLeay::md5 has been discontinued permanently. libwhisker2-perl-2.5/DISTRIBUTION000066400000000000000000000054441133410331400164070ustar00rootroot00000000000000 --------------------------------------------------------------------------- This file is for maintainers of a software distribution which would like to include Libwhisker. I thought I'd document a few things in order to make your life easier. --------------------------------------------------------------------------- If there are any changes which would make your life as an distribution package maintainer easier, feel free to let me know. I will try to accomodate as best I can. I appreciate your time and effort in helping distribute libwhisker2, so I would like to make the process as painless as possible for you. --------------------------------------------------------------------------- First, here's a brief recommended description of Libwhisker for use. If it's too long, then just use the first two sentences. Libwhisker is a Perl library useful for HTTP testing scripts. It contains a pure-Perl implementation of functionality found in the LWP, URI, Digest::MD5, Digest::MD4, Data::Dumper, Authen::NTLM, HTML::Parser, HTML::FormParser, CGI::Upload, MIME::Base64, and GetOpt::Std modules. Libwhisker is designed to be portable (a single perl file), fast (general benchmarks show libwhisker is faster than LWP), and flexible (great care was taken to ensure the library does exactly what you want to do, even if it means breaking the protocol). --------------------------------------------------------------------------- You need to make sure the LW2.pm is installed in the system perl module directory. I very, very, *very* STRONGLY suggest that you depreciate any Libwhisker 1.x packages and instead install the LW.pm included in the Libwhisker 2.x compat/ directory (this can be done with the 'install_lw1' Makefile.pl command). The compatible-LW.pm provides the normal Libwhisker 1.x functionality using the Libwhisker 2.x library. This allows both Libwhisker 1.x and 2.x support to be contained in a single Libwhisker package (preferred). Documentation for LW2 is embedded inside the final LW2.pm (unless you build with the 'nopod' option). You should use pod2man and store the resulting output amongst the local collection of Perl module manpages. The Makefile.pl will automatically install the POD page (assuming you haven't used the 'nopod' option) when you run the 'install' command. The files in docs/ and scripts/ are for programming references, and as such, should not be installed with a normal package. If you truly do want to include supporting files, then I recommend only the following: docs/crawler.txt docs/whisker_hash.txt docs/logo-builton.gif docs/logo-name.gif docs/logo-plain.gif scripts/api_demo.pl scripts/crawl_demo.pl scripts/simple_demo.pl --------------------------------------------------------------------------- libwhisker2-perl-2.5/KNOWNBUGS000066400000000000000000000066641133410331400160520ustar00rootroot00000000000000Known broken stuff as of v2.4: - HTML parser still screws up on a few corner test cases (particular types of invalid HTML). Fortunately these are relatively abnormal and rare. - XML parser is no where near functional. - Net::SSL (Crypt::SSLeay) has a bug in version 0.51, when used in proxy mode. I included a patch in the compat/ directory which temporarily fixes the bug, but it has the side effect of disabling LWP HTTPS support. Precompiled binary users are out of luck until an updated version is available, and I'm not so sure this is really ever going to be fixed. - Proxy-auth does not work with Net::SSL; the auth data needs to be set using %ENV instead of the Proxy-Authenticate header which Libwhisker normally sends. Not sure if this will ever be fixed, as it would require significant architecture changes to accomodate. Use Net::SSLeay instead. - NTLM proxy-auth is not supported for SSL connections. - The combination of Paros HTTP proxy, Net::SSLeay, and SSL connections does not currently work. Paros only returns a "Error: 1" message, so I'm not even sure where to begin on figuring out what's going wrong. Net::SSL works just fine (well, assuming you applied the patch referred to above). Caveats as of v2.4: - Cookie support does not yet account for expiration via Expires value. - Various perl warnings when running under -w. If you send me a copy of the warnings (warning and line number reported), I can fix them. Since warnings are generated at runtime, it's hard to traverse every possible code path and find the warnings. - LW2::dump() sometimes messes up on the visual layout of the resulting code...however, this does not affect the final representation/eval'ing, so it's merely cosmetic. - The internal session cache can grow quite big if you scan thousands of hosts in a row without calling LW2::http_reset() or exiting. In order to clear out any left over session information (when you're all done with a host), be sure to call LW2::http_reset() every once in a while. - Careful on trying to unset/remove various {whisker} request values. If it's not defined by default, then you should delete the key, rather than setting it to 0, since some libwhisker functions only check to see if a key is defined (it doesn't actually look at the value). - The SSL libraries do not allow persistant connections, so all connections will be closed at the end. - NTLM web auth through an NTLM auth'd proxy is a toss-up, largely because I'm not sure what's the most appropriate route to go. Libwhisker's implementation is how it *should* work IMHO, but I couldn't get it to actually work through MS ISA proxy (although, I couldn't get IE to work correctly through MS ISA proxy with double-NTLM auth either, so that leaves me to suspect MS has issues somewhere...). The general problem is that ISA proxy doesn't seem to keep-alive the connection between the proxy and web server, so persistent requests from the client will have to re-auth to the web server. Technically, since everything is kept-alive, once all the NTLM rigmarole for both proxy and web server are done, I should be able to just fire requests off like normal (e.g. no auth) until the connection is closed. So, in short, Microsoft screwed up their own implementation of double-NTLM auth in ISA Server, and I don't really have another NTLM-capable proxy to test with to ensure LW2 is perfectly implemented...but it should be. :) libwhisker2-perl-2.5/LICENSE000066400000000000000000000024441133410331400155470ustar00rootroot00000000000000Copyright (c) 2009, Jeff Forristal (wiretrip.net) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: - Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. - Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. libwhisker2-perl-2.5/Makefile000066400000000000000000000004361133410331400162010ustar00rootroot00000000000000# this is a passthru makefile for libwhisker2 DESTDIR= lib: perl Makefile.pl lib build: perl Makefile.pl lib install: export DESTDIR perl Makefile.pl install clean: perl Makefile.pl clean nopod: perl Makefile.pl nopod uninstall: export DESTDIR perl Makefile.pl uninstall libwhisker2-perl-2.5/Makefile.pl000066400000000000000000000314571133410331400166220ustar00rootroot00000000000000#!/usr/bin/perl # # Generic perl application Makefile # # Copyright (c) 2009, Jeff Forristal (wiretrip.net) # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # # - Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # - Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS # FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE # COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, # INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, # BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN # ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE # POSSIBILITY OF SUCH DAMAGE. $VERSION = '2.5'; # version of the app $PACKAGE = 'LW2'; # name of the app $TARGET = 'LW2.pm'; # target build filename $SRCDIR = 'src'; # dir containing .pl parts $MAIN = 'globals.pl_'; # main app logic/global library logic $HEADER = 'header.pod'; # POD/file header $FOOTER = 'footer.pod'; # POD/file footer $LIBRARY = 1; # is it a library? $HASPOD = 1; # does it have embedded POD? $DESTDIR = ''; # installation directory prefix #### supported build options ######################################### # general commands supported by this makefile $COMMANDS{clean} = \&command_clean; $COMMANDS{lib} = \&command_build if($LIBRARY); $COMMANDS{build} = \&command_build if(!$LIBRARY); $COMMANDS{install} = \&command_install; $COMMANDS{uninstall} = \&command_uninstall; $COMMANDS{support} = \&command_support; $COMMANDS{sockdiag} = \&command_socket_diag; $COMMANDS{nopod} = \&command_strip_pod if($HASPOD); # commands specific to this app $COMMANDS{install_lw1} = \&command_install_compat; #### external modules ################################################ # modules to check for and track if they are installed # # Module values: # 0 = just try to load module, but don't error if not available # 1 = abort build if module isn't available %MODULES = ( 'Socket' => 0, 'MIME::Base64' => 0, 'MD5' => 0, 'Net::SSLeay' => 0, 'Net::SSL' => 0, 'POSIX' => 0 ); #### end config ###################################################### $|++; # internal vars %BUILD = (); $CWD = (); $COMMAND = ''; %DESCRIPTIONS = (); # first check arguments if($ARGV[0] eq ''){ print STDOUT "$PACKAGE version $VERSION build options:\n\n"; # load the command descriptions while(){ tr/\r\n//d; my ($name,$desc)=split(/\t/,$_,2); $DESCRIPTIONS{$name}=$desc; } foreach (keys %COMMANDS){ print STDOUT "- Makefile.pl $_"; if(defined $DESCRIPTIONS{$_}){ print STDOUT "\t",$DESCRIPTIONS{$_}; } print STDOUT "\n"; } print STDOUT "\n"; exit; } # the makefile requires Config, Cwd, and Pod::Man modules $MODULES{Config} = 0; $MODULES{Cwd} = 0; $MODULES{'Pod::Man'} = 0; # next check for external modules foreach (keys %MODULES){ eval "use $_;"; if(!$@){ $MODULES{$_}++; } else { if($MODULES{$_}>0){ print STDERR "Error: module '$_' required.\n"; exit; } } } # adjust DESTDIR, if needed $DESTDIR = $ENV{DESTDIR} if(defined $ENV{DESTDIR}); # parse command line build options while($COMMAND = shift @ARGV){ if(defined $COMMANDS{$COMMAND}){ $COMMANDS{$COMMAND}->(); } else { print STDERR "Error: bad build command '$COMMAND'\n"; exit; } } exit; ######################################################################### sub command_clean { unlink $TARGET if(-e $TARGET); print STDOUT "Clean.\n"; } sub command_install { command_install_library() if($LIBRARY); command_install_pod() if($HASPOD); } sub command_uninstall { command_uninstall_library() if($LIBRARY); command_uninstall_pod() if($HASPOD); } sub command_install_pod { return if(!$HASPOD); if($MODULES{'Pod::Man'}==0){ print STDERR "WARNING: Pod::Man not available; man page not installed\n"; return; } command_build() if(!-e $TARGET); die("Can not install without Config.pm") if($MODULES{Config}==0); $CWD=&cwd if($MODULES{Cwd}>0); my $where=$DESTDIR . $Config{'man3direxp'}; my $t = $TARGET; if($LIBRARY){ $t="$PACKAGE.3pm"; } else { $t=~s/\.pl$//i; $t.='.3'; } if(!-e $where){ print STDOUT "WARNING!\n\n", "The local man3 site directory does not exist:\n", "$where\n\nPlease create this directory and try again.\n\n"; exit; } my $parser = Pod::Man->new ( release => $VERSION, section => 3, name => $PACKAGE ); open(IN,'<'.$TARGET)||puke($TARGET); $temp = ; if($temp=~m/^# NOPOD NOTICE:/){ print STDERR "Pod has been stripped; not installing man page\n"; return; } chdir($where); open(OUT,'>'.$t)||die("Can't open $where/$t for write"); chmod 0644, $t; $parser->parse_from_filehandle(\*IN,\*OUT); close(IN); close(OUT); if(-s "$t"){ print STDOUT "$t installed to $where\n"; } else { print STDOUT "Error installing $t to $where\n"; } exit if($MODULES{Cwd}==0); chdir($CWD); } sub command_uninstall_pod { die("Can not uninstall without Config.pm") if($MODULES{Config}==0); $CWD=&cwd if($MODULES{Cwd}>0); my $where=$DESTDIR . $Config{'man3direxp'}; my $t = $TARGET; if($LIBRARY){ $t="$PACKAGE.3pm"; } else { $t=~s/\.pl$//i; $t.='.3'; } chdir($where); if(-e $t){ unlink $t; print STDOUT "$t uninstalled.\n"; } else { print STDOUT "$t not installed.\n"; } exit if($MODULES{Cwd}==0); chdir($CWD); } sub command_install_library { return if(!$LIBRARY); command_build() if(!-e $TARGET); die("Can not install without Config.pm") if($MODULES{Config}==0); $CWD=&cwd if($MODULES{Cwd}>0); my $where=$DESTDIR . $Config{'installsitelib'}; if(!-e $where){ print STDOUT "WARNING!\n\n", "The local perl site directory does not exist:\n", "$where\n\nPlease create this directory and try again.\n\n"; exit; } open(IN,'<'.$TARGET)||puke($TARGET); chdir($where); open(OUT,'>'.$TARGET)||die("Can't open $where/$TARGET for write"); chmod 0755, $TARGET; while(){ print OUT; } close(IN); close(OUT); if(-s "$TARGET"){ print STDOUT "$TARGET installed to $where\n"; } else { print STDOUT "Error installing $TARGET to $where\n"; } exit if($MODULES{Cwd}==0); chdir($CWD); } sub command_uninstall_library { die("Can not uninstall without Config.pm") if($MODULES{Config}==0); $CWD=&cwd if($MODULES{Cwd}>0); my $where=$DESTDIR . $Config{'installsitelib'}; chdir($where); if(-e $TARGET){ unlink $TARGET; print STDOUT "$PACKAGE uninstalled.\n"; } else { print STDOUT "$PACKAGE not installed.\n"; } exit if($MODULES{Cwd}==0); chdir($CWD); } sub command_build { $CWD=&cwd if($MODULES{Cwd}>0); # open target file for output open(OUT,'>'.$TARGET)||die("Can't open $TARGET for write"); chmod 0755, $TARGET; # print out the shebang line print OUT "#!",$^X,"\n"; # embed the package and version info print OUT "# $PACKAGE version $VERSION\n"; # switch to the src directory opendir(DIR,$SRCDIR); chdir($SRCDIR); # print out the initial header and infoz readlib($HEADER,1); if($LIBRARY){ print OUT "package $PACKAGE;\n"; print OUT "\$",$PACKAGE,"::VERSION=\"$VERSION\";\n"; } else { print OUT "\$VERSION=\"$VERSION\";\n"; } print OUT "\$PACKAGE='",$PACKAGE,"';\n"; # handle main logic print OUT "\n"; if($LIBRARY){ print OUT "BEGIN {\n"; print OUT "package $PACKAGE;\n"; print OUT "\$PACKAGE='",$PACKAGE,"';\n"; } readlib($MAIN,0); if($LIBRARY){ print OUT "\n} # BEGIN\n\n"; } # handle all the source files &readlibs; # and now the footer readlib($FOOTER,1); print OUT "1;\n" if($LIBRARY); # we're all done; print status and cleanup print STDOUT "$PACKAGE built.\n"; close(OUT); closedir(DIR); exit if($MODULES{Cwd}==0); chdir($CWD); } sub command_strip_pod { return if(!$HASPOD); command_build() if(!-e $TARGET); open(OUT,'>'."$TARGET.nopod") || die("Couldn't open $TARGET.nopod"); open(IN,'<'.$TARGET) || puke($TARGET); &strip_pod; close(OUT); close(IN); unlink $TARGET; rename "$TARGET.nopod", $TARGET; chmod 0755, $TARGET; print STDOUT "POD removed.\n"; } sub command_support { print STDOUT "Perl $] on '$^O'\n"; print STDOUT "Architecture: '$Config{archname}'\n" if($MODULES{Config}>0); print STDOUT "\n"; foreach $lib (keys %MODULES){ print STDOUT $lib, ' ->', ' 'x(20-length($lib)); if($MODULES{$lib}>0){ print STDOUT 'yes'; my $name = $lib.'::VERSION'; if(defined $$name){ print STDOUT ' (version '.$$name.')'; } print "\n"; } else { print STDOUT "no\n"; } } } sub command_socket_diag { use Socket; use POSIX; my @what=qw( INADDR_ANY INADDR_BROADCAST INADDR_LOOPBACK INADDR_NONE AF_INET PF_INET MSG_OOB SOCK_DGRAM SOCK_RAW SOCK_SEQPACKET SOCK_STREAM SOL_SOCKET SOMAXCONN F_GETFL F_SETFL O_NONBLOCK EINPROGRESS EWOULDBLOCK SO_BROADCAST SO_KEEPALIVE SO_LINGER SO_OOBINLINE SO_RCVBUF SO_RCVLOWAT SO_RCVTIMEO SO_REUSEADDR SO_SNDBUF SO_SNDLOWAT SO_SNDTIMEO SO_TYPE SO_USELOOPBACK ); print STDOUT "\nPerl: $^O\n"; print STDOUT "Perl version: $]\n"; print STDOUT "Uname processor: ",`uname -m`; print STDOUT "Uname kernel: ",`uname -r`; print STDOUT "Socket defines:\n\n"; map { verify($_) } @what; } ######################################################################### sub command_install_compat { die("Can not install without Config.pm") if($MODULES{Config}==0); $CWD=&cwd if($MODULES{Cwd}>0); my $where=$DESTDIR . $Config{'installsitelib'}; if(!-e $where){ print STDOUT "WARNING!\n\n", "The local perl site directory does not exist:\n", "$where\n\nPlease create this directory and try again.\n\n"; exit; } open(IN,'<'.'compat/LW.pm')||puke('compat/LW.pm'); chdir($where); open(OUT,'>'.'LW.pm')||die("Can't open $where/LW.pm for write"); chmod 0755, 'LW.pm'; while(){ print OUT; } close(IN); close(OUT); if(-s "LW.pm"){ print STDOUT "LW.pm bridge installed to $where\n"; } else { print STDOUT "Error installing LW.pm to $where\n"; } exit if($MODULES{Cwd}==0); chdir($CWD); } ######################################################################### sub puke { my $file = shift; print STDERR "Build error: missing/corrupted file $file\n"; eval "close(OUT)"; exit; } sub readlib { my $file=shift; my $replace_flag=shift||0; return if(defined $BUILD{$file}); puke($file) if(!-e $file); $BUILD{$file}++; open(IN,'<'.$file)||puke($file); while(){ next if(m/^#GPL/ || m/^#LIC/ || m/^#BSD/); s/\r\n$/\n/; if($replace_flag){ s/\$VERSION/$VERSION/g; s/\$TARGET/$TARGET/g; s/\$PACKAGE/$PACKAGE/g; } print OUT $_; } close(IN); } sub readlibs { my $file; my @FF=(); while($file=readdir(DIR)){ next if($file=~/^\./); next if($file eq $MAIN); next if($file eq $HEADER); next if($file eq $FOOTER); next if($file =~ /^_/); push(@FF,$file); } my @FE = sort @FF; foreach $file (@FE){ readlib($file,0); } } sub strip_pod { my $inpod=0; my $last=''; # put a small notice in the file to keep people from wondering where # all the whitespace went... print OUT "# NOPOD NOTICE: the documentation and whitespace have been stripped\n"; print OUT "# from this file in order to reduce filesize.\n#\n\n"; my $IN_INITIAL_COMMENTS=1; while(){ s/^[ \t]+//; # remove leading whitespace my $line=$_; next if(m/^#/ && !$IN_INITIAL_COMMENTS); tr/\r\n//d; # remote CRLF if($IN_INITIAL_COMMENTS && !m/^#/){ $IN_INITIAL_COMMENTS=0; next; } next if($_ eq ''); $inpod=1 if($line=~/^=(head1|item|pod|back)/); if(!$inpod){ $line=~tr/\r//d; print OUT $line if(!($line eq "\n" && $last eq "\n")); } $inpod=0 if($line=~/^=cut/); $last = $line; } } sub verify { my $temp=-1; my $name=shift; eval { $temp=sprintf("%lu",&$name) }; $temp="\t$temp" if(length($name)<7); print STDOUT "\t$name:\t$temp\n"; } __DATA__ nopod Strip the POD documentation and whitespace clean Clean up the build tree lib Build the library build Build the application install Install the components to the Perl site directory uninstall Uninstall/remove the components from your system support List various external module support information sockdiag Diagnostics for troubleshooting Socket.pm problems install_lw1 Install the LW.pm compatiblity bridge libwhisker2-perl-2.5/README000066400000000000000000000167611133410331400154310ustar00rootroot00000000000000------------------------------------------------------------------- Libwhisker official release v2.4 ------------------------------------------------------------------- What is Libwhisker: Libwhisker is a Perl module geared specificly for HTTP testing. Libwhisker has a few design principles: - Portable: runs with 0 changes on Unix, Windows, etc (100% Perl) - Flexible: designed with a 'no rules' approach - Contained: designed to not require external modules when possible - Localized: does not require installation to use ------------------------------------------------------------------- README README README README README README README README README ------------------------------------------------------------------- "How do I run/use Libwhisker?" Libwhisker is not a program to run. It's a library for people to make programs with. There is nothing to 'run' in Libwhisker. If you're looking for a CGI scanner (whisker), you're in the wrong place. Whisker is separate from Libwhisker. ------------------------------------------------------------------- Information on Libwhisker library ------------------------------------------------------------------- Libwhisker's 'no rules' approach: Since the intent of this library is to use it in testing, fuzzing, and quality assurance situations, odds are the library will need to be capable of handling protocol malformities and other wackiness. Many existing Perl libraries are not flexible when you try to break the protocol--they assume you want to make a legitimate request. Libwhisker, on the other hand, is designed to not impose any rules on the software, thus allowing it to do whatever you really want it to do, including stuff not normally considered 'legal' or 'sane' by RFC/protocol definition. ------------------------------------------------------------------- What Libwhisker can do for you: Do you have a demonstration program, application, or exploit that interacts over HTTP? Well, using Libwhisker means your program: - Can communicate over HTTP 0.9, 1.0, and 1.1 - Can use persistant connections (keep-alives) - Has proxy support - Has anti-IDS support - Has SSL support - Can receive chunked encoding - Has nonblock/timeout support built in (platform-dependant...) - Has basic and NTLM authentication support (both server and proxy) That way you don't have to code it all yourself--use Libwhisker and all those features are transparently available automatically. So call now, operators are standing by. ------------------------------------------------------------------- Why not use other perl modules? Libwhisker actually combines the functionality LWP, URI, HTML::Parser, MIME::Base64, and a handful of other modules into a single file that is approximately 105k (when POD is stripped). One of the annoyances of LWP et. al. is that they require local system installation before they can be used--and that installation sometimes requires compilation of C code files. This can be a problem if you have a system that lacks a compiler (commercial unix platforms, Windows, etc), and it also makes portability very difficult. Libwhisker is 100% native Perl, so no additional compilers are necessary. It's one single 105k text file (i.e. very portable), which doesn't have to actually be installed--just put it in the same directory as your perl script and go! And since Libwhisker doesn't require external modules to work**, that means you should be good to go with a perl binary, the LW2.pm file, and your perl script--nothing else needed! Great for those 'security audit' situations where installing an entire perl distribution on a target system is out of the question... Of course, that doesn't mean that you shouldn't use LWP. Just keep in mind that LWP (and other modules, in general) were written to follow proper RFC protocols. This is fine and dandy; but if you're writing exploits, sometimes you need to purposefully break some aspect of the protocol, and typically the published Perl modules don't provide the capabilities to do this. And lastly, Libwhisker has been benchmarked against LWP--and it's been found to be almost three times as fast. ** you do need a local Socket.pm for your system to use any of the network functionality; however, if you don't have Socket support, you can still use the Libwhisker utility/parsing functions without problem. ------------------------------------------------------------------- Note to Libwhisker 1.x users: Libwhisker 2.0 is *not* backwards-compatible with Libwhisker 1.x. A few things were moved around and renamed. Changing 'use LW' to 'use LW2' in your programs is not enough--you may need to make code changes to your program. See the 'CHANGES' file. However, there is now a 'bridge' LW.pm module in the compat/ directory, which will use LW2 (libwhisker 2.x) functions to emulate the LW (libwhisker 1.x) functionality. This should allow programs written to use LW (libwhisker 1.x) to use LW2 (libwhisker 2.x) without any changes. You can have the compatibility bridge automatically installed by using the 'install_lw1' Makefile.pl command. ------------------------------------------------------------------- How to use Libwhisker: Use the included api_demo.pl script to see how to make a basic request using the library. Otherwise, there is embedded POD documentation for most of the functions within LW2.pm. You should be able to use the LW2.pm by including it in the same directory as the script that requires it. Otherwise run "perl Makefile.pl install" to install it into your local perl module site directory. To use SSL support, you will need Net::SSLeay or Crypt::SSLeay, as well as OpenSSL installed. Libwhisker will still work without any of them, you just won't have any SSL support. And technically Libwhisker will still work without Socket support, but it's very limited then. Crypt::SSLeay (also known as Net::SSL) is available precompiled for the Windows ActiveState package; Unix platforms should use Net::SSLeay, which has many more features than Net::SSL. ------------------------------------------------------------------- Libwhisker is under the GPL. That means it's free for use and redistribution under the terms of the GNU Public License (version 2). A copy is included with the development source distribution, or from http://www.gnu.org/ If you wish to (re)use Libwhisker code in a commercial product, or distribute it with a commercial product, please contact me at rfp@wiretrip.net. ------------------------------------------------------------------- Tested platforms: Libwhisker has been successfully ran on: - Linux, using perl 5.004 and higher - ActiveState Perl for Windows, based on perl 5.6.x and 5.005 - Sun Solaris, perl 5.004 and higher - SGI IRIX, perl 5.004 and higher Libwhisker does *not* run with perl 5.003 and earlier. There were too many bugs in 5.003 which would require too many workarounds to accomodate while still maintaining minimal code size and speed. ------------------------------------------------------------------- Feedback about Libwhisker: Send it to me directly at rfp@wiretrip.net (please use the word 'libwhisker' in the subject), or toss it out on the whisker-devel mailing list, whisker-devel@lists.sourceforge.net. You can subscribe by going to the mailing list section at http://sourceforge.net/projects/whisker/ ------------------------------------------------------------------- libwhisker2-perl-2.5/compat/000077500000000000000000000000001133410331400160215ustar00rootroot00000000000000libwhisker2-perl-2.5/compat/Net_SSL.patch000066400000000000000000000006531133410331400203150ustar00rootroot00000000000000--- Crypt-SSLeay-0.51/lib/Net/SSL.orig Fri Dec 26 14:53:24 2003 +++ Crypt-SSLeay-0.51/lib/Net/SSL.pm Fri Dec 26 11:26:35 2003 @@ -337,7 +337,7 @@ my $realm = ""; my $length = 0; my $line = ""; - my $lwp_object = $self->get_lwp_object; + my $lwp_object; # = $self->get_lwp_object; my $iaddr = gethostbyname($host); $iaddr || die("can't resolve proxy server name: $host, $!"); libwhisker2-perl-2.5/compat/lw.pm000066400000000000000000000202021133410331400167750ustar00rootroot00000000000000# # This is a compatiblity 'bridge' which will translate the # libwhisker 2.x API into libwhisker 1.x format. This should # only be used to support legacy programs which refuse to port # to LW2, but should be using LW2 over LW[1] because of bug fixes. # package LW; require 'LW2.pm'; $LW::VERSION = '1.10'; $LW::BRIDGE = '2.0'; # # NOTE: The following two lines depend on external files; remove/comment # out if you need single-file portability # use strict; use vars qw(%available $LW_HAS_SOCKET $LW_HAS_SSL $TIMEOUT $LW_SSL_LIB $LW_NONBLOCK_CONNECT $FUNC %_deprec %crawl_server_tags %crawl_referrers %_remap_to %crawl_offsites %crawl_cookies %crawl_forms %_remap_from %crawl_linktags %crawl_config ); #### GLOBAL VARIABLE STUFF #### %available = (); $LW_HAS_SOCKET = (defined $Socket::VERSION)?1:0; $LW_HAS_SSL = ($LW2::LW_SSL_LIB>0)?1:0; $LW_SSL_LIB = $LW2::LW_SSL_LIB; $LW_NONBLOCK_CONNECT = $LW2::LW_NONBLOCK_CONNECT; %crawl_server_tags=(); %crawl_referrers=(); %crawl_offsites=(); %crawl_cookies=(); %crawl_forms=(); %crawl_linktags = %LW2::_crawl_linktags; %crawl_config = %LW2::_crawl_config; $TIMEOUT=10; # doesn't do anything #### BRIDGED FUNCTIONS #### # antiids.pl = DONE sub anti_ids { warn("Anti-IDS: session splicing is not supported") if($_[1]=~/9/); my $hr = $_[0]; _remap($hr); LW2::encode_anti_ids(@_); _remap_from($hr); } # auth.pl = DONE sub auth_set_header { goto &LW2::auth_set; } sub do_auth { goto &LW2::auth_set; } sub auth_brute_force { _remap($_[1]); goto &LW2::auth_brute_force; } # bruteurl.pl = DONE sub bruteurl { _remap($_[0]); goto &LW2::utils_bruteurl; } # cookie.pl = DONE sub cookie_read { goto &LW2::cookie_read; } sub cookie_write { goto &LW2::cookie_write; } sub cookie_parse { goto &LW2::cookie_parse; } sub cookie_get { goto &LW2::cookie_get; } sub cookie_set { goto &LW2::cookie_set; } # crawl.pl = DONE sub crawl_get_config { my $key=shift; return $crawl_config{$key}; } sub crawl_set_config { return if(!defined $_[0]); my %opts=@_; while( my($k,$v)=each %opts){ $crawl_config{lc($k)}=$v; } } sub crawl { # crawl changed *a lot*, so we have a lot of fixing to do... my ($START, $DEPTH, $TRACK, $HREQ)=@_; _remap($HREQ); my $CRAWL = LW2::crawl_new($START,$DEPTH,$HREQ,$TRACK); $crawl_config{ref_hin}=$CRAWL->{request}; $crawl_config{ref_hout}=$CRAWL->{response}; $crawl_config{ref_jar}=$CRAWL->{jar}; $crawl_config{ref_links}=$CRAWL->{urls}; my @p = LW2::uri_split($START); $crawl_config{host}=$p[2]; $crawl_config{port}=$p[3]; $crawl_config{start}=$p[0]; %{$CRAWL->{config}} = %crawl_config; %crawl_server_tags=(); %crawl_referrers=(); %crawl_offsites=(); %crawl_cookies=(); %crawl_forms=(); my $res = $CRAWL->{crawl}->(); return if(!defined $res); %crawl_server_tags = %{$CRAWL->{server_tags}}; %crawl_referrers = %{$CRAWL->{referrers}}; %crawl_offsites = %{$CRAWL->{offsites}}; %crawl_cookies = %{$CRAWL->{cookies}}; %crawl_forms = %{$CRAWL->{forms}}; } # dump.pl = DONE sub dumper { my $res = &LW2::dump(@_); $res = 'ERROR' if(!defined $res); return $res; } sub dumper_writefile { goto &LW2::dump_writefile; } # easy.pl = DONE sub upload_file { die(""); } sub get_page { _remap($_[1]); goto &LW2::get_page; } sub get_page_hash { _remap($_[1]); my $res = LW2::get_page_hash(@_); return _remap_from($res); } sub get_page_to_file { _remap($_[2]); goto &LW2::get_page_to_file; } sub download_file { _remap($_[2]); goto &LW2::get_page_to_file; } # encode.pl = DONE sub encode_base64 { goto &LW2::encode_base64; } sub encode_base64_perl { goto &LW2::encode_base64; } sub decode_base64 { goto &LW2::decode_base64; } sub decode_base64_perl { goto &LW2::decode_base64; } sub encode_str2uri { goto &LW2::encode_uri_hex; } sub encode_str2ruri { goto &LW2::encode_uri_randomhex; } sub encode_unicode { goto &LW2::encode_unicode; } # forms.pl = DONE sub forms_read { warn("LW1.x forms support was broken; LW2 is fixed, but not compatible"); goto &LW2::forms_read; } sub forms_write { warn("LW1.x forms support was broken; LW2 is fixed, but not compatible"); goto &LW2::forms_write; } # html.pl = DONE { $FUNC = ''; sub html_find_tags { my ($dr,$func)=@_; $FUNC = $func; LW2::html_find_tags($dr,\&_html_callback_wrapper); $FUNC = ''; } sub _html_callback_wrapper { return if($FUNC eq ''); my $res = &$FUNC(@_); LW2::_html_find_tags_adjust($res,0) if(defined $res && $res > 0); }} # http.pl = DONE sub http_reset { goto &LW2::http_reset; } sub http_init_request { my $href = shift; LW2::http_init_request($href); $href->{whisker}->{version}='1.0'; # default for LW1.x _remap_from($href); $href->{Connection}='close'; # default for LW1.x } sub http_do_request { my ($req,$resp,%conf)=@_; my ($k,$v); while(($k,$v)=each(%conf)){ $req->{whisker}->{$k}=$v; } _remap($req); my $res = LW2::http_do_request($req,$resp); _remap_from($resp); return $res; } sub http_fixup_request { my $req=shift; _remap($req); LW2::http_fixup_request($req); _remap_from($req); } # mdx.pl = DONE sub md5 { goto &LW2::md5; } sub md5_perl { goto &LW2::md5; } sub md4 { goto &LW2::md4; } sub md4_perl { goto &LW2::md4; } # multipart.pl = DONE sub multipart_set { goto &LW2::mutipart_set; } sub multipart_get { goto &LW2::mutipart_get; } sub multipart_setfile { goto &LW2::mutipart_setfile; } sub multipart_getfile { goto &LW2::mutipart_getfile; } sub multipart_boundary { goto &LW2::mutipart_boundary; } sub multipart_write { goto &LW2::mutipart_write; } sub multipart_read { goto &LW2::mutipart_read; } sub multipart_read_data { goto &LW2::mutipart_read_data; } sub multipart_files_list { goto &LW2::mutipart_files_list; } sub multipart_params_list { goto &LW2::mutipart_params_list; } # ntlm.pl = DONE sub ntlm_new { goto &LW2::ntlm_new; } sub ntlm_client { goto &LW2::ntlm_client; } # utils.pl = DONE sub utils_recperm { goto &LW2::utils_recperm; } sub utils_array_shuffle { goto &LW2::utils_array_shuffle; } sub utils_randstr { goto &LW2::utils_randstr; } sub utils_get_dir { goto &LW2::uri_get_dir; } sub utils_port_open { goto &LW2::utils_port_open; } sub utils_getline { goto &LW2::utils_getline; } sub utils_getline_crlf { goto &LW2::utils_getline_crlf; } sub utils_absolute_uri { goto &LW2::uri_absolute; } sub utils_normalize_uri { goto &LW2::uri_normalize; } sub utils_save_page { goto &LW2::utils_save_page; } sub utils_getopts { goto &LW2::utils_getopts; } sub utils_unidecode_uri { goto &LW2::decode_unicode; } sub utils_text_wrapper { goto &LW2::utils_text_wrapper; } sub utils_lowercase_headers { goto &LW2::utils_lowercase_keys; } sub utils_lowercase_hashkeys { goto &LW2::utils_lowercase_keys; } sub utils_find_lowercase_key { goto &LW2::utils_find_lowercase_key; } sub utils_join_uri { goto &LW2::uri_join; } sub utils_split_uri { my $hr = $_[1]; my @res = &LW2::uri_split(@_); _remap_from($hr); return @res; } #### COMPATIBILITY SUPPORT FUNCTIONS #### %_remap_to = ( 'req_spacer' => 'http_space1', 'req_spacer2' => 'http_space2', 'http_ver' => 'version', 'http_protocol' => 'protocol', 'uri_param' => 'parameters', 'sockstate' => 'socket_state', 'recv_header_order' => 'header_order', 'http_resp_message' => 'message' ); %_remap_from = (); while(my($k,$v)=each(%_remap_to)){ $_remap_from{$v}=$k; } %_deprec = ( 'method_postfix' => 1, 'http_req_trailer' => 1, 'queue_md5' => 1, 'retry_errors' => 1, 'ids_session_splice' => 1 ); sub _remap_from { _remap($_[0],1); } sub _remap { my $hr = shift; return undef if(!defined $hr || !ref($hr)); my $from = shift||0; my $MAP = \%_remap_to; $MAP = \%_remap_from if($from || $hr->{whisker}->{MAGIC} eq '31340'); my @k = keys %{ $hr->{whisker} }; foreach(@k){ $hr->{whisker}->{http_resp} = $hr->{whisker}->{code} if($_ eq 'code'); warn("whisker option '$_' will be ignored") if(exists $_deprec{$_}); next if(!defined $MAP->{$_}); $hr->{whisker}->{ $MAP->{$_} } = $hr->{whisker}->{$_}; } } 1; libwhisker2-perl-2.5/docs/000077500000000000000000000000001133410331400154665ustar00rootroot00000000000000libwhisker2-perl-2.5/docs/FAQ.txt000066400000000000000000000105711133410331400166420ustar00rootroot00000000000000Libwhisker 2.4 FAQ ------------------------------------------------------------------------- Why does libwhisker exist when there's already LWP? LWP is a great package, but thre are still many areas within it that it expects/forces you to follow the HTTP protocol. It imposes restrictions on what you can do, and this can be problematic if you are trying to create vulnerability exploit proof of concepts or HTTP fuzzers...two types of applications which traditionally *break* the protocol on purpose. Libwhisker was designed to give the application as much freedom as possible to do what they want, even if that means breaking the HTTP protocol to the point of not working. This is libwhisker's "no rules" approach. What SSL libraries do you support? Net::SSLeay and Net::SSL (a component of the Crypt::SSLeay package). Which is the preferred SSL library? Net::SSLeay is the preferred library, but there are still issues with both Net::SSL/Crypt::SSLeay and Net::SSLeay. See the KNOWNBUGS file. SSL keep-alive support is only currently available with Net::SSLeay. How come you don't support IO::Socket::SSL for SSL support? IO::Socket:SSL, in its current state, uses Net::SSLeay under the hood. So if IO::Socket::SSL is installed, so will be Net::SSLeay. Thus we skip the overhead of dealing with IO::Socket::SSL and just go directly to Net::SSLeay. How can I speed up my SSL connections? If you're using Net::SSLeay, you can set $LW2::LW_SSL_KEEPALIVE=1 in order to enable HTTP keep-alives and connection reuse of SSL connections. If you are operating in a trusted environment, you can also set the {whisker}->{ssl_ciphers} value to a list of weak yet fast(er) ciphers. However, in doing so, you are compromising the security and integrity of the SSL connection. An example cipher list value of some of the faster (and insecure) ciphers would be: "NULL:RC4-MD5:RC2-MD5:IDEA-CBC-MD5:RC4-SHA:EXPORT:!DES:!3DES" This list starts off with the no-encryption 'NULL' ciphers, then goes through MD5 (which is faster than SHA) variants, falls back to a SHA variant, and uses any exportable cipher as a worst case scenario while completely disallowing DES and 3DES (which are horribly slow). Why does libwhisker contain replacements for modules which are a part of the core perl suite? The primary reason is my original goal of using libwhisker on systems which did not have a full perl distribution installed; rather, you can copy over just the perl executable and immediately required modules and ran everything out of the current directory. This was meant to support pen-testers and other folks who may have access to a system, but the system doesn't contain perl and they do not have sufficient privileges to install perl. The secondary reason has to do with the variances of what is considered to be the core perl distribution across all the different OS versions of the past 10+ years. Just because a module is considered a part of the perl core distribution in 5.8.0 doesn't mean it existed in 5.004. I've tried to always maintain compatibility with older versions of perl. The last time I tested, libwhisker functioned without errors on 5.004. Unfortunately, where were too many bugs and caveats in 5.003 to make it easy to support. Are your pure-perl implementations of MD4, MD5, and DES/NTLM slow? 'Slow' is a relative term. Libwhisker's various pure-perl implementations are slow compared to their locally compiled binary counterparts; however, that's to be expected, and that's also why libwhisker attempts to use the external module versions before resorting to its internal pure-perl version as a worst case scenario. That said, my benchmarks have shown that libwhisker's pure-perl implementations are faster than other pure-perl implemenations found in CPAN. I've spent considerably amount of time hand-optimizing the code in libwhisker to perform as fast as possible; libwhisker also generates and compiles all the code at runtime in order to optimize out all loops and function calls, which makes a significant reduction in overhead. Sure, the code is not clean and doesn't follow quaint programming style practices, but it works as expected and really should never have to be revisited. And quite frankly, that's kind of the norm when it comes to optimized crypto algorithms. libwhisker2-perl-2.5/docs/TESTED.txt000066400000000000000000000022201133410331400172130ustar00rootroot00000000000000Libwhisker 2.4 ----------------------------------------------------------------- The following platforms have been tested and found to be working as expected (with exceptions noted below). All platforms were tested for both native HTTP as well as SSL support through Net::SSL and Net::SSLeay. Tests include ability to make non-blocking connections with appropriate timeouts and SSL keep-alive support via Net::SSLeay. Unless otherwise indicated, the platform supports both SSL libraries and all functionality as expected. Note that exceptions are specific to the Perl version and platform indicated. Windows 2000 w/ Cygwin Perl 5.8.7 [4] Windows 2000 w/ ActiveState Perl 5.8.7 build 815 [1] [3] Windows XP w/ ActiveState Perl 5.8.6 build 811 [2] Debian Linux w/ Perl 5.8.4 Linux w/ Perl 5.6.2 Linux w/ Perl 5.00504 [1] Linux w/ Perl 5.00405 [1] Mac OSX w/ Perl 5.8.6 Exceptions: [1] - Net::SSL does not honor timeout for nonblocking connections [2] - Net::SSLeay and Net::SSL do not honor timeout for nonblocking connections [3] - Net::SSL not tested on this platform + version [4] - Net::SSLeay found to not work correctly on this platform + version libwhisker2-perl-2.5/docs/cookies.txt000066400000000000000000000115711133410331400176700ustar00rootroot00000000000000Libwhisker 2.4 cookie handling ------------------------------------------------------ This document serves to convey how Libwhisker treats the receiving, handling, and creation of HTTP cookies. First a brief bit of history. The original cookie proposal was made by Netscape. Their proposed cookie implementation is often dubbed "version 0". It provides a very simple cookie handling mechanism that practically all servers, browser, and proxies support. Then came along RFC 2109, which created version 1 cookies while still being backwards-compatible with version 0. The particular additions of RFC 2109 were the addition of extra attributes and the manner in which the Cookie header is returned to the server. Next came RFC 2965, which still uses version 1 cookies but now allows a server to send a Set-Cookie2 header. A few additional attributes were defined. RFC 2965 is still backwards-compatible with RFC 2109, which means it's still backwards-compatible with version 0 cookies (somewhat). In order to be the most widely compatible, Libwhisker mostly uses a version 0 approach to cookie handling. However, Libwhisker does make some attempts to parse certain version 1 attributes in offer more granular cookie support. Libwhisker also uses some of the later RFC suggestions on how to handle corner-cases. So, here's a laundry list of Libwhisker's exact cookie handling functionality: - Cookies are received from a Set-Cookie or Set-Cookie2 response header (libwhisker does not internally distinguish between the two) - Set-Cookie[2] headers with multiple cookie values separated by commas are NOT supported; only the first cookie will be extracted/parsed - Cookies are created in a version 0 format, using the Cookie request header, and having one or more non-quoted name=value pairs separated by semicolons; the format specified in RFCs 2109 and 2965 is not used - Libwhisker understands and acts upon the Domain, Path, Max-Age, and Secure cookie attributes - Libwhisker accepts and parses cookie attribue values which use surrounding quotes (e.g. 'foo="bar"') - Non-processed cookie attributes are permanently discarded and not available through the Libwhisker $jar or the cookie processing functions; if you wish to analyze all the cookie attributes (including those ignored by Libwhisker), then you will have to manually process the cookie values - Only a Max-Age attribute value of 0 (zero) is interpreted, which results in the cookie being immediately deleted; additional handling of the Max-Age attribute is not implemented - There is no Cookie expires/timeout handling done at any point, except the immediate deletion of a cookie if Max-Age=0 - Contrary to the RFCs, Libwhisker allows cookie names to start with a '$' character, to allow for additional flexibility in testing; normally this is not allowed, as names beginning with '$' acquired special meaning as of RFC 2109 - While Libwhisker does not specifically disallow any characters in cookie names or values, use of the characters '=', ';', and '"' can cause unexpected processing anomalies or silent application failures - Empty cookie names are not allowed, and the parsing will abort if one is seen - If a cookie doesn't specify a domain attribute, then the default host name is used; if the default hostname doesn't have a leading dot, then strict hostname matching only (no partial/sub domain matches) is performed; otherwise partial/sub domain matching is performed; IP addresses always use strict hostname matching only - If a cookie specifies an empty-string for the domain attribute, then it is treated like it didn't specify a domain attribute - Non-name values are terminated at the first whitespace, comma, or semicolon character encountered, or the end of the string; any remaining data between the point of termination and the next semicolon is discarded (e.g. the string "foo=bar baz;" will result in 'foo' with a value of 'bar') - If the cookie doesn't specify a domain attribute, and a default host name is not explicitly provided by the parent application, then the cookie will match all domain names (and the domain name value will be undefined) - If a cookie specfies a domain attribute, but the domain attribute doesn't include a leading dot, then the RFC 2965 rule of adding a leading dot is used - Multi-level domain matches are not allowed, so ".foo.com" will match on "a.foo.com", but not "a.b.foo.com"; in order for "a.b.foo.com" to match, there needs to be a domain definition for ".b.foo.com" - If a cookie doesn't specify a path attribute, then a default value of '/' is used - If a cookie specifies a path attribute but it is not absolute (doesn't start with '/'), then the default value of '/' is used instead - Per RFC 2965, successive duplicate attribute values are ignored (so a cookie with "foo=A; foo=B; foo=C" will result in a 'foo' value of 'A') libwhisker2-perl-2.5/docs/crawler.txt000066400000000000000000000131261133410331400176710ustar00rootroot00000000000000This file contains an explanation of the crawl variables. $CRAWL is assumed to be a $CRAWLER_OBJECT returned by crawl_new(). --------------------------------------------------------------------------- Crawl data structures --------------------------------------------------------------------------- %$CRAWL->{config} - configuration values (see below); key=config key, value=value of key &$CRAWL->{crawl} - subfunction which just calls LW2::crawl($CRAWL) &$CRAWL->{reset} - subfunction which resets all the values in $CRAWL %$CRAWL->{track} - All the URLs seen/requested; key=url, value=HTTP response code, or '?' if not actually requested %$CRAWL->{request} - Libwhisker request hash used during crawling %$CRAWL->{response} - Libwhisker response hash used during crawling $CRAWL->{depth} - Default max depth set by crawl_new() $CRAWL->{start} - Default start URL set by crawl_new() @$CRAWL->{errors} - All encountered errors during crawl'ing @$CRAWL->{urls} - Temporary array used internally by crawl() %$CRAWL->{server_tags} - Server banners encountered while crawling; key=banner, value=# times seen %$CRAWL->{referrers} - Keeps track of who refers to what URL; key=target URL, value=anon array of all URLs that point to it %$CRAWL->{offsites} - All URLs that point to other hosts; key=URL, value=# times seen %$CRAWL->{non_http} - All non-http/https URLs found; key=URL, value=# times seen %$CRAWL->{cookies} - All cookies encountered during crawling; key=cookie string, value=# times seen %$CRAWL->{forms} - URLs which were the target of
tags; key=URL, value=# times seen %$CRAWL->{jar} - Temporary hash used internally by crawl() to track cookies $CRAWL->{parsed_page_count} - The number of HTML pages parsed for URLs --------------------------------------------------------------------------- Crawl config options & values: --------------------------------------------------------------------------- You generally access the values below by: $CRAWL->{config}->{KEY}=VALUE; Where 'KEY' is the target key value (such as save_cookies), and VALUE is the config value for that key. --------------------------------------------------------------------------- save_cookies (value: 0 or 1) - save encountered cookies into %$CRAWL->{cookies}; key is entire cookie string, value is how many times cookie was encountered save_offsites (value: 0 or 1) - save all URLs not on this host to %$CRAWL->{offsites}; key is offsite URL, value is how many times it was referenced save_referrers (value: 0 or 1) - save the URLs that refer to the given URL in %$CRAWL->{referrers}; key is target URL, and the value is an anon array of all URLs that referred to it save_non_http (value: 0 or 1) - save any non-http/https URLs into %$CRAWL->{non_http}; basically all your ftp://, mailto:, and javascript: URLs, etc. follow_moves (value: 0 or 1) - crawl will transparently follow the URL given in a 30x move response use_params (value: 0 or 1) - crawl will factor in URI parameters when considering if a URI is unique or not (otherwise parameters are discarded) params_double_record (value: 0 or 1) - if both use_params and params_double_record are set, crawl will make two track entries for each URI which has paramaters: one with and one without the parameters reuse_cookies (value: 0 or 1) - crawl will resubmit any received/prior cookies, much like a browser would skip_ext (value: anonymous hash) - the keys of the anonymous hash are file extensions that crawl() should skip trying to crawl; defaults to common binary/multimedia files (gif, jpg, pdf, etc) save_skipped (value: 0 or 1) - any URLs that are skipped via skip_ext, or are above the specified DEPTH will be recorded in the tracking hash with a value of '?' (instead of an HTTP response code). callback (value: 0 or \&sub) - crawl will call this function (if this is a reference to a function), passing it the current URI and the @ST array. If the function returns a TRUE value, then crawl will skip that URI. Set to value 0 (zero) if you do not want to use a callback. netloc_bug (value: 0 or 1) - technically a url of the form '//www.host.com/url' is valid; the scheme (http/https) is assumed. However, it's also possible to have bad relative references such as '//dir/file', which is similar in spirit to '/dir//file' (i.e. too many slashes). When netloc_bug is enabled, any URL of the form '//blah/url' will be turned into 'http://blah/url'. This option was formerly called 'slashdot_bug' in LW 1.x, since slashdot.org was the first site I encountered using it (it makes for a great way to catch web crawlers ;) Note that this is enabled by default. source_callback (value: 0 or \&sub) - crawl will call this function (if this is a reference to a function), passing references to %hin and %hout, right before it parses the page for HTML links. This allows the callback function to review or modify the HTML before it's parsed for links. Return value is ignored. url_limit (value: integer) - number or URLs that crawl will queue up at one time; defaults to 1000 do_head (value: 0 or 1) - use head requests to determine if a file has a content-type worth downloading. Potentially saves some time, assuming the server properly supports HEAD requests. Set to value 1 to use (0/off by default). normalize_uri (value: 0 or 1) - when set, crawl() will normalize found URIs in order to ensure there are not duplicates (normalization means turning '/blah/../foo' and '/./foo' into '/foo') libwhisker2-perl-2.5/docs/evil.htm000066400000000000000000000052611133410331400171430ustar00rootroot00000000000000 This is an example of an evil HTML file, intended to screw up non-robust HTML parsers. It's used as a test for LW::html_find_tags(). So let's get started...

<link show Some parsers < a href="/blank">skip this This is also a variable case link? yes?">link? libwhisker2-perl-2.5/docs/forms_walkthrough.txt000066400000000000000000000310341133410331400217750ustar00rootroot00000000000000Libwhisker 2.x forms walkthrough --------------------------------------------------------------------------- This document discusses and demonstrates how to use the form parsing functionality contained in Libwhisker 2.x. This document assumes a Libwhisker 2.x version of at least 2.2. First and foremost, using the Libwhisker forms function requires a comfortable working knowledge of Perl anonymous structures and references. I recommend reading the first chapter of "Advanced Perl Programming" by O'Reilly to brush up on your anonymous data storage and reference concepts. Alright, let's start. --------------------------------------------------------------------------- Libwhisker stores all form data in a hash. Let's call this hash %FORM for now. The keys of the %FORM hash are one of three things: - The name of the element, as specified by 'name="..."' attribute - The value "unknown#" (where # is an incrementing number starting at 0) for any elements which do not specify a 'name="..."' attribute - The value "\0" (NULL) for the tag data Let's look at an example form, and see how this maps: If we were to pass this form data to the Libwhisker, the resulting hash would look like: $FORM = ( "\0" => ..., # data for

tag "first-input" => ..., # data for first input/text tag with # value 'one' "second-input" => ..., # data for second input/text tag with # value 'two' "unknown0" => ..., # data for third input/text tag with # value 'three' "first-check" => ..., # data for first checkbox with value # 'mycheck' "the-radio" => ..., # data for all the radio boxes named # 'the-radio' "areatext" => ..., # data for the first textarea tag "unknown1" => ... # data for first input/submit tag ); Right now we're not concerned with the actual data being stored--we're only focusing on the hash key names. The notable highlights to point out: - The form data is under the key "\0" (NULL) - Since no 'name=""' attribute was given for the third input/text and the input/submit elements, they were assigned names of "unknown0" and "unknown1", respectively - Elements which use multiple tags under the same name (select/option, radio, etc) all appear under a single entry of that name; in this case, the three "the-radio" radio buttons will all be contained under a single "the-radio" hash key Methodically parsing a Libwhisker form structure is a simple matter of first handling the special key "\0", and then iterating over the hash and handling each key (except "\0") as necessary. --------------------------------------------------------------------------- Let's move on to the next level of storage. For every key in the %FORM hash, Libwhisker creates an anonymous array. This array contains one entry for every tag with the given name. For many entries, which will lead to an array with only a single entry; however there will likely be many entries for radio buttons and select/options. There will also be multiple entries for tags having the same name. Let's look at another example form: If we were to pass this form data to the Libwhisker, the resulting hash would look like: $FORM = ( "\0" => ..., # data for the
tag, to be discussed # later "first-input" => [ ... # data for the first input box with value # "one" ], "unknown0" => [ ... # data for the second input box with value # "two" ], "the-radio" => [ ..., # data for radio box with value "1" ... # data for radio box with value "2" ], "the-select" => [ ..., # data for the tag ], "unknown1" => [ ... # data for the first submit element ] ); Note: At this point, the 'data' stored for each element (represented by '...') is still arbitrary and will be discussed later. Right now, we're just looking at how many sets of data are stored. Some highlights to note from this hash dump: - The "\0" key does not follow the same format as the rest of the keys (it is not a reference to an anonymous array); the "\0" key should always be processed by itself, and skipped for all other element processing - The "first-input", "unknown0", and "unknown1" keys have anonymous arrays with only a single element of data - The "the-radio" key is an anonymous array of two data elements, one for each encountered radio box - The "the-select" key is an anonymous array with a data set that contains an entry for the tag Hopefully this is still easy to understand thus far. All elements of the same name are put under the same hash key. The hash key points to an anonymous array which contains one entry for each tag/element encountered for that given element name. There is a corner case worth noting. Imagine the following HTML:
The tricky thing here is that all the elements are named "bar". How this is actually interpreted/handled is browser dependant; however, Libwhisker handles it by creating a single hash key "bar" and putting all the elements in the anonymous array, like so: $FORM = ( "\0" => ..., # data for
tag "bar" => [ ..., # data for input/text ..., # data for textarea ..., # data for checkbox ... # data for submit ] ); It should now be easy to see how multiple elements are stored under a single key. Let's move on. --------------------------------------------------------------------------- Up until this point, we've been using '...' as a placeholder for the tag data. It's time now for us to define exactly what this is. All tags/elements, except for the "\0" entry, have a set of data associated with them. This set of data takes the form of an anonymous array. The anonymous array has three entries: the type of the tag, the value of the tag, and a reference to an anonymous array containing additional tag attributes (we will discuss this a bit later, below). Thus a 'data set' looks like: [ $type, $value, [ ] ] First we'll discuss the $type. This is a string value of 'select', '/select', 'option', 'textarea', or an input value of the form 'input-?', where '?' is the actual input type (such as "input-submit", "input-text", "input-checkbox", etc.). One important thing to note: the 'input-?' value uses the value specified by the 'type' attribute, with no sanity checking or case adjustment. E.g.: # type is "input-text" # type is "input-TEXT" # type is "input-foobar" # type is "input-foo Bar bAZ" When processing the $type, you should always lowercase the value and then compare it to valid known types. The next data set entry is the $value. This is simply the value of the tag/element, or undef if the tag/element doesn't have a value attribute or when a value doesn't otherwise apply. The actual value contained in $value can vary depending on the specific HTML tag being parsed. Let's do a quick run-through of possible values: ---- ---- The value will be data in the 'value="..."' attribute; if a value attribute doesn't exist, then the value will be undef. # $value equals "foobar" # $value equals "" # $value is undef ---- ---- This is identical to input/text. ---- ---- This is identical to input/text; however, in order for radio boxes to work, you generally always need a value attribute. The trick here is that there will be one data set for every radio box encountered, meaning every possible value will be enumerated. You can tell which one is actually selected by default by looking at the optional attributes (discussed later). ---- ---- Checkboxes don't necessary need a value in order to work; the browser will typically submit a value of 'on' or '1' if the box is checked and a value is not specified. Therefore many times you'll find the value to be undef, but if a value attribute is specified, then the value will be set accordingly. Like the radio box, you can tell if the box is checked by consulting the optional attributes (again, discussed later). ---- ---- Submit buttons may or may not have an actual value assigned to them. In general, it's the same as input/text. ---- tags. If the area between the tags is empty, then the value will be an empty string (""). If a tag, then the value will be undef. ---- and will always be undef, since these tags do not carry any actual value. The values of subsequent