Statistics-Descriptive-3.0605000755000764000764 012146730560 16521 5ustar00shlomifshlomif000000000000Statistics-Descriptive-3.0605/Changes000444000764000764 2000412146730560 20165 0ustar00shlomifshlomif000000000000Revision history for Perl extension Statistics::Descriptive. 3.0605 2013-05-21 - Add t/style-trailing-space.t . - Add t/cpan-changes.t . - Convert Changes to it. 3.0604 2012-07-14 - Correct a misspelling of "weight" in lib/Statistics/Descriptive/Smoother/Weightedexponential.pm - Thanks to Wilhelm for the report. - Update the scripts/tag-release.pl file for Mercurial. 3.0603 2012-05-15 - Use in_between to compare decimal numbers - Smoothing tests were failing because of rounding problems - Thanks to Andreas J. König for reporting it and to Fabio Ponciroli for fixing it. 3.0602 2012-05-12 - Correct a typo: - https://rt.cpan.org/Ticket/Display.html?id=77145 - Thanks to Salvatore Bonaccorso and the Debian Perl Group for the report. 3.0601 2012-05-11 - No longer using Test::Exception in the tests. - It was used by the tests and not specified in test_requires/build_requires. - Thanks to hsk@fli-leibniz.de for the report. 3.0600 2012-05-11 - Add the smoothing functionality. - Add the following public methods: add_data_with_samples(), set_smoother(), get_smoothed_data() to the main module. - Add the lib/Statistics/Descriptive/Smoother.pm and lib/Statistics/Descriptive/Smoother/Exponential.pm lib/Statistics/Descriptive/Smoother/Weightedexponential.pm modules. - Thanks to Fabio Ponciroli - Add the scripts/bump-version-number.pl to facilitate bumping the version number. 3.0500 2012-05-03 - Add the get_data_without_outliers() and the set_outlier_filter() methods. - See https://bitbucket.org/barbasa/perl-statistics-descriptive/overview - Thanks to Fabio Ponciroli 3.0400 2012-03-01 - Fix https://rt.cpan.org/Ticket/Display.html?id=74890 - some function should return undef() in list context so they can be easily assigned to values in hash initialisations. - thanks to SLAFFAN for a preliminary patch which was modified by SHLOMIF (the current Statistics-Descriptive maintainer). 3.0300 2012-02-11 - Now mean() and median() and other routines return undef() if there are no data. - Somewhat incompatible change: some methods that returned undef() under list context now return an empty list (so it will be false). - it is generally not recommended to call such methods in list context as they should always be called in scalar context. - Resolves https://rt.cpan.org/Ticket/Display.html?id=74693 - thanks to Shawn Laffan for the report and the patch. 3.0203 2011-11-17 - Fix https://rt.cpan.org/Ticket/Display.html?id=72495 . - percentile should not die and should return undef if there are no elements in the collection. 3.0202 2011-07-23 - Moved tag-release.pl to scripts/tag-release.pl (though we now use Mercurial instead of Subversion.) - Add t/mode.t to test the ->mode() method. - Documented ->mode() better. - Optimized ->mode(). 3.0201 2010-10-14 - Add some documentation clarifying the 0th percentile return, as it returns undef() for representing -inf: - Fix https://rt.cpan.org/Ticket/Display.html?id=62055 - Thanks to Dave Breimann for reporting it. - Add the tag-release.pl to tag a release using Subversion. 3.0200 2010-06-18 - Added skewness and kurtosis - https://rt.cpan.org/Ticket/Display.html?id=58187 - Thanks to Shawn Laffan. - Removed the Changes / Revision log from the .pm file. 3.0102 2010-06-15 - Add the $VERSION variable to Statistics::Descriptive::Sparse and Statistics::Descriptive::Full. This was done to silence the CPAN indexer. 3.0101 2010-06-15 - Moved the trimmed_mean caching test (that used the Benchmark.pm module) to rejects/descr.t , because it kept failing. 3.0100 2009-07-20 - Added the quantile method - thanks to Djibril Ousmanou (DJIBEL). - https://rt.cpan.org/Ticket/Display.html?id=47948 3.0000 2009-05-29 - Added tests (for ->count, ->sum, ->sumsq, ->min, ->max) - Localized the scope of $stat and other variables in t/descr.t - Got rid of AUTOLOAD in favour of individual accessors. - Converted many direct member accesses to the accessors. - Added ->frequency_distribution_ref() which deprecates frequency_distribution(). - Some refactoring of the lib/Statistics/Descriptive.pm module (without breaking the documented API). 2.9 2009-05-13 - Fixed bug https://rt.cpan.org/Public/Bug/Display.html?id=46026 : - standard_deviation failing due to a variance that got evaluated to 0 due to rounding errors. - Kwalitee : added a LICENSE section to the POD. - Kwalitee (CPANTS) : added an examples/ directory with a script. 2.8 2009-05-09 - Enabled "./Build runtest" and "./Build distruntest" (using Test::Run) in the distribution. - Fixed incomplete/broken tests in t/descr.t. 2.7 2009-05-03 - Converted the distribution to Build.PL and re-organized it to put everything under its proper place. Started maintaining it in: - http://svn.berlios.de/svnroot/repos/web-cpan/Statistics-Descriptive/ - Converted t/descr.t to use "use strict;" and "use warnings;". - Converted t/descr.t to use Test::More. - Cleaned up the "use" statement of lib/Statistics/Descriptive.pm. - Added more explicit dependencies (core, though) to Build.PL. - Fixed RT bug #34999: freq distribution generated too many bins. - https://rt.cpan.org/Ticket/Display.html?id=34999 - Added some keywords and resources to the META.yml, using Build.PL's meta_merge. - Fixed https://rt.cpan.org/Ticket/Display.html?id=32183 - more authoritative (and non-broken) link to the RFC. - Applied the patch in https://rt.cpan.org/Ticket/Display.html?id=9160 - {{#9160: Variance and Standard Deviation use costly pseudo-variance, instead of computing real variance}}. 2.6 2002-10-10 - Fixed caching in trimmed mean and modified code to allow trimming 0% from upper bound. Formerly if 0 was requested then it used the lower bound! - POD format patch from ddunlap 2.5 1999-05-12 - Forgot to document change in v2.4, which included fixing percentile so that it worked right and added to the test harness. - Modified frequency_distribution so that specific bins could be passed in. Fixed caching so that it actually works (it only used to get stuck returning the result of the first call). - Turned off caching for least_squares_fit because there's no way to generate a unique key for memorization. 2.3 1998-11-12 - Fix for frequency distribution. Changed Makefile.PL to ease ActiveState distribution of the module. Andrea's code for preventing division by zero and other improvements. He also wrote a great test bench. Added code from Warren Matthews to calculate percentile. 2.2 1998-02-23 - Multiple bug fixes: Fixed min/max bug with '0' vs defined. Provided fix for bug with AUTOLOAD/DESTROY/Carp problem. 2.1 1997-09-02 - Multiple bug fixes: Cleaned up syntax error with my scoping. Fixed errors in least_squares_fit and median methods 2.00 1997-08-20 - new version; created by h2xs 1.16 - Complete rewrite of OO interface by Colin Kuskie. - Now has 2 classes instead of 1.5, a base class without data storage and a class that inherits the base methods and extends them with data storage and more statistics. 1.1 1995-04-01 - Added LeastSquaresFit and FrequencyDistribution. 1.0 1995-03-01 - Released to comp.lang.perl and placed on archive sites. 0.20 1994-12-01 - Complete rewrite after extensive and invaluable e-mail correspondence with Anno Siegel. 0.10 1994-12-01 - Initital concept, released to perl5-porters list. - Jason Kastner Statistics-Descriptive-3.0605/MANIFEST000444000764000764 121512146730560 20006 0ustar00shlomifshlomif000000000000Build.PL Changes MANIFEST META.yml Makefile.PL README UserSurvey.txt examples/statistical-analysis.pl inc/Test/Run/Builder.pm lib/Statistics/Descriptive.pm lib/Statistics/Descriptive/Smoother.pm lib/Statistics/Descriptive/Smoother/Exponential.pm lib/Statistics/Descriptive/Smoother/Weightedexponential.pm rejects/descr.t scripts/tag-release.pl scripts/bump-version-number.pl t/lib/Utils.pm t/00-load.t t/cpan-changes.t t/descr.t t/descr_smooth_methods.t t/freq_distribution-1-rt-34999.t t/mode.t t/outliers.t t/pod-coverage.t t/pod.t t/quantile.t t/smoother.t t/smoother_exponential.t t/smoother_weightedexponential.t t/style-trailing-space.t META.json Statistics-Descriptive-3.0605/Build.PL000444000764000764 243612146730560 20157 0ustar00shlomifshlomif000000000000use strict; use warnings; use lib "./inc"; use Test::Run::Builder; my $builder = Test::Run::Builder->new( module_name => 'Statistics::Descriptive', license => 'perl', dist_author => 'Shlomi Fish ', dist_version_from => 'lib/Statistics/Descriptive.pm', configure_requires => { 'Module::Build' => '0.36', }, build_requires => { 'Benchmark' => 0, 'lib' => 0, 'Test::More' => 0, }, requires => { 'Carp' => 0, 'POSIX' => 0, 'strict' => 0, 'vars' => 0, 'warnings' => 0, }, add_to_cleanup => [ 'Statistics-Descriptive-*' ], create_makefile_pl => 'traditional', PL_files => {}, meta_merge => { resources => { repository => "https://bitbucket.org/shlomif/perl-statistics-descriptive", homepage => "http://web-cpan.berlios.de/modules/Statistics-Descriptive/", }, keywords => [ qw( average distribution mean median statistics stats stddev ), "standard deviation" ], }, ); $builder->create_build_script(); Statistics-Descriptive-3.0605/Makefile.PL000444000764000764 112112146730560 20623 0ustar00shlomifshlomif000000000000# Note: this file was auto-generated by Module::Build::Compat version 0.39_01 use ExtUtils::MakeMaker; WriteMakefile ( 'NAME' => 'Statistics::Descriptive', 'VERSION_FROM' => 'lib/Statistics/Descriptive.pm', 'PREREQ_PM' => { 'Benchmark' => 0, 'Carp' => 0, 'POSIX' => 0, 'Test::More' => 0, 'lib' => 0, 'strict' => 0, 'vars' => 0, 'warnings' => 0 }, 'INSTALLDIRS' => 'site', 'EXE_FILES' => [], 'PL_FILES' => {} ) ; Statistics-Descriptive-3.0605/UserSurvey.txt000444000764000764 211712146730560 21554 0ustar00shlomifshlomif000000000000Over the three years that I've owned Statistics::Descriptive I've often wondered how users of the module feel about it. I would appreciate it if you would answer the following questions: 1) What do you use Statistics::Descriptive for? (This is something that always amazes me. I think the coolest app so far is summarizing web server performance for the 1998 Olympics in Atlanta) 2) On what platform do you use Perl and Statistics::Descriptive? 3) Is the manpage helpful? (If not, please explain why). 4) Do you use the caching for Full method outputs such as trimmed_mean()? 5) Is there anything really annoying about Statistics::Descriptive? (aside from flakiness from the maintainer like surveys and bad releases) 6) Is there anything missing from Statistics::Descriptive? (I inherited it from Jason Kastner because I wanted to add a few features and he didn't have time to maintain the module anymore. After v2.2, any new features (2-3) that were added have come from requests from users.) 7) Any other comments you'd like to make that I haven't asked about? Statistics-Descriptive-3.0605/META.yml000444000764000764 275312146730560 20136 0ustar00shlomifshlomif000000000000--- abstract: 'Module of basic descriptive statistical functions.' author: - 'Shlomi Fish ' build_requires: Benchmark: 0 Test::More: 0 lib: 0 configure_requires: Module::Build: 0.36 dynamic_config: 1 generated_by: 'Module::Build version 0.3901, CPAN::Meta::Converter version 2.120921' keywords: - average - distribution - mean - median - statistics - stats - stddev - 'standard deviation' license: perl meta-spec: url: http://module-build.sourceforge.net/META-spec-v1.4.html version: 1.4 name: Statistics-Descriptive provides: Statistics::Descriptive: file: lib/Statistics/Descriptive.pm version: 3.0605 Statistics::Descriptive::Full: file: lib/Statistics/Descriptive.pm version: 3.0605 Statistics::Descriptive::Smoother: file: lib/Statistics/Descriptive/Smoother.pm version: 3.0605 Statistics::Descriptive::Smoother::Exponential: file: lib/Statistics/Descriptive/Smoother/Exponential.pm version: 3.0605 Statistics::Descriptive::Smoother::Weightedexponential: file: lib/Statistics/Descriptive/Smoother/Weightedexponential.pm version: 3.0605 Statistics::Descriptive::Sparse: file: lib/Statistics/Descriptive.pm version: 3.0605 requires: Carp: 0 POSIX: 0 strict: 0 vars: 0 warnings: 0 resources: homepage: http://web-cpan.berlios.de/modules/Statistics-Descriptive/ license: http://dev.perl.org/licenses/ repository: https://bitbucket.org/shlomif/perl-statistics-descriptive version: 3.0605 Statistics-Descriptive-3.0605/META.json000444000764000764 445012146730560 20302 0ustar00shlomifshlomif000000000000{ "abstract" : "Module of basic descriptive statistical functions.", "author" : [ "Shlomi Fish " ], "dynamic_config" : 1, "generated_by" : "Module::Build version 0.3901, CPAN::Meta::Converter version 2.120921", "keywords" : [ "average", "distribution", "mean", "median", "statistics", "stats", "stddev", "standard deviation" ], "license" : [ "perl_5" ], "meta-spec" : { "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec", "version" : "2" }, "name" : "Statistics-Descriptive", "prereqs" : { "build" : { "requires" : { "Benchmark" : "0", "Test::More" : "0", "lib" : "0" } }, "configure" : { "requires" : { "Module::Build" : "0.36" } }, "runtime" : { "requires" : { "Carp" : "0", "POSIX" : "0", "strict" : "0", "vars" : "0", "warnings" : "0" } } }, "provides" : { "Statistics::Descriptive" : { "file" : "lib/Statistics/Descriptive.pm", "version" : "3.0605" }, "Statistics::Descriptive::Full" : { "file" : "lib/Statistics/Descriptive.pm", "version" : "3.0605" }, "Statistics::Descriptive::Smoother" : { "file" : "lib/Statistics/Descriptive/Smoother.pm", "version" : "3.0605" }, "Statistics::Descriptive::Smoother::Exponential" : { "file" : "lib/Statistics/Descriptive/Smoother/Exponential.pm", "version" : "3.0605" }, "Statistics::Descriptive::Smoother::Weightedexponential" : { "file" : "lib/Statistics/Descriptive/Smoother/Weightedexponential.pm", "version" : "3.0605" }, "Statistics::Descriptive::Sparse" : { "file" : "lib/Statistics/Descriptive.pm", "version" : "3.0605" } }, "release_status" : "stable", "resources" : { "homepage" : "http://web-cpan.berlios.de/modules/Statistics-Descriptive/", "license" : [ "http://dev.perl.org/licenses/" ], "repository" : { "url" : "https://bitbucket.org/shlomif/perl-statistics-descriptive" } }, "version" : "3.0605" } Statistics-Descriptive-3.0605/README000444000764000764 306712146730560 17544 0ustar00shlomifshlomif000000000000INSTALLATION To install this module, run the following commands: perl Build.PL ./Build ./Build test ./Build install SUPPORT AND DOCUMENTATION After installing, you can find documentation for this module with the perldoc command. perldoc Statistics::Descriptive You can also look for information at: RT, CPAN's request tracker http://rt.cpan.org/NoAuth/Bugs.html?Dist=Statistics-Descriptive AnnoCPAN, Annotated CPAN documentation http://annocpan.org/dist/Statistics-Descriptive CPAN Ratings http://cpanratings.perl.org/d/Statistics-Descriptive Search CPAN http://search.cpan.org/dist/Statistics-Descriptive COPYRIGHT AND LICENCE This program is released under the following license: perl ------------ Old README: =========== This new version (2.5) of Statistics::Descriptive contains: - frequency_distribution was modified so that specific bins could be defined. Also, caching was fixed so that it works. - Lots of changes to the test suite to enhance coverage. - Caching was removed from least_squares_fit since it didn't make sense. - Added user survey to distribution. List of methods by class: Statistics::Descriptive::Sparse ---------------------------------- add_data count mean sum variance pseudo_variance min max mindex maxdex standard_deviation sample_range Statistics::Descriptive::Full ---------------------------------- All methods above and: get_data sort_data presorted percentile median trimmed_mean harmonic_mean mode geometric_mean frequency_distribution least_squares_fit ckuskie@cpan.org Statistics-Descriptive-3.0605/lib000755000764000764 012146730560 17267 5ustar00shlomifshlomif000000000000Statistics-Descriptive-3.0605/lib/Statistics000755000764000764 012146730560 21421 5ustar00shlomifshlomif000000000000Statistics-Descriptive-3.0605/lib/Statistics/Descriptive.pm000444000764000764 10571512146730560 24446 0ustar00shlomifshlomif000000000000package Statistics::Descriptive; use strict; use warnings; ##This module draws heavily from perltoot v0.4 from Tom Christiansen. require 5.00404; ##Yes, this is underhanded, but makes support for me easier ##Not only that, but it's the latest "safe" version of ##Perl5. 01-03 weren't bug free. use vars (qw($VERSION $Tolerance $Min_samples_number)); $VERSION = '3.0605'; $Tolerance = 0.0; $Min_samples_number = 4; package Statistics::Descriptive::Sparse; use vars qw($VERSION); $VERSION = '3.0605'; use vars qw(%fields); use Carp; use Statistics::Descriptive::Smoother; sub _make_accessors { my ($pkg, $methods) = @_; no strict 'refs'; foreach my $method (@$methods) { *{$pkg."::".$method} = do { my $m = $method; sub { my $self = shift; if (@_) { $self->{$m} = shift; } return $self->{$m}; }; }; } return; } sub _make_private_accessors { my ($pkg, $methods) = @_; no strict 'refs'; foreach my $method (@$methods) { *{$pkg."::_".$method} = do { my $m = $method; sub { my $self = shift; if (@_) { $self->{$m} = shift; } return $self->{$m}; }; }; } return; } ##Define the fields to be used as methods %fields = ( count => 0, mean => undef, sum => undef, sumsq => undef, min => undef, max => undef, mindex => undef, maxdex => undef, sample_range => undef, variance => undef, ); __PACKAGE__->_make_accessors( [ grep { $_ ne "variance" } keys(%fields) ] ); __PACKAGE__->_make_accessors( ["_permitted"] ); __PACKAGE__->_make_private_accessors(["variance"]); sub new { my $proto = shift; my $class = ref($proto) || $proto; my $self = { %fields, }; bless ($self, $class); $self->_permitted(\%fields); return $self; } sub _is_permitted { my $self = shift; my $key = shift; return exists($self->_permitted()->{$key}); } sub add_data { my $self = shift; ##Myself my $oldmean; my ($min,$mindex,$max,$maxdex,$sum,$sumsq,$count); my $aref; if (ref $_[0] eq 'ARRAY') { $aref = $_[0]; } else { $aref = \@_; } ##If we were given no data, we do nothing. return 1 if (!@{ $aref }); ##Take care of appending to an existing data set if (!defined($min = $self->min())) { $min = $aref->[$mindex = 0]; } else { $mindex = $self->mindex(); } if (!defined($max = $self->max())) { $max = $aref->[$maxdex = 0]; } else { $maxdex = $self->maxdex(); } $sum = $self->sum(); $sumsq = $self->sumsq(); $count = $self->count(); ##Calculate new mean, sumsq, min and max; foreach ( @{ $aref } ) { $sum += $_; $sumsq += $_**2; $count++; if ($_ >= $max) { $max = $_; $maxdex = $count-1; } if ($_ <= $min) { $min = $_; $mindex = $count-1; } } $self->min($min); $self->mindex($mindex); $self->max($max); $self->maxdex($maxdex); $self->sample_range($max - $min); $self->sum($sum); $self->sumsq($sumsq); $self->mean($sum / $count); $self->count($count); ##indicator the value is not cached. Variance isn't commonly enough ##used to recompute every single data add. $self->_variance(undef); return 1; } sub standard_deviation { my $self = shift; ##Myself return undef if (!$self->count()); return sqrt($self->variance()); } ##Return variance; if needed, compute and cache it. sub variance { my $self = shift; ##Myself return undef if (!$self->count()); my $div = @_ ? 0 : 1; my $count = $self->count(); if ($count < 1 + $div) { return 0; } if (!defined($self->_variance())) { my $variance = ($self->sumsq()- $count * $self->mean()**2); # Sometimes due to rounding errors we get a number below 0. # This makes sure this is handled as gracefully as possible. # # See: # # https://rt.cpan.org/Public/Bug/Display.html?id=46026 if ($variance < 0) { $variance = 0; } $variance /= $count - $div; $self->_variance($variance); } return $self->_variance(); } ##Clear a stat. More efficient than destroying an object and calling ##new. sub clear { my $self = shift; ##Myself my $key; return if (!$self->count()); while (my($field, $value) = each %fields) { $self->{$field} = $value; } } 1; package Statistics::Descriptive::Full; use vars qw($VERSION); $VERSION = '3.0605'; use Carp; use POSIX (); use Statistics::Descriptive::Smoother; use vars qw(@ISA $a $b %fields); @ISA = qw(Statistics::Descriptive::Sparse); ##Create a list of fields not to remove when data is updated %fields = ( _permitted => undef, ##Place holder for the inherited key hash data => undef, ##Our data samples => undef, ##Number of samples for each value of the data set presorted => undef, ##Flag to indicate the data is already sorted _reserved => undef, ##Place holder for this lookup hash ); __PACKAGE__->_make_private_accessors( [qw(data samples frequency geometric_mean harmonic_mean least_squares_fit median mode skewness kurtosis ) ] ); __PACKAGE__->_make_accessors([qw(presorted _reserved _trimmed_mean_cache)]); sub _clear_fields { my $self = shift; # Empty array ref for holding data later! $self->_data([]); $self->_samples([]); $self->_reserved(\%fields); $self->presorted(0); $self->_trimmed_mean_cache(+{}); return; } ##Have to override the base method to add the data to the object ##The proxy method from above is still valid sub new { my $proto = shift; my $class = ref($proto) || $proto; # Create my self re SUPER my $self = $class->SUPER::new(); bless ($self, $class); #Re-anneal the object $self->_clear_fields(); return $self; } sub _is_reserved { my $self = shift; my $field = shift; return exists($self->_reserved->{$field}); } sub _delete_all_cached_keys { my $self = shift; KEYS_LOOP: foreach my $key (keys %{ $self }) { # Check each key in the object # If it's a reserved key for this class, keep it if ($self->_is_reserved($key) || $self->_is_permitted($key)) { next KEYS_LOOP; } delete $self->{$key}; # Delete the out of date cached key } return; } ##Clear a stat. More efficient than destroying an object and calling ##new. sub clear { my $self = shift; ##Myself my $key; if (!$self->count()) { return; } $self->_delete_all_cached_keys(); $self->SUPER::clear(); $self->_clear_fields(); } sub add_data { my $self = shift; my $aref; if (ref $_[0] eq 'ARRAY') { $aref = $_[0]; } else { $aref = \@_; } $self->SUPER::add_data($aref); ##Perform base statistics on the data push @{ $self->_data() }, @{ $aref }; ##Clear the presorted flag $self->presorted(0); $self->_delete_all_cached_keys(); return 1; } sub add_data_with_samples { my ($self,$aref_values) = @_; return 1 if (!@{ $aref_values }); my $aref_data = [map { keys %$_ } @{ $aref_values }]; my $aref_samples = [map { values %$_ } @{ $aref_values }]; $self->add_data($aref_data); push @{ $self->_samples() }, @{ $aref_samples }; return 1; } sub get_data { my $self = shift; return @{ $self->_data() }; } sub get_data_without_outliers { my $self = shift; if ($self->count() < $Statistics::Descriptive::Min_samples_number) { carp("Need at least $Statistics::Descriptive::Min_samples_number samples\n"); return; } if (!defined $self->{_outlier_filter}) { carp("Outliers filter not defined\n"); return; } my $outlier_candidate_index = $self->_outlier_candidate_index; my $possible_outlier = ($self->_data())->[$outlier_candidate_index]; my $is_outlier = $self->{_outlier_filter}->($self, $possible_outlier); return $self->get_data unless $is_outlier; # Removing the outlier from the dataset my @good_indexes = grep { $_ != $outlier_candidate_index } (0 .. $self->count() - 1); my @data = $self->get_data; my @filtered_data = @data[@good_indexes]; return @filtered_data; } sub set_outlier_filter { my ($self, $code_ref) = @_; if (!$code_ref || ref($code_ref) ne "CODE") { carp("Need to pass a code reference"); return; } $self->{_outlier_filter} = $code_ref; return 1; } sub _outlier_candidate_index { my $self = shift; my $mean = $self->mean(); my $outlier_candidate_index = 0; my $max_std_deviation = abs(($self->_data())->[0] - $mean); foreach my $idx (1 .. ($self->count() - 1) ) { my $curr_value = ($self->_data())->[$idx]; if ($max_std_deviation < abs($curr_value - $mean) ) { $outlier_candidate_index = $idx; $max_std_deviation = abs($curr_value - $mean); } } return $outlier_candidate_index; } sub set_smoother { my ($self, $args) = @_; $args->{data} = $self->_data(); $args->{samples} = $self->_samples(); $self->{_smoother} = Statistics::Descriptive::Smoother->instantiate($args); } sub get_smoothed_data { my ($self, $args) = @_; if (!defined $self->{_smoother}) { carp("Smoother object not defined\n"); return; } $self->{_smoother}->get_smoothed_data(); } sub sort_data { my $self = shift; if (! $self->presorted()) { ##Sort the data in descending order $self->_data([ sort {$a <=> $b} @{$self->_data()} ]); $self->presorted(1); ##Fix the maxima and minima indices $self->mindex(0); $self->maxdex($#{$self->_data()}); } return 1; } sub percentile { my $self = shift; my $percentile = shift || 0; ##Since we're returning a single value there's no real need ##to cache this. ##If the requested percentile is less than the "percentile bin ##size" then return undef. Check description of RFC 2330 in the ##POD below. my $count = $self->count(); if ((! $count) || ($percentile < 100 / $count)) { return; # allow for both scalar and list context } $self->sort_data(); my $num = $count*$percentile/100; my $index = &POSIX::ceil($num) - 1; my $val = $self->_data->[$index]; return wantarray ? ($val, $index) : $val ; } sub _calc_new_median { my $self = shift; my $count = $self->count(); ##Even or odd if ($count % 2) { return $self->_data->[($count-1)/2]; } else { return ( ($self->_data->[($count)/2] + $self->_data->[($count-2)/2] ) / 2 ); } } sub median { my $self = shift; return undef if !$self->count; ##Cached? if (! defined($self->_median())) { $self->sort_data(); $self->_median($self->_calc_new_median()); } return $self->_median(); } sub quantile { my ( $self, $QuantileNumber ) = @_; unless ( defined $QuantileNumber and $QuantileNumber =~ m/^0|1|2|3|4$/ ) { carp("Bad quartile type, must be 0, 1, 2, 3 or 4\n"); return; } # check data count after the args are checked - should help debugging return undef if !$self->count; $self->sort_data(); return $self->_data->[0] if ( $QuantileNumber == 0 ); my $count = $self->count(); return $self->_data->[ $count - 1 ] if ( $QuantileNumber == 4 ); my $K_quantile = ( ( $QuantileNumber / 4 ) * ( $count - 1 ) + 1 ); my $F_quantile = $K_quantile - POSIX::floor($K_quantile); $K_quantile = POSIX::floor($K_quantile); # interpolation my $aK_quantile = $self->_data->[ $K_quantile - 1 ]; return $aK_quantile if ( $F_quantile == 0 ); my $aKPlus_quantile = $self->_data->[$K_quantile]; # Calcul quantile my $quantile = $aK_quantile + ( $F_quantile * ( $aKPlus_quantile - $aK_quantile ) ); return $quantile; } sub _real_calc_trimmed_mean { my $self = shift; my $lower = shift; my $upper = shift; my $lower_trim = int ($self->count()*$lower); my $upper_trim = int ($self->count()*$upper); my ($val,$oldmean) = (0,0); my ($tm_count,$tm_mean,$index) = (0,0,$lower_trim); $self->sort_data(); while ($index <= $self->count() - $upper_trim -1) { $val = $self->_data()->[$index]; $oldmean = $tm_mean; $index++; $tm_count++; $tm_mean += ($val - $oldmean) / $tm_count; } return $tm_mean; } sub trimmed_mean { my $self = shift; my ($lower,$upper); #upper bound is in arg list or is same as lower if (@_ == 1) { ($lower,$upper) = ($_[0],$_[0]); } else { ($lower,$upper) = ($_[0],$_[1]); } # check data count after the args return undef if !$self->count; ##Cache my $thistm = join ':',$lower,$upper; my $cache = $self->_trimmed_mean_cache(); if (!exists($cache->{$thistm})) { $cache->{$thistm} = $self->_real_calc_trimmed_mean($lower, $upper); } return $cache->{$thistm}; } sub _test_for_too_small_val { my $self = shift; my $val = shift; return (abs($val) <= $Statistics::Descriptive::Tolerance); } sub _calc_harmonic_mean { my $self = shift; my $hs = 0; foreach my $item ( @{$self->_data()} ) { ##Guarantee that there are no divide by zeros if ($self->_test_for_too_small_val($item)) { return; } $hs += 1/$item; } if ($self->_test_for_too_small_val($hs)) { return; } return $self->count()/$hs; } sub harmonic_mean { my $self = shift; if (!defined($self->_harmonic_mean())) { $self->_harmonic_mean(scalar($self->_calc_harmonic_mean())); } return $self->_harmonic_mean(); } sub mode { my $self = shift; if (!defined ($self->_mode())) { my $mode = 0; my $occurances = 0; my %count; foreach my $item (@{ $self->_data() }) { my $count = ++$count{$item}; if ($count > $occurances) { $mode = $item; $occurances = $count; } } $self->_mode( ($occurances > 1) ? {exists => 1, mode => $mode} : {exists => 0,} ); } my $m = $self->_mode; return $m->{'exists'} ? $m->{mode} : undef; } sub geometric_mean { my $self = shift; return undef if !$self->count; if (!defined($self->_geometric_mean())) { my $gm = 1; my $exponent = 1/$self->count(); for my $val (@{ $self->_data() }) { if ($val < 0) { return undef; } $gm *= $val**$exponent; } $self->_geometric_mean($gm); } return $self->_geometric_mean(); } sub skewness { my $self = shift; if (!defined($self->_skewness())) { my $n = $self->count(); my $sd = $self->standard_deviation(); my $skew; # skip if insufficient records if ( $sd && $n > 2) { my $mean = $self->mean(); my $sum_pow3; foreach my $rec ( $self->get_data ) { my $value = (($rec - $mean) / $sd); $sum_pow3 += $value ** 3; } my $correction = $n / ( ($n-1) * ($n-2) ); $skew = $correction * $sum_pow3; } $self->_skewness($skew); } return $self->_skewness(); } sub kurtosis { my $self = shift; if (!defined($self->_kurtosis())) { my $kurt; my $n = $self->count(); my $sd = $self->standard_deviation(); if ( $sd && $n > 3) { my $mean = $self->mean(); my $sum_pow4; foreach my $rec ( $self->get_data ) { $sum_pow4 += ( ($rec - $mean ) / $sd ) ** 4; } my $correction1 = ( $n * ($n+1) ) / ( ($n-1) * ($n-2) * ($n-3) ); my $correction2 = ( 3 * ($n-1) ** 2) / ( ($n-2) * ($n-3) ); $kurt = ( $correction1 * $sum_pow4 ) - $correction2; } $self->_kurtosis($kurt); } return $self->_kurtosis(); } sub frequency_distribution_ref { my $self = shift; my @k = (); # Must have at least two elements if ($self->count() < 2) { return undef; } if ((!@_) && (defined $self->_frequency())) { return $self->_frequency() } my %bins; my $partitions = shift; if (ref($partitions) eq 'ARRAY') { @k = @{ $partitions }; return undef unless @k; ##Empty array if (@k > 1) { ##Check for monotonicity my $element = $k[0]; for my $next_elem (@k[1..$#k]) { if ($element > $next_elem) { carp "Non monotonic array cannot be used as frequency bins!\n"; return undef; } $element = $next_elem; } } %bins = map { $_ => 0 } @k; } else { return undef unless $partitions >= 1; my $interval = $self->sample_range() / $partitions; foreach my $idx (1 .. ($partitions-1)) { push @k, ($self->min() + $idx * $interval); } $bins{$self->max()} = 0; push @k, $self->max(); } ELEMENT: foreach my $element (@{$self->_data()}) { foreach my $limit (@k) { if ($element <= $limit) { $bins{$limit}++; next ELEMENT; } } } return $self->_frequency(\%bins); } sub frequency_distribution { my $self = shift; my $ret = $self->frequency_distribution_ref(@_); if (!defined($ret)) { return undef; } else { return %$ret; } } sub least_squares_fit { my $self = shift; return () if $self->count() < 2; ##Sigma sums my ($sigmaxy, $sigmax, $sigmaxx, $sigmayy, $sigmay) = (0,0,0,0,$self->sum); my ($xvar, $yvar, $err); ##Work variables my ($iter,$y,$x,$denom) = (0,0,0,0); my $count = $self->count(); my @x; ##Outputs my ($m, $q, $r, $rms); if (!defined $_[1]) { @x = 1..$self->count(); } else { @x = @_; if ( $self->count() != scalar @x) { carp "Range and domain are of unequal length."; return (); } } foreach $x (@x) { $y = $self->_data->[$iter]; $sigmayy += $y * $y; $sigmaxx += $x * $x; $sigmaxy += $x * $y; $sigmax += $x; $iter++; } $denom = $count * $sigmaxx - $sigmax*$sigmax; return () unless abs( $denom ) > $Statistics::Descriptive::Tolerance; $m = ($count*$sigmaxy - $sigmax*$sigmay) / $denom; $q = ($sigmaxx*$sigmay - $sigmax*$sigmaxy ) / $denom; $xvar = $sigmaxx - $sigmax*$sigmax / $count; $yvar = $sigmayy - $sigmay*$sigmay / $count; $denom = sqrt( $xvar * $yvar ); return () unless (abs( $denom ) > $Statistics::Descriptive::Tolerance); $r = ($sigmaxy - $sigmax*$sigmay / $count )/ $denom; $iter = 0; $rms = 0.0; foreach (@x) { ##Error = Real y - calculated y $err = $self->_data->[$iter] - ( $m * $_ + $q ); $rms += $err*$err; $iter++; } $rms = sqrt($rms / $count); $self->_least_squares_fit([$q, $m, $r, $rms]); return @{ $self->_least_squares_fit() }; } 1; package Statistics::Descriptive; ##All modules return true. 1; __END__ =head1 NAME Statistics::Descriptive - Module of basic descriptive statistical functions. =head1 SYNOPSIS use Statistics::Descriptive; $stat = Statistics::Descriptive::Full->new(); $stat->add_data(1,2,3,4); $mean = $stat->mean(); $var = $stat->variance(); $tm = $stat->trimmed_mean(.25); $Statistics::Descriptive::Tolerance = 1e-10; =head1 DESCRIPTION This module provides basic functions used in descriptive statistics. It has an object oriented design and supports two different types of data storage and calculation objects: sparse and full. With the sparse method, none of the data is stored and only a few statistical measures are available. Using the full method, the entire data set is retained and additional functions are available. Whenever a division by zero may occur, the denominator is checked to be greater than the value C<$Statistics::Descriptive::Tolerance>, which defaults to 0.0. You may want to change this value to some small positive value such as 1e-24 in order to obtain error messages in case of very small denominators. Many of the methods (both Sparse and Full) cache values so that subsequent calls with the same arguments are faster. =head1 METHODS =head2 Sparse Methods =over 5 =item $stat = Statistics::Descriptive::Sparse->new(); Create a new sparse statistics object. =item $stat->clear(); Effectively the same as my $class = ref($stat); undef $stat; $stat = new $class; except more efficient. =item $stat->add_data(1,2,3); Adds data to the statistics variable. The cached statistical values are updated automatically. =item $stat->count(); Returns the number of data items. =item $stat->mean(); Returns the mean of the data. =item $stat->sum(); Returns the sum of the data. =item $stat->variance(); Returns the variance of the data. Division by n-1 is used. =item $stat->standard_deviation(); Returns the standard deviation of the data. Division by n-1 is used. =item $stat->min(); Returns the minimum value of the data set. =item $stat->mindex(); Returns the index of the minimum value of the data set. =item $stat->max(); Returns the maximum value of the data set. =item $stat->maxdex(); Returns the index of the maximum value of the data set. =item $stat->sample_range(); Returns the sample range (max - min) of the data set. =back =head2 Full Methods Similar to the Sparse Methods above, any Full Method that is called caches the current result so that it doesn't have to be recalculated. In some cases, several values can be cached at the same time. =over 5 =item $stat = Statistics::Descriptive::Full->new(); Create a new statistics object that inherits from Statistics::Descriptive::Sparse so that it contains all the methods described above. =item $stat->add_data(1,2,4,5); Adds data to the statistics variable. All of the sparse statistical values are updated and cached. Cached values from Full methods are deleted since they are no longer valid. I =item $stat->add_data_with_samples([{1 => 10}, {2 => 20}, {3 => 30},]); Add data to the statistics variable and set the number of samples each value has been built with. The data is the key of each element of the input array ref, while the value is the number of samples: [{data1 => smaples1}, {data2 => samples2}, ...] =item $stat->get_data(); Returns a copy of the data array. =item $stat->get_data_without_outliers(); Returns a copy of the data array without outliers. The number minimum of samples to apply the outlier filtering is C<$Statistics::Descriptive::Min_samples_number>, 4 by default. A function to detect outliers need to be defined (see C), otherwise the function will return an undef value. The filtering will act only on the most extreme value of the data set (i.e.: value with the highest absolute standard deviation from the mean). If there is the need to remove more than one outlier, the filtering need to be re-run for the next most extreme value with the initial outlier removed. This is not always needed since the test (for example Grubb's test) usually can only detect the most exreme value. If there is more than one extreme case in a set, then the standard deviation will be high enough to make neither case an outlier. =item $stat->set_outlier_filter($code_ref); Set the function to filter out the outlier. C<$code_ref> is the reference to the subroutine implemeting the filtering function. Returns C for invalid values of C<$code_ref> (i.e.: not defined or not a code reference), C<1> otherwise. =over 4 =item Example #1: Undefined code reference my $stat = Statistics::Descriptive::Full->new(); $stat->add_data(1, 2, 3, 4, 5); print $stat->set_outlier_filter(); # => undef =item Example #2: Valid code reference sub outlier_filter { return $_[1] > 1; } my $stat = Statistics::Descriptive::Full->new(); $stat->add_data( 1, 1, 1, 100, 1, ); print $stat->set_outlier_filter( \&outlier_filter ); # => 1 my @filtered_data = $stat->get_data_without_outliers(); # @filtered_data is (1, 1, 1, 1) In this example the series is really simple and the outlier filter function as well. For more complex series the outlier filter function might be more complex (see Grubbs' test for outliers). The outlier filter function will receive as first parameter the Statistics::Descriptive::Full object, as second the value of the candidate outlier. Having the object in the function might be useful for complex filters where statistics property are needed (again see Grubbs' test for outlier). =back =item $stat->set_smoother({ method => 'exponential', coeff => 0, }); Set the method used to smooth the data and the smoothing coefficient. See C for more details. =item $stat->get_smoothed_data(); Returns a copy of the smoothed data array. The smoothing method and coefficient need to be defined (see C), otherwise the function will return an undef value. =item $stat->sort_data(); Sort the stored data and update the mindex and maxdex methods. This method uses perl's internal sort. =item $stat->presorted(1); =item $stat->presorted(); If called with a non-zero argument, this method sets a flag that says the data is already sorted and need not be sorted again. Since some of the methods in this class require sorted data, this saves some time. If you supply sorted data to the object, call this method to prevent the data from being sorted again. The flag is cleared whenever add_data is called. Calling the method without an argument returns the value of the flag. =item $stat->skewness(); Returns the skewness of the data. A value of zero is no skew, negative is a left skewed tail, positive is a right skewed tail. This is consistent with Excel. =item $stat->kurtosis(); Returns the kurtosis of the data. Positive is peaked, negative is flattened. =item $x = $stat->percentile(25); =item ($x, $index) = $stat->percentile(25); Sorts the data and returns the value that corresponds to the percentile as defined in RFC2330: =over 4 =item For example, given the 6 measurements: -2, 7, 7, 4, 18, -5 Then F(-8) = 0, F(-5) = 1/6, F(-5.0001) = 0, F(-4.999) = 1/6, F(7) = 5/6, F(18) = 1, F(239) = 1. Note that we can recover the different measured values and how many times each occurred from F(x) -- no information regarding the range in values is lost. Summarizing measurements using histograms, on the other hand, in general loses information about the different values observed, so the EDF is preferred. Using either the EDF or a histogram, however, we do lose information regarding the order in which the values were observed. Whether this loss is potentially significant will depend on the metric being measured. We will use the term "percentile" to refer to the smallest value of x for which F(x) >= a given percentage. So the 50th percentile of the example above is 4, since F(4) = 3/6 = 50%; the 25th percentile is -2, since F(-5) = 1/6 < 25%, and F(-2) = 2/6 >= 25%; the 100th percentile is 18; and the 0th percentile is -infinity, as is the 15th percentile, which for ease of handling and backward compatibility is returned as undef() by the function. Care must be taken when using percentiles to summarize a sample, because they can lend an unwarranted appearance of more precision than is really available. Any such summary must include the sample size N, because any percentile difference finer than 1/N is below the resolution of the sample. =back (Taken from: I, Section 11.3. Defining Statistical Distributions. RFC2330 is available from: L .) If the percentile method is called in a list context then it will also return the index of the percentile. =item $x = $stat->quantile($Type); Sorts the data and returns estimates of underlying distribution quantiles based on one or two order statistics from the supplied elements. This method use the same algorithm as Excel and R language (quantile B). The generic function quantile produces sample quantiles corresponding to the given probabilities. B<$Type> is an integer value between 0 to 4 : 0 => zero quartile (Q0) : minimal value 1 => first quartile (Q1) : lower quartile = lowest cut off (25%) of data = 25th percentile 2 => second quartile (Q2) : median = it cuts data set in half = 50th percentile 3 => third quartile (Q3) : upper quartile = highest cut off (25%) of data, or lowest 75% = 75th percentile 4 => fourth quartile (Q4) : maximal value Exemple : my @data = (1..10); my $stat = Statistics::Descriptive::Full->new(); $stat->add_data(@data); print $stat->quantile(0); # => 1 print $stat->quantile(1); # => 3.25 print $stat->quantile(2); # => 5.5 print $stat->quantile(3); # => 7.75 print $stat->quantile(4); # => 10 =item $stat->median(); Sorts the data and returns the median value of the data. =item $stat->harmonic_mean(); Returns the harmonic mean of the data. Since the mean is undefined if any of the data are zero or if the sum of the reciprocals is zero, it will return undef for both of those cases. =item $stat->geometric_mean(); Returns the geometric mean of the data. =item my $mode = $stat->mode(); Returns the mode of the data. The mode is the most commonly occuring datum. See L . If all values occur only once, then mode() will return undef. =item $stat->trimmed_mean(ltrim[,utrim]); C returns the mean with a fraction C of entries at each end dropped. C returns the mean after a fraction C has been removed from the lower end of the data and a fraction C has been removed from the upper end of the data. This method sorts the data before beginning to analyze it. All calls to trimmed_mean() are cached so that they don't have to be calculated a second time. =item $stat->frequency_distribution_ref($partitions); =item $stat->frequency_distribution_ref(\@bins); =item $stat->frequency_distribution_ref(); C slices the data into C<$partition> sets (where $partition is greater than 1) and counts the number of items that fall into each partition. It returns a reference to a hash where the keys are the numerical values of the partitions used. The minimum value of the data set is not a key and the maximum value of the data set is always a key. The number of entries for a particular partition key are the number of items which are greater than the previous partition key and less then or equal to the current partition key. As an example, $stat->add_data(1,1.5,2,2.5,3,3.5,4); $f = $stat->frequency_distribution_ref(2); for (sort {$a <=> $b} keys %$f) { print "key = $_, count = $f->{$_}\n"; } prints key = 2.5, count = 4 key = 4, count = 3 since there are four items less than or equal to 2.5, and 3 items greater than 2.5 and less than 4. C provides the bins that are to be used for the distribution. This allows for non-uniform distributions as well as trimmed or sample distributions to be found. C<@bins> must be monotonic and contain at least one element. Note that unless the set of bins contains the range that the total counts returned will be less than the sample size. Calling C with no arguments returns the last distribution calculated, if such exists. =item my %hash = $stat->frequency_distribution($partitions); =item my %hash = $stat->frequency_distribution(\@bins); =item my %hash = $stat->frequency_distribution(); Same as C except that returns the hash clobbered into the return list. Kept for compatibility reasons with previous versions of Statistics::Descriptive and using it is discouraged. =item $stat->least_squares_fit(); =item $stat->least_squares_fit(@x); C performs a least squares fit on the data, assuming a domain of C<@x> or a default of 1..$stat->count(). It returns an array of four elements C<($q, $m, $r, $rms)> where =over 4 =item C<$q and $m> satisfy the equation C($y = $m*$x + $q). =item C<$r> is the Pearson linear correlation cofficient. =item C<$rms> is the root-mean-square error. =back If case of error or division by zero, the empty list is returned. The array that is returned can be "coerced" into a hash structure by doing the following: my %hash = (); @hash{'q', 'm', 'r', 'err'} = $stat->least_squares_fit(); Because calling C with no arguments defaults to using the current range, there is no caching of the results. =back =head1 REPORTING ERRORS I read my email frequently, but since adopting this module I've added 2 children and 1 dog to my family, so please be patient about my response times. When reporting errors, please include the following to help me out: =over 4 =item * Your version of perl. This can be obtained by typing perl C<-v> at the command line. =item * Which version of Statistics::Descriptive you're using. As you can see below, I do make mistakes. Unfortunately for me, right now there are thousands of CD's with the version of this module with the bugs in it. Fortunately for you, I'm a very patient module maintainer. =item * Details about what the error is. Try to narrow down the scope of the problem and send me code that I can run to verify and track it down. =back =head1 AUTHOR Current maintainer: Shlomi Fish, L , C Previously: Colin Kuskie My email address can be found at http://www.perl.com under Who's Who or at: http://search.cpan.org/author/COLINK/. =head1 CONTRIBUTORS Fabio Ponciroli & Adzuna Ltd. team (outliers handling) =head1 REFERENCES RFC2330, Framework for IP Performance Metrics The Art of Computer Programming, Volume 2, Donald Knuth. Handbook of Mathematica Functions, Milton Abramowitz and Irene Stegun. Probability and Statistics for Engineering and the Sciences, Jay Devore. =head1 COPYRIGHT Copyright (c) 1997,1998 Colin Kuskie. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. Copyright (c) 1998 Andrea Spinelli. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. Copyright (c) 1994,1995 Jason Kastner. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =head1 LICENSE This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut Statistics-Descriptive-3.0605/lib/Statistics/Descriptive000755000764000764 012146730560 23702 5ustar00shlomifshlomif000000000000Statistics-Descriptive-3.0605/lib/Statistics/Descriptive/Smoother.pm000444000764000764 1034712146730560 26222 0ustar00shlomifshlomif000000000000package Statistics::Descriptive::Smoother; use strict; use warnings; use Carp; our $VERSION = '3.0605'; sub instantiate { my ($class, $args) = @_; my $method = delete $args->{method}; my $coeff = delete $args->{coeff} || 0; my $ra_samples = delete $args->{samples}; my $ra_data = delete $args->{data}; if ($coeff < 0 || $coeff > 1) { carp("Invalid smoothing coefficient C $coeff\n"); return; } if (@$ra_data < 2) { carp("Need at least 2 samples to smooth the data\n"); return; } $method = ucfirst(lc($method)); my $sub_class = __PACKAGE__."::$method"; eval "require $sub_class"; die "No such class $sub_class: $@" if $@; return $sub_class->_new({ data => $ra_data, samples => $ra_samples, count => scalar @$ra_data, coeff => $coeff, }); } sub get_smoothing_coeff { $_[0]->{coeff} } sub set_smoothing_coeff { my ($self, $coeff) = @_; if ($coeff < 0 || $coeff > 1) { carp("Invalid smoothing coefficient C $coeff\n"); return; } $self->{coeff} = $coeff; return 1; } 1; __END__ =head1 NAME Statistics::Descriptive::Smoother - Base module for smoothing statistical data =head1 SYNOPSIS use Statistics::Descriptive::Smoother; my $smoother = Statistics::Descriptive::Smoother->instantiate({ method => 'exponential', coeff => 0.5, data => [1, 2, 3, 4, 5], samples => [110, 120, 130, 140, 150], }); my @smoothed_data = $smoother->get_smoothed_data(); =head1 DESCRIPTION This module provide methods to smooth the trend of a series of statistical data. The methods provided are the C and the C (see respectively C and C for more details). This class is just a factory that will instantiate the object to perform the chosen smoothing algorithm. =head1 METHODS =over 5 =item Statistics::Descriptive::Smoother->instantiate({}); Create a new Smoother object. This method require several parameters: =over 5 =item method Method used for the smoothing. Allowed values are: C and C =item coeff Smoothing coefficient. It needs to be in the [0;1] range, otherwise undef will be reutrned. C<0> means that the series is not smoothed at all, while C<1> the series is universally equal to the initial unsmoothed value. =item data Array ref with the data of the series. At least 2 values are needed to smooth the series, undef is returned otherwise. =item samples Array ref with the samples each data value has been built with. This is an optional parameter since it is not used by all the smoothing algorithm. =back =item $smoother->get_smoothing_coeff(); Returns the smoothing coefficient. =item $smoother->set_smoothing_coeff(0.5); Set the smoothing coefficient value. It needs to be in the [0;1] range, otherwise undef will be reutrned. =back =head1 AUTHOR Fabio Ponciroli =head1 COPYRIGHT Copyright(c) 2012 by Fabio Ponciroli. =head1 LICENSE This file is licensed under the MIT/X11 License: http://www.opensource.org/licenses/mit-license.php. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. =cut Statistics-Descriptive-3.0605/lib/Statistics/Descriptive/Smoother000755000764000764 012146730560 25502 5ustar00shlomifshlomif000000000000Statistics-Descriptive-3.0605/lib/Statistics/Descriptive/Smoother/Weightedexponential.pm000444000764000764 1072312146730560 32227 0ustar00shlomifshlomif000000000000package Statistics::Descriptive::Smoother::Weightedexponential; use strict; use warnings; use Carp; use base 'Statistics::Descriptive::Smoother'; our $VERSION = '3.0605'; sub _new { my ($class, $args) = @_; if (scalar @{$args->{data}} != scalar @{$args->{samples}}) { carp("Number of data values and samples need to be the same\n"); return; } return bless $args || {}, $class; } # The name of the variables used in the code refers to the explanation in the pod sub get_smoothed_data { my ($self) = @_; my (@smoothed_values, @Wts); # W(0) = N(0) push @Wts, @{$self->{samples}}[0]; # S(0) = X(0) push @smoothed_values, @{$self->{data}}[0]; my $C = $self->get_smoothing_coeff(); foreach my $idx (1 .. ($self->{count} -1)) { my $Xt = $self->{data}->[$idx]; my $Nt = $self->{samples}->[$idx]; my $St_1 = $smoothed_values[-1]; my $Wt_1 = $Wts[-1]; push @Wts, $self->_get_Wt($Wt_1, $Nt); my $coeff_a = $self->_get_coeff_A($Wt_1, $Nt); my $coeff_b = $self->_get_coeff_B($Wt_1, $Nt); my $smoothed_value = ( $St_1 * $coeff_a + $Xt * $coeff_b ) / ( $coeff_a + $coeff_b ); push @smoothed_values, $smoothed_value; } return @smoothed_values; } sub _get_Wt { my ($self, $Wt_1, $Nt) = @_; my $C = $self->get_smoothing_coeff(); my $coeff_a = $self->_get_coeff_A($Wt_1, $Nt); my $coeff_b = $self->_get_coeff_B($Wt_1, $Nt);; return (($Wt_1 * $coeff_a + $Nt * $coeff_b)/($coeff_a + $coeff_b)); } sub _get_coeff_A { my ($self, $Wt_1, $Nt) = @_; my $C = $self->get_smoothing_coeff(); return $C * ( $Wt_1 / ($Wt_1 + $Nt) ); } sub _get_coeff_B { my ($self, $Wt_1, $Nt) = @_; my $C = $self->get_smoothing_coeff(); return (1 - $C) * ( $Nt / ($Nt + $Wt_1) ); } 1; __END__ =head1 NAME Statistics::Descriptive::Smoother::Weigthedexponential - Implement weighted exponential smoothing =head1 SYNOPSIS use Statistics::Descriptive::Smoother; my $smoother = Statistics::Descriptive::Smoother->instantiate({ method => 'weightedexponential', coeff => 0.5, data => [1, 2, 3, 4, 5], samples => [110, 120, 130, 140, 150], }); my @smoothed_data = $smoother->get_smoothed_data(); =head1 DESCRIPTION This module implement the weighted exponential smoothing algorithm to smooth the trend of a series of statistical data. This algorithm can help to control large swings in the unsmoothed data that arise from small samples for those data points. The algorithm implements the following formula: W(0) = N(0) W(t) = ( W(t-1) * CoeffA + N(t) * CoeffB ) / (CoeffA + CoeffB) CoeffA = C * ( W(t-1) / (W(t-1) + N(t) ) ) CoeffB = (1 - C) * ( N(t) * (W(t-1) + N(t)) ) S(t) = (S(t-1)*CoeffA + X(t)*CoeffB) / (CoeffA + CoeffB) where: =over 3 =item * t = index in the series =item * S(t) = smoothed series value at position t =item * C = smoothing coefficient. Value in the [0;1] range. C<0> means that the series is not smoothed at all, while C<1> the series is universally equal to the initial unsmoothed value. =item * X(t) = unsmoothed series value at position t =back =head1 METHODS =over 5 =item $stats->get_smoothed_data(); Returns a copy of the smoothed data array. =back =head1 AUTHOR Fabio Ponciroli =head1 COPYRIGHT Copyright(c) 2012 by Fabio Ponciroli. =head1 LICENSE This file is licensed under the MIT/X11 License: http://www.opensource.org/licenses/mit-license.php. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. =cut Statistics-Descriptive-3.0605/lib/Statistics/Descriptive/Smoother/Exponential.pm000444000764000764 627412146730560 30474 0ustar00shlomifshlomif000000000000package Statistics::Descriptive::Smoother::Exponential; use strict; use warnings; use base 'Statistics::Descriptive::Smoother'; our $VERSION = '3.0605'; sub _new { my ($class, $args) = @_; return bless $args || {}, $class; } # The name of the variables used in the code refers to the explanation in the pod sub get_smoothed_data { my ($self) = @_; my @smoothed_values; push @smoothed_values, @{$self->{data}}[0]; my $C = $self->get_smoothing_coeff(); foreach my $sample_idx (1 .. $self->{count} -1) { my $smoothed_value = $C * ($smoothed_values[-1]) + (1 - $C) * $self->{data}->[$sample_idx]; push @smoothed_values, $smoothed_value; } return @smoothed_values; } 1; __END__ =head1 NAME Statistics::Descriptive::Smoother::Exponential - Implement exponential smoothing =head1 SYNOPSIS use Statistics::Descriptive::Smoother; my $smoother = Statistics::Descriptive::Smoother->instantiate({ method => 'exponential', coeff => 0.5, data => [1, 2, 3, 4, 5], samples => [110, 120, 130, 140, 150], }); my @smoothed_data = $smoother->get_smoothed_data(); =head1 DESCRIPTION This module implement the exponential smoothing algorithm to smooth the trend of a series of statistical data. This algorithm works well for unsmoothed data build with big number of samples. If this is not the case you might consider using the C one. The algorithm implements the following formula: S(0) = X(0) S(t) = C*S(t-1) + (1-C)*X(t) where: =over 3 =item * t = index in the series =item * S(t) = smoothed series value at position t =item * C = smoothing coefficient. Value in the [0;1] range. C<0> means that the series is not smoothed at all, while C<1> the series is universally equal to the initial unsmoothed value. =item * X(t) = unsmoothed series value at position t =back =head1 METHODS =over 5 =item $stats->get_smoothed_data(); Returns a copy of the smoothed data array. =back =head1 AUTHOR Fabio Ponciroli =head1 COPYRIGHT Copyright(c) 2012 by Fabio Ponciroli. =head1 LICENSE This file is licensed under the MIT/X11 License: http://www.opensource.org/licenses/mit-license.php. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. =cut Statistics-Descriptive-3.0605/examples000755000764000764 012146730560 20337 5ustar00shlomifshlomif000000000000Statistics-Descriptive-3.0605/examples/statistical-analysis.pl000444000764000764 402112146730560 25173 0ustar00shlomifshlomif000000000000#!/usr/bin/perl # This script analyses two distributions from tab-separated input # and outputs some basic statistcs. # # It was used to analyse the distribution of the number of moves in # the solutions generated by http://fc-solve.berlios.de/ . use strict; use warnings; use Statistics::Descriptive; my $num_fields = 2; my @stats = (map { Statistics::Descriptive::Full->new(); } (1 .. $num_fields) ); while(my $line = ) { chomp($line); my @vals = split(/\t/, $line); foreach my $f (0 .. $num_fields-1) { $stats[$f]->add_data($vals[$f]); } } foreach my $f (0 .. $num_fields-1) { my $s = $stats[$f]; print "Field No. $f\n"; print "---------------------------\n"; print "Min: " , $s->min(), "\n"; print "Max: " , $s->max(), "\n"; print "Average: " , $s->mean(), "\n"; print "StdDev: " , $s->standard_deviation(), "\n"; print "Median: " , $s->median(), "\n"; print "\n"; } =head1 COPYRIGHT AND LICENSE Copyright (c) 2009 Shlomi Fish Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. =cut Statistics-Descriptive-3.0605/inc000755000764000764 012146730560 17272 5ustar00shlomifshlomif000000000000Statistics-Descriptive-3.0605/inc/Test000755000764000764 012146730560 20211 5ustar00shlomifshlomif000000000000Statistics-Descriptive-3.0605/inc/Test/Run000755000764000764 012146730560 20755 5ustar00shlomifshlomif000000000000Statistics-Descriptive-3.0605/inc/Test/Run/Builder.pm000444000764000764 316212146730560 23040 0ustar00shlomifshlomif000000000000package Test::Run::Builder; use strict; use warnings; use Module::Build; use vars qw(@ISA); @ISA = (qw(Module::Build)); sub ACTION_runtest { my ($self) = @_; my $p = $self->{properties}; $self->depends_on('code'); local @INC = @INC; # Make sure we test the module in blib/ unshift @INC, (File::Spec->catdir($p->{base_dir}, $self->blib, 'lib'), File::Spec->catdir($p->{base_dir}, $self->blib, 'arch')); $self->do_test_run_tests; } sub ACTION_distruntest { my ($self) = @_; $self->depends_on('distdir'); my $start_dir = $self->cwd; my $dist_dir = $self->dist_dir; chdir $dist_dir or die "Cannot chdir to $dist_dir: $!"; # XXX could be different names for scripts $self->run_perl_script('Build.PL') # XXX Should this be run w/ --nouse-rcfile or die "Error executing 'Build.PL' in dist directory: $!"; $self->run_perl_script('Build') or die "Error executing 'Build' in dist directory: $!"; $self->run_perl_script('Build', [], ['runtest']) or die "Error executing 'Build test' in dist directory"; chdir $start_dir; } sub do_test_run_tests { my $self = shift; require Test::Run::CmdLine::Iface; my $test_run = Test::Run::CmdLine::Iface->new( { 'test_files' => [glob("t/*.t")], } # 'backend_params' => $self->_get_backend_params(), ); return $test_run->run(); } sub ACTION_tags { return system(qw( ctags -f tags --recurse --totals --exclude=blib/ --exclude=t/lib --exclude=.svn --exclude='*~' --languages=Perl --langmap=Perl:+.t )); } 1; Statistics-Descriptive-3.0605/t000755000764000764 012146730560 16764 5ustar00shlomifshlomif000000000000Statistics-Descriptive-3.0605/t/descr.t000444000764000764 2427012146730560 20433 0ustar00shlomifshlomif000000000000#!/usr/bin/perl use strict; use warnings; use Test::More tests => 54; use lib 't/lib'; use Utils qw/is_between compare_hash_by_ranges/; use Benchmark; use Statistics::Descriptive; { # test #1 my $stat = Statistics::Descriptive::Full->new(); my @results = $stat->least_squares_fit(); # TEST ok (!scalar(@results), "Least-squares results on a non-filled object are empty."); # test #2 # data are y = 2*x - 1 $stat->add_data( 1, 3, 5, 7 ); @results = $stat->least_squares_fit(); # TEST is_deeply ( [@results[0..1]], [-1, 2], "least_squares_fit returns the correct result." ); } { # test #3 # test error condition on harmonic mean : one element zero my $stat = Statistics::Descriptive::Full->new(); $stat->add_data( 1.1, 2.9, 4.9, 0.0 ); my $single_result = $stat->harmonic_mean(); # TEST ok (!defined($single_result), "harmonic_mean is undefined if there's a 0 datum." ); } { # test #4 # test error condition on harmonic mean : sum of elements zero my $stat = Statistics::Descriptive::Full->new(); $stat->add_data( 1.0, -1.0 ); my $single_result = $stat->harmonic_mean(); # TEST ok (!defined($single_result), "harmonic_mean is undefined if the sum of the reciprocals is zero." ); } { # test #5 # test error condition on harmonic mean : sum of elements near zero my $stat = Statistics::Descriptive::Full->new(); local $Statistics::Descriptive::Tolerance = 0.1; $stat->add_data( 1.01, -1.0 ); my $single_result = $stat->harmonic_mean(); # TEST ok (! defined( $single_result ), "test error condition on harmonic mean : sum of elements near zero" ); } { # test #6 # test normal function of harmonic mean my $stat = Statistics::Descriptive::Full->new(); $stat->add_data( 1,2,3 ); my $single_result = $stat->harmonic_mean(); # TEST ok (scalar(abs( $single_result - 1.6363 ) < 0.001), "test normal function of harmonic mean", ); } { # test #7 # test stringification of hash keys in frequency distribution my $stat = Statistics::Descriptive::Full->new(); $stat->add_data(0.1, 0.15, 0.16, 1/3); my %f = $stat->frequency_distribution(2); # TEST compare_hash_by_ranges( \%f, [[0.216666,0.216667,3],[0.3333,0.3334,1]], "Test stringification of hash keys in frequency distribution", ); # test #8 ##Test memorization of last frequency distribution my %g = $stat->frequency_distribution(); # TEST is_deeply( \%f, \%g, "memorization of last frequency distribution" ); } { # test #9 # test the frequency distribution with specified bins my $stat = Statistics::Descriptive::Full->new(); my @freq_bins=(20,40,60,80,100); $stat->add_data(23.92, 32.30, 15.27, 39.89, 8.96, 40.71, 16.20, 34.61, 27.98, 74.40); my %f = $stat->frequency_distribution(\@freq_bins); # TEST is_deeply( \%f, { 20 => 3, 40 => 5, 60 => 1, 80 => 1, 100 => 0, }, "Test the frequency distribution with specified bins" ); } { # test #10 and #11 # Test the percentile function and caching my $stat = Statistics::Descriptive::Full->new(); $stat->add_data(-5,-2,4,7,7,18); ##Check algorithm # TEST is ($stat->percentile(50), 4, "percentile function and caching - 1", ); # TEST is ($stat->percentile(25), -2, "percentile function and caching - 2", ); } { # tests #12 and #13 # Check correct parsing of method parameters my $stat = Statistics::Descriptive::Full->new(); $stat->add_data(1,2,3,4,5,6,7,8,9,10); # TEST is( $stat->trimmed_mean(0.1,0.1), $stat->trimmed_mean(0.1), "correct parsing of method parameters", ); # TEST is ($stat->trimmed_mean(0.1,0), 6, "correct parsing of method parameters - 2", ); } { my $stat = Statistics::Descriptive::Full->new(); $stat->add_data((0.001) x 6); # TEST is_between ($stat->variance(), 0, 0.00001, "Workaround to avoid rounding errors that yield negative variance." ); # TEST is_between ($stat->standard_deviation(), 0, 0.00001, "Workaround to avoid rounding errors that yield negative std-dev." ); } { my $stat = Statistics::Descriptive::Full->new(); $stat->add_data(1, 2, 3, 5); # TEST is ($stat->count(), 4, "There are 4 elements." ); # TEST is ($stat->sum(), 11, "The sum is 11", ); # TEST is ($stat->sumsq(), 39, "The sum of squares is 39" ); # TEST is ($stat->min(), 1, "The minimum is 1." ); # TEST is ($stat->max(), 5, "The maximum is 5." ); } { # test #9 # test the frequency distribution with specified bins my $stat = Statistics::Descriptive::Full->new(); my @freq_bins=(20,40,60,80,100); $stat->add_data(23.92, 32.30, 15.27, 39.89, 8.96, 40.71, 16.20, 34.61, 27.98, 74.40); my $f_d = $stat->frequency_distribution_ref(\@freq_bins); # TEST is_deeply( $f_d, { 20 => 3, 40 => 5, 60 => 1, 80 => 1, 100 => 0, }, "Test the frequency distribution returned as a scalar reference" ); } { # test #9 # test the frequency distribution with specified bins my $stat = Statistics::Descriptive::Full->new(); $stat->add_data(2, 4, 8); # TEST is_between( $stat->geometric_mean(), (4-1e-4), (4+1e-4), "Geometric Mean Test #1", ) } { my $stat = Statistics::Descriptive::Full->new(); my $expected; $stat->add_data(1 .. 9, 100); # TEST $expected = 3.11889574523909; is_between ($stat->skewness(), $expected - 1E-13, $expected + 1E-13, "Skewness of $expected +/- 1E-13" ); # TEST $expected = 9.79924471616366; is_between ($stat->kurtosis(), $expected - 1E-13, $expected + 1E-13, "Kurtosis of $expected +/- 1E-13" ); $stat->add_data(100 .. 110); # now check that cached skew and kurt values are recalculated # TEST $expected = -0.306705104889384; is_between ($stat->skewness(), $expected - 1E-13, $expected + 1E-13, "Skewness of $expected +/- 1E-13" ); # TEST $expected = -2.09839497356215; is_between ($stat->kurtosis(), $expected - 1E-13, $expected + 1E-13, "Kurtosis of $expected +/- 1E-13" ); } { my $stat = Statistics::Descriptive::Full->new(); $stat->add_data(1,2); my $def; # TEST $def = defined $stat->skewness() ? 1 : 0; is ($def, 0, 'Skewness is undef for 2 samples' ); $stat->add_data (1); # TEST $def = defined $stat->kurtosis() ? 1 : 0; is ($def, 0, 'Kurtosis is undef for 3 samples' ); } { # This is a fix for: # https://rt.cpan.org/Ticket/Display.html?id=72495 # Thanks to Robert Messer my $stat = Statistics::Descriptive::Full->new(); my $ret = $stat->percentile(100); # TEST ok (!defined($ret), 'Returns undef and does not die.'); } # test stats when no data have been added { my $stat = Statistics::Descriptive::Full->new(); my ($result, $str); # An accessor method for _permitted would be handy, # or one to get all the stats methods my @methods = qw { mean sum variance standard_deviation min mindex max maxdex sample_range skewness kurtosis median harmonic_mean geometric_mean mode least_squares_fit percentile frequency_distribution }; # least_squares_fit is handled in an earlier test, so is actually a duplicate here #diag 'Results are undef when no data added'; # need to update next line when new methods are tested here # TEST:$method_count=18 foreach my $method (sort @methods) { $result = $stat->$method; # TEST*$method_count ok (!defined ($result), "$method is undef when object has no data."); } # quantile and trimmed_mean require valid args, so don't test in the method loop my $method = 'quantile'; $result = $stat->$method(1); # TEST ok (!defined ($result), "$method is undef when object has no data."); $method = 'trimmed_mean'; $result = $stat->$method(0.1); # TEST ok (!defined ($result), "$method is undef when object has no data."); } # test SD when only one value added { my $stat = Statistics::Descriptive::Full->new(); $stat->add_data( 1 ); my $result = $stat->standard_deviation(); # TEST ok ($result == 0, "SD is zero when object has one record."); } # Test function returns undef in list context when no data added. # The test itself is almost redundant. # Fixes https://rt.cpan.org/Ticket/Display.html?id=74890 { my $stat = Statistics::Descriptive::Full->new(); # TEST is_deeply( [ $stat->median(), ], [ undef() ], "->median() Returns undef in list-context.", ); # TEST is_deeply( [ $stat->standard_deviation(), ], [ undef() ], "->standard_deviation() Returns undef in list-context.", ); } { my $stats = Statistics::Descriptive::Full->new(); $stats->add_data_with_samples([{1 => 10}, {2 => 20}, {3 => 30}, {4 => 40}, {5 => 50}]); # TEST is_deeply( $stats->_data(), [ 1, 2, 3, 4, 5 ], 'add_data_with_samples: data set is correct', ); # TEST is_deeply( $stats->_samples(), [ 10, 20, 30, 40, 50 ], 'add_data_with_samples: samples are correct', ); } Statistics-Descriptive-3.0605/t/00-load.t000444000764000764 26312146730560 20423 0ustar00shlomifshlomif000000000000#!perl -T use Test::More tests => 1; BEGIN { use_ok( 'Statistics::Descriptive' ); } diag( "Testing Statistics::Descriptive $Statistics::Descriptive::VERSION, Perl $], $^X" ); Statistics-Descriptive-3.0605/t/style-trailing-space.t000444000764000764 73012146730560 23326 0ustar00shlomifshlomif000000000000#!/usr/bin/perl use strict; use warnings; use Test::More; eval "use Test::TrailingSpace"; if ($@) { plan skip_all => "Test::TrailingSpace required for trailing space test."; } else { plan tests => 1; } my $finder = Test::TrailingSpace->new( { root => '.', filename_regex => qr/(?:(?:\.(?:t|pm|pl|PL|yml|json|arc|vim))|README|Changes|LICENSE|MANIFEST)\z/, }, ); # TEST $finder->no_trailing_space( "No trailing space was found." ); Statistics-Descriptive-3.0605/t/pod-coverage.t000444000764000764 104712146730560 21663 0ustar00shlomifshlomif000000000000use strict; use warnings; use Test::More; # Ensure a recent version of Test::Pod::Coverage my $min_tpc = 1.08; eval "use Test::Pod::Coverage $min_tpc"; plan skip_all => "Test::Pod::Coverage $min_tpc required for testing POD coverage" if $@; # Test::Pod::Coverage doesn't require a minimum Pod::Coverage version, # but older versions don't recognize some common documentation styles my $min_pc = 0.18; eval "use Pod::Coverage $min_pc"; plan skip_all => "Pod::Coverage $min_pc required for testing POD coverage" if $@; all_pod_coverage_ok(); Statistics-Descriptive-3.0605/t/cpan-changes.t000444000764000764 26212146730560 21615 0ustar00shlomifshlomif000000000000#!/usr/bin/perl use strict; use warnings; use Test::More; eval 'use Test::CPAN::Changes'; plan skip_all => 'Test::CPAN::Changes required for this test' if $@; changes_ok(); Statistics-Descriptive-3.0605/t/outliers.t000444000764000764 1032212146730560 21172 0ustar00shlomifshlomif000000000000#!/usr/bin/perl use strict; use warnings; use Test::More tests => 11; use Statistics::Descriptive; sub foo {return;}; local $SIG{__WARN__} = sub { }; { # testing set_outlier_filter my $stat = Statistics::Descriptive::Full->new(); # TEST ok ( !defined($stat->set_outlier_filter()), 'set_outlier_filter: undef code reference value'); # TEST ok ( !defined($stat->set_outlier_filter(1)), 'set_outlier_filter: invalid code ref value'); # TEST is ( $stat->set_outlier_filter(\&foo), 1, 'set_outlier_filter: valid code reference - return value'); # TEST is ( $stat->{_outlier_filter}, \&foo, 'set_outlier_filter: valid code reference - internal'); } { # testing get_data_without_outliers without removing outliers my $stat = Statistics::Descriptive::Full->new(); # TEST ok ( !defined($stat->get_data_without_outliers()), 'get_data_without_outliers: insufficient samples'); $stat->add_data( 1, 2, 3, 4, 5 ); # TEST ok ( !defined($stat->get_data_without_outliers()), 'get_data_without_outliers: undefined filter'); # We force the filter function to never detect outliers... $stat->set_outlier_filter( sub {0} ); no warnings 'redefine'; local *Statistics::Descriptive::Full::_outlier_candidate_index = sub { 0 }; my @results = $stat->get_data_without_outliers(); #...we expect the data set to be unmodified # TEST is_deeply ( [@results], [1, 2, 3, 4, 5], 'get_data_without_outliers: no outliers', ); } { # testing get_data_without_outliers removing outliers my $stat = Statistics::Descriptive::Full->new(); # 100 is definitively the candidate to be an outlier in this series $stat->add_data( 1, 2, 3, 4, 100, 6, 7, 8 ); # We force the filter function to always detect outliers for this data set $stat->set_outlier_filter( sub {$_[1] > 0} ); my @results = $stat->get_data_without_outliers(); # Note that 100 has been filtered out from the data set # TEST is_deeply ( [@results], [1, 2, 3, 4, 6, 7, 8, ], 'get_data_without_outliers: remove outliers', ); } my ($first_val, $second_val); sub check_params { ($first_val, $second_val) = @_; } { # testing params passed to outlier filter my $stat = Statistics::Descriptive::Full->new(); # 100 is definitively the candidate to be an outlier in this series $stat->add_data( 1, 2, 3, 4, 100, 6, 7, 8 ); $stat->set_outlier_filter( \&check_params ); my @results = $stat->get_data_without_outliers(); # TEST isa_ok ($first_val, 'Statistics::Descriptive::Full', 'first param of outlier filter ok'); # TEST is ($second_val, 100, 'second param of outlier filter ok'); } { # testing _outlier_candidate_index my $stat = Statistics::Descriptive::Full->new(); # 100 is definitively the candidate to be an outlier in this series $stat->add_data( 1, 2, 3, 4, 100, 6, 7, 8 ); # TEST is ($stat->_outlier_candidate_index, 4, '_outlier_candidate_index' ); } =pod =head1 AUTHOR Fabio Ponciroli =head1 COPYRIGHT Copyright(c) 2012 by Fabio Ponciroli. =head1 LICENSE This file is licensed under the MIT/X11 License: http://www.opensource.org/licenses/mit-license.php. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. =cut Statistics-Descriptive-3.0605/t/pod.t000444000764000764 35012146730560 20046 0ustar00shlomifshlomif000000000000#!perl -T use strict; use warnings; use Test::More; # Ensure a recent version of Test::Pod my $min_tp = 1.22; eval "use Test::Pod $min_tp"; plan skip_all => "Test::Pod $min_tp required for testing POD" if $@; all_pod_files_ok(); Statistics-Descriptive-3.0605/t/smoother.t000444000764000764 1030512146730560 21165 0ustar00shlomifshlomif000000000000#!/usr/bin/perl use strict; use warnings; use Test::More tests => 11; use Statistics::Descriptive::Smoother; local $SIG{__WARN__} = sub { }; { #Test factory pattern my $smoother = Statistics::Descriptive::Smoother->instantiate({ method => 'exponential', coeff => 0, data => [1,2,3], samples => [100, 100, 100], }); # TEST isa_ok ($smoother, 'Statistics::Descriptive::Smoother::Exponential', 'Exponential class correctly created'); } { my $smoother = Statistics::Descriptive::Smoother->instantiate({ method => 'weightedExponential', coeff => 0, data => [1,2,3], samples => [100, 100, 100], }); # TEST isa_ok ($smoother, 'Statistics::Descriptive::Smoother::Weightedexponential', 'Weightedexponential class correctly created'); } { # Test invalid smoothing method eval { Statistics::Descriptive::Smoother->instantiate({ method => 'invalid_method', coeff => 0, data => [1,2,3], samples => [100, 100, 100], }); }; # TEST ok ($@, 'Invalid method'); } { #TODO get output from Carp #Test invalid coefficient my $smoother_neg = Statistics::Descriptive::Smoother->instantiate({ method => 'exponential', coeff => -123, data => [1,2,3], samples => [100, 100, 100], }); # TEST is ($smoother_neg, undef, 'Invalid coefficient: < 0'); my $smoother_pos = Statistics::Descriptive::Smoother->instantiate({ method => 'exponential', coeff => 123, data => [1,2,3], samples => [100, 100, 100], }); # TEST is ($smoother_pos, undef, 'Invalid coefficient: > 1'); } { #Test unsufficient number of samples my $smoother = Statistics::Descriptive::Smoother->instantiate({ method => 'exponential', coeff => 0, data => [1], samples => [100], }); # TEST is ($smoother, undef, 'Insufficient number of samples'); } { #Test smoothing coefficient accessors my $smoother = Statistics::Descriptive::Smoother->instantiate({ method => 'exponential', coeff => 0.5, data => [1,2,3], samples => [100, 100, 100], }); # TEST is ($smoother->get_smoothing_coeff(), 0.5, 'get_smoothing_coeff'); my $ok = $smoother->set_smoothing_coeff(0.7); # TEST ok ($ok, 'set_smoothing_coeff: set went fine'); # TEST is ($smoother->get_smoothing_coeff(), 0.7, 'set_smoothing_coeff: correct value set'); my $ok2 = $smoother->set_smoothing_coeff(123); # TEST is ($ok2, undef, 'set_smoothing_coeff: set failed'); # TEST is ($smoother->get_smoothing_coeff(), 0.7, 'set_smoothing_coeff: value not modified after failure'); } 1; =pod =head1 AUTHOR Fabio Ponciroli =head1 COPYRIGHT Copyright(c) 2012 by Fabio Ponciroli. =head1 LICENSE This file is licensed under the MIT/X11 License: http://www.opensource.org/licenses/mit-license.php. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. =cut Statistics-Descriptive-3.0605/t/quantile.t000444000764000764 347212146730560 21136 0ustar00shlomifshlomif000000000000#!/usr/bin/perl #================================================================== # Author : Djibril Ousmanou # Copyright : 2009 # Update : 20/07/2009 # AIM : Test quantile type 7 calcul #================================================================== use strict; use warnings; use Carp; use Test::More tests => 15; use Statistics::Descriptive; my @data1 = ( 1 .. 10 ); my @data2 = ( 601, 449, 424, 568, 569, 447, 425, 621, 616, 573, 584, 635, 480, 437, 724, 711, 717, 576, 724, 585, 458, 752, 753, 709, 584, 748, 628, 483, 739, 747, 694, 601, 758, 653, 487, 720, 750, 660, 588, 719, 631, 492, 584, 647, 548, 585, 649, 532, 492, 598, 653, 524, 567, 570, 506, 475, 640, 725, 688, 567, 634, 520, 488, 718, 769, 739, 576, 718, 527, 497, 698, 736, 785, 581, 733, 540, 537, 683, 691, 785, 588, 733, 531, 564, 581, 554, 765, 580, 626, 510, 533, 495, 470, 713, 571, 573, 476, 526, 441, 431, 686, 563, 496, 447, 518 ); my @data3 = qw/-9 2 3 44 -10 6 7/; my %DataTest = ( 'First sample test' => { 'Data' => \@data1, 'Test' => { '0' => '1', '1' => '3.25', '2' => '5.5', '3' => '7.75', '4' => '10', }, }, 'Second sample test' => { 'Data' => \@data2, 'Test' => { '0' => '424', '1' => '526', '2' => '584', '3' => '698', '4' => '785', }, }, 'Third sample test' => { 'Data' => \@data3, 'Test' => { '0' => '-10', '1' => '-3.5', '2' => '3', '3' => '6.5', '4' => '44', }, } ); # Test Quantile, foreach my $MessageTest ( sort keys %DataTest ) { my $stat = Statistics::Descriptive::Full->new(); $stat->add_data( @{ $DataTest{$MessageTest}->{Data} } ); for ( 0 .. 4 ) { is( $stat->quantile($_), $DataTest{$MessageTest}->{Test}{$_}, $MessageTest . ", Q$_" ); } } Statistics-Descriptive-3.0605/t/mode.t000444000764000764 135612146730560 20237 0ustar00shlomifshlomif000000000000#!/usr/bin/perl use strict; use warnings; use Test::More tests => 4; use Statistics::Descriptive; { my $stat = Statistics::Descriptive::Full->new(); $stat->add_data( 1, 10, 100 ); my $mode = $stat->mode(); # TEST ok (!defined($mode), "No mode for a flat distribution." ); my $second_mode = $stat->mode(); # TEST ok (!defined($second_mode), "No mode after a second call." ); } { my $stat = Statistics::Descriptive::Full->new(); $stat->add_data( 1, 5,5,5,10,19,19,30 ); my $mode = $stat->mode(); # TEST is ($mode, 5, "Mode is 5." ); my $second_mode = $stat->mode(); # TEST is ($second_mode, 5, "Second call mode is 5." ); } Statistics-Descriptive-3.0605/t/smoother_weightedexponential.t000444000764000764 732612146730560 25305 0ustar00shlomifshlomif000000000000#!/usr/bin/perl use strict; use warnings; use lib 't/lib'; use Utils qw/is_array_between/; use Test::More tests => 4; use Statistics::Descriptive::Smoother; local $SIG{__WARN__} = sub { }; my @original_data = (1 .. 10); my @original_samples = (100, 50, 100, 50, 100, 50, 100, 50, 100, 50,); { #Test no smoothing my $smoother = Statistics::Descriptive::Smoother->instantiate({ method => 'weightedExponential', coeff => 0, data => \@original_data, samples => \@original_samples, }); my @smoothed_data = $smoother->get_smoothed_data(); # When the smoothing coefficient is 0 the series is not smoothed # TEST is_deeply( \@smoothed_data, \@original_data, 'No smoothing C=0'); } { #Test max smoothing my $smoother = Statistics::Descriptive::Smoother->instantiate({ method => 'weightedExponential', coeff => 1, data => \@original_data, samples => \@original_samples, }); my @smoothed_data = $smoother->get_smoothed_data(); # When the smoothing coefficient is 1 the series is universally equal to the initial unsmoothed value my @expected_values = map { $original_data[0] } 1 .. $smoother->{count}; # TEST is_deeply( \@smoothed_data, \@expected_values, 'Max smoothing C=1'); } { #Test smoothing coeff 0.5 my $smoother = Statistics::Descriptive::Smoother->instantiate({ method => 'weightedExponential', coeff => 0.5, data => \@original_data, samples => \@original_samples, }); my @smoothed_data = $smoother->get_smoothed_data(); my @expected_values = ( 1, 1.33333333333333, 2.24242424242424, 2.85944551901999, 4.0651836704636, 4.75526654493058, 6.03174342835728, 6.7367839208657, 8.02706266125788, 8.73457937329917, ); # TEST is_array_between( \@smoothed_data, \@expected_values, 1E-13, 1E-13, 'Smoothing with C=0.5'); } { #Test different number of samples and data are not allowed my $smoother = Statistics::Descriptive::Smoother->instantiate({ method => 'weightedExponential', coeff => 0, data => [1,2,3,4], samples => [1,2,3], }); # TEST is ( $smoother, undef, 'Different number of samples and data'); } 1; =pod =head1 AUTHOR Fabio Ponciroli =head1 COPYRIGHT Copyright(c) 2012 by Fabio Ponciroli. =head1 LICENSE This file is licensed under the MIT/X11 License: http://www.opensource.org/licenses/mit-license.php. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. =cut Statistics-Descriptive-3.0605/t/descr_smooth_methods.t000444000764000764 465612146730560 23535 0ustar00shlomifshlomif000000000000#!/usr/bin/perl use strict; use warnings; use Test::More tests => 3; use Statistics::Descriptive; local $SIG{__WARN__} = sub { }; my @original_data = (1 .. 10); { # testing set_smoother my $stats = Statistics::Descriptive::Full->new(); $stats->add_data(\@original_data ); $stats->set_smoother({ method => 'exponential', coeff => 0, }); # TEST isa_ok ( $stats->{_smoother}, 'Statistics::Descriptive::Smoother::Exponential', 'set_smoother: smoother set correctly'); } { # testing get_smoothed_data my $stats = Statistics::Descriptive::Full->new(); # TEST is ( $stats->get_smoothed_data(), undef, 'get_smoothed_data: smoother needs to be defined'); $stats->add_data(\@original_data ); $stats->set_smoother({ method => 'exponential', coeff => 0.5, }); my @expected_values = ( 1, 1.5, 2.25, 3.125, 4.0625, 5.03125, 6.015625, 7.0078125, 8.00390625, 9.001953125, ); my @smoothed_data = $stats->get_smoothed_data(); # TEST is_deeply( \@smoothed_data, \@expected_values, 'Smoothing with C=0.5'); } =pod =head1 AUTHOR Fabio Ponciroli =head1 COPYRIGHT Copyright(c) 2012 by Fabio Ponciroli. =head1 LICENSE This file is licensed under the MIT/X11 License: http://www.opensource.org/licenses/mit-license.php. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. =cut Statistics-Descriptive-3.0605/t/freq_distribution-1-rt-34999.t000444000764000764 167512146730560 24333 0ustar00shlomifshlomif000000000000#!/usr/bin/perl use strict; use warnings; use Test::More tests => 2; use Statistics::Descriptive; my @data=( 601,449,424,568,569,447,425,621,616,573,584,635,480,437,724,711, 717,576,724,585,458,752,753,709,584,748,628,483,739,747,694,601, 758,653,487,720,750,660,588,719,631,492,584,647,548,585,649,532, 492,598,653,524,567,570,506,475,640,725,688,567,634,520,488,718, 769,739,576,718,527,497,698,736,785,581,733,540,537,683,691,785, 588,733,531,564,581,554,765,580,626,510,533,495,470,713,571,573, 476,526,441,431,686,563,496,447,518 ); my $stat = Statistics::Descriptive::Full->new(); $stat->add_data(@data); # I should get 20 partitions, shouldn't I? my %freqs=$stat->frequency_distribution (20); # TEST is (scalar(keys(%freqs)), 20, "We got 20 partitions" ); my $sum = 0; foreach my $v (values(%freqs)) { $sum += $v; } # TEST is ($sum, scalar(@data), "The total number of elements in the bins" ); Statistics-Descriptive-3.0605/t/smoother_exponential.t000444000764000764 616512146730560 23564 0ustar00shlomifshlomif000000000000#!/usr/bin/perl use strict; use warnings; use lib 't/lib'; use Utils qw/is_array_between/; use Test::More tests => 3; use Statistics::Descriptive::Smoother; my @original_data = (1 .. 10); my @original_samples = (3, 3, 3, 3, 3, 3, 3, 3, 3, 3,); { #Test no smoothing my $smoother = Statistics::Descriptive::Smoother->instantiate({ method => 'exponential', coeff => 0, data => \@original_data, samples => \@original_samples, }); my @smoothed_data = $smoother->get_smoothed_data(); # When the smoothing coefficient is 0 the series is not smoothed # TEST is_deeply( \@smoothed_data, \@original_data, 'No smoothing C=0'); } { #Test max smoothing my $smoother = Statistics::Descriptive::Smoother->instantiate({ method => 'exponential', coeff => 1, data => \@original_data, samples => \@original_samples, }); my @smoothed_data = $smoother->get_smoothed_data(); # When the smoothing coefficient is 1 the series is universally equal to the initial unsmoothed value my @expected_values = map { $original_data[0] } 1 .. $smoother->{count}; # TEST is_deeply( \@smoothed_data, \@expected_values, 'Max smoothing C=1'); } { #Test smoothing coeff 0.5 my $smoother = Statistics::Descriptive::Smoother->instantiate({ method => 'exponential', coeff => 0.5, data => \@original_data, samples => \@original_samples, }); my @smoothed_data = $smoother->get_smoothed_data(); my @expected_values = ( 1, 1.5, 2.25, 3.125, 4.0625, 5.03125, 6.015625, 7.0078125, 8.00390625, 9.001953125, ); # TEST is_array_between( \@smoothed_data, \@expected_values, 1E-13, 1E-13, 'Smoothing with C=0.5'); } 1; =pod =head1 AUTHOR Fabio Ponciroli =head1 COPYRIGHT Copyright(c) 2012 by Fabio Ponciroli. =head1 LICENSE This file is licensed under the MIT/X11 License: http://www.opensource.org/licenses/mit-license.php. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. =cut Statistics-Descriptive-3.0605/t/lib000755000764000764 012146730560 17532 5ustar00shlomifshlomif000000000000Statistics-Descriptive-3.0605/t/lib/Utils.pm000444000764000764 756712146730560 21344 0ustar00shlomifshlomif000000000000package Utils; use strict; use warnings; require Exporter; our @ISA = qw/Exporter/; our @EXPORT_OK = qw/is_between compare_hash_by_ranges is_array_between/; use Test::More; sub is_between { local $Test::Builder::Level = $Test::Builder::Level + 1; my ($have, $want_bottom, $want_top, $blurb) = @_; ok ( _is_between($have, $want_bottom, $want_top), $blurb ); } sub is_array_between { local $Test::Builder::Level = $Test::Builder::Level + 1; my ($got_array_ref, $expected_array_ref, $low_tolerance, $high_tolerance, $blurb) = @_; my $success = 1; if (scalar @$expected_array_ref != scalar @$got_array_ref) { $success = 0; diag('Arrays have different lengths'); } else { for my $idx (0 .. $#$got_array_ref) { my $expected_bottom = $expected_array_ref->[$idx] - $low_tolerance; my $expected_top = $expected_array_ref->[$idx] + $high_tolerance; unless (_is_between($got_array_ref->[$idx], $expected_bottom, $expected_top)) { $success = 0; diag(<<"EOF"); Value $idx is out of range: Got: [$got_array_ref->[$idx]] Expected: [$expected_bottom, $expected_top, $expected_array_ref->[$idx]] EOF last; } } } ok($success, $blurb); } sub compare_hash_by_ranges { local $Test::Builder::Level = $Test::Builder::Level + 1; my $got_hash_ref = shift; my $expected = shift; my $blurb = shift; my $got = [ map { [$_, $got_hash_ref->{$_} ] } sort { $a <=> $b } keys(%$got_hash_ref) ] ; my $success = 1; if (scalar(@$expected) != scalar(@$got)) { $success = 0; diag("Number of keys differ in hashes."); } else { COMPARE_KEYS: for my $idx (0 .. $#$got) { my ($got_key, $got_val) = @{$got->[$idx]}; my ($expected_bottom, $expected_top, $expected_val) = @{$expected->[$idx]}; if (! ( ($got_key >= $expected_bottom) && ($got_key <= $expected_top) && ($got_val == $expected_val) ) ) { $success = 0; diag(<<"EOF"); Key/Val pair No. $idx is out of range or wrong: Got: [$got_key, $got_val] Expected: [$expected_bottom, $expected_top, $expected_val] EOF last COMPARE_KEYS; } } } ok($success, $blurb); } sub _is_between { my ($have, $want_bottom, $want_top,) = @_; return (($have >= $want_bottom) && ($want_top >= $have)); } 1; =pod =head1 AUTHOR Shlomi Fish, L , C =head1 COPYRIGHT Copyright(c) 2012 by Shlomi Fish. =head1 LICENSE This file is licensed under the MIT/X11 License: http://www.opensource.org/licenses/mit-license.php. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. =cut Statistics-Descriptive-3.0605/scripts000755000764000764 012146730560 20210 5ustar00shlomifshlomif000000000000Statistics-Descriptive-3.0605/scripts/bump-version-number.pl000444000764000764 121012146730560 24610 0ustar00shlomifshlomif000000000000#!/usr/bin/perl use strict; use warnings; use File::Find::Object; use IO::All; my $tree = File::Find::Object->new({}, 'lib/'); my $version_n = shift(@ARGV); if (!defined($version_n)) { die "Specify version number as an argument! bump-version-number.pl '0.0.1'"; } while (my $r = $tree->next()) { if ($r =~ m{/\.svn\z}) { $tree->prune(); } elsif ($r =~ m{\.pm\z}) { my @lines = io->file($r)->getlines(); foreach (@lines) { s#(\$VERSION = '|^Version )\d+\.\d+(?:\.\d+)?('|)#$1 . $version_n . $2#e; } io->file($r)->print( @lines ); } } Statistics-Descriptive-3.0605/scripts/tag-release.pl000444000764000764 101312146730560 23066 0ustar00shlomifshlomif000000000000#!/usr/bin/perl use strict; use warnings; use IO::All; my ($version) = (map { m{\$VERSION *= *'([^']+)'} ? ($1) : () } io->file('lib/Statistics/Descriptive.pm')->getlines() ) ; if (!defined ($version)) { die "Version is undefined!"; } my $mini_repos_base = 'https://svn.berlios.de/svnroot/repos/web-cpan/Statistics-Descriptive'; my @cmd = ( "hg", "tag", "-m", "Tagging the Statistics-Descriptive release as $version", "releases/$version", ); print join(" ", @cmd), "\n"; exec(@cmd); Statistics-Descriptive-3.0605/rejects000755000764000764 012146730560 20160 5ustar00shlomifshlomif000000000000Statistics-Descriptive-3.0605/rejects/descr.t000444000764000764 132312146730560 21601 0ustar00shlomifshlomif000000000000{ # tests #14 # Make sure that trimmed_mean caching works but checking execution times # This test may fail on very fast machines but I'm not sure how to get # better timing without requiring extra modules to be added. my $stat = Statistics::Descriptive::Full->new(); ##Make this a really big array so that it takes some time to execute! $stat->add_data((1,2,3,4,5,6,7,8,9,10,11,12,13) x 10000); my ($t0,$t1,$td); my @t = (); foreach (0..1) { $t0 = new Benchmark; $stat->trimmed_mean(0.1,0.1); $t1 = new Benchmark; $td = timediff($t1,$t0); push @t, $td->cpu_p(); } # TEST ok ($t[1] < $t[0], "trimmed_mean caching works", ); }