Lingua-Identify-0.56/000755 000765 000024 00000000000 12203746720 014524 5ustar00ambsstaff000000 000000 Lingua-Identify-0.56/Changes000644 000765 000024 00000015451 12203746653 016032 0ustar00ambsstaff000000 000000 Revision history for Lingua-Identify 0.56 Sat Aug 17 20:22:16 WEST 2013 - Added Czech (thanks to Roger Thompson) 0.55 Wed Jul 24 21:51:32 WEST 2013 - Fixed the HI test that included some English text. 0.54 Thu May 30 17:40:24 WEST 2013 - Added Welsh (Thanks to Kevin Donnelly) 0.53 Sat May 25 15:29:48 WEST 2013 - Fix stupid failure in perl v5.18. 0.52 Tue Feb 12 21:03:20 WET 2013 - Added Hindi (Thanks to Prashant Mathur) 0.51 Wed May 2 14:51:06 WEST 2012 - Added Ukranian and corrected Russian and Bulgarian tranining corpora. (Thanks to André Santos) 0.50 Wed Dec 7 15:43:48 WET 2011 - update requirements, trying to solve some cpan testers issues 0.31 Wed Nov 23 22:22:03 WET 2011 - some cpantesters complain about non declared variabled in test. Fix it; 0.30 Fri Aug 19 21:39:46 WEST 2011 - added abstract to be shown on metacpan and other cpan indexers; 0.29 Tue Nov 23 15:40:17 WET 2010 - added Greek. Thanks to Nikos Mastropavlos. 0.28 Mon Oct 18 11:15:36 WEST 2010 - betterned langof for the empty string on dev-perl. 0.27 Fri Oct 15 21:59:18 WEST 2010 - ignore older language modules; 0.26 Fri May 21 18:01:32 WEST 2010 - Added input encoding choice on langof_file; - Added input encoding choice on langident; - Added latin1 test; 0.25 Fri May 21 17:22:03 WEST 2010 - Really dropped Text::ExtractWords dependency; - Some code rewrite for UTF-8 support (now is the default); - Dropped some knew languages given lack of training corpora; - Added some new languages; - Added at least two tests per language (t/02 and t/06); 0.23 Fri Jan 16 21:55:00 2009 - Drop Text::ExtractWords dependency. (it was giving segmentation fault, and it is no maintained) 0.22 Fri Jan 16 21:30:00 2009 - fixed the extract_from option. (Text::ExtractWords doesn't handle big files) 0.21 Sat Dec 13 11:52:00 2008 - Fixed tests that were using installed languages. (still needing fixes) 0.20 Thu Dec 11 19:39:00 2008 - Fixed tests to use $^X instead of 'perl'. 0.19 Thu Oct 16 18:02:00 2008 - changed maintainership information. 0.18 Thu Nov 03 08:32:00 2005 + changes on Lingua::Identify - minor changes in the documentation - added tests for some corner cases (empty inputs, etc) - added tests to increase test coverage 0.17 Wed Nov 02 11:57:00 2005 + changes on Lingua::Identify - added tests to see if every function is loaded - added method langof_file, to work with files - added tests for langof_file - added more tests to increase test coverage 0.16 Thu Aug 03 00:53:00 2005 + changes on Lingua::Identify - minor changes in the documentation + changes on make-lingua-identify-language - added verbose mode (-V switch) - fixed another bug in the POD for Somali 0.15 Wed Aug 02 14:11:00 2005 - more POD changes 0.14 Wed Aug 02 11:48:00 2005 + changes on make-lingua-identify-language - fixed a bug that was printing POD wrongly + changes on languages modules - fixed the POD 0.13 Wed Aug 02 09:30:00 2005 + changes on Lingua::Identify - added dummy mode, for debugging, so you know what's going on under the hood - fixed some minor glitches in the documentation - added some more tests - reorganized existing tests - added 7 more languages, thanks to George Wilson (ID, MS, RO, RU, SL, SO, SW) + changes on langident - added some crude form of support for big files, with the -s switch + changes on make-lingua-identify-language - fixed a potential glitch for languages with two letter codes that happen to be Perl reserved words - fixed some POD issues 0.12 Mon Jun 21 15:38:00 2005 + changes on Lingua::Identify - minor changes in the documentation - added support for big inputs (through options 'extract-from' and 'max-size'; see HOW TO PERFORM IDENTIFICATION -> langof -> OPTIONS) + changes on langident - fixed a weird bug (results would vary for a file if there were others being processed too; that is, `langident a` and `langident a b` would return different results for "a" 0.11 Tue Dec 21 16:28:00 2004 + changes on Lingua::Identify - added documentation detailing the confidence level + changes on make-lingua-identify-language - fixed the POD problem 0.10 Mon Dec 20 11:18:23 2004 - added make-lingua-identify-language to the MANIFEST 0.09 Fri Dec 17 15:49:23 2004 + changes on Lingua::Identify - minor changes in the documentation + added make-lingua-identify-language 0.08 Fri Nov 05 11:55:00 2004 + changes on Lingua::Identify - fixed a glitch in the documentation (thanks to Matti Korttila) + changes on langident - fixed an embarassing bug (thanks to Matti Korttila) - fixed a bug on the -m switch - added some validation for the -m switch 0.07 Thu Nov 04 01:57:02 2004 + changes on Lingua::Identify - added 7 more languages (BG, GA, HR, HU, IS, PL, TR) - added pod-coverage.t to the tests directory + changes on langident - improved the documentation (and added examples to it) - added the -o switch (work only with specified languages) - added the -e switch (chose the methods to use) - added the -c switch (show confidence level for most probable language) - the -d switch (debug) now actually prints something (the values of the command-line options). - command-line options are now parsed with Getopt::Std (much better) 0.06 Wed Nov 03 12:20:14 2004 + changes on Lingua::Identify - fixed a bug in the submodules documentation - added the function confidence 0.05 Mon Nov 01 20:00:00 2004 + changes on Lingua::Identify - some more changes in the documentation - language modules now have some generic documentation - added pod.t to the tests directory - added a section with examples in the documentation - added function name_of + changes on langident - added the -l switch: list all languages 0.04 Sat Oct 30 17:43:00 2004 + changes on Lingua::Identify - Perl required version is now 5.006 - tests for language manipulation are now automated - added 11 more languages (AF, BR, BS, CY, DA, EO, FI, FY, LA, NL, NO, SQ, SV) - added a small test to avoid confusing possible future submodules of Lingua::Identify with languages - improved the documentation + changes on langident - added the -m switch (limit number of languages shown) - langident version is now the version of Lingua::Identify 0.03 Wed Oct 27 10:13:10 2004 - major changes in the architecture - major changes in the documentation - more languages - more methods - more stuff (see the documentation) - more power 0.02 Mon Jul 26 23:43:39 2004 - basic functionality - added the `langident` command - improved documentation 0.01 Fri Oct 1 11:59:44 2004 - original version; created by h2xs 1.23 with options -XAn Lingua::Identify Lingua-Identify-0.56/langident000644 000765 000024 00000015404 11375536067 016432 0ustar00ambsstaff000000 000000 #!/usr/bin/perl use warnings; use strict; use 5.006; use Getopt::Std qw(getopts); use Lingua::Identify qw(:all); # get the user options our %opts = get_options(); # special requests show_help() if $opts{h}; show_version() if $opts{v}; show_languages() if $opts{l}; choose_languages() if $opts{o}; # check options for sanity if ($opts{m}) { $opts{m} > 0 || die "langident: please provide a positive value for -m\n"; $opts{a} = $opts{m}; } $opts{s} ||= "\n"; # identify the language... if (@ARGV) { # ...of files my @files = @ARGV; for (sort @files) { -f && -r && -s || next; if ($opts{E}) { open (FILE,"<:encoding($opts{E})" ,$_) || do {print STDERR "Could not open $_ ($!)\n" ; next}; } else { open (FILE,"<:utf8" ,$_) || do {print STDERR "Could not open $_ ($!)\n" ; next}; } local $/ = $opts{s}; print $_, ":", (join $", ident()), "\n"; close (FILE); } } else { # ...of STDIN binmode(STDIN, ":utf8"); print ((join $", ident(<>)), "\n"); } # # subroutines # ### sub get_options { my %opts; getopts('acde:hlo:ps:vm:E:', \%opts ); foreach my $key ( keys %opts ) { $opts{$key} = 1 unless defined $opts{$key} } debug_options( %opts ) if $opts{d}; return %opts; } ### sub debug_options { my %opts = @_; #local $^W = 0; no warnings; { print STDERR <<"HERE"; ------------------------------------------------------------------------- Command line options ------------------------------------------------------------------------- a $opts{a} c $opts{c} d $opts{d} e $opts{e} h $opts{h} l $opts{l} m $opts{m} o $opts{o} p $opts{p} s $opts{s} v $opts{v} ------------------------------------------------------------------------- HERE } } ### sub show_help { die "Usage: langident file1 file2 or: langident -a -p file or: cat file | langident langident: identifies the languages files are written in Options: -a show all results -c show confidence level -d debug -e METHODS select the method(s) to use -h displays this messages and exit -l list available languages and exit -m NUMBER sets maximum number of results (languages) to display -o LANGS work only with specified languages -p also show percentages -s SIZE maximum size to examine -v show version and exit " } ### sub show_version { die "langident version 0.08 (Lingua::Identify ", Lingua::Identify->VERSION, ")\n"; } ### # identify the language sub ident { if ($opts{a} || $opts{p} || $opts{c}) { my @results = langof(get_config(),@_); @results || return (); if ($opts{m}) { my $m = $opts{m} > @results / 2 ? @results - 1 : (($opts{m} * 2) - 1); @results = @results[0 .. $m]; } my @confidence = $opts{c} ? scalar confidence(@results) * 100 : (); @results = @results[0,1] unless $opts{a}; @results = grep /[a-z]{2}/, @results unless $opts{p}; for (@results) {$_*=100 if /\d/} return (shift @results, @confidence, @results); } else { return (langof(get_config(),@_))[0]; } } ### # HELP!!! I'm an innocent comment trapped in here by an evil programmer :-( # Please get me out of here :-( My family is probably looking for me :-( ### sub get_config { my %config; # get the methods to use if (defined $opts{e}) { my %methods; for (get_all_methods()) { $methods{$_} = 0 } for (split /,/, $opts{e}) { my ($m, $v) = split /=/; defined $methods{$_} || next; $methods{$_} = $v || 1; } $config{'method'} = \%methods; } return \%config; } ### sub choose_languages { my $t = set_active_languages(split /,/, $opts{o}); $opts{m} = $t if $opts{m} > $t; } # show all available languages sub show_languages { for (sort (get_all_languages())) { print uc $_, " - ", ucfirst name_of($_) ,"\n"; } exit; } __END__ =head1 NAME langident - identifies the language files are written in =head1 SYNOPSIS langident [OPTIONS] file1 [file2 ...] =head1 DESCRIPTION Identifies the language files are written in using Perl module Lingua::Identify. =head2 OPTIONS =head2 -a Show all results (not just the most probable language). =head2 -c Show confidence level for most probable language (it will be the first value right after the most probable language). =head2 -d Debug (development only). =head2 -E ENCODING Select an input encoding. Defaults to UTF-8. # use ISO-8859-1 (latin1) langident -E ISO-8859-1 file =head2 -e METHODS Select the method(s) to use. There are three ways of doing this: # simply using a method langident -e ngrams3 file # using several methods (separate them with a comma) langident -e prefixes3,suffixes3 # using several methods and assign different weights to each of them langident -e smallwords=2,prefixes=1,ngrams3=1.3 The available methods are the following: B, B, B, B, B, B, B, B, B, B, B, B and B. =head2 -h Display help message and exit. =head2 -l List all available languages and exit. =head2 -m NUMBER Set maximum number of results (languages) to display (shows the N most probable languages, by descending order of probability). Overrides the -a switch. =head2 -o LANGUAGES Only work with specified languages. # identify between Portuguese and English only langident -o pt,en * =head2 -p Also show percentages. =head2 -s SIZE Maximum size to examine. =head2 -v Show version and exit. =head1 EXAMPLES Use methods ngrams2 and ngrams1, assigning the double of importance to ngrams2 (-e switch); output will include the three most probable languages (-m switch) with its percentages (-p switch) and also the confidence level (-c switch) of the first result. $ langident -e ngrams2=2,ngrams1 -c -p -m 3 README README:en 65.7209505939491 7.8971987481393 ga 4.11905889385895 tr 4.08487011400505 $ =head1 TO DO =over 6 =item * Add a switch to ignore HTML tags (and maybe other formats too) =back =head1 SEE ALSO Lingua::Identify(3), Text::ExtractWords(3), Text::Ngram(3), Text::Affixes(3). A linguist and/or a shrink. The latest CVS version of C (which includes I) can be attained at http://natura.di.uminho.pt/natura/viewcvs.cgi/Lingua/Identify/ ISO 639 Language Codes, at http://www.w3.org/WAI/ER/IG/ert/iso639.htm =head1 AUTHOR Jose Alves de Castro, Ecog@cpan.orgE =head1 COPYRIGHT AND LICENSE Copyright 2004 by Jose Alves de Castro This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut Lingua-Identify-0.56/lib/000755 000765 000024 00000000000 12203746720 015272 5ustar00ambsstaff000000 000000 Lingua-Identify-0.56/make-lingua-identify-language000644 000765 000024 00000024734 11700373713 022244 0ustar00ambsstaff000000 000000 #!/usr/bin/perl -s use strict; use POSIX; use Text::Ngram qw(ngram_counts); use Text::ExtractWords qw(words_count words_list); use Text::Affixes; my $version = '0.02'; our ($v,$h,$d,$u,$verbose,$locale); if ($locale) { if ($u && $locale !~ /utf/i) { verbose("setting locale: $locale.UTF-8"); POSIX::setlocale( &POSIX::LC_CTYPE, "$locale.UTF-8" ); } else { verbose("setting locale: $locale"); POSIX::setlocale( &POSIX::LC_CTYPE, $locale ); } use locale; } our $utf8 = undef; $utf8 = 1 if $u; show_help() if $h; show_version() if $v; unless (-d || @ARGV) { $d = 1; } if ($d) { my @languages = @ARGV || <*-*>; for (@languages) { /(.+)-(.+)/ || die "Can't figure out the language tag and name out of '$_'\n"; make_module($1, $2, <$_/*>); } } else { my $tag = shift || die "You must provide a language tag.\n"; my $language = shift || die "You must provide a language name.\n"; @ARGV || die "You must provide at least one file.\n"; verbose("Creating module for $tag for $language language"); make_module(lc($tag), lc($language), @ARGV); } ############### # subroutines # ############### sub verbose { print STDERR $_[0], "\n" if $verbose; } ### sub show_help { die "Usage: make-lingua-identify-language tag language file1 file2 or: make-lingua-identify-language -d directory1 directory2 make-lingua-identify-language: creates Lingua::Identify language modules Examples: make-lingua-identify-language en english file1 make-lingua-identify-language -d en-english/ pt-portuguese/ make-lingua-identify-language Options: -d directory mode -h displays this messages and exit -v show version and exit -verbose verbose mode -u unicode " } ### sub show_version { die "make-lingua-identify-language version $version\n"; } ### sub make_module { my ($tag, $name, @files) = @_; verbose("Studying $name ($tag)"); # read all files in its directory my $text; my $meta = { 'language_name' => $name, 'sets' => [], }; $tag =~ /^..$/ and $meta->{'two_letter_code'} = $tag; $tag =~ /^...$/ and $meta->{'three_letter_code'} = $tag; for (@files) { if ($utf8) { open( STDINO, '<:utf8', $_ ) || die $!; } else { open( STDINO, $_ ) || die $!; } if ($_ eq 'META.yml') { verbose("\tfound META.yml"); while () { # META.yml is processed here if (/^(\w+):\s*(\w+)$/) { $meta->{$1} = $2; verbose("\tassigned $1 to $2"); } elsif (/^(\w+):\s*$/) { my $id = $1; while () { last if /^\s*$/; if (/^\s*(\w*)\s*$/) { push @{$meta->{$id}}, $1; verbose("\tpushed $1 into $id"); } } } } } else { if ($locale) { $text .= join "\n", map { lc } ; } else { $text .= join "\n", ; } } close STDINO; } # write some headers if ($utf8) { open( STDOUTO, ">:utf8" , ( uc $tag ) . ".pm" ) || die; } else { open( STDOUTO, ">" . ( uc $tag ) . ".pm" ) || die; } my $sets = join ", ", map { "'$_'" } @{$meta->{'sets'}}; verbose("\t$sets"); print STDOUTO "use utf8;\n" if $utf8; print STDOUTO "use strict;\n", "\n\${Lingua::Identify::languages{'_versions'}{'$tag'}} = '$version';\n", "\n\${Lingua::Identify::languages{'_names'}{'$tag'}} = '$name';\n", "\n\${Lingua::Identify::languages{'_sets'}{'$tag'}} = '$sets';\n"; # write POD my $module_name = uc $tag; my $podname = ucfirst $name; print STDOUTO pod_unindent(" =head1 NAME Lingua::Identify::$module_name - Meta-information on $podname =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut "); # write prefixes information verbose("\tstudying prefixes"); my $prefixes = get_prefixes( { min => 1, max => 4 }, $text ); for my $i ( 1 .. 4 ) { select STDOUTO; print "\n\${Lingua::Identify::languages{'prefixes$i'}{'$tag'}} = {\n"; my $total; for ( values %{ $$prefixes{$i} } ) { $total += $_; } for ( ( sort { $$prefixes{$i}{$b} <=> $$prefixes{$i}{$a} } keys %{ $$prefixes{$i} } )[ 0 .. 19 ] ) { print " '$_'\t=> ", $$prefixes{$i}{$_} / $total, ",\n"; } print "};\n"; } # write suffixes information verbose("\tstudying suffixes"); my $suffixes = get_suffixes( { min => 1, max => 4 }, $text ); for my $i ( 1 .. 4 ) { select STDOUTO; print "\n\${Lingua::Identify::languages{'suffixes$i'}{'$tag'}} = {\n"; my $total; for ( values %{ $$suffixes{$i} } ) { $total += $_; } for ( ( sort { $$suffixes{$i}{$b} <=> $$suffixes{$i}{$a} } keys %{ $$suffixes{$i} } )[ 0 .. 19 ] ) { print " '$_'\t=> ", $$suffixes{$i}{$_} / $total, ",\n"; } print "};\n"; } # write words information verbose("\tstudying words"); my $hash_w; if ($utf8) { $hash_w = my_words_count($text); } else { my %hash; words_count(\%hash, $text); $hash_w = \%hash; } my $total; for ( values %$hash_w ) { $total += $_; } print "\n\${Lingua::Identify::languages{'smallwords'}{'$tag'}} = {\n"; for (( sort { $hash_w->{$b} <=> $hash_w->{$a} } grep { !/(?:[_'",;:.«»0-9\(\)\[\]\{\}\/\\\!\?%]|^-|-$)/ } keys %$hash_w ) [ 0 .. 19 ] ) { print " '$_'\t=> ", $hash_w->{$_} / $total, ",\n"; } print "};\n"; # write ngrams information my $f = sub { ngram_counts( {spaces => 0}, $_[0], $_[1]) }; get_ngrams($tag, 'ngrams', 1, 4, $f, $text ); # close the file close(STDOUT); } ############################### sub get_ngrams { my ($tag, $what, $min, $max, $function, $text) = @_; for my $gram ( $min .. $max ) { verbose("\tstudying $what $gram"); #my $hash_r = ngram_counts( $text, $gram ); my $hash = &{$function}($text, $gram); my $total; for ( values %$hash ) { $total += $_; } print "\n\${Lingua::Identify::languages{'$what$gram'}{'$tag'}} = {\n"; for ((sort { $$hash{$b} <=> $$hash{$a} } keys %$hash )[ 0 .. 19 ]) { print " '$_' => ", $$hash{$_} / $total, ",\n"; } print "};\n"; } } ### sub pod_unindent { ( local $_ = shift ) =~ s/^ +//mg; $_ } sub my_words_count { my $text = shift; my $count; for my $word (split /[\n\s]+/, $text) { $count->{$word}++; } return $count; } __END__ =head1 NAME make-lingua-identify-language - creates language modules for Lingua::Identify =head1 SYNOPSIS make-lingua-identify-language Language-Tag Language-Name file1 [file2 ...] or make-lingua-identify-language -d TAG1-LANGUAGE1/ [TAG2-LANGUAGE2/ ...] or make-lingua-identify =head1 DESCRIPTION Creates language modules to be used by Lingua::Identify. After creating the modules, you still have to install them. Please note that this script is still at an early stage. Please do not even look at the code... Without parameters, make-lingua-identify-language assumes mode -d and goes through all the directories in the current one. This is useful to be used in a directory where you something like this: . |-- en-english | `-- english.txt |-- fr-french | `-- french1.txt | `-- french2.txt `-- pt-portuguese `-- portuguese.txt =head2 OPTIONS =head2 -d Directory mode. Each parameter passed should be a directory whose name must be of the form tag-name (e.g., en-english/ ). Each of the directories passed should contain text files that can be used to train Lingua::Identify. =head2 -D Debug mode. Only for development. =head2 -h Display help and exit. =head2 -v Show version and exit. =head2 -verbose Verbose mode. =head2 -locale=C<< >> Set a specific locale. This way your text will be all lowercased before analysed. =head1 META.yml C files are not parsed as other files, they are ignored. In directory mode (C<-d> switch), C files are checked for info on languages codes and sets. Here's a simple C for you to put in your directories: two_letter_code: pt three_letter_code: por sets: spoken_in_portugal With that, the language will be identified with the two letter code "pt" or the three letter code "por"; it will also be in the set ":spoken_in_portugal". =head1 CONTRIBUTING WITH NEW LANGUAGES Please do not contribute with modules you made yourself. It's easier to contribute with unprocessed text, because that allows for new versions of Lingua::Identify not having to drop languages down in case I can't contact you by that time. Use I to create a new module for your own personal use, if you must, but try to contribute with unprocessed text rather than those modules. =head1 SEE ALSO Lingua::Identify(3), langident(1) A linguist and/or a shrink. The latest CVS version of C (which includes I) can be attained at http://natura.di.uminho.pt/natura/viewcvs.cgi/Lingua/Identify/ ISO 639 Language Codes, at http://www.w3.org/WAI/ER/IG/ert/iso639.htm =head1 AUTHOR Jose Alves de Castro, Ecog@cpan.orgE =head1 COPYRIGHT AND LICENSE Copyright 2004-2005 by Jose Alves de Castro This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut Lingua-Identify-0.56/Makefile.PL000644 000765 000024 00000000641 11667706703 016511 0ustar00ambsstaff000000 000000 use 5.0005; use ExtUtils::MakeMaker; WriteMakefile ( NAME => 'Lingua::Identify', VERSION_FROM => 'lib/Lingua/Identify.pm', ABSTRACT_FROM => 'lib/Lingua/Identify.pm', PREREQ_PM => { 'Class::Factory::Util' => 1.6, 'Text::Affixes' => 0.07, 'Text::Ngram' => 0.13, }, 'EXE_FILES' => ['langident', 'make-lingua-identify-language'], ); Lingua-Identify-0.56/MANIFEST000644 000765 000024 00000003425 12203746720 015661 0ustar00ambsstaff000000 000000 Changes Makefile.PL MANIFEST README t/00-everything_loads.t t/01-language_manipulation.t t/02-language_identification.t t/03-compile-langident.t t/04-compile-make-lingua-identify-language.t t/05-dummy-mode.t t/06-file_identification.t t/07-method_manipulation.t t/08-corner-cases.t t/files/de lib/Lingua/Identify/DE.pm t/files/en lib/Lingua/Identify/EN.pm t/files/es lib/Lingua/Identify/ES.pm t/files/pt lib/Lingua/Identify/PT.pm t/files/la lib/Lingua/Identify/LA.pm t/files/bg lib/Lingua/Identify/BG.pm t/files/el lib/Lingua/Identify/EL.pm t/files/tr lib/Lingua/Identify/TR.pm t/files/it lib/Lingua/Identify/IT.pm t/files/hr lib/Lingua/Identify/HR.pm t/files/fr lib/Lingua/Identify/FR.pm t/files/no # missing NO.pm t/files/ro lib/Lingua/Identify/RO.pm t/files/da lib/Lingua/Identify/DA.pm t/files/hu lib/Lingua/Identify/HU.pm t/files/bs # lib/Lingua/Identify/BS.pm t/files/ms # missing MS.pm t/files/fy # missing FY.pm t/files/br # missing BR.pm t/files/nl lib/Lingua/Identify/NL.pm t/files/af # missing AF.pm t/files/cy lib/Lingua/Identify/CY.pm t/files/is # missing IS.pm t/files/fi lib/Lingua/Identify/FI.pm t/files/ga # missing GA.pm t/files/sv lib/Lingua/Identify/SV.pm t/files/eo # missing EO.pm t/files/pl lib/Lingua/Identify/PL.pm t/files/sq lib/Lingua/Identify/SQ.pm t/files/uk lib/Lingua/Identify/UK.pm t/files/sl lib/Lingua/Identify/SL.pm t/files/id lib/Lingua/Identify/ID.pm t/files/ru lib/Lingua/Identify/RU.pm t/files/hi lib/Lingua/Identify/HI.pm t/files/pt_big t/pod.t t/pod-coverage.t langident make-lingua-identify-language lib/Lingua/Identify.pm lib/Lingua/Identify/Nothing.pm META.yml Module meta-data (added by MakeMaker) t/files/pt_lt1 lib/Lingua/Identify/CS.pm t/files/cs META.json Module JSON meta-data (added by MakeMaker) Lingua-Identify-0.56/META.json000644 000765 000024 00000001650 12203746720 016147 0ustar00ambsstaff000000 000000 { "abstract" : "Language identification", "author" : [ "unknown" ], "dynamic_config" : 1, "generated_by" : "ExtUtils::MakeMaker version 6.72, CPAN::Meta::Converter version 2.132140", "license" : [ "unknown" ], "meta-spec" : { "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec", "version" : "2" }, "name" : "Lingua-Identify", "no_index" : { "directory" : [ "t", "inc" ] }, "prereqs" : { "build" : { "requires" : { "ExtUtils::MakeMaker" : "0" } }, "configure" : { "requires" : { "ExtUtils::MakeMaker" : "0" } }, "runtime" : { "requires" : { "Class::Factory::Util" : "1.6", "Text::Affixes" : "0.07", "Text::Ngram" : "0.13" } } }, "release_status" : "stable", "version" : "0.56" } Lingua-Identify-0.56/META.yml000644 000765 000024 00000001004 12203746720 015770 0ustar00ambsstaff000000 000000 --- abstract: 'Language identification' author: - unknown build_requires: ExtUtils::MakeMaker: 0 configure_requires: ExtUtils::MakeMaker: 0 dynamic_config: 1 generated_by: 'ExtUtils::MakeMaker version 6.72, CPAN::Meta::Converter version 2.132140' license: unknown meta-spec: url: http://module-build.sourceforge.net/META-spec-v1.4.html version: 1.4 name: Lingua-Identify no_index: directory: - t - inc requires: Class::Factory::Util: 1.6 Text::Affixes: 0.07 Text::Ngram: 0.13 version: 0.56 Lingua-Identify-0.56/README000644 000765 000024 00000040513 11075671533 015414 0ustar00ambsstaff000000 000000 Lingua::Identify version ======================== =head1 NAME Lingua::Identify - Language identification =head1 SYNOPSIS use Lingua::Identify qw(:language_identification); $a = langof($textstring); # gives the most probable language or the complete way: @a = langof($textstring); # gives pairs of languages / probabilities # sorted from most to least probable %a = langof($textstring); # gives a hash of language / probability or the expert way (see section OPTIONS, under HOW TO PERFORM IDENTIFICATION) $a = langof( { method => [qw/smallwords prefix2 suffix2/] }, $text); $a = langof( { 'max-size' => 3_000_000 }, $text); $a = langof( { 'extract_from' => ( 'head' => 1, 'tail' => 2)}, $text); =head1 DESCRIPTION C identifies the language a given string or file is written in. See section WHY LINGUA::IDENTIFY for a list of C's strong points. See section KNOWN LANGUAGES for a list of available languages and HOW TO PERFORM IDENTIFICATION to know how to really use this module. If you're in a hurry, jump to section EXAMPLES, way down below. Also, don't forget to read the following section, IMPORTANT WARNING. =head1 A WARNING ON THE ACCURACY OF LANGUAGE IDENTIFICATION METHODS Take a word that exists in two different languages, take a good look at it and answer this question: "What language does this word belong to?". You can't give an answer like "Language X", right? You can only say it looks like any of a set of languages. Similarly, it isn't always easy to identify the language of a text if the only two active languages are very similar. Now that we've taken out of the way the warning that language identification is not 100% accurate, please keep reading the documentation. =head1 WHY LINGUA::IDENTIFY You might be wondering why you should use Lingua::Identify instead of any other tool for language identification. Here's a list of Lingua::Identify's strong points: =over 6 =item * it's free and it's open-source; =item * it's portable (it's Perl, which means it will work in lots of different platforms); =item * 33 languages and growing; =item * 4 different methods of language identification and growing (see METHODS OF LANGUAGE IDENTIFICATION for more details on this one); =item * it's a module, which means you can easily write your own application (be it CGI, TK, whatever) around it; =item * it comes with I, which means you don't actually need to write your own application around it; =item * it's flexible (at the moment, you can actually choose the methods to use and their relevance, the max size of input to analyze each time and which part(s) of the input to analyze) =item * it supports big inputs (through the 'max-size' and 'extract_from' options) =item * it's easy to deal with languages (you can activate and deactivate the ones you choose whenever you want to, which can improve your times and accuracy); =item * it's maintained. =back =head1 HOW TO PERFORM IDENTIFICATION =head2 langof To identify the language a given text is written in, use the I function. To get a single value, do: $language = langof($text); To get the most probable language and also the percentage of its probability, do: ($language, $probability) = langof($text); If you want a hash where each active language is mapped into its percentage, use this: %languages = langof($text); =head3 OPTIONS I can also be given some configuration parameters, in this way: $language = langof(\%config, $text); These parameters are detailed here: =over 6 =item * B When the size of the input exceeds the C'max-size', C analyzes only the beginning of the file. You can specify which part of the file is analyzed with the 'extract-from' option: langof( { 'extract_from' => 'tail' } , $text ); Possible values are 'head' and 'tail' (for now). You can also specify more than one part of the file, so that text is extracted from those parts: langof( { 'extract_from' => [ 'head', 'tail' ] } , $text ); (this will be useful when more than two possibilities exist) You can also specify different values for each part of the file (not necessarily for all of them: langof( { 'extract_from' => { head => 40, tail => 60 } } , $text); The line above, for instance, retrives 40% of the text from the beginning and 60% from the end. Note, however, that those values are not percentages. You'd get the same behavior with: langof( { 'extract_from' => { head => 80, tail => 120 } } , $text); The percentages would be the same. =item * B By default, C analyzes only 1,000,000 bytes. You can specify how many bytes (at the most) can be analyzed (if not enough exist, the whole input is still analyzed). langof( { 'max-size' => 2000 }, $text); If you want all the text to be analyzed, set max-size to 0: langof( { 'max-size' => 0 }, $text); See also C. =item * B You can choose which method or methods to use, and also the relevance of each of them. To choose a single method to use: langof( {method => 'smallwords' }, $text); To choose several methods: langof( {method => [qw/prefixes2 suffixes2/]}, $text); To choose several methods and give them different weight: langof( {method => {smallwords => 0.5, ngrams3 => 1.5} }, $text); To see the list of available methods, see section METHODS OF LANGUAGE IDENTIFICATION. If no method is specified, the configuration for this parameter is the following (this might change in the future): method => { smallwords => 0.5, prefixes2 => 1, suffixes3 => 1, ngrams3 => 1.3 }; =item * B By default, C assumes C mode, but others are available. In C mode, instead of actually calculating anything, C only does the preparation it has to and then returns a bunch of information, including the list of the active languages, the selected methods, etc. It also returns the text meant to be analised. Do be warned that, with I, the dummy mode still reads the files, it simply doesn't calculate language. langof( { 'mode' => 'dummy' }, $text); This returns something like this: { 'methods' => { 'smallwords' => '0.5', 'prefixes2' => '1', }, 'config' => { 'mode' => 'dummy' }, 'max-size' => 1000000, 'active-languages' => [ 'es', 'pt' ], 'text' => $text, 'mode' => 'dummy', } =back =head2 langof_file I works just like I, with the exception that it reveives filenames instead of text. It reads these texts (if existing and readable, of course) and parses its content. Currently, I assumes the files are regular text. This may change in the future and the files might be scanned to check their filetype and then parsed to extract only their textual content (which should be pretty useful so that you can perform language identification, say, in HTML files, or PDFs). To identify the language a file is written in: $language = langof_file($path); To get the most probable language and also the percentage of its probability, do: ($language, $probability) = langof_file($path); If you want a hash where each active language is mapped into its percentage, use this: %languages = langof_file($path); If you pass more than one file to I, they will all be read and their content merged and then parsed for language identification. =head3 OPTIONS I accepts all the options I does, so refer to those first (up in this document). $language = langof_file(\%config, $path); I currently only reads the first 10,000 bytes of each file. =head2 confidence After getting the results into an array, its first element is the most probable language. That doesn't mean it is very probable or not. You can find more about the likeliness of the results to be accurate by computing its confidence level. use Lingua::Identify qw/:language_identification/; my @results = langof($text); my $confidence_level = confidence(@results); # $confidence_level now holds a value between 0.5 and 1; the higher that # value, the more accurate the results seem to be The formula used is pretty simple: p1 / (p1 + p2) , where p1 is the probability of the most likely language and p2 is the probability of the language which came in second. A couple of examples to illustrate this: English 50% Portuguese 10% ... confidence level: 50 / (50 + 10) = 0.83 Another example: Spanish 30% Portuguese 10% ... confidence level: 30 / (25 + 30) = 0.55 French 10% German 5% ... confidence level: 10 / (10 + 5) = 0.67 As you can see, the first example is probably the most accurate one. Are there any doubts? The English language has five times the probability of the second language. The second example is a bit more tricky. 55% confidence. The confidence level is always above 50%, for obvious reasons. 55% doesn't make anyone confident in the results, and one shouldn't be, with results such as these. Notice the third example. The confidence level goes up to 67%, but the probability of French is of mere 10%. So what? It's twice as much as the second language. The low probability may well be caused by a great number of languages in play. =head2 get_all_methods Returns a list comprised of all the available methods for language identification. =head1 LANGUAGE IDENTIFICATION IN GENERAL Language identification is based in patterns. In order to identify the language a given text is written in, we repeat a given process for each active language (see section LANGUAGES MANIPULATION); in that process, we look for common patterns of that language. Those patterns can be prefixes, suffixes, common words, ngrams or even sequences of words. After repeating the process for each language, the total score for each of them is then used to compute the probability (in percentage) for each language to be the one of that text. =head1 METHODS OF LANGUAGE IDENTIFICATION C currently comprises four different ways for language identification, in a total of thirteen variations of those. The available methods are the following: B, B, B, B, B, B, B, B, B, B, B, B and B. Here's a more detailed explanation of each of those ways and those methods =head2 Small Word Technique - B The "Small Word Technique" searches the text for the most common words of each active language. These words are usually articles, pronouns, etc, which happen to be (usually) the shortest words of the language; hence, the method name. This is usually a good method for big texts, especially if you happen to have few languages active. =head2 Prefix Analysis - B, B, B, B This method analyses text for the common prefixes of each active language. The methods are, respectively, for prefixes of size 1, 2, 3 and 4. =head2 Suffix Analysis - B, B, B, B Similar to the Prefix Analysis (see above), but instead analysing common suffixes. The methods are, respectively, for suffixes of size 1, 2, 3 and 4. =head2 Ngram Categorization - B, B, B, B Ngrams are sequences of tokens. You can think of them as syllables, but they are also more than that, as they are not only comprised by characters, but also by spaces (delimiting or separating words). Ngrams are a very good way for identifying languages, given that the most common ones of each language are not generally very common in others. This is usually the best method for small amounts of text or too many active languages. The methods are, respectively, for ngrams of size 1, 2, 3 and 4. =head1 LANGUAGE MANIPULATION When trying to perform language identification, C works not with all available languages, but instead with the ones that are active. By default, all available languages are active, but that can be changed by the user. For your convenience, several methods regarding language manipulation were created. In order to use them, load the module with the tag :language_manipulation. These methods work with the two letters code for languages. =over 6 =item B Activate a language activate_language('en'); # or activate_language($_) for get_all_languages(); =item B Activates all languages activate_all_languages(); =item B Deactivates a language deactivate_language('en'); =item B Deactivates all languages deactivate_all_languages(); =item B Returns the names of all available languages my @all_languages = get_all_languages(); =item B Returns the names of all active languages my @active_languages = get_active_languages(); =item B Returns the names of all inactive languages my @active_languages = get_inactive_languages(); =item B Returns the name of the language if it is active, an empty list otherwise if (is_active('en')) { # YOUR CODE HERE } =item B Returns the name of the language if it exists, an empty list otherwise if (is_valid_language('en')) { # YOUR CODE HERE } =item B Sets the active languages set_active_languages('en', 'pt'); # or set_active_languages(get_all_languages()); =item B Given the two letter tag of a language, returns its name my $language_name = name_of('pt'); =back =head1 KNOWN LANGUAGES Currently, C knows the following languages (33 total): =over 6 =item AF - Afrikaans =item BG - Bulgarian =item BR - Breton =item BS - Bosnian =item CY - Welsh =item DA - Danish =item DE - German =item EN - English =item EO - Esperanto =item ES - Spanish =item FI - Finnish =item FR - French =item FY - Frisian =item GA - Irish =item HR - Croatian =item HU - Hungarian =item ID - Indonesian =item IS - Icelandic =item IT - Italian =item LA - Latin =item MS - Malay =item NL - Dutch =item NO - Norwegian =item PL - Polish =item PT - Portuguese =item RO - Romanian =item RU - Russian =item SL - Slovene =item SO - Somali =item SQ - Albanian =item SV - Swedish =item SW - Swahili =item TR - Turkish =back =head1 CONTRIBUTING WITH NEW LANGUAGES Please do not contribute with modules you made yourself. It's easier to contribute with unprocessed text, because that allows for new versions of Lingua::Identify not having to drop languages down in case I can't contact you by that time. Use I to create a new module for your own personal use, if you must, but try to contribute with unprocessed text rather than those modules. =head1 EXAMPLES =head2 THE BASIC EXAMPLE Check the language a given text file is written in: use Lingua::Identify qw/langof/; my $text = join "\n", <>; # identify the language by letting the module decide on the best way # to do so my $language = langof($text); =head2 IDENTIFYING BETWEEN TWO LANGUAGES Check the language a given text file is written in, supposing you happen to know it's either Portuguese or English: use Lingua::Identify qw/langof set_active_languages/; set_active_languages(qw/pt en/); my $text = join "\n", <>; # identify the language by letting the module decide on the best way # to do so my $language = langof($text); =head1 TO DO =over 6 =item * WordNgrams based methods; =item * More languages (always); =item * File recognition and treatment; =item * Deal with different encodings; =item * Create sets of languages and allow their activation/deactivation; =item * There should be a way of knowing the default configuration (other than using the dummy mode, of course, or than accessing the variables directly); =item * Add a section about other similar tools. =back =head1 SEE ALSO langident(1), Text::ExtractWords(3), Text::Ngram(3), Text::Affixes(3). ISO 639 Language Codes, at http://www.w3.org/WAI/ER/IG/ert/iso639.htm =head1 AUTHOR Alberto Simoes, C<< >> Jose Castro, C<< >> =head1 COPYRIGHT & LICENSE Copyright 2008 Alberto Simoes, All Rights Reserved. Copyright 2004 Jose Castro, All Rights Reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. Lingua-Identify-0.56/t/000755 000765 000024 00000000000 12203746720 014767 5ustar00ambsstaff000000 000000 Lingua-Identify-0.56/t/00-everything_loads.t000644 000765 000024 00000001240 10730327426 020735 0ustar00ambsstaff000000 000000 use Test::More tests => 16; BEGIN { use_ok('Lingua::Identify', ':all') }; can_ok(__PACKAGE__,'langof'); can_ok(__PACKAGE__,'langof_file'); can_ok(__PACKAGE__,'confidence'); can_ok(__PACKAGE__,'get_all_methods'); can_ok(__PACKAGE__,'activate_all_languages'); can_ok(__PACKAGE__,'deactivate_all_languages'); can_ok(__PACKAGE__,'get_all_languages'); can_ok(__PACKAGE__,'get_active_languages'); can_ok(__PACKAGE__,'get_inactive_languages'); can_ok(__PACKAGE__,'is_active'); can_ok(__PACKAGE__,'is_valid_language'); can_ok(__PACKAGE__,'activate_language'); can_ok(__PACKAGE__,'deactivate_language'); can_ok(__PACKAGE__,'set_active_languages'); can_ok(__PACKAGE__,'name_of'); Lingua-Identify-0.56/t/01-language_manipulation.t000644 000765 000024 00000004243 12203746024 021735 0ustar00ambsstaff000000 000000 #!/usr/bin/perl use Lingua::Identify ':language_manipulation'; use Test::More; my @languages = qw/pt en de bg da es it fr fi hr nl ro ru pl el la sq sv tr sl hu id uk hi cy cs/; plan tests => 23 + scalar(@languages); for (qw/zbr xx zz/, '') { is(is_valid_language($_), 0); } is_deeply( [ get_all_languages() ], [ get_active_languages() ]); is_deeply( [ sort ( get_all_languages() ) ], [ sort ( get_active_languages() ) ]); is_deeply( [ sort ( get_all_languages() ) ], [ sort @languages ]); is_deeply( [ sort ( deactivate_language('pt') ) ], [ sort grep {! /^pt$/ } @languages ]); is_deeply( [ sort ( get_active_languages() ) ], [ sort grep {! /^pt$/ } @languages ]); is_deeply( [ get_inactive_languages() ], [ qw/pt/ ]); is(is_active('pt'), 0); is_deeply( [ deactivate_all_languages() ], [ ]); is_deeply( [ get_inactive_languages() ], [ get_all_languages() ]); is_deeply( [ activate_language('pt') ], [ qw/pt/ ]); is(is_active('pt'), 1); is_deeply( [ sort ( set_active_languages(qw/pt ru/) ) ], [ qw/pt ru/ ]); is_deeply( [ sort ( get_active_languages() ) ], [ qw/pt ru/ ]); is_deeply( [ activate_all_languages() ], [ get_all_languages() ]); is(name_of('pt'), 'portuguese'); deactivate_all_languages(); is_deeply( [ get_active_languages() ], [ ]); is_deeply( [ activate_all_languages() ], [ get_all_languages ]); is_deeply( [ sort ( get_all_languages() ) ], [ sort @languages ]); is_deeply( [ sort ( get_active_languages() ) ], [ sort @languages ]); for (get_all_languages()) { is(is_valid_language($_), 1); } Lingua-Identify-0.56/t/02-language_identification.t000644 000765 000024 00000065057 12203746015 022241 0ustar00ambsstaff000000 000000 #!/usr/bin/perl use utf8; use Test::More tests => 4 + 3 * 26; BEGIN { use_ok('Lingua::Identify', qw/:language_manipulation :language_identification/) }; my %texts = ( cs => "Asi 4000 lidí v sobotu demonstrovaly v centru tureckého největšího města Istanbulu, aby vyjádřily podporu islámským konzervativcům v Egyptě. Ty podporuje i turecká vláda. V libyjském Benghází vybuchla bomba v před egyptským konzulátem. Tlaková vlna vyrazila okna budovy, okolních domů a poblíž zaparkovaných aut.", hi => "इस पोजीशन मे महिला पुरुष का चेहरे से बेहतर संपर्क रहता है. इस पोजीशन में दोनों के बीच काफी निकटता रहती है और यह पोजीशन उन जोड़ो के लिये बेहतर है जो रतिक्रीड़ा के दौरान एक दूसरे को चुंबन करने में ज्यादा रुचि रखते हैं. इस पोजीशन के लिये पुरुष किसी पलंग या उस जैसे किसी अन्य जगह पर पांव नीचे करके बैठ जाता है. फिर महिला उसके चेहरे की ओर अपना चेहरा करते हूए उसके लिंग के उपर या सामने अपने योनि को ले जाते हुए अपनी टांगे सामने फैला देती है. साथ ही महिला के हाथ पुरुष के शरीर से सहारा लेने के काम आते है. इस पोजीशन में रतिक्रीड़ा के दौरान पुरुष चाहे तो अपने हाथ पीछे कर सहारे के रुप में प्रयुक्त कर सकता है वहीं दूसरी ओर हाथों को महिला के कूल्हों या कमर के पास से सहारा देकर धक्कों में मदद के साथ गति भी बढ़ा सकता है. बदकिस्मती से चित्र में दिखाई गई पोजीशन धक्कों के हिसाब से उतनी बेहतर नहीं कही जा सकती (जितनी रेटिंग में दिखाई गई है) . पोजीशन के संपूर्ण आनंद के लिये स्टूल या चेयर का प्रयोग करें इसमेंमहिला को पांव के सहारे के ज्यादा सही अवसर होते हैं. इसके अलावा भी इसमें अपने हिसाब से बैठने की व्यवस्था बनाकर पोजीशन को ज्यादा आनंददायी बनाया जा सकता है.", uk => "Блокування роботи Верховної Ради тривало — депутати від опозиції і надалі вимагали негайно заслухати Генпрокурора та керівника Державної пенітенціарної служби щодо начебто засто ... ", el => "Μέτρα, το ύψος των οποίων ξεπερνά τα 13 δισ. ευρώ και δεν έχουν ακόμη προσδιοριστεί αν και περιλαμβάνονται από τον περασμένο Μάιο στο Μνημόνιο, θα ενσωματώνει το «μεσοπρόθεσμο δημοσιονομικό στρατηγικό πλαίσιο 2012-2014. Το κείμενο, που θα συνταχθεί από κοινού με την τρόικα, θα γίνει νόμος του κράτους στο τέλος Απριλίου του 2011. Το χρονοδιάγραμμα για την αποκάλυψη των κρυφών μέτρων του μνημονίου ανακοίνωσε ο Γιώργος Παπακωνσταντίνου, ενώ σχετική αναφορά έκαναν και οι εκπρόσωποι της τρόικας οι οποίοι μίλησαν για λήψη μέτρων που θα αντιστοιχούν στο 5% του ΑΕΠ. ", 'tr' => "Ölüm bu işin kaderinde var' diyordu 1 gün önce Başbakan.. Haklı çıktı.. Hepsi öldü. Aileler isyan etti böyle kadere.. Ama 23 madencinin durumu, isyan edilmeyecek gibi değildi. Türkİye 3 gündür yerin 540 metre dibinde mahsur kalmış 30 madenci için dua ediyordu. Dün sabah patlamanın olduğu kuyudan ocağa inen ekipler, acı gerçekle yüzleşti. Hiç biri kurtulamamıştı. Cesetler yeryüzüne çıkarıldıkça feryatlar göğe yükseldi.", ms => 'Ahli Parlimen Jerai, Mohd Firdaus Jaafar ketika mengulas isu itu berkata Umno sepatutnya bangun untuk menentang kemungkaran yang berlaku setelah dengan jelas melibatkan perjudian yang sememangnya diharamkan oleh agama Islam. Apakah pemimpin Umno ini tidak berasa malu apabila DAP sanggup menentang pemberian lesen judi tersebut sedangkan pemimpin-pemimpin Umno hanya mendiamkan diri. Kita juga ingin bertanya kemanakah perginya suara-suara yang sebelum ini melaung-laungkan perjuangan untuk agama sepertimana yang mereka uar-uarkan, katanya.', fy => "Us Heit, dy't yn de himelen is jins namme wurde hillige. Jins keninkryk komme. Jins wollen barre, allyk yn 'e himel sa ek op ierde. Jou ús hjoed ús deistich brea. En ferjou ús ús skulden, allyk ek wy ferjouwe ús skuldners. En lied ús net yn fersiking, mar ferlos ús fan 'e kweade.", cy => 'Gwraig Huan ap Gwydion, a vu un o ladd ei gwr, ag a ddyfod ei fyned ef i hely oddi gartref, ai dad ef Gwdion brenhin Gwynedd y gerddis bob tir yw amofyn, ac or diwedd y gwnaeth ef Gaergwdion (sef: via laactua( sy yn yr awyr yw geissio: ag yn y nef y cafas ei chwedyl , lle yr oedd ei enaid: am hynny y troes y wraig iefanc yn ederyn, a ffo rhag ei thad yn y gyfraith, ag a elwir er hynny hyd heddiw Twyll huan. ', br => "Ul lec'hienn gouestlet d'ar brezhoneg abaoe 1995 eo Kervarker.org. Amañ e kaver a bep seurt servijoù evit deskiñ pe peurzeskiñ ar yezh, evit an dudi hag evit kejañ gant brezhonegerien eus ar bed a-bezh. Evit tennañ splet eus ar gwellañ kinniget gant al lec'hienn-mañ, n'ho peus ken emezeliñ : digoust eo !", # bs => ' oči podigao prema meni, kao da se on zaista tako zove i kao da sam izgovorio nešto što se samo po sebi razumije. Je li on lud ili je potpuno otupio ležeći u tvrđavi u kojoj je, mojim odgojem, izbrisan,', eo => 'En multaj lokoj de Ĉinio estis temploj de drako-reĝo. Dum trosekeco oni preĝis en la temploj, ke la drako-reĝo donu pluvon al la homa mondo. Tiam drako estis simbolo de la supernatura estaĵo. Kaj pli poste, ĝi fariĝis prapatro de la plej altaj regantoj kaj simbolis la absolutan aŭtoritaton de feŭda imperiestro. La imperiestro pretendis, ke li estas filo de la drako. Ĉiuj liaj vivbezonaĵoj portis la nomon drako kaj estis ornamitaj per diversaj drakofiguroj. Nun ĉie en Ĉinio videblas drako-ornamentaĵoj, kaj cirkulas legendoj pri drakoj.', sq => 'Kryetari i Partisë Socialiste ka deklaruar në mënyrë të drejtpërdrejtë se është gati të pranojë marrëveshjen e propozuar nga ndërkombëtarët dhe Presidenti për hapjen e kutive të materiale zgjedhore që do të çonte edhe në përfundim e grevës së urisë dhe rinisjen e jetës parlamentare dhe politike në vend. "Unë jam i gatshëm që të pranoj marrëveshjen për hapjen e materialeve zgjedhore, më pas nëse aty del e nevojshme të hetohem kutitë e votave le të vendosë Komisioni i Venecias". Kështu deklaroi lideri socialist Edi Rama, dy orë pas deklaratës së Kryeministrit Berisha i cili përgënjështroi ambasadorin e OSBE-së, në vendin tonë, Robert Bosch, se ka një draft marrëveshje për zgjidhjen e krizës.', is => 'Alls fá 34 verkefni framlög frá Menningarráði Vestfjarða samtals að upphæð 15 milljónir, í fyrri styrkúthlutun ráðsins á árinu 2010. Styrkirnir eru á bilinu 75 þúsund til ein milljón króna. Umsóknir sem bárust að þessu sinni voru 78 og var samtals beðið um rúmar 55 milljónir í verkefnastyrki, en heildarupphæð fjárhagsáætlana var þrisvar sinnum hærri. Styrkirnir fara til margvíslegra verkefna í fjölbreyttum listgreinum og var áhersla að þessu sinni lögð á að styrkja verkefni sem fólu í sér nýsköpun og fjölgun atvinnutækifæra tengd listum og menningu, samvinnu og menningartengda ferðaþjónustu. „Vestfirskt menningarlíf lætur engan bilbug á sér finna og sóknarhugur og bjartsýni eru ríkjandi,“ segir í tilkynningu Menningarráðs.', hu => 'A magyar intézkedés által érintett tárcák vezetőinek elemzést kell készíteniük a helyzetről és ki kell dolgozniuk a szükséges törvényalkotási javaslatokat, amelyekkel minimalizálnák a magyar törvény szlovákiai hatásait és kockázatait. A kormány kezdeményezésére ezt követően rendkívüli ülést fog tartani a szlovák parlament is, hogy gyorsított eljárásban politikai választ fogadjon el.', af => 'Toe daal die HERE neer om die stad en die toring te besien waaraan die mensekinders gebou het. En die HERE sê: Daar is hulle nou een volk en het almal een taal! En dit is net die begin van hulle onderneming: nou sal niks vir hulle meer onmoontlik wees van wat hulle van plan is om te doen nie. Kom, laat Ons neerdaal en hulle taal daar verwar, sodat die een die taal van die ander nie kan verstaan nie. So het die HERE hulle dan daarvandaan oor die hele aarde verstrooi; en hulle het opgehou om die stad te bou. Daarom het hulle dit Babel genoem, want daar het die HERE die taal van die hele aarde verwar, en daarvandaan het die HERE hulle oor die hele aarde verstrooi', da => 'For to måneder siden var det tæt på, at der kom en ny ejer til indkøbscentret i centrum. Sådan lød det også for fire måneder siden. Men nu trækker det altså ud igen. Konkurskurator Lars Grøngaard har ellers været i gang med at finde nye ejere ganske længe, og siger i Folkebladet i dag, at der er interesserede til at købe Bytorv Horsens, deriblandt en række af de nuværende panthavere. Bytorv Horsens blev tilbage i maj 2007 solgt for 635 mill. kr. til EBH Ejendomme. Siden er andre dele af koncernen EBH Bank kollapset, og EBH Fonden er gået konkurs. Det fremgår af det senest tilgængelige årsregnskab fra 2008, at der er indgået et realkreditlån på 381 mill. kr., mens der i øvrigt er en samlet gæld for i alt 508 mill.', fi => 'Tanska nöyryyttää jälleen isolla kädellä yhtä kiekkoilun suurmaista. Tanska johtaa avauserän jälkeen Slovakiaa murskaavasti 6-0. Tanskalaiset tahkoivat ottelun alussa kiekkoa maaliin oikein urakalla. 6-0 tilanne oli jo tosiasia ajassa 13.42. Slovakia vaihtoi maalivahtiaan ajassa 4.40, jolloin tilanne oli 3-0. Peter Budaj sai väistyä Rastislav Stanan tieltä, joka imaisi vielä toiset kolme avauserässä. Kun tätä tilannetta katsoo, niin Leijona-ryhmän Tanska-tappio tuntuu varsin lievältä!', nl => 'Ambulancevliegtuig. Libië heeft een ambulancevliegtuig ter beschikking gesteld om het slachtoffertje naar Nederland te brengen. Naast zijn oom en tante is wordt hij ook begeleidt door een behandelend arts. Het toestel vertrekt om 10.00 uur vanuit Tripoli. Geheime locatie. Op uitdrukkelijk verzoek van de familie van Ruben wordt de aankomstplaats niet bekend gemaakt en zullen de media niet in de gelegenheid worden gesteld bij de aankomst aanwezig te zijn.', hr => 'Gradišćanske Hrvate u Austriji, Mađarskoj i Slovačkoj predstavlja osam folklornih i pjevačkih ansambala: Kolo Slavuj, Graničari, Štrabanci, Hajdenjaki, Čunovski bećari, Basbaritenori, Staro vino i Paxi. Kao što kaže jedan od glavnih organizatora, predsjednik društva Anno 93 Perica Mijić, emisija "Lijepom našom" obljubljena je u Hrvatskoj i dijaspori. Prije 15 godina je posljednji put gostovala u Beču, a sada je opet vrijeme da se emitira iz glavnoga grada Austrije, veli Mijić. Ulaznice se mogu nabaviti u Hrvatskom centru po cijeni od 25 eura.', sv => 'Det var i onsdags som den thailändska regeringen förklarade att det planerade nyvalet i november har blåsts av. Målet uppges vara att finna en annan väg till försoning, men beslutet ledde snabbt till att de redan långt gångna demonstrationerna trappades upp. Redan samma dag som påbudet meddelades hotade regeringen att från midnatt natten till torsdagen stoppa tillgången på el, telefon, mat och vatten för demonstrantlägret i centrala Bangkok. Hjälpte inte det kunde det bli aktuellt att ”med våld återta området.', sl => 'Letalske povezave pa bi bile še kako ugodne tudi za udeležence dveh dogodkov desetletja, univerzijado in evropsko prestolnico kulture. Zagotovo bosta ta dva dogodka nekakšen zrelostni izpit za mesto ob Dravi in hkrati najboljša priložnost, da dokažemo, da je Maribor zares mesto priložnosti. Za zdaj ocena ni najboljša, še vedno je preveč ‘soliranja’ in iskanja razlogov, zakaj kakšna zadeva ne bo uspela. Župan sam seveda ne bo mogel narediti veliko in če ne bomo stopili skupaj.', ro => 'Preşedintele Traian Băsescu a declarat aseară, într-o conferinţă de presă, că autorităţile „speră” ca în anul 2011 „să existe toate resursele necesare pentru a acoperi necesităţile bugetului de asigurări sociale”. „Pot spune doar intenţia, aşa cum am discutat cu Guvernul. Intenţia este să menţinem această reducere până la 31 decembrie 2010, dar este doar o intenţie, în speranţa că în bugetului anului 2011 vom avea resursele să acoperim integral necesităţile bugetului de asigurări sociale din bugetul de stat. Această acoperire depinde de foarte multe, de programul Guvernului de relansare a creşterii economice, de lupta împotriva evaziunii fiscale şi a contrabandei. Sunt foarte multe elemente, nu vreau să mă substitui programului pe care Guvernul îl va lansa odată cu aplicarea măsurilor din scrisoarea cu FMI", a precizat Traian Băsescu.', id => 'Disebutkannya, berdasarkan pernyataan Ketua Desk Pilkada Nasional I Gusti Putu Artha melihat sikap ngotot Komisi Pemilihan Umum Medan tetap menggelar pemungutan suara meski sejumlah masalah belum dituntaskan, sepertinya bakal banyak pihak agar pilkada tetap diulang. Apalagi KPU melihat banyaknya masalah yang dilakukan KPU Medan itu pastilah memberi peluang untuk pilkada harus diulang. Jika dilakukan pilkada ulang, dengan demikian KPU memastikan semua anggota Komisi Pemilihan Umum Medan dipecat', no => 'Norge har i alle år benyttet pengepolitikken til å stimulere sentralisering av privat og offentlig virksomhet til Oslofjord regionen. Prinsippet er det samme som vestlige land bruker som motkonjunkturpolitikk i finanskrisen. Kunnskapsløshet hos distriktsbefolkningen kan være årsaken til at denne utviklingen fortsetter å forsterke seg.Den britiske økonomen John Maynard Keynes formulerte en teori som litt forenklet sier at det offentlige kan påvirke den innenlandske etterspørselen etter varer og tjenester ved å øke offentlige utgifter i form av økte investeringer og økt offentlig etterspørsel etter varer og tjenester. På grunn av en positiv multiplikatoreffekt vil dette bidra til en selvforsterkende vekst i økonomien. Keynes teori har fått fornyet aktualitet i forbindelse med den pågående finanskrisen. Så å si alle nasjoner har brukt offentlige stimuleringspakker for å få fart på økonomien.', pl => 'Zgłoszenie chęci wzięcia udziału w Konkursie poprzez wysłanie e-maila na adres: do dnia 13 maja 2010 włącznie. W temacie korespondencji elektronicznej należy wpisać słowo „Konkurs”. W treści podać swoje imię, nazwisko oraz datę urodzenia. Każdy uczestnik Konkursu zobowiązany jest do posiadania aktywnej skrzynki mailowej, w celu komunikowania się z Organizatorem. Po otrzymaniu wiadomości z chęcią wzięcia udziału w Konkursie, Organizator przesyła uczestnikowi potwierdzenie wpisania na listę obecności. Wszelkie uwagi dotyczące listy obecności należy zgłaszać Organizatorowi w terminie do 13 maja 2010 włącznie. Po tym terminie na listę obecności nie będą nanoszone żadne zmiany.', ga => 'Lecht Fir Death forsind áth la Coin Culaind atchíi cách Cethern mac Fintain anair dorochair oc Smirommair. oca togail docer Luan oc techt immach assa thaig. fríth lecht Lóegaire Buadaig. fri Dún Lethglasse anair; bás Blaí Briuga tria chin mná i ndesciurt Oenaig Macha. Aided Cuscraid la Mac Cecht. de luin Cheltchair croda in t-echt. dorochair Mac Cecht iar tain.', la => 'Horum ego puer morum in limine iacebam miser, et huius harenae palaestra erat illa, ubi magis timebam barbarismum facere, quam cavebam, si facerem, non facientibus invidere. dico haec et confiteor tibi, deus meus, in quibus laudabar ab eis, quibus placere tunc mihi erat honeste vivere. non enim videbam voraginem turpitudinis, in quam proiectus eram ab oculis tuis. nam in illis iam quid me foedius fuit, ubi etiam talibus displicebam, fallendo innumerabilibus mendaciis et paedagogum et magistros et parentes, amore ludendi, studio spectandi nugatoria et imitandi ludicra inquietudine?', ru => 'При чем тут Генплан? Генплан как и СССР навегда ушел в прошлое. Как и старые схемы. Руководители России должны заботиться именно о собственной стране. А не спонсировать чужую экономику за счет собственных граждан, и не повышать конкурентоспособность чужих предприятий в ущерб своим. Если нет аналога в том что было утрачено (как верфь для авианосцев в Николаеве) - нужно строить заново, обеспечивая работой своих собственных сограждан а не чужих (судьба твоей страны меня не волнует, это ваши проблемы. Ничего кроме скорейшего развала на части и присоединения бга и востока к России лично я ей вообще не желаю)', it => "L'operazione di Boston ha interessato una casa di Watertown e una stazione di servizio nella zona residenziale di Brookline, dove le telecamere di una tv locale hanno ripreso la polizia locale che aiutava gli agenti dell'Fbi a perquisire un'auto. Indagini e perquisizioni anche a Long Island, nello stato di New York, e in New Jersey. In tutto sarebbero state perquisiti quattro edifici. In un comunicato, le autorità di Boston hanno specificato che non esistono minacce immediate alla sicurezza.", fr => "Une commission d'enquête sera créée, afin d'éclaircir les raisons de l'incident et d'en définir les responsabilités. L'entreprise publique Petroleos de Venezuela est l'opérateur de cette plateforme depuis 2009. Dans un communiqué, le groupe a immédiatement rappelé que ses activités d'exploration et de production de gaz et de pétrole étaient «conformes aux procédures et standards internationaux». Assumant toutefois sa part de responsabilité, Petroleos de Venezuela a entamé sa propre enquête.", 'es' => 'Un día después del ajuste draconiano en España, el Gobierno portugués que preside José Sócrates (socialista) ha aprobado un aumento generalizado de impuestos y un recorte drástico del gasto para ahorrar 2.100 millones de euros y reducir este año el déficit público al 7% del PIB, por debajo del 8,3% previsto inicialmente por el Ejecutivo. A diferencia de su vecino ibérico, el plan portugués ha sido pactado con el Partido Social Demócrata (PSD), principal fuerza de la oposición (conservadora). "Son necesarias para defender Portugal y defender la moneda única", ha justificado Sócrates.', 'de' => 'soviel nehmen darf, als man ihr giebt, wenn sie nur ihre Tugend behauptet? Das gilt auch fuer Minister und erlaubt mir, in dieser kargen Zeit unter Umstaenden auf mein Gehalt zu verzichten. Dafuer kannst du dir zuweilen ein gutes Bild kaufen, Fraenzchen. Du musst auch deine ehrbare Ergoetzung haben.', 'pt' => 'As armas e os barões assassinados, que da Ocidental praia Lusitana, Por mares que nunca antes foram navegados, Passaram além de uma tal Taprobana E em perigos e guerras esforçados Mais do que prometia a força humana, E entre gente remota edificaram Novo Reino, que tanto sublinharam; ', 'en' => "this is an example of an English text; hopefully, it won't be mistaken for a Gaelic text, this time! That is not the purpose for this line.", bg => 'Смисълът на правовата държава е не да защитава престъпниците. Смисълът и е да не позволи държавата да стане престъпник. Защото когато тя е такава, това е най-лошият възможен вариант за обществото. Именно поради тази причина, след векове на демократична еволюция, западът е стигнал до правовата държава. Тя не е наше изобретение, не е измислена от българите или от Тройната коалиция. Тя е разумният избор на доста по-мъдри от нас нации.', ); for my $lang (get_all_languages()) { die "\n\n*** $lang test is not available." unless exists($texts{$lang}); my @x = langof($texts{$lang}); is($x[0], $lang, "Identifying $lang text..."); cmp_ok($x[1],'>','0.14'); cmp_ok(confidence(@x),'>','0.50'); } my @pt = langof(<','0.14'); cmp_ok(confidence(@pt),'>','0.50'); Lingua-Identify-0.56/t/03-compile-langident.t000644 000765 000024 00000000413 11372773001 020763 0ustar00ambsstaff000000 000000 #!/usr/bin/perl use Test::More tests => 1; my $file = "langident"; print "bail out! Script file is missing!" unless -e $file; my $output = `$^X -c $file 2>&1`; print "bail out! Script file is missing!" unless like( $output, qr/syntax OK$/, 'script compiles' ); Lingua-Identify-0.56/t/04-compile-make-lingua-identify-language.t000644 000765 000024 00000000414 11372773022 024607 0ustar00ambsstaff000000 000000 #!/usr/bin/perl use Test::More tests => 1; my $file = "langident"; print "bail out! Script file is missing!" unless -e $file; my $output = `$^X -c $file 2>&1`; print "bail out! Script file is missing!" unless like( $output, qr/syntax OK$/, 'script compiles' ); Lingua-Identify-0.56/t/05-dummy-mode.t000644 000765 000024 00000021620 11374060465 017457 0ustar00ambsstaff000000 000000 #!/usr/bin/perl use utf8; use Test::More tests => 10; BEGIN { use_ok('Lingua::Identify', qw/:language_manipulation :language_identification/) }; my $text = ' As armas e os barões assinalados Que, da Ocidental praia Lusitana, Por mares nunca de antes navegados Passaram ainda além da Taprobana E em perigos e guerras esforçados Mais do que prometia a força humana, E entre gente remota edificaram Novo Reino, que tanto sublimaram; E também as memórias gloriosas Daqueles Reis que foram dilatando A Fé, o Império, e as terras viciosas De África e de Ásia andaram devastando, E aqueles que por obras valerosas Se vão da lei da Morte libertando: Cantando espalharei por toda parte, Se a tanto me ajudar o engenho e arte. Cessem do sábio Grego e do Troiano As navegações grandes que fizeram; Cale-se de Alexandro e de Trajano A fama das vitórias que tiveram; Que eu canto o peito ilustre Lusitano, A quem Neptuno e Marte obedeceram. Cesse tudo o que a Musa antiga canta, Que outro valor mais alto se alevanta. E vós, Tágides minhas, pois criado Tendes em mi um novo engenho ardente Se sempre, em verso humilde, celebrado Foi de mi vosso rio alegremente, Dai-me agora um som alto e sublimado, Um estilo grandíloco e corrente, Por que de vossas águas Febo ordene Que não tenham enveja às de Hipocrene. Dai-me húa fúria grande e sonorosa, E não de agreste avena ou frauta ruda, Mas de tuba canora e belicosa, Que o peito acende e a cor ao gesto muda; Dai-me igual canto aos feitos da famosa Gente vossa, que a Marte tanto ajuda; Que se espalhe e se cante no Universo, Se tão sublime preço cabe em verso. E vós, ó bem nascida segurança Da Lusitana antiga liberdade, E não menos certíssima esperança De aumento da pequena Cristandade; Vós, ó novo temor da Maura lança, Maravilha fatal da nossa idade, Dada ao mundo por Deus (que todo o mande, Pera do mundo a Deus dar parte grande); Vós, tenro e novo ramo florecente, De húa árvore, de Cristo mais amada Que nenhúa nascida no Ocidente, Cesárea ou Cristianíssima chamada, (Vede-o no vosso escudo, que presente Vos amostra a vitória já passada, Na qual vos deu por armas e deixou As que Ele pera Si na Cruz tomou); Vós, poderoso Rei, cujo alto Império O Sol, logo em nascendo, vê primeiro; Vê-o também no meio do Hemisfério, E, quando dece, o deixa derradeiro; Vós, que esperamos jugo e vitupério Do torpe lsmaelita cavaleiro, Do Turco Oriental e do Gentio Que inda bebe o licor do santo Rio: Inclinai por um pouco a majestade, Que nesse tenro gesto vos contemplo, Que já se mostra qual na inteira idade, Quando subindo ireis ao eterno Templo; Os olhos da real benignidade Ponde no chão: vereis um novo exemplo De amor dos pátrios feitos valerosos, Em versos devulgado numerosos. Vereis amor da pátria, não movido De prêmio vil, mas alto e quase eterno; Que não é prêmio vil ser conhecido Por um pregão do ninho meu paterno. Ouvi: vereis o nome engrandecido Daqueles de quem sois senhor superno, E julgareis qual é mais excelente, Se ser do mundo Rei, se de tal gente. Ouvi: que não vereis com vãs façanhas, Fantásticas, fingidas, mentirosas, Louvar os vossos, como nas estranhas Musas, de engrandecer-se desejosas: As verdadeiras vossas são tamanhas, Que excedem as sonhadas, fabulosas, Que excedem Rodamonte e o vão Rugeiro, E Orlando, inda que fora verdadeiro. Por estes vos darei um Nuno fero, Que fez ao Rei e ao Reino tal serviço, Um Egas e um Dom Fuas, que de Homero A cítara para eles só cobiço; Pois polos Doze Pares dar-vos quero Os Doze de Inglaterra e o seu Magriço; Dou-vos também aquele ilustre Gama, Que para si de Eneias toma a fama. Pois, se a troco de Carlos, Rei de França, Ou de César, quereis igual memória, Vede o primeiro Afonso, cuja lança Escura faz qualquer estranha glória; E aquele que a seu Reino a segurança Deixou, co a grande e próspera vitória; Outro Joanne, invicto cavaleiro; O quarto e quinto Afonsos e o terceiro. Nem deixarão meus versos esquecidos Aqueles que, nos Reinos lá da Aurora, Se fizeram por armas tão subidos, Vossa bandeira sempre vencedora: Um Pacheeo fortíssimo e os temidos Almeidas, por quem sempre o Tejo chora, Albuquerque terribil, Castro forte, E outros em quem poder não teve a morte. E, enquanto eu estes canto, e a vós não posso, Sublime Rei, que não me atrevo a tanto, Tomai as rédeas vós do Reino vosso: Dareis matéria a nunca ouvido canto. Comecem a sentir o peso grosso (Que polo mundo todo faça espanto) De exércitos e feitos singulares De África as terras e do Oriente os mares. '; my $t1 = langof( { 'mode' => 'dummy' }, $text); is_deeply( $t1 , { 'method' => { %Lingua::Identify::default_methods }, 'config' => { 'mode' => 'dummy', }, 'max-size' => 1000000, 'active-languages' => [ sort (get_all_languages()) ], 'text' => $text, 'mode' => 'dummy', }); $t1 = langof( { method => { smallwords => 1, prefixes2 => 2 }, 'mode' => 'dummy' }, $text); is_deeply( $t1 , { 'method' => { 'smallwords' => '1', 'prefixes2' => '2', }, 'config' => { 'mode' => 'dummy', 'method' => { 'smallwords' => '1', 'prefixes2' => '2', }, }, 'max-size' => 1000000, 'active-languages' => [ sort (get_all_languages()) ], 'text' => $text, 'mode' => 'dummy', }); $t1 = langof( { method => [ qw/smallwords prefixes2/ ], 'mode' => 'dummy' }, $text); is_deeply( $t1 , { 'method' => { 'smallwords' => '1', 'prefixes2' => '1', }, 'config' => { 'mode' => 'dummy', 'method' => [ 'smallwords' , 'prefixes2' , ], }, 'max-size' => 1000000, 'active-languages' => [ sort (get_all_languages()) ], 'text' => $text, 'mode' => 'dummy', }); $t1 = langof( { method => 'smallwords', 'mode' => 'dummy' }, $text); is_deeply( $t1 , { 'method' => { 'smallwords' => '1', }, 'config' => { 'mode' => 'dummy', 'method' => 'smallwords', }, 'max-size' => 1000000, 'active-languages' => [ sort (get_all_languages()) ], 'text' => $text, 'mode' => 'dummy', }); is_deeply( [ sort (set_active_languages( qw/pt es fr/ )) ] , [ sort qw/pt es fr/ ] ); is_deeply( [ sort (get_active_languages( )) ] , [ sort qw/pt es fr/ ] ); my $t2 = langof( { 'method' => 'smallwords', 'mode' => 'dummy' }, $text); is_deeply( $t2 , { 'method' => { 'smallwords' => '1', }, 'config' => { 'mode' => 'dummy', 'method' => 'smallwords', }, 'max-size' => 1000000, 'active-languages' => [ 'es', 'fr', 'pt', ], 'text' => $text, 'mode' => 'dummy', }); my $t3 = langof( { 'max-size' => 100, 'method' => 'smallwords', 'mode' => 'dummy' }, $text); is_deeply( $t3 , { 'method' => { 'smallwords' => '1', }, 'config' => { 'mode' => 'dummy', 'method' => 'smallwords', 'max-size' => 100, }, 'max-size' => 100, 'active-languages' => [ 'es', 'fr', 'pt', ], 'text' => substr($text,0,100), 'mode' => 'dummy', }); $t3 = langof( { 'max-size' => 0, 'method' => 'smallwords', 'mode' => 'dummy' }, $text); is_deeply( $t3 , { 'method' => { 'smallwords' => '1', }, 'config' => { 'mode' => 'dummy', 'method' => 'smallwords', 'max-size' => 0, }, 'max-size' => 0, 'active-languages' => [ 'es', 'fr', 'pt', ], 'text' => $text, 'mode' => 'dummy', }); Lingua-Identify-0.56/t/06-file_identification.t000644 000765 000024 00000002425 12203746472 021376 0ustar00ambsstaff000000 000000 #!/usr/bin/perl use Test::More tests => 13 + 3 * 26; BEGIN { use_ok('Lingua::Identify', qw/:language_identification :language_manipulation/) }; for my $language (get_all_languages()) { die "**** Text file for $language language not available" unless -f "t/files/$language"; my @lang = langof_file("t/files/$language"); is($lang[0], $language, "Checking identified language is $language."); if (grep { $language eq $_ } (qw"sl cs")) { # Harder languages cmp_ok($lang[1],'>','0.15', "Checking probability for $language"); } else { cmp_ok($lang[1],'>','0.16', "Checking probability for $language"); } cmp_ok(confidence(@lang),'>','0.51', "Checking confidence for $language"); } # Some extra tests my @pt = langof_file({method=>'smallwords'},'t/files/pt_big'); is($pt[0],'pt'); cmp_ok($pt[1],'>','0.14'); cmp_ok(confidence(@pt),'>','0.50'); @pt = langof_file('t/files/pt_big'); is($pt[0],'pt'); cmp_ok($pt[1],'>','0.18'); cmp_ok(confidence(@pt),'>','0.51'); @pt = langof_file('t/files/en', 't/files/pt_big'); is($pt[0],'pt'); cmp_ok($pt[1],'>','0.13'); cmp_ok(confidence(@pt),'>','0.50'); # Encoding @pt = langof_file({encoding=>'ISO-8859-1'},'t/files/pt_lt1'); is($pt[0],'pt'); cmp_ok($pt[1],'>','0.16'); cmp_ok(confidence(@pt),'>','0.51'); Lingua-Identify-0.56/t/07-method_manipulation.t000644 000765 000024 00000000440 11372773560 021446 0ustar00ambsstaff000000 000000 #!/usr/bin/perl use Test::More tests => 2; BEGIN { use_ok('Lingua::Identify', qw/:language_identification/) }; is_deeply( [ get_all_methods ], [ qw/smallwords prefixes1 prefixes2 prefixes3 prefixes4 suffixes1 suffixes2 suffixes3 suffixes4 ngrams1 ngrams2 ngrams3 ngrams4/ ] ); Lingua-Identify-0.56/t/08-corner-cases.t000644 000765 000024 00000004321 11663270762 017774 0ustar00ambsstaff000000 000000 #!/usr/bin/perl use Test::More tests => 14; BEGIN { use_ok('Lingua::Identify', qw/:language_manipulation :language_identification/) }; # Check language of undef or space... is(langof(), undef, "Language of nothing is undefined"); my @undef = langof(); is_deeply( [ @undef ] , [ ] , "Language of nothing is nothing"); is(langof( { method => 'smallwords' }, ' '), undef, "Language of space is undefined"); # Check language for word 'melhor' # my @pt = langof( { method => 'suffixes4' }, 'melhor'); # is_deeply( [ @pt ], [ 'pt', 1 ], # "list of possible languages with 'melhor' word, using 'suffixes4' method."); # is_deeply(confidence(@pt), 1, # "Confidence for 'melhor' being portuguese using 'suffixes4' method."); my @xx = langof( { method => 'suffixes4' }, 'z'); is_deeply( [ @xx ], [ ]); is_deeply(confidence(@xx), 0 ); is_deeply( [ deactivate_all_languages() ], [ ]); is_deeply( [ get_active_languages() ], [ ]); my @pt = langof( { method => 'suffixes4' }, 'melhor'); is_deeply( [ @pt ], [ ]); is_deeply(confidence(@pt), 0 ); is_deeply( [ sort ( set_active_languages(qw/pt/) ) ], [ qw/pt/ ]); is_deeply( [ get_active_languages() ], [ qw/pt/ ]); @pt = langof( { method => 'suffixes4' }, 'zzzzzz'); is_deeply( [ @pt ], [ ]); is_deeply(confidence(@pt), 0 ); __END__ is_deeply( [ sort ( get_active_languages() ) ], [ qw/fr it/ ]); is_deeply( [ activate_all_languages() ], [ get_all_languages() ]); is(name_of('pt'), 'portuguese'); deactivate_all_languages(); is_deeply( [ get_active_languages() ], [ ]); is_deeply( [ activate_all_languages() ], [ get_all_languages ]); is_deeply( [ sort ( get_all_languages() ) ], [ sort @languages ]); is_deeply( [ sort ( get_active_languages() ) ], [ sort @languages ]); Lingua-Identify-0.56/t/files/000755 000765 000024 00000000000 12203746720 016071 5ustar00ambsstaff000000 000000 Lingua-Identify-0.56/t/pod-coverage.t000644 000765 000024 00000000254 10730327426 017531 0ustar00ambsstaff000000 000000 #!perl -T use Test::More; eval "use Test::Pod::Coverage 1.04"; plan skip_all => "Test::Pod::Coverage 1.04 required for testing POD coverage" if $@; all_pod_coverage_ok(); Lingua-Identify-0.56/t/pod.t000644 000765 000024 00000000214 10730327426 015734 0ustar00ambsstaff000000 000000 #!perl -T use Test::More; eval "use Test::Pod 1.14"; plan skip_all => "Test::Pod 1.14 required for testing POD" if $@; all_pod_files_ok(); Lingua-Identify-0.56/t/files/af000644 000765 000024 00000002352 11373302537 016405 0ustar00ambsstaff000000 000000 1. Die hele wêreld het net een taal gepraat. 2. Toe die mense ooswaarts getrek het, het hulle uitgekom by 'n vlakte in Sinar en daar gaan woon. 3. Hulle het op 'n keer vir mekaar gesê: "Kom ons maak stene en ons brand hulle hard." Hulle het toe stene in plaas van klip gebruik en asfalt in plaas van klei. 4. Toe sê hulle: "Kom ons bou vir ons 'n stad met 'n toring waarvan die punt tot in die hemel reik en ons maak so vir ons 'n naam. Dan sal ons nie oor die hele aarde versprei nie." 5. Die Here het afgekom om te kyk na die stad en die toring wat die mense vir hulle gebou het, 6. en toe sê Hy: "Hier is hulle een volk en almal het een taal. Hulle het nog maar net begin om iets te doen. Hierna sal niks wat hulle beplan vir hulle onmoontlik wees nie. 7. Kom laat Ons afgaan en verwarring bring in hulle taal, sodat die een nie die ander verstaan nie." 8. Die Here het hulle toe van daar af verstrooi oor die hele aarde, en hulle het opgehou met die bou van die stad. 9. Daarom het hulle die stad Babel {Die naam "Babel" en "in verwarring bring" klink amper eenders in die Hebreeus} genoem, want daar het die Here in die taal van die aarde verwarring gebring, en van daar af het Hy die mense oor die hele aarde verstrooi. Lingua-Identify-0.56/t/files/bg000644 000765 000024 00000005331 11375530366 016414 0ustar00ambsstaff000000 000000 Не би трябвало бюджетът за култура да се формира на остатъчен принцип. Това заяви във Варна президентът Георги Първанов, предаде репортер на радио „Фокус”-Варна. Президентът откри Четвъртия международен поетичен фестивал „Славянска прегръдка” в градската художествена галерия на Варна. Пред поетите и художниците Първанов заяви, че ако всички ние осъзнаем, че културата е този приоритет, който ще изведе България и ще я нареди в челните редици на държавите от Европейския съюз, тогава ще осъзнаем, че това е една от най-добрите антикризисни мерки. Първанов бе категоричен, че в 20 години преход да се промени ситуацията и да се реформира всичко в страната, като че ли е забравено едно нещо – да се реформира културата. Президентът Георги Първанов подчерта, че когато България е влязла в Европейския съюз, славянската азбука е била утвърдена като третата азбука, изписана на еврото, което утвърждава значимостта на славянската азбука. Първанов добави, че е приел да участва във фестивала без колебание, за да може да бъде част от това вълнение. Георги Първанов поздрави варненските управници като заяви, че когато един добър управленец реши, може да подкрепи културата. В Четвъртия международен поетичен фестивал „Славянска прегръдка” са взели участие 37 поети от 11 славянски държави. Освен прекия културен обмен, радетели на идеята във филиалите на славянските страни, станали вече и приятели на България и Варна, работят и по лобирането за кандидатура на Варна за Европейска столица на културата. Десислава СТОЯНОВА Lingua-Identify-0.56/t/files/br000644 000765 000024 00000001612 11373610544 016420 0ustar00ambsstaff000000 000000 Keleier al lec'hienn Kervarker N'eo ket ! Met ar foromoù, ne laran ket. Ne oant ket a-feson evit kelenn brezhoneg. re a dabutoù oa oa e-barzh. Gwir eo ez eo mod-kozh Kervarker.org. Setu perak emaon e soñj adsevel pep tra a-benn daou vloaz. Ha gant muioc'h a zafar e-barzh ! N'ho peus kont ebet e Kervarker c'hoazh ? Emezelit bremañ ! Digoust eo ha servijoù ouzhpenn ha suroc'h ho po : kinnig keleier war ar bennbajenn, postañ er foromoù gant ho lesanv, resev keleier dre bostel, kemm an etrefas, ... Klask zo war ur sklolaer(ez) brezhoneger(ez) e Bruz (35) Keleier all 50 bugel zo e klas divyezhek ar skol-mamm Jacques Prévert e Bruz ha nac'het e vez gant an ensellerezh digeriñ ur c'hlas nevez war digarez ne vefe skolaer bezhoneger vak ebet ganto ! Setu maz oc'h dedennet gant ar post (ha "contractuel" e vefec'h), kasit ho kinnig d'ar rektordi ha kit e darempred gant Div Yezh. Trugarez. Lingua-Identify-0.56/t/files/bs000644 000765 000024 00000003370 11373610042 016415 0ustar00ambsstaff000000 000000 Najžešća stranačka borba vodit će se na području Zapadnohercegovačke, Hercegbosanske i Posavske županije gdje će i nakon ovih izbora vlast pokušati formirati HDZ BiH i HKDU na jednoj, odnosno HDZ 1990., HSP i Lijanovići na drugoj strani. Te će tri županije s hrvatskom većinom biti prave političke arene u kojima će se političke stranke, kao gladijatori, boriti svim raspoloživim demokratskim i prljavim oružjima. Tu se ne radi samo o preuzimanju ili zadržavanju vlasti, ulog je u svakom smislu preveliki. U Središnjoj Bosni, kao i u Hercegovačko-neretvanskoj županiji formiranje vlasti ovisit će u konačnici o bošnjačko-hrvatskoj koaliciji tako da će u tom smislu biti od iznimnog značaja odnos SDA i SBiH, odnosno Sulejmana Tihića i Harisa Silajdžića pa možda i Fahrudina Radončića, spram jednoga od dva HDZ-a. U Središnjoj Bosni i Posavskoj županiji moguće je da ključnu ulogu o sudjelovanju Čovićevog ili Ljubićevog HDZ-a u vlasti s Bošnjacima, kao i do sada, odigra HSS-NHI, ako postigne očekivane rezultate i dobije nekoliko zastupnika u županijskim skupštinama. Na entitetskoj i državnoj razini borba za raspodjelu ministarskih dužnosti vodit će se između HDZ-a BiH i HDZ-a 1990. i HSP-a i to će opet ovisiti od odnosa s bošnjačkim strankama u FBiH odnosno od odnosa sa srpskim i bošnjačkim strankama na državnoj razini. Prema trenutačnom raspoloženju i odnosima između najjačih hrvatskih i bošnjačkih stranaka, pod uvjetom da hrvatske stranke poluče makar približno iste izborne rezultate, u FBiH vlast bi mogli kreirati devedesetka i pravaši, bez Čovićevih kadrova, ali zato objektivno Božo Ljubić i Zvonko Jurišić teško u takvim relacijama mogu računati na ministarske pozicije na državnoj razini. Lingua-Identify-0.56/t/files/cs000644 000765 000024 00000005444 12203745753 016435 0ustar00ambsstaff000000 000000 Káhira - Egyptským ozbrojeným složkám se podle světových agentur podařilo v sobotu odpoledne vyklidit káhirskou mešitu, kterou ráno obsadily stovky stoupenců sesazeného prezidenta Muhammada Mursího. Armáda s policií budovu nejprve obklíčily a během obležení se ozývala v místě i střelba. Poradce prozatímního prezidenta Mustafa Higází prohlásil, že Egypt čelí v současnosti "extremistickým silám", na což reaguje bezpečnostními opatřeními, která jsou v souladu s platnými zákony. V mešitě bylo asi 700 příznivců konzervativního Muslimského bratrstva, z jehož řad vzešel i sesazený Mursí. Podle médií vypukla série přestřelek mezi ozbrojenci v minaretu mešity a vojáky v okolí mešity. Minaret je také jediným místem, které se bezpečnostním silám zatím obsadit nepodařilo. Vojáci s islamisty, kteří se zabarikádovali uvnitř, v průběhu dne vyjednávali a přesvědčovali je, aby budovu opustili. Desítky Egypťanů tak učinily rychle, větší množství jich ale dlouho ještě zůstávalo uvnitř. Mnozí nechtěli místo opustit z obav ze zatčení. Zatímco ženy a děti mohly odejít hned, muži byli nejprve vyslýcháni, což mezi lidmi ve svatyni na Ramsesovu náměstí vyvolávalo neklid. Ramsesovo náměstí bylo jedním z center pátečních potyček, mešita tehdy fungovala i jako márnice a nemocnice dohromady, brzy se zaplnila těly mrtvých, zraněných i těch, kteří se zde jen chtěli ukrýt před násilím, popsala BBC. Premiér: Bratrstvo rozpustit! Egyptský premiér Hazím Bibláví mezitím přišel s návrhem, aby bylo Muslimské bratrstvo legálně rozpuštěno. Podrobnosti již Bibláví předložil ministrovi sociálních věcí, který má na starosti vydávání povolení pro nevládní organizace. "Nyní se celá věc důkladně studuje," uvedl premiérův mluvčí. Muslimské bratrstvo na sociální síti Facebook oznámilo, že při pátečních střetech na káhirském Ramsesově náměstí zemřel i syn jednoho z vůdců Muslimského bratrstva Muhammada Badího. Celkem tento den přišlo po celém Egyptě o život 173 lidí, z toho 95 v centru Káhiry. Uvedlo to egyptské ministerstvo zdravotnictví. Zahynul mimo jiní Amar Badí, syn duchovního vůdce Muslimského bratrstva Muhammada Badího. Zraněných je podle údajů ministerstva 1330 po celém zemi a 596 při potyčkách v egyptské metropoli. Egyptské bezpečnostní složky zároveň oznámily, že zadržely bratra vůdce teroristické sítě Al-Káida Ajmána Zavahrího, který v zemi žije. Muhammad Zavahrí byl údajně zadržen na kontrolním stanovišti v Gíze nedaleko Káhiry. Policie nezveřejnila žádné podrobnosti, důvodem jeho vzetí do vazby má být ale údajně jeho podpora Muslimskému bratrstvu a Muhammadu Mursímu.Lingua-Identify-0.56/t/files/cy000644 000765 000024 00000004045 11373611600 016425 0ustar00ambsstaff000000 000000 Eryr Gwernabwy, wedi bod yn hir yn briod a'r Eryres, a bod iddo lawer o blant o honi, a'i marw hi, a bod o hono yntau yn hir yn weddw, a aeth i briodi gyda Dallhuan Cwmcawlyd. Ond rhag iddo blant o honi, a dirywio o'i rywogaeth, efe a aeth yn nghyntaf at Hynafiaid y Byd i ofyn ei hoedran hi. Ac yn nghyntaf, efe a aeth at Garw Rhedynfre, ac a'i cafodd yn gorwedd wrth hen geltffinen o dderwen, ac a ofynodd iddo oedran y Ddallhuan. Y Carw a'i hatebodd, "Mi a welais y dderwen hon yn fesen y sy yr awrhon ar lawr heb na dail na rhisgl arni; ac ni bu arni draul yn y byd, ond fy mod I yn ymrwbio ynddi bob dydd wrth godi, ac ni welais I erioed y Ddallhuan yn hy^n nac iau nag ydyw heddyw; ond y mae un sydd yn hy^n na myfi, a hwnw ydyw Gleisiad Glyn Llifon." Aeth yr Eryr at y Gleisiad, a holodd yntau, ac efe a atebodd, "Mi a wn fy mod I yn flwydd oed am bob gem sydd ar fy nghroen, ac am bob gronyn sydd yn fy mol, ac ni welais I erioed mo'r Ddallhuan ond yr un modd! Ond y mae un sydd hy^n na mi, a hwnw ydyw Mwyalchen Cilgwri." Yr Eryr a aeth i edrych am y Fwyalchen, a chafodd ef yn eistedd ar gareg fechan, ac a roddodd yr un gofyniad iddo yntau. Ebai y Fwyalchen, "A weli di y gareg yma sydd danaf?—nid ydyw fwy nag a fedr dyn gario yn eilaw, ac mi a'i gwelais yn llwyth cant o ychain! ac ni bu arni draul erioed, ond fy ngwaith I yn sychu fy mhig arni bob nos, ac yn taro blaen fy adenydd ynddi wrth ymgodi yn y bore, ac nid adnabum I y Ddallhuan na hyu nac iau nag ydyw hi heddyw. Ond y mae un hy^n na myfi, a hwnw ydyw Llyffant Cors Fochno; ac oni wyr hwnw ei hoedran hi nis gwyr neb." Yna aeth yr Eryr at y Llyffant, a rhoddodd yr un gofyniad iddo yntau. Ac efe a atebodd, "Ni fwyteais I ddim erioed ond a fwyteais o'r ddaear, ac ni fwyteais haner fy nigon o hono, ac a weli di y ddau fryn yna sydd wrth y gors?—mi a welais y fan yna yn dir gwastad, ac ni wnaeth dim hwynt cymaint ond a ddaeth allan o'm corph I, a bwyta cyn lleied; ac nid adnabum I erioed mo'r Ddallhuan ond yn hen wrach yn canu 'tw-hw-hw,' ac yn dychrynu plant gyda'i llais garw, fel y mae heddyw." Lingua-Identify-0.56/t/files/da000644 000765 000024 00000003352 11373267427 016414 0ustar00ambsstaff000000 000000 Den globale konkurrence og den demografiske udvikling i Danmark tvinger i fremtiden virksomheder til i højere grad at skaffe sig arbejdskraft fra udlandet. Men de danske virksomheder formår endnu ikke at omsætte mangfoldighedens muligheder til produktivitet eller mere innovation. Et nyt stort forskningsprojekt fra Handelshøjskolen, Aarhus Universitet har set på 6000 danske virksomheder årligt gennem ti år i et såkaldt ubalanceret panel af virksomheder. Forskerne har taget højde for brancheforskelle og ser den udenlandske arbejdskraft som en helhed. Det vil sige, at de for eksempel ikke skelner mellem højtuddannet arbejdskraft, der er hentet til Danmark, eller almindelige indvandrere på arbejdsmarkedet. Forskerne bag undersøgelsen har set på både store, mellemstore og små virksomheder inden for alle brancher, og resultaterne viser, at virksomheder med under 100 medarbejdere og en høj grad af etnisk diversitet i snit har cirka otte procent lavere produktivitet end den gennemsnitlige virksomhed. For virksomheder med over 100 ansatte har den udenlandske arbejdskraft hverken en positiv eller negativ indflydelse på produktiviteten. Ifølge postdoc, ph.d. Dario Pozzoli, der er en af forskerne bag undersøgelsen, kan den lavere produktivitet meget vel afspejle den indsats, virksomhederne gør for at integrere udenlandske ansatte og styre diversiteten. - Det kræver en stor investering i integration og træning af ikke-danske medarbejdere, kommunikation på tværs af nationaliteter og ikke mindst at undgå diskriminering. Men det er nødvendigt, hvis virksomhederne skal drage nytte af den nye viden, som et mangfoldigt sammensat personale kan bidrage med i forhold at forstå og imødekomme nye behov hos kunderne, siger Dario Pozzoli.Lingua-Identify-0.56/t/files/de000644 000765 000024 00000000460 10730327426 016405 0ustar00ambsstaff000000 000000 soviel nehmen darf, als man ihr giebt, wenn sie nur ihre Tugend behauptet? Das gilt auch fuer Minister und erlaubt mir, in dieser kargen Zeit unter Umstaenden auf mein Gehalt zu verzichten. Dafuer kannst du dir zuweilen ein gutes Bild kaufen, Fraenzchen. Du musst auch deine ehrbare Ergoetzung haben. Lingua-Identify-0.56/t/files/el000644 000765 000024 00000210702 11472757171 016427 0ustar00ambsstaff000000 000000 Ακρωτήρι Θήρας Από τη Βικιπαίδεια, την ελεύθερη εγκυκλοπαίδεια Μετάβαση σε: πλοήγηση, αναζήτηση Συντεταγμένες: 36°21′35″N 25°23′50″E / 36.35972, 25.39722 Το Ακρωτήρι Κατασκευασμένος χάρτης του Ακρωτηρίου την Εποχή των Μετάλλων Το Ακρωτήρι είναι χωριό της Σαντορίνης με 450 κατοίκους, με βάση την απογραφή του 2001. Βρίσκεται στο νοτιοδυτικό άκρο του νησιού σε απόσταση 15 χιλιομέτρων από τα Φηρά. Διοικητικά, ανήκει στο Τοπικό διαμέρισμα Ακρωτηρίου του Δήμου Θήρας. Κατά τους μεσαιωνικούς χρόνους αποτελούσε ένα από τα καστέλια του νησιού. Έγινε παγκοσμίως γνωστό χάρη στον προϊστορικό οικισμό που ανακαλύφθηκε στις ανασκαφές, τις οποίες άρχισε συστηματικά στην περιοχή ο Σπύρος Μαρινάτος το 1967. Γεωγραφικά, η περιοχή αποτελεί πραγματικό ακρωτήριο με απόκρημνες ακτές, που προβάλλει επί 3 μίλια δυτικά του νότιου τμήματος της Σαντορίνης. Πίνακας περιεχομένων [Απόκρυψη] 1 Ο Προϊστορικός Οικισμός του Ακρωτηρίου 1.1 Οι Ανασκαφές 1.2 Γενικά για τον προϊστορικό οικισμό 1.3 Τα πρόδρομα φαινόμενα και η έκρηξη 1.4 Η χρονολογία της έκρηξης 1.5 Η αρχιτεκτονική του οικισμού 1.6 Οι τοιχογραφίες 1.7 Επεμβάσεις στα κτίρια 1.8 Τα κτίρια, οι τοιχογραφίες και σημαντικά ευρήματα 1.8.1 Ξεστή 3 1.8.2 Οικία των Θρανίων 1.8.3 Πλατεία Ξεστής 3 1.8.4 Οδός Τελχίνων 1.8.5 Κτίριο Γ 1.8.6 Κτίριο Β 1.8.7 Πλατεία Μυλώνος 1.8.8 Συγκρότημα Δ 1.8.9 Άνδηρο Κλινών 1.8.10 Οικία της Άγκυρας 1.8.11 Πλατεία Τριγώνου 1.8.12 Δυτική Οικία 1.8.13 Πλατεία Κενοταφίου 1.8.14 Οικία των Γυναικών 1.8.15 Κτίριο Α και Κτίριο "Μύλων" 1.8.16 Ξεστή 2 1.8.17 Ξεστή 4 1.8.18 Οδός Κουρητών 1.8.19 Κτίριο ΙΑ 1.9 Συμπεράσματα από τα Ευρήματα 1.10 Το Στέγαστρο του Προϊστορικού Οικισμού 2 Το Ακρωτήρι κατά τους Μεσαιωνικούς χρόνους 3 Ο Μπάλος 4 Ο Φάρος του Ακρωτηρίου 5 Παραλίες 6 Τοπικές Εορτές 7 Πρόσθετες Πληροφορίες 8 Εικόνες από το Ακρωτήρι 9 Αναφορές 10 Εξωτερικές Συνδέσεις [Επεξεργασία] Ο Προϊστορικός Οικισμός του Ακρωτηρίου [Επεξεργασία] Οι Ανασκαφές Στοιχεία για την κατοίκηση της Θήρας κατά την προϊστορική εποχή άρχισαν να έρχονται στο φως από το δεύτερο ήμισυ του 19ου αιώνα, όταν λόγω της χρησιμοποίησης θηραϊκής γης για την μόνωση των τοιχωμάτων στη διώρυγα του Σουέζ από τον Γάλλο μηχανικό Φερντινάν ντε Λεσσέψ (Ferdinard de Lesseps) το 1866 αποκαλύφθηκαν προϊστορικές αρχαιότητες.[1] Οι πρώτες ανασκαφές στο Ακρωτήρι έγιναν από τον Γάλλο γεωλόγο και ηφαιστειολόγο Φερντινάν Φουκέ (Ferdinand André Fouqué). Μικρή ανασκαφική έρευνα επιχειρήθηκε το 1870 από την Γαλλική Αρχαιολογική Σχολή, με τον γεωλόγο Ανρί Γκορσί (Henri Gorceix) και τον Ανρί Μαμέ (Henri Mamet) στη θέση Φαβατάς (στη θέση του Συγκροτήματος Δ των σημερινών ανασκαφών, ονομαζόταν "φαβατάς" γιατί το μέρος έβγαζε πολύ φάβα), νότια του Ακρωτηρίου. Στην θέση αυτή περνούσε χείμαρρος, ο οποίος έφτανε στο επίπεδο των αρχαιοτήτων και είχε ήδη αρχίσει να αποκαλύπτει κάποιες από αυτές. Οι συστηματικές, πάντως, ανασκαφές ξεκίνησαν το 1967 από τον καθηγητή Σπύρο Μαρινάτο, με τις υποδείξεις του ντόπιου Νίκου Πελέκη και στο ίδιο σημείο που έκαναν τις ανασκαφές τους οι Γάλλοι.[2] Ο Σπύρος Μαρινάτος ξεκίνησε τις ανασκαφές στο Ακρωτήρι στην προσπάθεια του να επαληθεύσει μια παλιά δική του θεωρία, που είχε δημοσιεύσει ως Έφορος Αρχαιοτήτων Κρήτης το 1939, ότι η έκρηξη του ηφαιστείου της Θήρας προκάλεσε την κατάρρευση του πολιτισμού της Μινωικής Κρήτης.[3] Η προετοιμασία έγινε το διάστημα 1962 με 1965. Μετά τον θάνατο του καθηγητή Μαρινάτου το 1974, η ανασκαφή συνεχίζεται κάτω από την διεύθυνση του καθηγητή Χρίστου Ντούμα. [Επεξεργασία] Γενικά για τον προϊστορικό οικισμό Ο χώρος των Ανασκαφών. Από τα ευρήματα των ανασκαφών είναι πλέον γνωστό ότι η περιοχή του Ακρωτηρίου κατοικήθηκε κατά την Ύστερη Νεολιθική περίοδο (γύρω στο 4500π.Χ.) και κατά τον 18ο αιώνα π.Χ. είχε εξελιχθεί σε πόλη. Στις αρχές του 17ου αιώνα π.Χ. ισοπεδώθηκε από σεισμό, αλλά ξανακτίστηκε επάνω στα ερείπια και άκμασε κατά την Υστεροκυκλαδική Ι περίοδο, μέχρι τον ενταφιασμό της από την Μινωική έκρηξη. Η θέση ήταν ιδανική για ασφαλές αγκυροβόλιο, καθότι ήταν προστατευμένη από τους βόρειους ανέμους, ενώ ταυτόχρονα η μορφολογία του εδάφους ευνοούσε την ανάπτυξη γεωργικών δραστηριοτήτων. Πιθανολογείται ότι ήταν η πρωτεύουσα του νησιού, αλλά αυτό δεν έχει ακόμα επιβεβαιωθεί. Η έκταση των ανασκαφών είναι κοντά στα 14 στρέμματα, ένα μικρό ποσοστό της προϊστορικής πόλης, που υπολογίζεται ότι ήταν περίπου 200 στρέμματα και είχε γύρω στους 30.000 κατοίκους. Η δόμηση ήταν πυκνή και διέθετε πολυώροφα κτίρια με πλούσιες τοιχογραφίες, οργανωμένες αποθήκες, βιοτεχνικούς χώρους, άριστη πολεοδομική οργάνωση με δρόμους, πλατείες και είχε ένα πλήρως αναπτυγμένο αποχετευτικό σύστημα, το οποίο περνούσε κάτω από το λιθόστρωτο και συνδεόταν απευθείας με τα σπίτια. Τα οικοδομικά υλικά ήταν πέτρες, άργιλος, τούβλα λάσπης που ενισχύθηκαν με άχυρο, ξυλεία, γύψο μέσα και έξω. Το μεγάλο πλήθος από τοιχογραφίες, με τις οποίες ήταν διακοσμημένοι πολλοί από τους χώρους των κτιρίων, κατά κανόνα των άνω ορόφων, υποδηλώνουν μια εξελιγμένη και εκλεπτυσμένη αστική κοινωνία, η οποία ντυνόταν με πολυτέλεια, κομψότητα, και εντυπωσιακή πολυχρωμία.[4] [Επεξεργασία] Τα πρόδρομα φαινόμενα και η έκρηξη Το γεγονός ότι στον οικισμό δεν βρέθηκαν καθόλου ανθρώπινοι σκελετοί μαρτυρά ότι μια σειρά από προειδοποιητικούς σεισμούς εξανάγκασε τους κατοίκους να τον εγκαταλείψουν έγκαιρα. Πάντως πριν ταφεί ο οικισμός από την τέφρα της ηφαιστειακής έκρηξης, είχε χτυπηθεί από μεγάλο σεισμό. Κάποιοι κάτοικοι επέστρεψαν αργότερα στον οικισμό για να απεγκλωβίσουν όσους δεν είχαν προλάβει να φύγουν και να συλλέξουν πολύτιμα και προσωπικά αντικείμενα. Άλλα πρόδρομα της ηφαιστειακής εκρήξεως φαινόμενα, όμως, ανάγκασαν τους κατοίκους να ξαναεγκαταλείψουν την πόλη, όπως αποδεικνύει το γεγονός ότι οι εργασίες διάνοιξης των δρόμων δεν ολοκληρώθηκαν ποτέ, ενώ ένας μεγάλος αριθμός από αγγεία βρέθηκαν πάνω σε σωρούς μπαζών, όπου, προφανώς, είχαν τοποθετηθεί αρχικά για να μεταφερθούν σε πιο ασφαλείς θέσεις.[5] Ενδείξεις για το που κατέφυγαν δεν έχουμε. Ο χρόνος, πάντως, μεταξύ του μεγάλου σεισμού και της ηφαιστειακής έκρηξης δεν πρέπει να υπερέβαινε τις λίγες δεκάδες μέρες, ενώ η χρονική διάρκεια από τις πρώτες εκρήξεις μέχρι την δημιουργία της καλδέρας υπολογίζεται σε δύο με τρία εικοσιτετράωρα.[6] Τα αλλεπάλληλα κύματα της τέφρας παρέσυραν τις στέγες και τα ανώτερα τμήματα των κτηρίων του οικισμού.[7] Μετά την ηφαιστειακή έκρηξη και την απόθεση των ηφαιστειακών υλικών που οδήγησε στον ενταφιασμό του οικισμού, ακολούθησε καταρρακτώδης βροχή, η οποία διέβρωσε την ελαφρόπετρα και την τέφρα και σε πολλές περιπτώσεις έφτασε μέχρι και το προεκρηξιακό έδαφος. Η βροχή αυτή μετέφερε ρευστή λάσπη στα ισόγεια των κτιρίων του οικισμού, κάτι που οδήγησε τόσο στη διατήρηση του περιεχομένου τους, όσο και στην παραμονή στη θέση τους των δαπέδων των υπερκείμενων ορόφων.[5] [Επεξεργασία] Η χρονολογία της έκρηξης Η πρώτη κλασσική χρονολόγηση της Μινωικής έκρηξης βασίστηκε σε συγκριτικές μελέτες της τεχνικής των αγγείων και σε Αιγυπτιακές πηγές και είχε εκτιμηθεί ότι η έκρηξη του ηφαιστείου που κατέστρεψε την πόλη είχε συμβεί το 1500 π.Χ.. Οι απόλυτες χρονολογήσεις, όμως, που έγιναν με βάση τον ραδιενεργό άνθρακα, τη δενδροχρονολόγηση και την παγοχρονολόγηση μετατόπισαν την ημερομηνία 100 με 150 χρόνια παλαιότερα, ενώ η πλέον πρόσφατη χρονολόγηση με ραδιενεργό άνθρακα ενός κλαδιού ελιάς που θάφτηκε από την τέφρα της έκρηξης τοποθετεί την ημερομηνία μεταξύ 1627 και 1600 π.Χ.[8] με πιο πιθανό το διάστημα μεταξύ 1613 με 1614 π.Χ.. Η νέα χρονολόγηση αποδεικνύει την μη σύνδεση της έκρηξης με την καταστροφή του Μινωικού πολιτισμού.[9] Οι εκτιμήσεις, πάντως, δείχνουν ότι η έκρηξη αυτή αποτέλεσε την μεγαλύτερη έκρηξη ηφαιστείου στον κόσμο τα τελευταία 10.000 χρόνια. Σε σχέση με την εποχή του χρόνου που έγινε η έκρηξη, πιθανολογείται ότι ήταν άνοιξη, καθώς έχουν ανακαλυφθεί στο στρώμα των υλικών της έκρηξης κόκκοι γύρης από ελιές και κωνοφόρα δέντρα.[10] [Επεξεργασία] Η αρχιτεκτονική του οικισμού Τα κτίρια είχαν δύο ή τρεις ορόφους και πολλά δωμάτια. Τα πλουσιότερα ήταν κατασκευασμένα από πελεκητές πέτρες, τα οποία, για αυτόν τον λόγο, οι αρχαιολόγοι τα ονομάζουν ξεστές. Τα υπόλοιπα κτίρια ήταν κατασκευασμένα από τούβλα λάσπης ενισχυμένα με άχυρα, ξύλα και γύψο. Η θεμελίωση κατά κανόνα ήταν ρηχή και πολλές φορές υπήρχε τεχνητή επίχωση. Σε δύο περιπτώσεις, κάτω από την Ξεστή 3 και κάτω από τα θεμέλια του μεσοκυκλαδικού κτιρίου, πάνω στα ερείπια του οποίου χτίστηκε η Δυτική οικία, βρέθηκε στρώση από χαλαρά θραύσματα πορώδους λάβας διατομής 4 με 6 εκατοστών (αδράλια), η οποία έπαιζε τον ρόλο σεισμικής μόνωσης.[11] Τα δάπεδα των ορόφων κατασκευάζονταν από ξύλα και καλάμια, πάνω στα οποία υπήρχε πατημένο χώμα, στο οποίο συχνά τοποθετούσαν σχιστολιθικές πλάκες ή βότσαλα. Με ξύλα και καλάμια κατασκευαζόταν και η στέγη, πάνω στην οποία τοποθετούσαν, επίσης, πατημένο χώμα, το οποίο δρούσε ως μονωτικό και εξασφάλιζε δροσιά το καλοκαίρι και ζέστη το χειμώνα. Οι κάτω όροφοι χρησιμοποιούνταν ως αποθήκες, εργαστήρια ή μύλοι, ενώ οι πάνω όροφοι ήταν οι χώροι διαμονής των κατοίκων. Στα πιο πλούσια σπίτια, συχνά, οι τοίχοι των πάνω ορόφων ήταν διακοσμημένοι με τοιχογραφίες. Οι δρόμοι της πόλης ήταν λιθόστρωτοι. Η αποχέτευση των κτιρίων γινόταν με πήλινους σωλήνες που βρίσκονταν μέσα στους τοίχους των κτιρίων και κατέληγαν σε χτιστούς υπονόμους κάτω από τους λιθόστρωτους δρόμους. [Επεξεργασία] Οι τοιχογραφίες Κυκλαδικός Σκύφος που περιέχει ασβεστοκονίαμα. Το μεγάλος πλήθος από τοιχογραφίες που βρέθηκε κατά την διάρκεια των ανασκαφών είναι πολύτιμη πηγή πληροφοριών για την καθημερινή ζωή στο Ακρωτήρι, την θρησκεία και την φύση του νησιού. Έχουν φιλοτεχνηθεί κατά βάση με την τεχνική της νωπογραφίας (buon fresco), δηλαδή η απόδοση του έργου γινόταν πάνω στο νωπό ακόμα ασβεστολιθικό κονίαμα. Αυτό είχε ως αποτέλεσμα τα χρώματά τους να παραμένουν ανεξίτηλα. Συχνά, όμως, το κονίαμα στέγνωνε πριν την ολοκλήρωση της εργασίας του καλλιτέχνη, ο οποίος συνέχιζε την εργασία του πάνω σε στεγνό πλέον τοίχο. Σε αυτά τα σημεία, η προστασία της τοιχογραφίας γίνεται σήμερα με χημικά μέσα.[12] Λεπτομέρειες προσθέτονταν αργότερα. Υπήρχαν δύο τύποι χρωμάτων: τα ορυκτά, όπως ήταν το σκούρο κόκκινο, το οποίο ήταν υλικό με μεγάλη περιεκτικότητα σε σίδηρο και τα τεχνητά, όπως το γαλάζιο το οποίο ήταν πυρίτιο με οξείδια χαλκού και ασβεστίου. Μεγάλη έκπληξη προκαλεί στους επιστήμονες το γεγονός ότι με τη φασματοσκοπική μέθοδο ανακαλύφθηκε ότι το ιώδες χρώμα σε λεπτομέρειες της τοιχογραφικής σύνθεσης με τις κροκοσυλλέκτριες είναι πορφύρα, γεγονός που αποδεικνύει ότι το επίπεδο τεχνογνωσίας και πολιτισμού του νησιού ήταν ιδιαίτερα υψηλό.[13] Τα θέματα των τοιχογραφιών ήταν ιδιαίτερα πρωτότυπα για την εποχή τους, ενώ υπήρχε ιδιαίτερη ελευθερία στον σχεδιασμό και στην χρήση των χρωμάτων. Σημαντικές ήταν και οι λεγόμενες μικρογραφικές τοιχογραφίες, οι οποίες ήταν μακριές παραστάσεις με μορφές μικρού μεγέθους, οι οποίες ήταν τοποθετημένες στο ύψος του ματιού.[14] Σχεδόν σε όλα τα ανασκαμμένα κτίρια διαπιστώθηκε η ύπαρξη τοιχογραφιών παλαιότερων από εκείνες που κοσμούσαν τους τοίχους τους όταν έγινε η καταστροφική ηφαιστειακή έκρηξη. Αυτό δείχνει ότι οι τοιχογραφίες ήταν ένας καθιερωμένος τρόπος διακόσμησης των εσωτερικών χώρων, πιθανότατα από τις αρχές του 17ου αιώνα π.Χ.. [Επεξεργασία] Επεμβάσεις στα κτίρια Λόγω της αποσύνθεσης των οργανικών υλικών, κυρίως του ξύλου, που είχαν χρησιμοποιηθεί στην κατασκευή των κτιρίων, η απομάκρυνση των ηφαιστειακών αποθέσεων έθετε τα κτίρια σε κίνδυνο κατάρρευσης. Επιλέχθηκε η μέθοδος της έγχυσης οπλισμένου σκυροδέματος στα κενά που είχαν δημιουργηθεί. Σε πολλές περιπτώσεις ξύλινων πλαισίων θυρών και παραθύρων, το σκυρόδεμα βάφτηκε σε χρώμα καφέ, παρόμοιο με του ξύλου, ενώ ζωγραφίστηκαν σε αυτό ρόζοι, ώστε να δίνεται στον επισκέπτη όσο το δυνατόν πλησιέστερη εικόνα σε αυτή που είχε το κτίριο πριν από την ηφαιστειακή έκρηξη. [Επεξεργασία] Τα κτίρια, οι τοιχογραφίες και σημαντικά ευρήματα [Επεξεργασία] Ξεστή 3 Η Ξεστή 3 από τα ανατολικά (Δωμάτια 2, 4, 5) Από την κύρια (νότια) είσοδο του στεγασμένου αρχαιολογικού χώρου, το πρώτο κτιριακό συγκρότημα στην ανατολική πλευρά είναι η Ξεστή 3 (παλιότερα λεγόταν και Ξεστή Ε). Η Ξεστή 3 ήταν ένα μεγάλο διώροφο κτίριο, με 14 δωμάτια σε κάθε όροφο, κάποια από τα οποία συνδέονται μεταξύ τους με πολύθυρα. Στο συγκρότημα αυτό βρέθηκε ένας πολύ μεγάλος αριθμός από τοιχογραφίες, με πιο σημαντική την τοιχογραφική σύνθεση με τις Κροκοσυλλέκτριες (Πρώτος όροφος, Δωμάτιο 3Α, Ανατολικός τοίχος). Στον βόρειο τοίχο του ίδιου δωματίου υπήρχε η τοιχογραφία της Πότνιας θηρών, η οποία αποτελεί κοινή θεματική ενότητα με τις Κροκοσυλλέκτριες. Σε ένα από τα δωμάτια του ισογείου υπάρχει Δεξαμενή καθαρμών (άδυτο), ένας χώρος που θεωρείται ιερός. Αυτό, σε συνδυασμό με την θεματολογία των τοιχογραφιών, οδηγεί στο συμπέρασμα ότι στην Ξεστή 3 τελούνταν διάφορες τελετές[15] και είναι το μόνο μέχρι τώρα αδιαμφισβήτητο δημόσιο Ιερό του οικισμού.[16] Στο βόρειο τοίχο του δωματίου με την δεξαμενή καθαρμών βρισκόταν η τοιχογραφία των Λατρευτριών. Στον Προθάλαμο 5 βρέθηκε η τοιχογραφία των Κυνηγών. Στο δωμάτιο 3Β του ισογείου βρέθηκε η τοιχογραφία με τα Γυμνά αγόρια. Στο δωμάτιο 9 του πρώτου ορόφου υπήρχε η τοιχογραφία των Λυγαριών, ενώ στο δωμάτιο 3Β του ορόφου βρέθηκαν οι Γυναίκες με τις Ανθοδέσμες. Στον δεύτερο όροφο του κτιρίου, ο οποίος σήμερα δεν σώζεται, υπήρχε η τοιχογραφία των Πολύχρωμων Σπειρών, μια τοιχογραφία πολύ μεγάλων διαστάσεων, με την οποία αποδεικνύεται ότι οι καλλιτέχνες της Θήρας γνώριζαν την λεγόμενη Σπείρα του Αρχιμήδη, πάνω από χίλια χρόνια πριν από αυτόν.[17][18] Στο δωμάτιο 9, επίσης του δευτέρου ορόφου, υπήρχε η τοιχογραφία με τους Ρόδακες. Οι Κροκοσυλλέκτριες (Ανατολικός τοίχος). Οι Κροκοσυλλέκτριες (Απόσπασμα από τον ανατολικό τοίχο). Η Πότνια θηρών (Απόσπασμα). Ρόδακες. [Επεξεργασία] Οικία των Θρανίων Στις 12 Δεκεμβρίου του 1999, κατά την διάρκεια των εργασιών για την στήριξη του νέου στεγάστρου, ανακαλύφθηκε μέσα σε πήλινη λάρνακα στον χώρο της Οικίας Θρανίων, νοτιοδυτικά της Ξεστής 3, το μοναδικό χρυσό εδώλιο αίγαγρου, από τα ελάχιστα πολύτιμα αντικείμενα που έχουν βρεθεί στο χώρο των ανασκαφών, το οποίο σήμερα εκτίθεται στο Μουσείο Προϊστορικής Θήρας, στα Φηρά. Γενικά, από τις ανασκαφές που έγιναν σε μεγάλο βάθος προκειμένου να στηριχθούν τα υποστυλώματα του νέου στεγάστρου, φαίνεται ότι ο αρχικός νεολιθικός οικισμός περιοριζόταν στην περιοχή που βρίσκεται η Ξεστή 3. Το εδώλιο ήταν μέσα σε ξύλινη θήκη, της οποίας μόνο το αποτύπωμα είχε διατηρηθεί, ενώ η πήλινη λάρνακα ήταν κάτω από σωρό μεγάλου αριθμού από ζεύγη κεράτων, κατά βάση αιγοπροβάτων. Ο χρυσός αίγαγρος. [Επεξεργασία] Πλατεία Ξεστής 3 Ο ακάλυπτος χώρος ανατολικά της Ξεστής 3 έχει ονομαστεί «Πλατεία Ξεστής 3». [Επεξεργασία] Οδός Τελχίνων Από την Πλατεία Ξεστής 3 ξεκινάει το μεγαλύτερο κομμάτι δρόμου που έχει ως τώρα ανασκαφεί, η Οδός Τελχίνων, ο οποίος φαίνεται ότι οδηγούσε στο λιμάνι της πόλης. [Επεξεργασία] Κτίριο Γ Στο ξεκίνημα της οδού Τελχίνων, στην ανατολική της πλευρά, βρίσκεται το Κτίριο Γ, του οποίου μόνο μερικά δευτερεύοντα δωμάτια έχουν ανασκαφεί. Το κτίριο αυτό είχε τουλάχιστον δύο ορόφους. Στα δωμάτια Γ1 και Γ2 του κτιρίου Γ είχαν εγκατασταθεί οι λιθοξόοι, οι οποίοι επιδιόρθωναν τον τομέα αυτό από τις ζημιές που είχε υποστεί κατά τον σεισμό που έγινε πριν την μεγάλη έκρηξη του ηφαιστείου. [Επεξεργασία] Κτίριο Β Στην άλλη πλευρά της οδού Τελχίνων, σχεδόν απέναντι από το Κτίριο Γ, βρίσκεται το Κτίριο Β, το οποίο, επίσης, είχε τουλάχιστον δύο ορόφους. Είναι πολύ πιθανόν να είναι στην πραγματικότητα δύο ξεχωριστά κτίρια προσκολλημένα το ένα στο άλλο. Έχει υποστεί πολλές ζημιές από τον χείμαρρο που περνούσε από την ανατολική του πλευρά. Στο εσωτερικό του βρέθηκε ένας αριθμός από σημαντικές τοιχογραφίες, από τις οποίες οι πιο χαρακτηριστικές είναι οι Πυγμάχοι (Δωμάτιο Β1, Νότιος Τοίχος), οι Αντιλόπες (Δωμάτιο Β1, Δυτικός Τοίχος) και οι Κυανοπίθηκοι (Δωμάτιο Β6, Βόρειος και Δυτικός Τοίχος). Στο δωμάτιο Β6 βρέθηκαν και τα σπαράγματα της τοιχογραφίας με τα Τετράποδα. Οι Πυγμάχοι. Οι Αντιλόπες. Οι Κυανοπίθηκοι (Δυτικός τοίχος). Η τοιχογραφία με τα τετράποδα. [Επεξεργασία] Πλατεία Μυλώνος Η οδός Τελχίνων οδηγεί στα βόρεια σε μία πλατεία, η οποία έχει ονομαστεί Πλατεία Μυλώνος. [Επεξεργασία] Συγκρότημα Δ Το βόρειο κλιμακοστάσιο του Συγκροτήματος Δ από βορρά (Δωμάτια Δ4, Δ5, Δ6, Δ7) Στην βορειοανατολική πλευρά της Πλατείας Μυλώνος βρίσκεται το Συγκρότημα Δ, το οποίο αποτελείται από τέσσερα κτίρια. Το βορειότερο τμήμα είναι η πρώτη ξεστή που βρέθηκε και ονομάζεται Ξεστή 1. Στον νότιο, στον δυτικό και στον βόρειο τοίχο του Δωματίου Δ2, ενός μικρού ημιυπόγειου δωματίου, βρέθηκε η περίφημη τοιχογραφία της Άνοιξης. Ο τέταρτος τοίχος του δωματίου είχε πόρτα και διπλό παράθυρο. Πιθανότατα ο χώρος αυτός ήταν χώρος ιερός, κάτι στο οποίο συνηγορεί και το γεγονός ότι έξω από αυτό, προς στην ανατολή, βρέθηκαν σκεύη ιεράς σημασίας.[19] Στο Δωμάτιο Δ2 βρέθηκε και εντυπωσιακά μεγάλος αριθμός από κεραμικά αγγεία, στις γωνίες και κατά μήκος των τοίχων του. Μπροστά από το νότιο τοίχο του δωματίου βρισκόταν ένα κρεβάτι (σήμερα υπάρχει το εκμαγείο του), κάτω από το οποίο είχαν επίσης τοποθετηθεί αγγεία, πιθανότατα για να προστατευθούν από τους σεισμούς.[20] Στο Δωμάτιο Δ15 βρέθηκε μυλόλιθος για την παραγωγή αλευριού, ενώ το δωμάτιο Δ16 ήταν πιθανότατα εργαστήριο κεραμικής καθώς βρέθηκε εκεί μεγάλος αριθμός κεραμικών. Στο Δωμάτιο Δ18Α βρέθηκαν σπαράγματα από δύο ή ίσως τρεις επιγραφές Γραμμικής γραφής Α, οι οποίες αναφέρουν ποσότητες υλικών που πιθανότατα σχετίζονταν με εμπορικές δραστηριότητες των κατοίκων του οικισμού, ενώ στο Δ18Β βρέθηκε μεγάλος αριθμός πήλινων σφραγισμάτων από πηλό πιθανότατα κρητικής προέλευσης. Σημαντικό είναι το γεγονός ότι η οργανική ύλη των επίπλων αποσυντέθηκε με τους αιώνες, αλλά η ηφαιστειακή σκόνη που τα είχε καλύψει διατήρησε το σχήμα τους (αρνητικό). Οι αρχαιολόγοι γέμισαν τις τρύπες που βρέθηκαν με γύψο και έτσι έγινε δυνατόν να ανασυσταθούν τα αντικείμενα αυτά. Τέτοια παραδείγματα είναι το εκμαγείο κρεβατιού που αποκαλύφθηκε στο δωμάτιο Δ2 και το εκμαγείο τραπεζιού (μορφολογίας "Λουί κατόρζ") που αποκαλύφθηκε στο δωμάτιο Δ18. Η τοιχογραφία της Άνοιξης (Απόσπασμα). Σπαράγματα πινακίδων Γραμμικής Α από το Δωμάτιο Δ18Α. Εκμαγείο ξύλινου τραπεζιού (μορφολογίας "Λουί κατόρζ"). [Επεξεργασία] Άνδηρο Κλινών Αμέσως βορειοδυτικά της Πλατείας Μύλωνος βρίσκεται ένας ακάλυπτος χώρος στον οποίο ανακαλύφθηκαν τα αρνητικά τριών μικρών κρεβατιών, από τα οποία προέκυψαν τα γύψινα εκμαγεία τους. [Επεξεργασία] Οικία της Άγκυρας Στα βόρεια της πλατείας Μυλώνος και δυτικά του Συγκροτήματος Δ βρίσκεται η Οικία της Άγκυρας. Ονομάστηκε έτσι επειδή σε αυτή βρέθηκε μελανός λίθος με οπή, βάρους 65 κιλών, που προφανώς είχε την χρήση άγκυρας. Ο τύπος αυτός θεωρείται ότι είναι της Μεσοκυκλαδικής περιόδου. Η οικία της Άγκυρας. [Επεξεργασία] Πλατεία Τριγώνου Στην δυτική πλευρά του Συγκροτήματος Δ βρίσκεται η Πλατεία Τριγώνου ή Τριγωνική Πλατεία, η οποία ονομάζεται έτσι λόγω του σχήματος της [Επεξεργασία] Δυτική Οικία Στα δυτικά της Πλατείας Τριγώνου βρίσκεται η Δυτική Οικία. Η οικία αυτή είχε περίμετρο 49,65 μέτρα. Η ταράτσα της ήταν καλυμμένη με πατημένο χώμα το οποίο εξασφάλιζε την μόνωση της. Στο ισόγειο της υπήρχαν αποθήκες τροφίμων, εργαστήρια, μαγειρείο και χώρος μυλωνά. Στον πρώτο όροφο υπήρχε δωμάτιο με αργαλειούς (Δωμάτιο 3), μια αποθήκη σκευών και τροφίμων, ένα αποχωρητήριο (Δωμάτιο 4Α) και δύο δωμάτια με τοιχογραφίες (Δωμάτια 4 και 5). Στο ένα από τα δύο, στο Δωμάτιο 5, βρέθηκαν οι δύο τοιχογραφίες των Ψαράδων (Δυτικός τοίχος), η τοιχογραφία της νεαρής ιέρειας (Νοτιοανατολική είσοδος / ανατολικός παραστάτης), η τοιχογραφία με το ποτάμι (Ανατολικός τοίχος) και η διάσημη μικρογραφική ζωφόρος του στόλου που βρίσκεται και στους τέσσερις τοίχους του. Η ζωφόρος του Στόλου (Απόσπασμα: Δεύτερη και Τρίτη πόλη). Η ζωφόρος του Στόλου (Απόσπασμα: Δεύτερη και Τρίτη πόλη). Το Δωμάτιο 4 ήταν διακοσμημένο με τα Ικρία, την τοιχογραφία των οκτώ θαλαμίσκων πλοίων. Το δωμάτιο αυτό με λεπτή μεσοτοιχία χωριζόταν σε δύο δωμάτια. Το δωμάτιο 4Α ήταν λουτρό. Στην Δυτική Οικία θεωρείται ότι έμενε ο αρχηγός του στόλου. Μια μικρή σκάλα που βρέθηκε στο κτίριο υποδηλώνει ότι μπορεί να υπήρχε και τρίτος όροφος ή κάποια σοφίτα. Η είσοδος της Δυτικής Οικίας. Ο ψαράς. Απόσπασμα από την ζωφόρο του Στόλου. Απόσπασμα από την ζωφόρο του Στόλου: Η ναυμαχία. [Επεξεργασία] Πλατεία Κενοταφίου Βόρεια του Συγκροτήματος Δ υπάρχει περιοχή, η οποία είχε διαμορφωθεί σε πλατεία κατά την Υστεροκυκλαδική Ι περίοδο και σήμερα έχει ονομαστεί Πλατεία Κενοταφίου. Εκεί έχει βρεθεί τραπεζοειδής κατασκευή που θυμίζει πρωτοκυκλαδικό τάφο και στο εσωτερικό της οποίας βρέθηκαν 17 μαρμάρινα πρωτοκυκλαδικά εδώλια και θραύσματα μαρμάρινων κρατηρίσκων χωρίς, όμως, παρουσία ανθρώπινων οστών. Το εύρημα αυτό ερμηνεύεται ως ανακομιδή ενός κυκλαδικού νεκροταφείου, για τις ανάγκες επέκτασης της πόλης.[21] Στην ίδια περιοχή, στα βορειοανατολικά του Συγκροτήματος Δ, σε μεγαλύτερο βάθος από αυτό που βρισκόταν το κενοτάφιο, βρέθηκε τμήμα δαπέδου με έντονα ίχνη φωτιάς και μεγάλο αριθμό από οστά ζώων και κεράτων και κεραμικής, το οποίο έχει ονομαστεί Πυρά της Θυσίας και ανάγεται στην Πρωτοκυκλαδική ΙΙΙ περίοδο.[22] [Επεξεργασία] Οικία των Γυναικών Στα βορειοδυτικά του χώρου των ανασκαφών βρίσκεται η Οικία των Γυναικών, η οποία οφείλει το όνομα της στις τοιχογραφίες των γυναικών που κάλυπταν τον νότιο τοίχο (γυναικεία φιγούρα) και τον βόρειο τοίχο (γυμνόστηθη γυναικεία φιγούρα) του ανατολικού τμήματος του Δωματίου 1. Στο δυτικό τμήμα του ίδιου δωματίου βρέθηκε η τοιχογραφία με τους Πάπυρους. Η οικία αποτελεί μεγάλο διώροφο οικοδόμημα στο κέντρο του οποίου υπήρχε φωταγωγός. Γυναικεία φιγούρα (Νότιος τοίχος). Γυμνόστηθη γυναικεία φιγούρα (Βόρειος τοίχος). Πάπυροι (Μουσείο Προϊστορικής Θήρας). [Επεξεργασία] Κτίριο Α και Κτίριο "Μύλων" Εσωτερικά του Νέου Στεγάστρου (Κτίριο «Μύλων» / Τομέας Α) Στο βόρειο άκρο των ανασκαφών βρίσκεται το Κτίριο Α ή Αποθήκη των Πίθων, στα τρία μεγαλύτερα δωμάτια του οποίου βρέθηκε μεγάλος αριθμός από πίθους, τα οποία περιείχαν όσπρια, αλεύρι και κριθάρι. Στον Μύλωνα, που βρίσκεται δίπλα, βρέθηκε καλάθι, το οποίο περιείχε ψάρια και αχινούς. Στην ευρύτερη περιοχή του Τομέα Α βρέθηκαν τα σπαράγματα του Αφρικανού, των Σεβιζόντων πιθήκων και του Γαλάζιου πτηνού, τα οποία ο Σπύρος Μαρινάτος είχε θεωρήσει αρχικά ότι ανήκουν στην ίδια τοιχογραφία.[23] Ο Αφρικανός. Σεβίζοντες Πίθηκοι. Γαλάζιο πτηνό. [Επεξεργασία] Ξεστή 2 Στα ανατολικά του Συγκροτήματος Δ βρίσκεται η Ξεστή 2. Η βόρεια πρόσοψη του κτιρίου διατηρείται μέχρι και το τρίτο πάτωμα. [Επεξεργασία] Ξεστή 4 Το μεγαλύτερο από τα κτίρια που έχουν ανακαλυφθεί μέχρι σήμερα είναι η Ξεστή 4, το οποίο είναι ένα τριώροφο οικοδόμημα του οποίου όλες οι όψεις είναι επενδυμένες με λαξευτούς ορθογώνιους όγκους τόφου (στρώμα πορώδους πετρώματος, κυρίως ηφαιστειογενούς). Βρίσκεται στα νοτιοανατολικά του χώρου των ανασκαφών και αποτελούσε πιθανότατα δημόσιο κτίριο. Δεν έχει ανασκαφεί πλήρως. Στο Δωμάτιο 2 της Ξεστής 4, κατά την διάρκεια ανασκαφών για την υποστύλωση του νέου στεγάστρου, βρέθηκε θραύσμα ζωφόρου με παράσταση οδοντόφρακτων κρανών (μυκηναϊκός τύπος) σε φυσικό μέγεθος με λεπτομερή απεικόνιση και με όλα τα στοιχεία ενός κράνους: λοφίο, παραγναθίδα, προμετωπίδα και επάορτο. Η ανακάλυψη αυτή δημιουργεί προβληματισμούς για την σχέση του προϊστορικού οικισμού με την ηπειρωτική Ελλάδα[24] Στο κλιμακοστάσιο του κτιρίου έχει βρεθεί η τοιχογραφία των Δωροφόρων, η οποία είναι η μεγαλύτερη τοιχογραφική σύνθεση που έχει ανακαλυφθεί μέχρι τώρα στον οικισμό. Ξεστή 4. [Επεξεργασία] Οδός Κουρητών Ο δρόμος που περνάει στα νότια της Ξεστής 4 έχει ονομαστεί Οδός Κουρητών (Κρητών). [Επεξεργασία] Κτίριο ΙΑ Στα βόρεια της Ξεστής 4 βρίσκεται το Κτίριο ΙΑ ή Κτίριο των Καλών Αγγείων. [Επεξεργασία] Συμπεράσματα από τα Ευρήματα Από τα ευρήματα των ανασκαφών προκύπτει ότι η κοινωνία του Ακρωτηρίου δεν διοικούνταν από έναν μονάρχη, αλλά από μια ελίτ, η οποία δραστηριοποιούνταν σε δύο κυρίως τομείς, το θαλάσσιο εμπόριο και τη βιοτεχνία, αφού η γεωργία στη γύρω περιοχή δεν μπορούσε να καλύψει τις ανάγκες του μεγάλου πληθυσμού.[25] Τα πλοία των τοιχογραφιών φαίνονται ικανά για μακρινά ταξίδια, ενώ το λιμάνι του Ακρωτηρίου πρέπει να ήταν ένα από τα σημαντικότερα της εποχής του. Απόδειξη των εμπορικών σχέσεων των κατοίκων με άλλες περιοχές της Μεσογείου είναι η εύρεση στο Δωμάτιο Δ9 δοχείου από την Χαναάν[26] και των σφραγισμάτων από την Κρήτη στο δωμάτιο Δ18Β. Στα σπαράγματα των επιγραφών που βρέθηκαν στο Δωμάτιο Δ18Α αναφέρονται εξαιρετικά μεγάλες ποσότητες υφασμάτων, πράγμα που οδηγεί στο συμπέρασμα ότι ο οικισμός ήταν τόπος συγκέντρωσης και επεξεργασίας του μαλλιού που παραγόταν από τα γειτονικά νησιά, πιθανώς από την Ίο, την Σίκινο, την Φολέγανδρο και την Ανάφη.[5] Παράλληλα στο Ακρωτήρι έχουν βρεθεί ίνες μαλλιού, οι οποίες, με βάση εργαστηριακές αναλύσεις, είναι οι παλαιότερες διατηρημένες αποδείξεις χρήσης του μαλλιού στη Μεσόγειο, εκτός από την περίπτωση της Αιγύπτου, όπου επίσης υπάρχουν αντίστοιχα ευρήματα.[27] Οι κάτοικοι του οικισμού είχαν αναπτύξει σε υψηλό βαθμό την πυροτεχνολογία και η χρήση της φωτιάς εντοπίζεται τόσο στα σπίτια, όσο και στις οικονομικές και θρησκευτικές δραστηριότητες. Από τις αρχές της τρίτης χιλιετίας π.χ. εντοπίζεται η ύπαρξη μόνιμων και φορητών εστιών μαγειρικής και κινητών ή σταθερών φούρνων, λείψανα στάχτης και κάρβουνα, ταψιά, ψηστιέρες και υποδοχές για σουβλάκια, μαγκάλια και επίπεδες πλάκες για το ψήσιμο πίτας.[28] Ευρήματα αποδεικνύουν, επίσης, την καλλιέργεια σταφυλιών και την παραγωγή κρασιού.[29] [Επεξεργασία] Το Στέγαστρο του Προϊστορικού Οικισμού Η στέγαση της προϊστορικής πόλης του Ακρωτηρίου κρίθηκε αναγκαία από τον Σπύρο Μαρινάτο προκειμένου να προστατευθούν τα οικοδομήματα που έρχονταν στο φως. Η ιδέα αυτή είχε έντονα επικριθεί την εποχή εκείνη. Η προσωπικότητα όμως του Μαρινάτου επέβαλε τη λύση αυτή,[30] η οποία τέθηκε σε εφαρμογή από το 1968. Παραπλεύρως του στεγάστρου οικοδομήθηκαν εργαστήρια, αποθήκες και ξενώνες, απαραίτητα για την εξέλιξη της ανασκαφικής δραστηριότητας. Το παλαιό στέγαστρο, στα τριάντα περίπου χρόνια ζωής του, αποδείχθηκε σωτήριο για τα μνημεία, όμως ο μεταλλικός σκελετός του από DEXION, λόγω της γειτνίασης με τη θάλασσα και λόγω των οξειδίων που περιέχονται στα ηφαιστειακά υλικά των επιχώσεων, είχε έντονα διαβρωθεί, ενώ η επικάλυψη του στεγάστρου με φύλλα αμιαντοτσιμέντου ELLENIT ήταν ανθυγιεινή και αντίθετη με την υφιστάμενη κοινοτική νομοθεσία. Κρίθηκε, λοιπόν, απαραίτητη η αντικατάσταση του παλαιού στεγάστρου με νέο, το οποίο να καλύπτει τόσο τις λειτουργικές ανάγκες της ανασκαφής και ταυτόχρονα να μπορεί να αναδείξει τον αρχαιολογικό χώρο.[31] Η Κεντρική (Νότια) είσοδος του νέου στεγάστρου στην φάση της κατασκευής. Η προκαταρκτική μελέτη του έργου έγινε σε συνέχεια του βραβευθέντος Ερευνητικού Προγράμματος «ASPIRE» (Archaeological Sites Protection Implementing Renewable Energies) και εγκρίθηκε τον Φεβρουάριο του 1996. Ζεύγος πήλινων κρατευτών. Η κατασκευή του στεγάστρου ανατέθηκε έπειτα από διαγωνισμό, σε κοινοπραξία κατασκευαστικών εταιρειών και οι εργασίες ξεκίνησαν τον Νοέμβριο του 1999. Το βιοκλιματικό στέγαστρο κατασκευάζεται από χάλυβα, ώστε να είναι μεγάλης αντοχής, ενώ προβλέπεται η επικάλυψή του με θηραϊκή γη, ώστε να ενταχθεί πλήρως στο θηραϊκό τοπίο. Παράλληλα, με ήπια μέσα εξασφαλίζει τη δημιουργία κατάλληλων συνθηκών στο εσωτερικό για αερισμό και για επαρκή φυσικό φωτισμό. Η κατασκευή του στεγάστρου οδήγησε σε ανασκαφές σε μεγάλο βάθος προκειμένου να στηριχθούν τα υποστυλώματα του και έβγαλε στην επιφάνεια σημαντικές αρχαιολογικές ανακαλύψεις για την ιστορία της πόλης, καθώς και πολλά ευρήματα, όπως το χρυσό εδώλιο αιγάγρου και το ζεύγος πήλινων κρατευτών. Δυστυχώς, στις 23 Σεπτεμβρίου του 2005, έγινε κατάρρευση ενός μικρού τμήματος του υπό ανέγερση στεγάστρου με αποτέλεσμα τον θάνατο ενός Ουαλλού τουρίστα, του Richard-George Bennion, ενώ ο χώρος έκλεισε έπ' αόριστον για το κοινό. Σημαντικό μέρος του στεγάστρου που κατέρρευσε έπεσε σε ακάλυπτο χώρο, στην πλατεία του Τριγώνου. Το υποκείμενο Συγκρότημα Δ που κυρίως επλήγη άντεξε εξαιτίας του γεγονότος ότι δεν είχε ανασκαφεί πλήρως και άρα το εσωτερικό του είναι παραγεμισμένο με το υλικό της έκρηξης που συγκράτησε τους τοίχους στη θέση τους, επειδή ένα τμήμα του είχε αναστηλωθεί με τσιμέντο από την εποχή της αποκάλυψής του από τον Σπυρίδωνα Μαρινάτο[32] και επειδή κράτησε αντίσταση το παλαιό στέγαστρο, το μεγαλύτερο μέρος του οποίου δεν είχε ακόμα απομακρυνθεί από την συγκεκριμένη θέση. [Επεξεργασία] Το Ακρωτήρι κατά τους Μεσαιωνικούς χρόνους Το καστέλι και στην κορυφή ο Γουλάς του Ακρωτηρίου. Κατά τους μεσαιωνικούς χρόνους, το Ακρωτήρι αποτελούσε ένα από τα καστέλια του νησιού και ονομαζόταν La Ponta. Στο κέντρο του οικισμού υπήρχε ο Γουλάς του Ακρωτηρίου (Τουρκικά kule: "πύργος"), ο οποίος υπέστη μεγάλες ζημιές κατά την σεισμική δραστηριότητα του 1956, αν και ήταν σε πολύ καλή κατάσταση μέχρι τότε. Το 1336 το Ακρωτήρι παραχωρήθηκε από τον Δούκα της Νάξου Νικόλαο Σανούδο στην οικογένεια Gozzadini, η οποία είχε καταγωγή από την Μπολόνια. Το γεγονός ότι η καταγωγή τους ήταν από αυτήν την ιταλική πόλη και όχι από την Βενετία, η οποία ήταν σε πόλεμο με την Οθωμανική αυτοκρατορία, σε συνδυασμό με την ισχύ της άμυνας του κάστρου επέτρεψε στην οικογένεια των Γοζαδίνων να διατηρήσει στην κατοχή της το καστέλι για μεγάλη περίοδο ακόμα και κατά την διάρκεια της κατοχής της υπόλοιπης Σαντορίνης από τους Τούρκους.[33] Τελικά πέρασε στα χέρια των Τούρκων μόλις το 1617. Το καστέλι και ο Γουλάς του Ακρωτηρίου. Η είσοδος στο καστέλι του Ακρωτηρίου. [Επεξεργασία] Ο Μπάλος Ο Άγιος Νικόλαος (του Μπάλου). Ο Μπάλος ή Πάλος είναι μικρός όρμος στα βορειοδυτικά του χωριού του Ακρωτηρίου. Σύμφωνα με την παράδοση ονομάζεται έτσι γιατί "εκεί οι κοπέλες χόρευαν Μπάλο".[34] Τον 19ο αιώνα ήταν το λιμάνι με το οποίο διακινούνταν τα προϊόντα της περιοχής. Σήμερα η πρόσβαση γίνεται με δυσκολία, κυρίως λόγω της μη συντήρησης του διαμορφωμένου μονοπατιού. Πάνω στα βράχια, δίπλα στο λιμανάκι, υπάρχει το παρεκκλήσι του Αγίου Νικολάου. Στην θέση αυτή υπάρχουν παλιά κτίρια που πιθανότατα χρησιμοποιούνταν ως αποθήκες και υπόσκαφες κατοικίες. Κατά την περίοδο της κατασκευής της Διώρυγας του Σουέζ, από τα γύρω τοιχώματα της καλντέρας γινόταν εξόρυξη θηραϊκής γης. Εκεί είχε διενεργήσει ανασκαφές η Γαλλική Αρχαιολογική Σχολή και είχε καταγραφεί η εύρεση προϊστορικού κτιρίου, η θέση του οποίου σήμερα δεν είναι γνωστή. Στην περιοχή του Μπάλου υπάρχει και το υπόσκαφο παρεκκλήσι των Εισοδίων της Θεοτόκου, που είναι γνωστό και ως Παναγιά του (Μ)Πάλου, στο οποίο η πρόσβαση είναι πολύ πιο εύκολη μέσω ενός τσιμεντοστρωμένου μονοπατιού. Παλιότερα, σε σπηλιές στα τοιχώματα της καλντέρας ζούσαν μοναχές. Κεντρική σπηλιά αποτελούσε υπόσκαφο ναό της Αγίας Τριάδας. Η παράδοση της περιοχής θέλει να υπάρχουν στις σπηλιές αυτές σήραγγες διαφυγής, οι οποίες προστάτευαν τις μοναχές σε περίπτωση επιδρομής. Η πρόσβαση στον Όρμο του Μπάλου. Η Παναγιά του Πάλου (Εισόδια της Θεοτόκου). [Επεξεργασία] Ο Φάρος του Ακρωτηρίου Ο Φάρος του Ακρωτηρίου. Το Ακρωτήρι βρίσκεται ακριβώς στον άξονα του θαλάσσιου δρόμου Πειραιά - Αλεξάνδρεια και φέρει φάρο, ο οποίος είναι ένας από τους καλύτερους του ελληνικού δικτύου. Βρίσκεται σε απόσταση 18 χιλιομέτρων από τα Φηρά σε υψόμετρο 58 μέτρων και οι γεωγραφικές του συντεταγμένες είναι: Γεωγραφικό Πλάτος 36ο 21' 05" Βόρειο και Γεωγραφικό Μήκος 25ο 21' 05" Ανατολικό. Ο φάρος κτίσθηκε το 1892, κατά την διάρκεια της Οθωμανικής κυριαρχίας στο νησί, από την Γαλλική εταιρεία La Société Collas et Michel. Αρχικά δούλευε με πετρέλαιο και η ακτινοβολία του ήταν 23 ναυτικά μίλια. Aνακαινίστηκε το 1925. Η λειτουργία του διακόπηκε κατά την διάρκεια του Δευτέρου Παγκοσμίου Πολέμου και άρχισε να ξαναλειτουργεί το 1945. Κατά την περίοδο αυτή εξέπεμπε μία λάμψη κάθε 30 δευτερόλεπτα, η οποία ήταν ορατή μέχρι και 25 ναυτικά μίλια, ενώ λειτουργούσε με προσωπικό τεσσάρων αντρών. To 1983 ηλεκτροδοτήθηκε, ενώ το 1988 έγινε πλήρως αυτοματοποιημένος και έκτοτε συνεχίζει να αποδίδει τη μέγιστη φωτοβολία. Σήμερα έχει ακτινοβολία 24 ναυτικά μίλια. Στα νότια του φάρου βρίσκεται καλό αγκυροβόλιο καταφυγής για βόρειους ανέμους, το οποίο οι ντόπιοι το αποκαλούν Λιμνιονάρι (Ακρωτηριανοί) ή Λιμανάρι (Εμπορειανοί). Ο φάρος της Σαντορίνης, Ακρωτήρι. Το Λιμανάρι. Το Λιμανάρι. [Επεξεργασία] Παραλίες Η πιο γνωστή παραλία στο Ακρωτήρι είναι η Κόκκινη παραλία. Βρίσκεται σχετικά κοντά στον χώρο των ανασκαφών, αλλά η πρόσβαση γίνεται μόνο με τα πόδια. Χρωστάει το όνομα της στα κόκκινα βράχια τα οποία χαρακτηρίζουν το τοπίο. Τα Μέσα Πηγάδια είναι παραθαλάσσια τοποθεσία στην νότια πλευρά του Ακρωτηρίου. Η πρόσβαση γίνεται μέσω χωματόδρομου, η αρχή του οποίου βρίσκεται σε απόσταση 16 χιλιομέτρων από τα Φηρά. Η Άσπρη Παραλία, πρόσβαση στην οποία υπάρχει μόνο από την θάλασσα. Η Κόκκινη Παραλία. Η Κόκκινη Παραλία. Η παραλία στα Μέσα Πηγάδια. Η Άσπρη Παραλία. [Επεξεργασία] Τοπικές Εορτές Στις 12 Μαΐου γίνεται πανηγύρι στην εκκλησία του Αγίου Επιφανίου. Στις 29 Μαΐου εορτάζει η Αγία Θεοδοσία, που είναι η ενορία του Ακρωτηρίου. Εκκλησία της Αγίας Θεοδοσίας υπήρχε σε όλα τα καστέλια της Σαντορίνης και ήταν χτισμένη πάντα σε κοντινή απόσταση έξω από την είσοδο τους. Στις 6 Αυγούστου γίνεται πανηγύρι στην εκκλησία της Μεταμόρφωσης του Σωτήρα. Στις 15 Αυγούστου γίνεται πανηγύρι στην εκκλησία της Κοιμήσεως της Παναγίας. Στις 20 Σεπτεμβρίου γίνεται πανηγύρι στο παρεκκλήσι του Αγίου Ευσταθίου. [Επεξεργασία] Πρόσθετες Πληροφορίες Κατά την διάρκεια της Κρητικής Επανάστασης, από το Ακρωτήρι μεταφέρονταν όπλα προς τους επαναστατημένους Κρητικούς.[2] Κατά την απογραφή του 1928 είχε 305 κατοίκους. [Επεξεργασία] Εικόνες από το Ακρωτήρι Ο βράχος "Ινδιάνος", Ακρωτήρι. Ο Άγιος Επιφάνιος. Η Κοίμησις της Θεοτόκου. Ο Άη Γιάννης. Ο Άγιος Ανδρέας. Ο Άγιος Νικόλαος (κρίνος). Βρίσκεται στην θέση "Καπαριές". [Επεξεργασία] Αναφορές ↑ Ντούμας Χρ.: "Θήρα/Σαντορίνη", Santorini Guidebook 2007 ↑ 2,0 2,1 Τζαχίλη Ίρις: "Οι Αρχές της Αιγαιακής Προϊστορίας: Οι Ανασκαφές στην Θήρα και την Θηρασιά τον 19ο Αιώνα", Η Καθημερινή, Αθήνα 2006 ↑ Μαρινάτος Σπ.: "The Volcanic Destruction on Minoan Crete", Antiquity 425:39, 1939 ↑ Κατημερτζή Π.: "Άντεξε ηφαίστεια και σεισμούς, αλλά τη βροχή;", Εφημερίδα ΚΑΘΗΜΕΡΙΝΗ, 03.10.2005 ↑ 5,0 5,1 5,2 Ντούμας Χρ.: "Πρόσφατα ευρήματα από το Ακρωτήρι της Θήρας", Σύλλογος για την Μελέτη και Διάδοση της Ελληνικής Ιστορίας, 2000 ↑ Συκκά Γ.: "Σαντορίνη, η μεγάλη έκρηξη", Εφημερίδα Η ΚΑΘΗΜΕΡΙΝΗ, 28.02.2007 ↑ Ντούμας Χρ.: "Ξεθάβοντας μια νεκρή πολιτεία στο Ακρωτήρι της Θήρας", Περιοδικό ΑΛΣ, Εταιρεία Στήριξης Σπουδών Προϊστορικής Θήρας, Τεύχος 1, Αθήνα 2003 ↑ Friedrich W., Kromer B., Friedrich M., Heinemeier J., Pfeiffer T., Talamo S.: "Santorini Eruption Dated to 1627-1600 B.C." ↑ Δρ. Βουγιουκλάκης Γ.: "Η έκρηξη του 17ου π.Χ. αιώνα.", Santorini Guidebook 2007 ↑ Κοντράρου-Ρασσιά Ν.: "Άνοιξη... ηφαιστειακή" Εφημερίδα ΕΛΕΥΘΕΡΟΤΥΠΙΑ 28.02.2007 ↑ Συκκά Γ.: "Η συμβολή της Θήρας στον πολιτισμό", Εφημερίδα Η ΚΑΘΗΜΕΡΙΝΗ, 14.01.2007 ↑ Ντούμας Χρ.: "Σαντορίνη, Οδηγός του Νησιού και των Αρχαιολογικών του Θησαυρών", Εκδοτική Αθηνών, Αθήνα 2005 ↑ Κρίκκης Σ.: "Πορφύρα 35 αιώνων", Εφημερίδα ΤΑ ΝΕΑ, 14.05.2003. ↑ Ιστοσελίδα Ιδρύματος Μείζονος Ελληνισμού: "Τοιχογραφίες", 07.08.2007. ↑ Υπουργείο Πολιτισμού: "Ακρωτήρι Θήρας" ↑ Μπουλώτης Χρ.: "Πτυχές θρησκευτικής έκφρασης στο Ακρωτήρι". Περιοδικό ΑΛΣ, Εταιρεία Στήριξης Σπουδών Προϊστορικής Θήρας, Τεύχος 3, Αθήνα 2005 ↑ Papaodysseus C., Panagopoulos Th., Exarhos M., Fragoulis D., Roussopoulos G., Rousopoulos P., Galanopoulos G., Triantafillou C., Vlachopoulos A., Doumas C.: "Distinct, Late Bronxe Age (c. 1650 bc) Wall-paintings from Akrotiri, Thera, Comprising Advanced Geometrical Patterns", Archaeometry 48 (1), 97–114 (2006). ↑ In.gr: "Οι μαθηματικοί της Θήρας «αιώνες μπροστά από την εποχή τους»", 01.03.2006 ↑ Μαρινάτος Σπ.: "Θησαυροί της Θήρας", Έκδοση Εμπορικής Τράπεζας της Ελλάδος, Αθήνα 1972 ↑ Γέροντας Α.: "Αποκάλυψη και Συντήρηση Κρεβατιών στο Ακρωτήρι Θήρας", Περιοδικό ΑΛΣ, Εταιρεία Στήριξης Σπουδών Προϊστορικής Θήρας, Τεύχος 2, Αθήνα 2004 ↑ Κατημερτζή Π.: "Αρχαία «δημοκρατία» εφοπλιστών", Εφημερίδα ΤΑ ΝΕΑ, 15.01.2003. ↑ Μπουλώτης Χρ.: "Πτυχές θρησκευτικής έκφρασης στο Ακρωτήρι", Περιοδικό ΑΛΣ, Εταιρεία Στήριξης Σπουδών Προϊστορικής Θήρας, Τεύχος 3, Αθήνα 2005 ↑ Μαρινάτος Σπ.: "Ανασκαφαί Θήρας VI (1972)", Αθήνα 1974 ↑ Ακριβάκη Ν.: "Τοιχογραφία με παράσταση οδοντόφρακτου κράνους", Περιοδικό ΑΛΣ, Εταιρεία Στήριξης Σπουδών Προϊστορικής Θήρας, Τεύχος 1, Αθήνα 2003 ↑ In.gr: "Νέα σημαντικά στοιχεία έρχονται στο φως για το Ακρωτήρι της Θήρας", 15.01.2003 ↑ Buchholz H.-G.: "Thera and the Aegean World II", Papers and Proceedings of the Second International Scientific Congress, Santorini, Greece, August 1978. ↑ Τσώλη Θ.: "Μάλλινα ΣΑΝΤΟΡΙΝΗΣ ηλικίας 4.000 ετών...", Εφημερίδα ΤΟ ΒΗΜΑ, 22.10.2006 ↑ In.gr: "Εντυπωσιακά τα νέα ευρήματα στο Ακρωτήρι της Θήρας", 06.02.2004 ↑ "Discovery of a press confirms wine-making on site", Εφημερίδα Καθημερινή, Αγγλική Έκδοση, 13.05.2006 ↑ Βαρβιτσιώτης Ι.: "Το στέγαστρο της προϊστορικής Θήρας", Εφημερίδα ΤΟ ΒΗΜΑ, 21.08.2005. ↑ "Αντικατάσταση Στεγάστρου - Συντήρηση, Διαμόρφωση και Ανάδειξη του Αρχαιολογικού Χώρου του Ακρωτηρίου Θήρας" ↑ Κατημερτζή Π.: "Tο στέγαστρο έπεσε... «στον ακάλυπτο»", Εφημερίδα ΤΑ ΝΕΑ, 15.10.2005. ↑ Δώρα Μονιούδη - Γαβαλά: "Σαντορίνη. Κοινωνία και Χώρος. 15ος - 20ος Αιώνας", Έκδοση του Ιδρύματος Λουκά και Ευάγγελου Μπελλώνια, 1997 ↑ Μηνδρινός Μ.: "Το τοπωνύμιο Μπάλ(λ)ος". Εφημερίδα ΠΑΝΘΗΡΑΪΚΗ ΚΡΑΥΓΗ, Φύλλο 44, σελ. 2. Θήρα, 1973. [Επεξεργασία] Εξωτερικές Συνδέσεις Commons logo Τα Κοινά έχουν πολυμέσα σχετικά με το θέμα Ακρωτήρι Θήρας Πανεπιστήμιο της Οκλαχόμα, Φωτογραφίες και Αναπαραστάσεις από τον προϊστορικό οικισμό του Ακρωτηρίου: 12 3 4 5 6 7 8 9 10 The Thera Foundation Jacques Thobie.: "L'administration generale des phares de l'Empire ottoman et la societe Collas et Michel, 1860-1960", 2004. Αυτό είναι ένα αξιόλογο άρθρο. Περισσότερες πληροφορίες είναι διαθέσιμες ακολουθώντας αυτό τον σύνδεσμο. Ανακτήθηκε από "http://el.wikipedia.org/wiki/%CE%91%CE%BA%CF%81%CF%89%CF%84%CE%AE%CF%81%CE%B9_%CE%98%CE%AE%CF%81%CE%B1%CF%82" Κατηγορίες: Χωριά της Σαντορίνης | Ελληνική προϊστορία | Αρχαιολογικοί τόποι στην Ελλάδα Κρυμμένη κατηγορία: Αξιόλογα άρθρα Προσωπικά εργαλεία Νέα χαρακτηριστικά Δημιουργία Λογαριασμού/Είσοδος Περιοχές ονομάτων Άρθρο Συζήτηση Παραλλαγές Εμφανίσεις Ανάγνωση Επεξεργασία Προβολή ιστορικού Ενέργειες Αναζήτηση Πλοήγηση Κύρια πύλη Κατάλογος άρθρων Αξιόλογα άρθρα Τρέχοντα γεγονότα Τυχαία σελίδα Συμμετοχή Πύλη Κοινότητας Αγορά Πρόσφατες αλλαγές Βοήθεια Επικοινωνία Δωρεές Εκτύπωση/εξαγωγή Δημιουργία βιβλίου Κατέβασμα ως PDF Έκδοση εκτύπωσης Εργαλειοθήκη Συνδέσεις προς εδώ Σχετικές αλλαγές Ειδικές σελίδες Σταθερός σύνδεσμος Παραπομπή Άλλες γλώσσες Català Deutsch English Español Français Italiano Nederlands Polski Русский Τελευταία τροποποίηση 17:31, 8 Μαΐου 2010. Όλα τα κείμενα είναι διαθέσιμα υπό την Creative Commons Attribution/Share-Alike License· μπορεί να ισχύουν και πρόσθετοι όροι. Δείτε τους Όρους Χρήσης για λεπτομέρειες. Πολιτική προσωπικών δεδομένων Για τη Βικιπαίδεια Αποποίηση ευθυνών Powered by MediaWiki Wikimedia Foundation Lingua-Identify-0.56/t/files/en000644 000765 000024 00000000214 10730327426 016414 0ustar00ambsstaff000000 000000 this is an example of an English text; hopefully, it won't be mistaken for a Gaelic text, this time! That is not the purpose for this line. Lingua-Identify-0.56/t/files/eo000644 000765 000024 00000002664 11373603322 016424 0ustar00ambsstaff000000 000000 Dum la lastaj jaroj ILEI ricevis kaj plu ricevas informojn kaj proponojn pri novaj kaj novtipaj ekzamenoj, kaj petojn aŭspicii ilin. Tio estas rezulto de evoluanta praktiko de E-instruado inkluzive de novaj instrumetodoj kaj lernteknologioj. Aliflanke la tradicia sistemo de Internaciaj Ekzamenoj de ILEI/UEA ĉiam pli montras la simptomojn de maljuniĝo kaj ekstempiĝo. Por diri la veron, ĝi eĉ neniam ekvivis plenspire en ĉiuj siaj segmentoj. El la tri niveloj nur la elementa kaj meza estas aplikataj, la supera apenaŭ okazis iam antaŭ longa tempo. La nove aprobita modelo de ekzameno de instrukapablo de E-o1 rapide montriĝis mortnaskito. Neniu ĝis nun sukcesis vivigi ĝin en la proponita formo. Ekster Esperantujo dume la teorio kaj praktiko de lingvoinstruado disponigas novajn konceptojn kaj atingojn. Estas tasko de ILEI subteni ilian akceptadon, adaptadon kaj aplikadon konforme al bezonoj de la E-komunumo. La defioj de multlingveco en la disvastiĝanta EU kaj atingoj de lingvistiko, instrumetodoj kaj instruteknologioj skuis la E-didaktikon: kiel progresigi la instruadon kaj lernadon de E-o, kaj kion instrui al esperantistoj kaj per E-o pri informado kaj utiligado, pri kulturo kaj valoroj de E-o, pri interkultureco kaj rolo de E-o en la mondo de kultura kaj lingva diverseco? Tio vekas krean energion kaj organizpolitikajn ŝanceliĝojn en Esperantujo pri tio, kiel plej bone adaptiĝi al la ŝanĝiĝantaj cirkonstancoj? Lingua-Identify-0.56/t/files/es000644 000765 000024 00000002145 11373034666 016433 0ustar00ambsstaff000000 000000 En medio de estos crecientes signos de negligencia, British Petroleum (BP) ha informado hoy de que a finales de esta semana realizará un nuevo intento de sellar la fuga de crudo que ha generado la mayor marea negra en la historia de EE UU con una nueva campana metálica. El director de operaciones de la petrolera británica, Doug Suttles, ha confirmado en rueda de prensa que el jueves o el viernes se tratará de llevar a cabo la operación. La caja contenedora se encuentra ya en el fondo marino a la espera de ser colocada sobre la principal fuga de petróleo. Se trata del segundo intento de tapar la fuga colocando sobre ella una estructura metálica campaniforme. La semana pasada BP intentó realizar esta operación, pero el artefacto no pudo cumplir su propósito debido a la presencia de gas cristalizado, que taponó el conducto por el que tenía que ser transportado el petróleo hacia un barco en la superficie. En esta ocasión la campana es de menor tamaño, con lo que los técnicos de BP confían en que se ajuste mejor y así se dificulte la entrada de agua gélida que causa la cristalización del gas.Lingua-Identify-0.56/t/files/fi000644 000765 000024 00000003337 11373266625 016430 0ustar00ambsstaff000000 000000 Leijonien haasteena illan MM-jääkiekkopelissä on Valko-Venäjän keskialueen puolustussumpun murtaminen. On odotettavissa, että vastustaja pyrkii miehittämään keskialueen tiiviisti ja iskemään maalinsa vastahyökkäyksistä. - Siellä on keskialueella yleensä viisi äijää odottamassa. Oma pelaaminen pitää olla nopeaa. Pitäisi pystyä kääntämään rivakasti peliä hyökkäyssuuntaan kiekon saatuamme, etteivät he ehdi muodostaa puolustusryhmitystään, sanoi Jarkko Immonen. Myös päätykiekot Valko-Venäjän puolustajien taakse ovat lääkkeenä sumppupuolustuksen ohittamiseksi. - Välillä joudumme väkisinkin kohtaamaan keskialueen lihamuurin. Silloin pitää tulla kovalla vauhdilla ja heittää myös kiekkoa päätyyn, totesi päävalmentaja Jukka Jalonen. Pelkkään puolustamiseen valkovenäläiset eivät kuitenkaan tyydy. Joukkueessa on KHL-pelaajia, jotka osaavat varmasti pistää kiekon Suomen pömpeliin. - Taitavia jätkiä siellä on. He osaavat haastaa yksi vastaan yksi -tilanteissa, Immonen tuumi. Osalalta odotetaan häiriötä ja hämminkiä Laitayökkääjä Oskar Osala pääse illan pelissä debytoimaan MM-tasolla. Isokoinen Osala ottaa paikan Suomen neljännessä hyökkäysketjussa keskushyökkääjä Riku Hahlin ja Leo Komarovin rinnalta. - Toivon Osalan pelaavan yritteliäästi, suoraviivaisesti ja fyysisesti. Odotan hänen myös ampuvan paljon ja ajavan maalille aiheuttamaan häiriötä ja hämminkiä, Jalonen kertoi. Keskushyökkääjä Juha-Pekka Hytönen siirtyy kolmosketjuun. Tommi Santala on puolestaan Suomen ns. 13. hyökkääjä. Jori Lehterä seuraa Valko-Venäjä -ottelun katsomosta. - Yksikään pelaaja ei ole pettänyt, mutta haluamme nyt kokeilla "Oskua", Jalonen totesi. Lingua-Identify-0.56/t/files/fr000644 000765 000024 00000002511 11372773436 016434 0ustar00ambsstaff000000 000000 Les autorités, qui intensifient la pression, ont également prévu d'envoyer des blindés dans Bangkok autour du campement des manifestants antigouvernementaux. «J'ai annulé la date des élections. C'est ma décision car les manifestants refusent de se disperser». La crise en Thaïlande a de nouveau sombré dans l'impasse jeudi, avec la décision du premier ministre d'annuler les élections anticipées et d'envoyer des blindés pour isoler le quartier de Bangkok où les manifestants antigouvernementaux sont retranchés. Plus de dix jours après sa publication, le plan de sortie de crise d'Abhisit Vejjajiva semble plus fragile que jamais, après avoir suscité de vifs espoirs et recueilli l'assentiment apparent de la majorité des acteurs politiques du royaume. Mais les «chemises rouges» ont bloqué le processus en début de semaine en exigeant l'inculpation du numéro deux du gouvernement, Suthep Thaugsuban, qu'ils jugent responsable des violences du 10 avril qui ont fait 25 morts et plus de 800 blessés. «Les chemises rouges n'ont accepté que verbalement de se joindre à la feuille de route vers la réconciliation», a expliqué de son côté Korbsak Sabhavasu, son secrétaire général. «Mais ils n'ont pas décidé de mettre fin aux manifestations et il est donc impossible d'organiser des élections comme prévues».Lingua-Identify-0.56/t/files/fy000644 000765 000024 00000003503 11373612602 016431 0ustar00ambsstaff000000 000000 De taaldatabank Nijfrysk De taaldatabank Nijfrysk befettet in trochsneed fan it Frysk fan de 20e ieu. Ûnderskate tekstsjenres binne deryn opnommen, lykas krantenartikelen, romans, poëzy, fakliteratuer, en sa mear. Alles meiïnoar is it likernôch 25 miljoen wurden. De databank is as tekst troch te sykjen. It leit yn de bedoeling om op termyn de taaldatabank fierder út te bouwen troch de ûnderskate staveringen en wurdfoarmen oan lemma's (opsykfoarmen yn wurdboeken) te keppeljen, sa't dat foar it Frysk fan 1550-1800 dien is. De taaldatabank Njoggentjinde-ieusk Frysk Likernôch in miljoen wurden oan njoggentjinde-ieusk Frysk is ynscand en korrisjearre. Fierders binne der ek njoggentjinde-ieuwske hânskriften hânmjittich yntikt. Op dit stuit is de taaldatabank Njoggentjinde-ieusk Frysk as beta-ferzje beskikber. It leit yn de bedoeling om op termyn de taaldatabank fierder út te bouwen troch de ûnderskate staveringen en wurdfoarmen oan lemma's (opsykfoarmen) te keppeljen, sa't dat foar it Frysk fan 1550-1800 dien is. De taaldatabank Midfrysk / Ier-Nijfrysk Yn de taaldatabank Midfrysk binne alle oant no ta bekende Midfryske teksten (Frysk út it tiidrek fan likernôch 1550 oant 1800) opnommen. Dy teksten binne fierhinne folslein lemmatisearre. Dat wol sizze dat alle wurden yn al har staveringsfarianten oan in lemmafoarm (de opsykfoarm yn in wurdboek) keppele binne en dat útmakke is hokker foarm it is; bygelyks hokker persoan en tiid fan in tiidwurd, of inkeltal of meartal fan in haadwurd, ensfh. De opmaak fan de side en de sykmooglikheden sille meiertiid noch ferbettere en útwreide wurde. De taaldatabank Aldfrysk Op dit stuit wurde Aldfryske teksten sammele as tarieding op in taaldatabank Aldfrysk, dêr't finansjele middels foar frijmakke binne. Der fine op dit stuit eksperiminten plak mei ûnderskate foarferzjes. Lingua-Identify-0.56/t/files/ga000644 000765 000024 00000001624 11373057375 016416 0ustar00ambsstaff000000 000000 Tá taifead ag Roinn Bhéaloideas Éireann i gColáiste na hOllscoile Bhaile Átha Cliath de chainteoir dúchais Gaeilge as Cill Chainnigh chomh maith agus táthar ag súil go mbeidh sé seo ar fáil le héisteacht ag an bpobal amach anseo. Táthar ann a cheapann gur le cósta iarthair na tíre amháin a bhaineann an Ghaeilge ach léiríonn an méid atá bailithe le chéile anseo gur cuid d’oidhreacht na hÉireann uilig í. Measann muid go gcabhróidh an leathanach seo le daoine nasc a dhéanamh in athuair le hoidhreacht Ghaeilge a gceantar féin. Má tá aon tuairimí nó moltaí agat faoin leathanach seo nó má tá eolas agat faoi oidhreacht Ghaeilge do cheantair, is féidir teagmháil a dhéanamh linn ach rphost a sheoladh chuig colm@nuacht.com Tá cóipcheart iomlán ag Acadamh Ríoga na hÉireann ar thaifeadtaí Doegen agus ní féidir iad a úsáid ar bhealach ar bith gan chead ón Acadamh.Lingua-Identify-0.56/t/files/hi000644 000765 000024 00000045471 12174037041 016423 0ustar00ambsstaff000000 000000 ऊपर इन निर्माताओं ने बेचा है. श्री रामचरित मानस-किष्किन्धा काण्ड विश्व का सबसे लंबा यह साहित्यिक ग्रंथ और महाकाव्य हिन्दू धर्म के मुख्यतम ग्रंथों में से एक है। विवशता और बीमारी की हालत में इसे टाला जा सकता है और बाद में समय मिलने पर छूटी हूई नमाज़ें पढ़ी जा सकती हैं। यह सच है कि महादेवी का काव्य संसार छायावाद की परिधि में आता है पर उनके काव्य को उनके युग से एकदम असम्पृक्त करके देखना उनके साथ अन्याय करना होगा। तत्कालीन मुगल राज्य केवल काबुल से दिल्ली तक ही फैला हुआ था। हिन्दीराइटर का आईएमई (विकास बन्द) नेपाल पर्यटन वार्णावत (वर्तमान बरनावा) स्थित लाक्षागृह के सुरक्षित अवशेष इसका गठन लावा निर्मित सात छोटे-छोटे द्वीपों द्वारा हुआ है एवं यह पुल द्वारा प्रमुख भू-खंड के साथ जुड़ा हुआ है। वह भारतीय सेनाओं का मुख्य सेनापति भी है। हिन्दी काव्य के उत्कृष्ट रचनाओं का उत्तम संग्रह़ उनके पिता श्री विश्वनाथ दत्त पाश्चात्य सभ्यता में विश्वास रखते थे। देर रात तक कार्तिक की अँधेरी रात पूर्णिमा से भी से भी अधिक प्रकाशयुक्त दिखाई पड़ती है। विवाद का एक और मुद्दा है की उभरती हुई अर्थव्यवस्थाओं जैसे भारत और चीन से कैसी उम्मीद की जानी चाहिए की वेह अपने उत्सर्जन को कितना कम करें .हाल की रिपोर्ट के अनुसार चीन के सकल राष्ट्रीय CO 2 / उप उत्सर्जन अमरीका से जिअदा हो सकते हैं पर चीन ने कहा है की प्रति व्यक्ति उत्सर्जन अमरीका से पाँच गुना कम है इसलिए उस पर यह बंदिश नही होनी चाहिए भारत ने भी इसी बात को दोहराया है जिसे क्‍योटो प्रतिबंधों से छूट प्राप्त है और जो औद्योगिक उत्सर्जन का सबसे बड़ा स्रोत है. मधुकलश (1937) एनोटेशन भरत अपने स्नेही जनों के साथ राम की पादुका को साथ लेकर वापस अयोध्या आ गये। परन्तु यदि आपकी वेबसाइट कोई हिन्दी टूल प्रदान करती है तो उसका लिंक लिखा जा सकता है। कल्पना चावला फाउंडेशन परिवार वैज्ञानिक मानते हैं कि इस नदी के जल में बैक्टीरियोफेज नामक विषाणु होते हैं जो जीवाणुओं व अन्य हानिकारक सूक्ष्मजीवों को जीवित नहीं रहने देते हैं। ये मीडिया को डाउनलोड कर उसे चला सकते हैं या आईपॉड जैसे किसी बाहरी मीडिया प्लेयर के साथ सिक्रोनाईज़ भी कर सकते हैं। एक कलियुग ४३२००० वर्ष का द्वापर ८६४००० वर्ष का त्रेता युग १२९६००० वर्ष का तथा सतयुग १७२८००० वर्ष का होता है। इसके अलावा राज्यपाल एक संवैधानिक प्रमुख है जो अपने कर्तव्य मंत्रिपरिषद की सलाह सहायता से करता है परंतु उसकी संवैधानिक स्थिति उसकी मंत्रिपरिषद की तुलना मे बहुत सुरक्षित है वह राष्ट्रपति के समान असहाय नहीं है राष्ट्रपति के पास मात्र विवेकाधीन शक्ति ही है जिसके अलावा वह सदैव प्रभाव का ही प्रयोग करता है किंतु संविधान राजयपाल को प्रभाव तथा शक्ति दोनों देता है उसका पद उतना ही शोभात्मक है उतना ही कार्यातमक भी है डोमेन नाम समस्या 4 भारतीय संविधान मे आपातकाल लागू करने के उपबन्ध है [352 अनुच्छेद] इसके लागू होने पर राज्य-केन्द्र शक्ति पृथक्करण समाप्त हो जायेगा तथा वह एकात्मक संविधान बन जायेगा। यह प्रक्रिया इंटरनेट सर्विस प्रोवाडर की देख-रेख मे चलती है। इस पोजीशन मे महिला पुरुष का चेहरे से बेहतर संपर्क रहता है. इस पोजीशन में दोनों के बीच काफी निकटता रहती है और यह पोजीशन उन जोड़ो के लिये बेहतर है जो रतिक्रीड़ा के दौरान एक दूसरे को चुंबन करने में ज्यादा रुचि रखते हैं. इस पोजीशन के लिये पुरुष किसी पलंग या उस जैसे किसी अन्य जगह पर पांव नीचे करके बैठ जाता है. फिर महिला उसके चेहरे की ओर अपना चेहरा करते हूए उसके लिंग के उपर या सामने अपने योनि को ले जाते हुए अपनी टांगे सामने फैला देती है. साथ ही महिला के हाथ पुरुष के शरीर से सहारा लेने के काम आते है. इस पोजीशन में रतिक्रीड़ा के दौरान पुरुष चाहे तो अपने हाथ पीछे कर सहारे के रुप में प्रयुक्त कर सकता है वहीं दूसरी ओर हाथों को महिला के कूल्हों या कमर के पास से सहारा देकर धक्कों में मदद के साथ गति भी बढ़ा सकता है. बदकिस्मती से चित्र में दिखाई गई पोजीशन धक्कों के हिसाब से उतनी बेहतर नहीं कही जा सकती (जितनी रेटिंग में दिखाई गई है) . पोजीशन के संपूर्ण आनंद के लिये स्टूल या चेयर का प्रयोग करें इसमेंमहिला को पांव के सहारे के ज्यादा सही अवसर होते हैं. इसके अलावा भी इसमें अपने हिसाब से बैठने की व्यवस्था बनाकर पोजीशन को ज्यादा आनंददायी बनाया जा सकता है. भारत विश्व की दसवीं सबसे बड़ी अर्थव्यवस्था है किन्तु हाल में भारत ने बहुत प्रगति की है और ताज़ा स्थिति में भारत विश्व में तीसरे चौथे स्थान पर होने का दावा करता है। श्रेणी:व्यक्तिगत जीवन जो धम्म यह बताए कि करुणा से भी अधिक मैत्री की आवश्यकता है. एक विश्वसनीय और दैनिक कि एक आतंकवाद अनुभाग शामिल हैं खोलें सूत्रों का कहना है केन्द्र नवीनीकृत. ISRIA द्वारा. विभिन्न शब्दकोशों हेतु ब्राउजर सर्च इंजन निर्वाचन आयोग की कार्यप्रणाली/कार्य वर्जिनिया कवितावली प्रकाशक एवं मुद्रक: गीताप्रेस गोरखपुर ‎आदेशों के लिहाज़ से क़ुरआन में विचार करने वाले को पीछे की ओर यात्रा ‎करनी होगी। बाग के खाके एवं उसके वास्तु लक्षण् जैसे कि फव्वारे ईंटें संगमर्मर के पैदल पथ एवं ज्यामितीय ईंट-जडि़त क्यारियाँ जो काश्मीर के शालीमार बाग से एकरूप हैं जताते हैं कि इन दोनों का ही वास्तुकार एक ही हो सकता है अली मर्दान। सेना तथा सुरक्षा अंग अयोध्याकाण्ड शाहजहाँ की कब्र पर खुदा है; आचार्य हजारीप्रसाद द्विवेदी ने सूर की कवित्व-शक्ति के बारे में लिखा है- सूरदास जब अपने प्रिय विषय का वर्णन शुरू करते हैं तो मानो अलंकार-शास्त्र हाथ जोड़कर उनके पीछे-पीछे दौड़ा करता है। बांग्ला साहित्य के माध्यम से भारतीय सांस्कृतिक चेतना में नयी जान फूँकने वाले युगद्रष्टा थे। जया बच्चन ने समाजवादी पार्टी (Samajwadi Party) ज्वाइन कर ली और राज्यसभा सभा (Rajya Sabha) की सदस्या बन गई। इसी प्रकार जैसलमेर राज्य के अधिकांश भाग वल्लदेश में सम्मिलित थे तो जोधपुर मरुदेश के नाम से जाना जाता था। क्यों कि हर प्रभावी व्यक्ति अपनी शक्ति के आगे सब को नतमस्तक ‎देखना चाहता था। इन्हें पन्थ कहते हैं। श्रेणी:भक्तिकाल के कवि यहां से हिमालय का खूबसूरत नजारा देखते ही बनता है। १९७७ में मृणाल सेन ने प्रेमचंद की कहानी कफ़न पर आधारित ओका ऊरी कथा नाम से एक तेलुगू फ़िल्म बनाई जिसको सर्वश्रेष्ठ तेलुगू फ़िल्म का राष्ट्रीय पुरस्कार भी मिला। इस समय विश्व में भ्रूण शास्त्र के ‎सर्वोच्च ज्ञाता माने जाते हैं। प्रेमचंद हिन्दी साहित्य के युग प्रवर्तक हैं। सन् 543 ईसापूर्व में पाकिस्तान का अधिकांश इलाका ईरान (फारस) के हख़ामनी साम्राज्य के अधीन आ गया । रोज़ा रखने के कई उद्देश्य हैं जिन में से दो प्रमुख उद्देश्य यह हैं कि दुनिया की बाकी आकर्षणों से ध्यान हटा कर ईश्वर से निकटता अनुभव की जाए और दूसरा यह कि निर्धनों भिखारियों और भूखों की समस्याओं और परेशानियों का ज्ञान हो। तमिळ भाषा अपनी भाव पराये (1970) श्रेणी:हिन्दी नाटककार मानव जन्म में अपनी आज़ादी से किये गये कर्मों के मुताबिक आत्मा अगला शरीर धारण करती है। कंगूरे बुर्जी एवं कलश आदि बनाने वाले दूसरा जो केवल संगमर्मर पर पुष्प तराश्ता था इत्यादि सत्ताईस कारीगरों में से कुछ थे जिन्होंने सृजन इकाई गठित की थी। इस नाम से उन्हें सर्वप्रथम बंगाल के विख्यात उपन्यासकार शरतचंद्र चट्टोपाध्याय ने संबोधित किया था। श्रेणी:देवी-देवता कुरु के वंश में शान्तनु हुए। बाद में शाहजहां द्वारा इस किले का पुनरोद्धार लाल बलुआ पत्थर से करवाया गया व इसे किले से प्रासाद में बदला गया। इसलिए सुभाषबाबू ने यह शर्त स्वीकार नहीं की। इस पुत्र का नाम हुमायुं ने एक बार स्वप्न में सुनाई दिये के अनुसार जलालुद्दीन मोहम्मद रखा। दक्षिण भारतीय खान - ठेला गाड़ीः यह पोजीशन काफी मजबूत व दमदार लोगों के लिये है. इसमें पेट के बल लेटी महिला के पीछे खड़े होकर पुरुष महिला के पांव पकड़ कर उठा लेता है फिर महिला अपने हाथों के सहारे अपने शरीर को उठा लेती है. इसके बाद पुरुष अपने हाथों से पैरों को फैला कर महिला की जांघों के बीच घुस जाता है और धीरे-धीरे हाथों के जांघों के निकट लाकर जांघों के पास हाथ से महिला के शरीर को सहारा देते हुए प्रवेश क्रिया शुरू करता है. इडली दोसा सांबर रसम उत्तपम अवियल वत्ताकुरम संस्कृत साहित्य मैथुन ओक्लाहोमा यातायात जो इस स्रोत को आकर्षक बनाती हैं। ब्राह्मण भाग- जिसमें विधि (आज्ञाबोधक शब्द) कथा आख्यायिका एवं स्तुति द्वारा यज्ञ कराने की प्रवृत्ति उत्पन्न कराना यज्ञानुष्ठान करने की पद्धति बताना उसकी उपपत्ति और विवेचन के साथ उसके रहस्य का निरुपण करना है। महादेवी वर्मा (२६ मार्च १९०७ — ११ सितंबर १९८७) हिन्दी की सर्वाधिक प्रतिभावान कवयित्रियों में से हैं। मुंबई शहर में विद्युत आपूर्ति बेस्ट रिलायंस एनर्जी टाटा पावर और महावितरण (महाराष्ट्र राज्य विद्युत वितरण कंपनी लि.) करते हैं। Lingua-Identify-0.56/t/files/hr000644 000765 000024 00000004054 11373255123 016427 0ustar00ambsstaff000000 000000 TorabiZaštitari Mekkija Torrabija, samozvanog marokanskog iscjelitelja, fizički su nasrnuli na Nacionalovog fotografa Krasnodara Peršuna. Nemili događaj odvijao se u poslijepodnevnim satima ispred jednog od paviljona na zagrebačkom Velesajmu, gdje je Torabi održavao svoje iscjeliteljske seanse. Torabijeve gorile okružile su Peršuna dok je sjedio u svom automobilu. "Premda nisam bio na području gdje se ovaj događaj odvijao, već sam fotografirao sa javne površine, što mi nitko ne može zabraniti, najprije mi je prišao pomoćniK Mekija Torabija. Iza njega su došla i četiri zaštitara, koja su okružila automobil", priča Peršun. Zaštitari su, nastavlja, postali agresivni nakon što im je, na njihov upit, odbio pokazati osobne isprave. "Obzirom da nemaju nikakvo pravo tražiti me bilo kakve dokumente, oglušio sam se na njihov zahtjev. Nakon toga su mi zaprijetili da moram izbrisati sve slike iz fotoaparata, ističući da će zaplijeniti fotoaparat", kaže Nacionalov fotograf, na kojeg su Mekijevi zaštitari, zajedno s pomoćnikom, koji, uzgred rečeno, uopće nema hrvatsko državljanstvo, u tom trenutku i fizički nasrnuli. "Jedan mi je strgnuo naočale sa glave, dok je drugi krenuo ulaziti u automobil", prepričava šokirani fotograf. U tom trenutku on je počeo vikati i prijetiti da će zvati policiju, nakon čega su se Mekkijevi pobočnici povukli na područje Velesajma. Peršun je i dalje na mjestu incidenta. Čeka policijske djelatnike, kojima će prijaviti Mekijeve zaštitare, zajedno s njegovim pomoćnikom, i to za prijetnje i fizički napad. Premda nisu u potpunosti poznati razlozi koji su ponukali Mekkijeve ljude na ovakvu divljačku reakciju prema jednom predstavniku medija, motivi za to mogu se pronaći u pisanju tjednika Nacional, koji se recentno u nekoliko navrata kritički osvrnuo na djelovanje Mekkija Torabija. Među ostalim, tamo su prenesene izjave pojedinih stručnjaka koji su ga nazvali običnim šarlatanom. Zbog toga je Torabi preko svojih hrvatskih odvjetnika zatražio ni manje ni više nego milijun kuna 'odštete'.Lingua-Identify-0.56/t/files/hu000644 000765 000024 00000003026 11373554021 016427 0ustar00ambsstaff000000 000000 A kormányfő - a cseh jogrend megfogalmazását idézve - közvetetten ismét utalt arra, hogy azok a szlovák állampolgárok, akik felvennék a magyart, elveszíthetik szlovák állampolgárságukat. Beismerte, hogy ehhez a szlovák alkotmány módosítására is szükség lenne. "Végső soron az alkotmány is módosítható" - jegyezte meg. Arra a konkrét kérdésre azonban, hogy ő személyesen támogatná-e, hogy a szlovákiai magyarok elveszítsék szlovák állampolgárságukat, Fico kitérően felelt. Megjegyezte egyebek között, hogy "mi nem harcolunk a szlovákiai magyarok ellen". Szerinte, ha a szlovákiai magyarok elégedetlenek lennének helyzetükkel, akkor azt ki is mutatnák. Gyakorlatilag az egész szlovákiai politikai elit - kivéve a Magyar Koalíció Pártját és a Híd polgári pártot - elutasította csütörtökön a tervezett magyarországi törvénymódosítást. Miroslav Lajcák külügyminiszter szerint az állampolgársági törvény módosítása a gyakorlatban semmit sem kínál a szlovákiai magyaroknak, de jogi bizonytalanságba elijesztheti őket. "Ez a törvény semmit sem kínál állampolgárainknak, nem ad nekik semmiféle jogot, sem előnyt, amivel ma, mint szlovák állampolgárok ne rendelkeznének. Amit kínál nekik, az a jogbizonytalanság, amely a két országhoz fűződő elméleti államjogi kapcsolatból ered" - jelentette ki a kormány rendkívüli ülése után. Lajcák szerint a magyar etnikai kötődés erősítése a szlovák állampolgári kötődés rovására történne.Lingua-Identify-0.56/t/files/id000644 000765 000024 00000003257 11373065163 016421 0ustar00ambsstaff000000 000000 MEDAN | DNA - Sekretaris DPW BMPAN Sumut Isa Anshari SF meminta Komisi Pemilihan Umum Pusat memecat semua anggota KPU Medan, karena dianggap benyak melakukan kesalahan sehingga banyak pihak menyangsikan bakal diterima semua kalangan. Hal itu dikatakannya, Kamis (13/5) di Medan. Disebutkannya, berdasarkan pernyataan Ketua Desk Pilkada Nasional I Gusti Putu Artha melihat sikap ngotot Komisi Pemilihan Umum Medan tetap menggelar pemungutan suara meski sejumlah masalah belum dituntaskan, sepertinya bakal banyak pihak agar pilkada tetap diulang. Apalagi KPU melihat banyaknya masalah yang dilakukan KPU Medan itu pastilah memberi peluang untuk pilkada harus diulang. Jika dilakukan pilkada ulang, dengan demikian KPU memastikan semua anggota Komisi Pemilihan Umum Medan dipecat. Menurutnya, Isa Anshari yakin, KPU tak main-main dengan ancaman pemecatan ini. Sebab pihaknya memahami KPU sekarang ini menegakkan wibawa lembaga. “Terbukti seperti beberap waktu lalu 19 KPU di Papua dipecat anggotanya dan telah melantik yang baru,” katanya. Disebukannya, sudah diprediksi banyak pihak yang menyangsikan hasil pilkada bakal diterima semua pihak. Sebab sejumlah masalah yang mengiringi pemungutan suara Pilkada Medan diantaranya persoalan pencoretan Rudolf Pardede sebagai calon wali kota yang kemudian berujung di pengadilan. Disebutkannya, sejauh ini, Rudolf telah memenangi gugatan terhadap KPU Medan di tingkat Pengadilan Tinggi Tata Usaha Negara Medan. Meski KPU Medan memastikan bakal mengajukan kasasi ke Mahkamah Agung, tak pelak jika putusan hukum tetap (inkracht) pada kasus ini tetap berpihak kepada Rudolf . “Dari persoalan ini kita menilai Pilkada Medan dianggap cacat hukum,” ungkapnya.Lingua-Identify-0.56/t/files/is000644 000765 000024 00000003672 11373554665 016453 0ustar00ambsstaff000000 000000 Fasteignamat er lægst á Patreksfirði af þeim stöðum sem Fasteignskrá Íslands hefur gert samanburð á. Að beiðni Byggðastofnunar reiknaði Fasteignskrá út fasteignamat og fasteignagjöld á sams konar fasteign vítt um landið. Viðmiðunareignin er einbýlishús sem er 161,1 m2 að grunnfleti og 351m3. Stærð lóðar er 808m2. Gjöldin eru reiknuð út samkvæmt núgildandi fasteignamati sem gildir frá 31. desember 2009. Kom fram í útreikningnum að fasteignamat er mjög mismunandi eftir því hvar á landinu er. Fasteignamat húss og lóðar á höfuðborgarsvæðinu, miðað við meðaltal, er 34,2 milljónir. Af þeim þéttbýlisstöðum sem skoðaðir voru utan höfuðborgarsvæðisins er matið hæst í Keflavík, 25,8 milljónir, og á Akureyri 25,6 milljónir. Lægst er matið á Patreksfirði, eins og fyrr segir, en þar er það 7,9 milljónir en næst lægst er það í Bolungarvík eða 8,8 milljónir. Danski markmaðurinn Sören Rasmussen, sem er þessa dagana að leika með Aab í úrslitaeinvígi um danska titilinn í handbolta, gengur til liðs við Flensburg fyrir næstu leiktíð. Rasmussen leysir af hólmi Svíann Johan Sjöstrand sem mun leika með Barcelona á næsta tímabili. Flensburg heldur áfram að leita á norrænar slóðir í leikmannakaupum en alls eru þrettán af sextán leikmönnum liðsins frá Norðurlöndunum. Þar á meðal er Alexander Pettersson en hann er á förum og mun leika undir stjórn Dags Sigurðssonar hjá Füsche Berlin á næstu leiktíð. Sören Rasmussen er 33 ára og hefur leikið allan sinn feril í Danmörku. Hann hefur átt glimrandi gott tímabil og hefur átt sæti í danska landsliðinu. Samtals hefur hann leikið tólf leiki fyrir Dani. Næsti leikur Rasmussen er á laugardaginn eftir viku og er það mögulega síðasti leikur hans með liðinu. Aab hefur betur gegn Kolding 1-0, í einvíginu um titilinn og vinni liðið næsta leik er það danskur meistari.Lingua-Identify-0.56/t/files/it000644 000765 000024 00000002761 11373053220 016427 0ustar00ambsstaff000000 000000 Al grido di 'Viva O Papa' un mare di fedeli ha accolto Benedetto XVI, giunto sulla spianata del santuario di Fatima per celebrare la grande messa all'aperto nell'anniversario della prima apparizione della Madonna ai tre pastorelli, il 13 maggio 1917. Un anniversario che coincide con quelli dell'attentato di Ali Agca a Giovanni Paolo II (13 maggio 1981) e della beatificazione di due dei tre pastorelli ai quali nel 1917 apparì la Madonna (13 maggio 2000). Mezzo milione di pellegrini hanno preso parte alla funzione celebrata dal Pontefice. Per Fatima è un record: nel 2000 circa 400mila persone avevano assistito alle celebrazioni del 13 maggio presiedute da Giovanni Paolo II, durante l'ultima visita di papa Wojtyla, che aveva anche pronunciato la beatificazione dei piccoli veggenti Francisco e Giacinta. Benedetto XVI ha attraversato due volte con la 'papamobile' la spianata, la prima arrivando dalla Casa di Esercizi che lo ospita e la seconda dopo aver indossato i paramenti, per raggiungere l'altare. Ratzinger è apparso raggiante e più volte si è alzato dalla poltrona bianca per avvicinarsi ai finestrini per baciare e benedire bimbi che gli venivano avvicinati dai genitori. Lo stesso gesto che abitualmente compiva Giovanni Paolo II. Anche il 13 maggio del 1981, cioè esattamente 29 anni fa, il Papa polacco aveva appena baciato una bambina prima di essere colpito da Ali Agca con la pallottola che oggi è incastonata nella corona della piccola statua lignea della Vergine posta vicino all'altare. Lingua-Identify-0.56/t/files/la000644 000765 000024 00000003753 11373056200 016412 0ustar00ambsstaff000000 000000 Magnus es, domine, et laudabilis valde: magna virtus tua, et sapientiae tuae non est numerus. et laudare te vult homo, aliqua portio creaturae tuae, et homo circumferens mortalitem suam, circumferens testimonium peccati sui et testimonium, quia superbis resistis: et tamen laudare te vult homo, aliqua portio creaturae tuae.tu excitas, ut laudare te delectet, quia fecisti nos ad te et inquietum est cor nostrum, donec requiescat in te. da mihi, domine, scire et intellegere, utrum sit prius invocare te an laudare te, et scire te prius sit an invocare te. sed quis te invocat nesciens te? aliud enim pro alio potest invocare nesciens. an potius invocaris, ut sciaris? quomodo autem invocabunt, in quem non crediderunt? aut quomodo credent sine praedicante? et laudabunt dominum qui requirunt eum. quaerentes enim inveniunt eum et invenientes laudabunt eum. quaeram te, domine, invocans te, et invocem te credens in te: praedicatus enim es nobis. invocat te, domine, fides mea, quam dedisti mihi, quam inspirasti mihi per humanitatem filii tui, per ministerium praedicatoris tui. Et quomodo invocabo deum meum, deum et dominum meum, quoniam utique inme ipsum eum invocabo, cum invocabo eum? et quis locus est in me, quoveniat in me deus meus? quo deus veniat in me, deus, qui fecit caelum et terram? itane, domine deus meus, est quiquam in me, quod capiat te?an vero caelum et terra, quae fecisti et in quibus me fecisti, capiuntte? an quia sine te non esset quidquid est, fit, ut quidquid est capiat te? quoniam itaque et ego sum, quid peto, ut venias in me, quinon essem, nisi esses in me? non enim ego iam in inferis, et tamen etiam ibi es. nam etsi descendero in infernum, ades. non ergo essem, deus meus, non omnino essem, nisi esses in me. an potius non essem, nisi essem in te, ex quo omnia, per quem omnia, in quo omnia? etiam sic, domine, etiam sic. quo te invoco, cum in te sim? aut unde venias in me? quo enim recedam extra caelum et terram, ut inde in me veniat deus meus, qui dixit: caelum et terram ego impleo? Lingua-Identify-0.56/t/files/ms000644 000765 000024 00000002723 11373613115 016435 0ustar00ambsstaff000000 000000 Ahli Dewan Undangan Negeri (Adun) Bukit Selambau, S.Manikumar menafikan dakwaan sesetengah pihak yang mengatakan bahawa beliau akan meninggalkan Parti Keadilan Rakyat (PKR). Menurut beliau, berita yang digembar-gemburkan itu langsung tidak berasas dan hanya mahu menjatuhkan PKR Kedah. "Desas-desus yang didengari ini adalah satu tohmahan yang sudah lama diperkatakan.Namun begitu saya tidak akan sekali-kali meninggalkan PKR.Oleh kerana itu bagi mereka yang sentiasa cuba menimbulkan isu sensasi ini perlu sedar dan menerima hakikat bahawa Kedah adalah diperintah oleh PR."katanya. Beliau juga berkata, tiada sebab untuk beliau mengikut rentak Umno BN berhubung perkara tersebut apatah lagi sangat selesa dengan pentadbiran Pakatan Rakyat di negeri Kedah pada waktu ini. "Kenapa pula saya mahu terikut-ikut dengan rentak pembangkang seperti Umno dan Barisan Nasional (BN) yang ingin melihat Pakatan Rakyat goyang di negeri Kedah.Saya selesa dengan pentadbiran Kerajaan Negeri yang ada," katanya lagi. Manikumar yang juga Exco negeri menegaskan dia bukan seorang yang mudah melupakan jasa orang lain dan seorang yang tetap pendirian. Baginya, amanah yang diberi rakyat pada Pilihan Raya Umum lalu tidak akan sesekali dikhianatinya. "Sebenarnya saya adalah seorang yang mempunyai pendirian.Tidak mudah untuk saya melupakan jasa orang lain apa lagi melupakan budi rakyat yang telah memberi kepercayaan kepada saya dalam pilihan raya kecil yang lalu untuk membela nasib mereka," katanya.Lingua-Identify-0.56/t/files/nl000644 000765 000024 00000001221 11373265655 016433 0ustar00ambsstaff000000 000000 Amsterdam - Een 18-jarige overvaller is vrijdagochtend rond 09.00 uur in een supermarkt aan de Oostelijke Handelskade door personeel aangehouden. Dit heeft de politie vrijdag laten weten. De man bedreigde het personeel met een vuurwapen en maakte geld buit. Na een worsteling, waarbij een personeelslid lichtgewond raakte, werd de overvaller ontwapend en overmeesterd. De verdachte werd direct overgedragen aan gewaarschuwde politiemensen. De man is een bekende van de politie. Rechercheurs van het Bureau Districtsrecherche Noord hebben de zaak in behandeling. Tijdens het onderzoek zal worden nagegaan of de verdachte betrokken is bij andere overvallen.Lingua-Identify-0.56/t/files/no000644 000765 000024 00000002026 11373063036 016427 0ustar00ambsstaff000000 000000 Det er ikke mulig å dekke verdens økende energibehov alene med økt satsing på fornybar energi. Dette vil være tilfelle selv om det satses på energieffektivisering og energisparing. Selv de mest optimistiske anslag fra IEA viser at behovet for olje og gass vil være omtrent på dagens nivå også i 2050. Norge produserer olje og gass renest av alle. Vi har kompetanse, FOU miljøer og industri. Miljøhensyn er sentralt i norsk lovgivning og forskning. Det er fortsatt slik at 1 kwh produsert med kullkraft medfører dobbelt så store CO2-utslipp som 1kwh produsert på norsk sokkel. Derfor er det viktig at Norge fortsatt satser på, og eksporterer energi og kunnskap og at petroleumsindustrien videreutvikler sine miljøstandarder. Vi kan bli enda flinkere. Driften av norsk sokkel kan forbedres, og i tillegg har vi kompetanse som kan og bør eksporteres. For å oppnå dette er det behov for å satse på videre utdanning innenfor feltet, oppbygging av forskningsinfrastrukturen og forskning på både fornybar og petroleumsområdet.Lingua-Identify-0.56/t/files/pl000644 000765 000024 00000001512 11373062215 016423 0ustar00ambsstaff000000 000000 Witamy na oficjalnej stronie Dni Portugalii w Łodzi, zorganizowanych przez Centrum Języków Romańskich. Między 14 a 18 maja 2010 roku każdy będzie miał niepowtarzalną szansę poznać z nami Portugalię, jej piękny język i bogatą kulturę. Z roku na rok w Łodzi wzrasta zainteresowanie tym niezwykłym krajem na skraju Europy. Dlatego zdecydowaliśmy się na organizację szeregu spotkań, koncertów i konferencji mających przybliżyć Portugalię każdemu zainteresowanemu. Zapraszamy do zapoznania się na stronie z ofertą Dni Portugalii. Pokazy filmowe, koncerty, występ teatru portugalistycznego, wystawa fotografii, konferencje multimendialne, pokazowe lekcje języka portugalskiego a nawet warsztaty tańca! Każdy, zarówno miłośnicy Portugalii, jak i ci, którzy niewiele o niej wiedzą, znajdą tu coś dla siebie.Lingua-Identify-0.56/t/files/pt000644 000765 000024 00000001512 11373037504 016436 0ustar00ambsstaff000000 000000 Para a chefe do governo alemão, que falava na cerimónica da tribuição do Prémio Carlos Magno ao seu homólogo polaco Donald Tusk, a actual crise que a União Europeia está a viver “é a maior prova de fogo desde o desmoronamento do comunismo” no leste do continente. Merkel lembrou que os governos prometeram aos cidadãos que o euro seria uma moeda estável, “e têm de cumprir essa promessa”, sublinhou. Se a actual crise não for superada “as consequências para a Europa serão imprevisíveis”, mas se esta fase difcíl for ultrapassada “a Europa será mais forte do que nunca”, prognosticou. No discurso de agradecimento, Donald Tusk mostrou-se convicto de que a crise “não será o princípio do crepúsculo da Europa”, mas sim “paradoxalmente, uma oportunidade de a Europa se reforçar e desenvolver”. Lingua-Identify-0.56/t/files/pt_big000644 000765 000024 00000011022 11372616153 017256 0ustar00ambsstaff000000 000000 As armas e os barões assinalados Que, da Ocidental praia Lusitana, Por mares nunca de antes navegados Passaram ainda além da Taprobana E em perigos e guerras esforçados Mais do que prometia a força humana, E entre gente remota edificaram Novo Reino, que tanto sublimaram; E também as memórias gloriosas Daqueles Reis que foram dilatando A Fé, o Império, e as terras viciosas De África e de Ásia andaram devastando, E aqueles que por obras valerosas Se vão da lei da Morte libertando: Cantando espalharei por toda parte, Se a tanto me ajudar o engenho e arte. Cessem do sábio Grego e do Troiano As navegações grandes que fizeram; Cale-se de Alexandro e de Trajano A fama das vitórias que tiveram; Que eu canto o peito ilustre Lusitano, A quem Neptuno e Marte obedeceram. Cesse tudo o que a Musa antiga canta, Que outro valor mais alto se alevanta. E vós, Tágides minhas, pois criado Tendes em mi um novo engenho ardente Se sempre, em verso humilde, celebrado Foi de mi vosso rio alegremente, Dai-me agora um som alto e sublimado, Um estilo grandíloco e corrente, Por que de vossas águas Febo ordene Que não tenham enveja às de Hipocrene. Dai-me húa fúria grande e sonorosa, E não de agreste avena ou frauta ruda, Mas de tuba canora e belicosa, Que o peito acende e a cor ao gesto muda; Dai-me igual canto aos feitos da famosa Gente vossa, que a Marte tanto ajuda; Que se espalhe e se cante no Universo, Se tão sublime preço cabe em verso. E vós, ó bem nascida segurança Da Lusitana antiga liberdade, E não menos certíssima esperança De aumento da pequena Cristandade; Vós, ó novo temor da Maura lança, Maravilha fatal da nossa idade, Dada ao mundo por Deus (que todo o mande, Pera do mundo a Deus dar parte grande); Vós, tenro e novo ramo florecente, De húa árvore, de Cristo mais amada Que nenhúa nascida no Ocidente, Cesárea ou Cristianíssima chamada, (Vede-o no vosso escudo, que presente Vos amostra a vitória já passada, Na qual vos deu por armas e deixou As que Ele pera Si na Cruz tomou); Vós, poderoso Rei, cujo alto Império O Sol, logo em nascendo, vê primeiro; Vê-o também no meio do Hemisfério, E, quando dece, o deixa derradeiro; Vós, que esperamos jugo e vitupério Do torpe lsmaelita cavaleiro, Do Turco Oriental e do Gentio Que inda bebe o licor do santo Rio: Inclinai por um pouco a majestade, Que nesse tenro gesto vos contemplo, Que já se mostra qual na inteira idade, Quando subindo ireis ao eterno Templo; Os olhos da real benignidade Ponde no chão: vereis um novo exemplo De amor dos pátrios feitos valerosos, Em versos devulgado numerosos. Vereis amor da pátria, não movido De prêmio vil, mas alto e quase eterno; Que não é prêmio vil ser conhecido Por um pregão do ninho meu paterno. Ouvi: vereis o nome engrandecido Daqueles de quem sois senhor superno, E julgareis qual é mais excelente, Se ser do mundo Rei, se de tal gente. Ouvi: que não vereis com vãs façanhas, Fantásticas, fingidas, mentirosas, Louvar os vossos, como nas estranhas Musas, de engrandecer-se desejosas: As verdadeiras vossas são tamanhas, Que excedem as sonhadas, fabulosas, Que excedem Rodamonte e o vão Rugeiro, E Orlando, inda que fora verdadeiro. Por estes vos darei um Nuno fero, Que fez ao Rei e ao Reino tal serviço, Um Egas e um Dom Fuas, que de Homero A cítara para eles só cobiço; Pois polos Doze Pares dar-vos quero Os Doze de Inglaterra e o seu Magriço; Dou-vos também aquele ilustre Gama, Que para si de Eneias toma a fama. Pois, se a troco de Carlos, Rei de França, Ou de César, quereis igual memória, Vede o primeiro Afonso, cuja lança Escura faz qualquer estranha glória; E aquele que a seu Reino a segurança Deixou, co a grande e próspera vitória; Outro Joanne, invicto cavaleiro; O quarto e quinto Afonsos e o terceiro. Nem deixarão meus versos esquecidos Aqueles que, nos Reinos lá da Aurora, Se fizeram por armas tão subidos, Vossa bandeira sempre vencedora: Um Pacheeo fortíssimo e os temidos Almeidas, por quem sempre o Tejo chora, Albuquerque terribil, Castro forte, E outros em quem poder não teve a morte. E, enquanto eu estes canto, e a vós não posso, Sublime Rei, que não me atrevo a tanto, Tomai as rédeas vós do Reino vosso: Dareis matéria a nunca ouvido canto. Comecem a sentir o peso grosso (Que polo mundo todo faça espanto) De exércitos e feitos singulares De África as terras e do Oriente os mares. Lingua-Identify-0.56/t/files/pt_lt1000644 000765 000024 00000001421 11375535072 017222 0ustar00ambsstaff000000 000000 Para a chefe do governo alemo, que falava na cerimnica da tribuio do Prmio Carlos Magno ao seu homlogo polaco Donald Tusk, a actual crise que a Unio Europeia est a viver " a maior prova de fogo desde o desmoronamento do comunismo" no leste do continente. Merkel lembrou que os governos prometeram aos cidados que o euro seria uma moeda estvel, e tm de cumprir essa promessa, sublinhou. Se a actual crise no for superada as consequncias para a Europa sero imprevisveis, mas se esta fase difcl for ultrapassada a Europa ser mais forte do que nunca, prognosticou. No discurso de agradecimento, Donald Tusk mostrou-se convicto de que a crise no ser o princpio do crepsculo da Europa, mas sim paradoxalmente, uma oportunidade de a Europa se reforar e desenvolver. Lingua-Identify-0.56/t/files/ro000644 000765 000024 00000003511 11373065655 016444 0ustar00ambsstaff000000 000000 Producatorul de profile din PVC Teraplast Bistrita (TRP) spune ca exista premise ca piata de materiale de constructii sa creasca in urmatoarele luni de zile, cu conditia ca statul sa-si plateasca datoriile aferente proiectelor din domeniul public care au fost deja finalizate. Anul trecut, principalele piete pe care activeaza compania au inregistrat scaderi situate intre 20% si 35%, potrivit raportului anual al societatii. Cu toate acestea, investitiile majore derulate de Teraplast in ultimii doi ani, care s-au cifrat la 81,9 mil. lei (circa 21 mil. euro), au permis companiei sa inregistreze doar o scadere modica, de 3,4%, a afacerilor in 2009, pana la un nivel de 188,2 mil. lei (44,5 mil. euro). O parte din aceste sume, respectiv 13 mil. euro, au fost atrase de pe Bursa printr-o oferta publica initiala de listare in anul 2008, acesta fiind si ultimul IPO derulat de catre o companie la BVB. In urmatoarele luni ale anului ne asteptam la o perioada mai buna pentru principalele piete pe care activeaza compania, cu conditia ca statul sa isi plateasca datoriile catre companiile de constructii. In caz contrar, desi exista proiecte, firmele de constructii se vor confrunta cu riscul de blocaje din lipsa de lichiditati. Teraplast mizeaza in special pe sectorul de infrastructura, in timp ce pe segmentul proiectelor rezidentiale si nonrezidentiale se anticipeaza doar cresteri usoare fata de anul trecut", spune Florin Urite, directorul general al Teraplast. In primele trei luni din acest an, afacerile Teraplast si-au reluat cresterea, cu un avans de 14% fata de acelasi interval din 2009, si s-au cifrat la 34,3 mil. lei (8,35 mil. euro). In schimb, profitul companiei s-a diminuat cu 4%, pe fondul avansului cu 24% a cheltuielilor cu materiile prime, din cauza investitiilor puse in functiune si intensificarii concurentei pe fondul cererii scazute.Lingua-Identify-0.56/t/files/ru000644 000765 000024 00000003345 11746563257 016464 0ustar00ambsstaff000000 000000 Вчера президент Дмитрий Медведев встретился с активом "Единой России", чтобы перед лицом партии еще раз ответить на предложение ее председателя Владимира Путина занять его место. С этой инициативой избранный президент выступил во вторник, объяснив, что президент в нашей стране пока что традиционно является надпартийным. Но замену себе Путину долго искать не пришлось. Дмитрий Медведев и в президенты шел не без поддержки "Единой России", и на последних парламентских выборах возглавлял федеральный список партии. Ответ действующего главы государства фактически прозвучал еще в четверг в интервью пяти телеканалам, но вчера были соблюдены все положенные формальности. Президент сразу дал слово главе Высшего совета ЕР Борису Грызлову, который вкратце рассказал о встрече с Путиным. "Мы ждем вас в нашей партии. Мы ждем вашего официального заявления", - заявил Грызлов. Более того, на съезде партии в конце мая Медведева уже ждут в качестве члена "Единой России". Lingua-Identify-0.56/t/files/sl000644 000765 000024 00000004142 11373066472 016441 0ustar00ambsstaff000000 000000 Pa nam gre to, kar se mesta tiče, včasih kar težko iz rok. To še posebej velja za nekatere politike, ki so na žalost pripravljeni rušiti še tako pozitivno idejo, če ta prihaja iz druge politične opcije. Torej: politiki, zavedajte se, da je na prvem mestu interes Maribora in Mariborčanov in šele nato se lahko greste v vašem političnem peskovniku razne igrice. Vsi iščejo prostor pod soncem in tudi Maribor bo moral najti svoje mesto. Tradicija drugega industrijskega mesta v Jugoslaviji je nedvomno preteklost, pa čeprav je v poslovni coni Tezno danes zaposleno toliko ljudi, kot včasih v TAM-u. V Mariboru ni več proizvodnega podjetja, ki bi imelo 1000 zaposlenih, še tri desetletja nazaj je bilo takšnih vsaj 10, na čelu s skoraj 9000 delavci TAM-a. Danes imamo številna manjša podjetja, ki jih marsikdo izmed nas sploh ne pozna, poznajo pa jih največji svetovni gospodarski velikani, za katere delajo. Maribor je zagotovo šolski primer, kako težko se je soočiti s preteklostjo in predvsem sprejeti dejstvo, da se nekatere zadeve pač spreminjajo in da je treba poiskati nove poti. In v mestu jih v veliki meri uspešno iščemo. Turizem je le ena od panog, ki bi lahko nadomestila nekdanjo industrijsko usmerjenost. Eko turizem, športni turizem, kongresi, to je le del priložnosti, ki v Mariboru dobivajo svoj prostor in veljavo, možnosti pa so še velike. Treba se je sprehoditi samo po avstrijskih in italijanskih smučiščih in pogledati, kaj nudijo zimskim in tudi poletnim gostom. Na žalost pa je individualne iniciative še vedno zelo malo. Na Pohorju lahko na preste ene roke preštejemo turistične kmetije, ki bi ponudile recimo domače ekološke dobrote. A tudi to je danes vaba za turiste. Že v zgodovini je Maribor slovel kot mesto z izredno ugodno logistično lego. In tudi danes je tako. V dveh urah smo lahko na Dunaju ali pa na morju, do ‘hribov’ pa imamo le 10 minut. Le malo evropskih mest ima letališče tako blizu centra, a kaj, ko imamo že vrsto let letališče brez letal in nič ne kaže, da bi ga uspeli oživeti. Morda bo to uspelo Pošti, ki naj bi imela interes za nakup. Lingua-Identify-0.56/t/files/sq000644 000765 000024 00000003175 11373602157 016447 0ustar00ambsstaff000000 000000 Në një konferencë për median Rama u shpreh se "kompromisi është gati: Trasparencë dhe kushtetuetshmëri. Transparencë në funksion të garantimit të zgjedhjeve të ardhshme dhe në asnjë mënyrë për të marrë pushtetin në tavolinë por për të garantuar që pushteti të mos merret më nën tavolinë. Ne e kemi pranuar se jemi gati ta zgjidhim përmes vullnetit këtë krizë. Ndërkombëtarët kërkojnë zgjidhje dhe këtë e kanë deklaruar. Por i pari i qeverisë refuzon çdo zgjidhje sepse disponon një forcë kartonash të cilën e përdor abuzivisht. Kjo është tërësia e tablosë së sotme dhe unë doja ta nënvizoja që ne jemi gati për zgjidhje. Transparenca mund të nisë menjëherë duke hapur kutitë e dokumentacionit zgjedhor. Ndërsa për fletët e votimit në varësi të hetimit, mund të vendosë Komisioni i Venecias". "Kjo është një krizë e rëndë që kërkon kompromis në funksion të interesit të transparencës. Palët kanë një detyrim karshi shqiptarëve. Mjafton vullneti që kjo zgjidhje të vijë menjëherë. Ne nuk do të bëhemi pengesë, por duhet edhe vullneti i palës tjetër", shtoi ai persa i përket draft-marrëveshjes së propozuar nga ndërkombetarët dhe Presidenti. Përsa i përket protestës ai tha se "dua të falenderoj shpirtin opozitar që është mishëruar në çadrën ku dëshmohet nevoja e patjetërsueshme për vendosjen në themel të demokracisë dhe nga ana tjetër të shpreh falenderim të gjithë atyre që në protestën e djeshme treguan emancipim që duke i zhbërë zërat dhe gjithë mjegullnajat për të njollosur opoztën si një burim i mundshëm dhune".Lingua-Identify-0.56/t/files/sv000644 000765 000024 00000001744 11373251705 016453 0ustar00ambsstaff000000 000000 För att tragedin inte ska upprepas valde demonstranterna att natten till fredagen placera svartklädda vakter, beväpnade med slangbågar, vid ingångarna till lägret som ligger i Silomdistriktet, mitt i huvudstaden. Samtliga ingångar utom spärrades av efter våldsutbrottet på torsdagen. Myndigheterna replikerade snabbt genom att använda tårgas och vattenkanoner mot de rödklädda demonstranterna. Journalister rapporterade om lastbilar som anlänt till området med tungt beväpnade soldater. Krigsmakten uppgav att prickskyttar också skickats dit. De får använda skarpa kulor som varningsskott — i "självförsvar" mot "beväpnade terrorister", hette det. På fredagsmorgonen svensk tid kom rapporter om att armén skjutit gummikulor rakt in i folkmassan. Enligt vissa uppgifter kan det också röra sig om skarpa skott. Regimen hade förvarnat om att armén skulle kunna ta till våld, men också poängterat att ingen som frivilligt lämnade lägret skulle komma att skadas.Lingua-Identify-0.56/t/files/tr000644 000765 000024 00000002525 11375525475 016460 0ustar00ambsstaff000000 000000 Cumhurbaşkanı Abdullah Gül, TBMM'de grubu bulunan partilerin liderleriyle tek tek görüşecek. Bu kapsamda, Gül ilk görüşmesini ana muhalefet lideri, CHP Genel Başkanı Deniz Baykal ile yaptı Cumhurbaşkanlığı tarafından basına dağıtılan görüşmeye ilişkin görüntüde, Cumhurbaşkanı Gül, Baykal’ı makam odasının kapısında karşılıyor. Görüntülerde, Baykal’ın "Son gelişimde oda bu şekli almış mıydı?" sözleri üzerine Cumhurbaşkanı Gül’ün "Böyleydi. Vakit geçti epeyce" dediği duyuldu. CHP lideri Baykal, görüşme için Çankaya Köşkü'ne protokol kapısından saat 17.25'te giriş yaptı. Yaklaşık 1 saat 40 dakika süren görüşmeden sonra Baykal Köşk'ten ayrıldı. CHP lideri Köşk'ten çıkarken yaptığı açıklamada 'Yararlı bir görüşme oldu. Gündemdeki konuları değerlendirdik" dedi. Baykal-Gül buluşmasında gündemin yargı krizi ve Anayasa değişikliğine odaklanması bekleniyordu. Cumhurbaşkanlığı kaynakları, Gül'ün son gelişmelerle ilgili yüksek yargı organlarının başkanlarını kabul etmesi ve geçen hafta Başbakan Recep Tayyip Erdoğan ve Genelkurmay Başkanı Orgeneral İlker Başbuğ ile yaptığı üçlü görüşme bağlamında liderlerle bir araya geleceğini bildirdi. Buna göre Gül, yarın (perşembe) BDP ve MHP liderlerini kabul edecek.Lingua-Identify-0.56/t/files/uk000644 000765 000024 00000001605 11746562711 016444 0ustar00ambsstaff000000 000000 На вчорашню траурну дату одночасно тут було заплановано одразу три акції. Усі, в принципі, націлені на здорове й благополучне життя у здоровій країні. Одну проводили спортсмени (з вимогою збільшити фінансування галузі та відновити в структурі КМДА департамент фізичної культури і спорту). Дві інші стосувалися проблем будівництва і порятунку того, що ще не винищили. Здавалося б, логічно об’єднати зусилля мітингувальників і спрямувати помножені сили на те, аби разом достукатися до влади. Lingua-Identify-0.56/lib/Lingua/000755 000765 000024 00000000000 12203746720 016511 5ustar00ambsstaff000000 000000 Lingua-Identify-0.56/lib/Lingua/Identify/000755 000765 000024 00000000000 12203746720 020264 5ustar00ambsstaff000000 000000 Lingua-Identify-0.56/lib/Lingua/Identify.pm000644 000765 000024 00000064454 12203746537 020645 0ustar00ambsstaff000000 000000 package Lingua::Identify; use 5.006; use strict; use warnings; use utf8; use base 'Exporter'; our %EXPORT_TAGS = ( all => [ qw( langof langof_file confidence get_all_methods activate_all_languages deactivate_all_languages get_all_languages get_active_languages get_inactive_languages is_active is_valid_language activate_language deactivate_language set_active_languages name_of ) ], language_identification => [ qw( langof langof_file confidence get_all_methods ) ], language_manipulation => [ qw( activate_all_languages deactivate_all_languages get_all_languages get_active_languages get_inactive_languages is_active is_valid_language activate_language deactivate_language set_active_languages name_of ) ], ); our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } ); our @EXPORT = qw(); our $VERSION = '0.56'; # DEFAULT VALUES # our %default_methods = qw/smallwords 1.3 prefixes2 1.5 suffixes3 1.5 ngrams3 1.2/; my $default_maxsize = 1_000_000; my %default_extractfrom = qw/head 1/; =head1 NAME Lingua::Identify - Language identification =head1 SYNOPSIS use Lingua::Identify qw(:language_identification); $a = langof($textstring); # gives the most probable language or the complete way: @a = langof($textstring); # gives pairs of languages / probabilities # sorted from most to least probable %a = langof($textstring); # gives a hash of language / probability or the expert way (see section OPTIONS, under HOW TO PERFORM IDENTIFICATION) $a = langof( { method => [qw/smallwords prefix2 suffix2/] }, $text); $a = langof( { 'max-size' => 3_000_000 }, $text); $a = langof( { 'extract_from' => ( 'head' => 1, 'tail' => 2)}, $text); =head1 DESCRIPTION B C identifies the language a given string or file is written in. See section WHY LINGUA::IDENTIFY for a list of C's strong points. See section KNOWN LANGUAGES for a list of available languages and HOW TO PERFORM IDENTIFICATION to know how to really use this module. If you're in a hurry, jump to section EXAMPLES, way down below. Also, don't forget to read the following section, IMPORTANT WARNING. =head1 A WARNING ON THE ACCURACY OF LANGUAGE IDENTIFICATION METHODS Take a word that exists in two different languages, take a good look at it and answer this question: "What language does this word belong to?". You can't give an answer like "Language X", right? You can only say it looks like any of a set of languages. Similarly, it isn't always easy to identify the language of a text if the only two active languages are very similar. Now that we've taken out of the way the warning that language identification is not 100% accurate, please keep reading the documentation. =head1 WHY LINGUA::IDENTIFY You might be wondering why you should use Lingua::Identify instead of any other tool for language identification. Here's a list of Lingua::Identify's strong points: =over 6 =item * it's free and it's open-source; =item * it's portable (it's Perl, which means it will work in lots of different platforms); =item * unicode support; =item * 4 different methods of language identification and growing (see METHODS OF LANGUAGE IDENTIFICATION for more details on this one); =item * it's a module, which means you can easily write your own application (be it CGI, TK, whatever) around it; =item * it comes with I, which means you don't actually need to write your own application around it; =item * it's flexible (at the moment, you can actually choose the methods to use and their relevance, the max size of input to analyze each time and which part(s) of the input to analyze) =item * it supports big inputs (through the 'max-size' and 'extract_from' options) =item * it's easy to deal with languages (you can activate and deactivate the ones you choose whenever you want to, which can improve your times and accuracy); =item * it's maintained. =back =cut # initialization our (@all_languages,@active_languages,%languages,%regexen,@methods); BEGIN { use Class::Factory::Util; for ( Lingua::Identify->subclasses() ) { /^[A-Z][A-Z]$/ || next; eval "require Lingua::Identify::$_ ;"; if ($languages{_versions}{lc $_} < 0.02) { for my $k (keys %languages) { delete($languages{$k}{lc $_}) if exists $languages{$k}{lc $_}; } } } @all_languages = @active_languages = keys %{$languages{_names}}; @methods = qw/smallwords/; } =head1 HOW TO PERFORM IDENTIFICATION =head2 langof To identify the language a given text is written in, use the I function. To get a single value, do: $language = langof($text); To get the most probable language and also the percentage of its probability, do: ($language, $probability) = langof($text); If you want a hash where each active language is mapped into its percentage, use this: %languages = langof($text); =cut sub langof { my %config = (); %config = (%config, %{+shift}) if ref($_[0]) eq 'HASH'; =head3 OPTIONS I can also be given some configuration parameters, in this way: $language = langof(\%config, $text); These parameters are detailed here: =over 6 =item * B When the size of the input exceeds the C'max-size', C analyzes only the beginning of the file. You can specify which part of the file is analyzed with the 'extract-from' option: langof( { 'extract_from' => 'tail' } , $text ); Possible values are 'head' and 'tail' (for now). You can also specify more than one part of the file, so that text is extracted from those parts: langof( { 'extract_from' => [ 'head', 'tail' ] } , $text ); (this will be useful when more than two possibilities exist) You can also specify different values for each part of the file (not necessarily for all of them: langof( { 'extract_from' => { head => 40, tail => 60 } } , $text); The line above, for instance, retrives 40% of the text from the beginning and 60% from the end. Note, however, that those values are not percentages. You'd get the same behavior with: langof( { 'extract_from' => { head => 80, tail => 120 } } , $text); The percentages would be the same. =item * B By default, C analyzes only 1,000,000 bytes. You can specify how many bytes (at the most) can be analyzed (if not enough exist, the whole input is still analyzed). langof( { 'max-size' => 2000 }, $text); If you want all the text to be analyzed, set max-size to 0: langof( { 'max-size' => 0 }, $text); See also C. =item * B You can choose which method or methods to use, and also the relevance of each of them. To choose a single method to use: langof( {method => 'smallwords' }, $text); To choose several methods: langof( {method => [qw/prefixes2 suffixes2/]}, $text); To choose several methods and give them different weight: langof( {method => {smallwords => 0.5, ngrams3 => 1.5} }, $text); To see the list of available methods, see section METHODS OF LANGUAGE IDENTIFICATION. If no method is specified, the configuration for this parameter is the following (this might change in the future): method => { smallwords => 0.5, prefixes2 => 1, suffixes3 => 1, ngrams3 => 1.3 }; =item * B By default, C assumes C mode, but others are available. In C mode, instead of actually calculating anything, C only does the preparation it has to and then returns a bunch of information, including the list of the active languages, the selected methods, etc. It also returns the text meant to be analised. Do be warned that, with I, the dummy mode still reads the files, it simply doesn't calculate language. langof( { 'mode' => 'dummy' }, $text); This returns something like this: { 'methods' => { 'smallwords' => '0.5', 'prefixes2' => '1', }, 'config' => { 'mode' => 'dummy' }, 'max-size' => 1000000, 'active-languages' => [ 'es', 'pt' ], 'text' => $text, 'mode' => 'dummy', } =back =cut # select the methods my %methods = defined $config{'method'} ? _make_hash($config{'method'}) : %default_methods; # select max-size my $maxsize = defined $config{'max-size'} ? $config{'max-size'} : $default_maxsize; # get the text my $text = join "\n", @_; return wantarray ? () : undef unless $text; # this is the support for big files; if the input is bigger than the $maxsize, we act if ($maxsize < length $text && $maxsize != 0) { # select extract_from my %extractfrom = defined $config{'extract_from'} ? _make_hash($config{'extract_from'}) : %default_extractfrom; my $total_weight = 0; for (keys %extractfrom) { if ($_ eq 'head' or $_ eq 'tail') { $total_weight += $extractfrom{$_}; next; } else { delete $extractfrom{$_}; } } for (keys %extractfrom) { $extractfrom{$_} = $extractfrom{$_} / $total_weight; } $extractfrom{'head'} ||= 0; $extractfrom{'tail'} ||= 0; my $head = int $maxsize * $extractfrom{'head'}; my $tail = length($text) - $head - int $maxsize * $extractfrom{'tail'}; substr( $text, $head, $tail, ''); } # dummy mode exits here $config{'mode'} ||= 'normal'; if ($config{'mode'} eq 'dummy') { return { 'method' => \%methods, 'max-size' => $maxsize, 'config' => \%config, 'active-languages' => [ sort (get_active_languages()) ], 'text' => $text, 'mode' => $config{'mode'}, }; } # use the methods my (%result, $total); for (keys %methods) { my %temp_result; if (/^smallwords$/) { %temp_result = _langof_by_word_method('smallwords', $text); } elsif (/^(prefixes[1-4])$/) { %temp_result = _langof_by_prefix_method($1, $text); } elsif (/^(suffixes[1-4])$/) { %temp_result = _langof_by_suffix_method($1, $text); } elsif (/^(ngrams[1-4])$/) { %temp_result = _langof_by_ngram_method($1, $text); } for my $l (keys %temp_result) { my $temp = $temp_result{$l} * $methods{$_}; $result{$l} += $temp; $total += $temp; } } # report the results my @result = ( map { ( $_, ($total ? $result{$_} / $total : 0)) } sort { $result{$b} <=> $result{$a} } keys %result ); return wantarray ? @result : $result[0]; } sub _make_hash { my %hash; my $temp = shift; for (ref($temp)) { if (/^HASH$/) { %hash = %{$temp}; } elsif (/^ARRAY$/) { for (@{$temp}) { $hash{$_}++; } } else { $hash{$temp} = 1; } } %hash; } =head2 langof_file I works just like I, with the exception that it reveives filenames instead of text. It reads these texts (if existing and readable, of course) and parses its content. Currently, I assumes the files are regular text. This may change in the future and the files might be scanned to check their filetype and then parsed to extract only their textual content (which should be pretty useful so that you can perform language identification, say, in HTML files, or PDFs). To identify the language a file is written in: $language = langof_file($path); To get the most probable language and also the percentage of its probability, do: ($language, $probability) = langof_file($path); If you want a hash where each active language is mapped into its percentage, use this: %languages = langof_file($path); If you pass more than one file to I, they will all be read and their content merged and then parsed for language identification. =cut sub langof_file { my %config = (); if (ref($_[0]) eq 'HASH') {%config = (%config, %{+shift})} =head3 OPTIONS I accepts all the options I does, so refer to those first (up in this document). $language = langof_file(\%config, $path); I currently only reads the first 10,000 bytes of each file. You can force an input encoding with C<< { encoding => 'ISO-8859-1' } >> in the configuration hash. =cut # select max-size my $maxsize = defined $config{'max-size'} ? $config{'max-size'} : $default_maxsize; my @files = @_; my $text = ''; for my $file (@files) { #-r and -e or next; if (exists($config{encoding})) { open(FILE, "<:encoding($config{encoding})", $file) or next; } else { open(FILE, "<:utf8", $file) or next; } local $/ = \$maxsize; $text .= ; close(FILE); } return langof(\%config,$text); } =head2 confidence After getting the results into an array, its first element is the most probable language. That doesn't mean it is very probable or not. You can find more about the likeliness of the results to be accurate by computing its confidence level. use Lingua::Identify qw/:language_identification/; my @results = langof($text); my $confidence_level = confidence(@results); # $confidence_level now holds a value between 0.5 and 1; the higher that # value, the more accurate the results seem to be The formula used is pretty simple: p1 / (p1 + p2) , where p1 is the probability of the most likely language and p2 is the probability of the language which came in second. A couple of examples to illustrate this: English 50% Portuguese 10% ... confidence level: 50 / (50 + 10) = 0.83 Another example: Spanish 30% Portuguese 10% ... confidence level: 30 / (25 + 30) = 0.55 French 10% German 5% ... confidence level: 10 / (10 + 5) = 0.67 As you can see, the first example is probably the most accurate one. Are there any doubts? The English language has five times the probability of the second language. The second example is a bit more tricky. 55% confidence. The confidence level is always above 50%, for obvious reasons. 55% doesn't make anyone confident in the results, and one shouldn't be, with results such as these. Notice the third example. The confidence level goes up to 67%, but the probability of French is of mere 10%. So what? It's twice as much as the second language. The low probability may well be caused by a great number of languages in play. =cut sub confidence { defined $_[1] and $_[1] or return 0; defined $_[3] and $_[3] or return 1; $_[1] / ($_[1] + $_[3]); } =head2 get_all_methods Returns a list comprised of all the available methods for language identification. =cut sub get_all_methods { qw/smallwords prefixes1 prefixes2 prefixes3 prefixes4 suffixes1 suffixes2 suffixes3 suffixes4 ngrams1 ngrams2 ngrams3 ngrams4/ } =head1 LANGUAGE IDENTIFICATION IN GENERAL Language identification is based in patterns. In order to identify the language a given text is written in, we repeat a given process for each active language (see section LANGUAGES MANIPULATION); in that process, we look for common patterns of that language. Those patterns can be prefixes, suffixes, common words, ngrams or even sequences of words. After repeating the process for each language, the total score for each of them is then used to compute the probability (in percentage) for each language to be the one of that text. =cut sub _langof_by_method { my ($method, $elements, $text) = @_; my (%result, $total); for my $language (get_active_languages()) { for (keys %{$languages{$method}{$language}}) { if (exists $$elements{$_}) { $result{$language} += $$elements{$_} * ${languages{$method}{$language}{$_}}; $total += $$elements{$_} * ${languages{$method}{$language}{$_}}; } } } my @result = ( map { ( $_, ($total ? $result{$_} / $total : 0)) } sort { $result{$b} <=> $result{$a} } keys %result ); return wantarray ? @result : $result[0]; } =head1 METHODS OF LANGUAGE IDENTIFICATION C currently comprises four different ways for language identification, in a total of thirteen variations of those. The available methods are the following: B, B, B, B, B, B, B, B, B, B, B, B and B. Here's a more detailed explanation of each of those ways and those methods =head2 Small Word Technique - B The "Small Word Technique" searches the text for the most common words of each active language. These words are usually articles, pronouns, etc, which happen to be (usually) the shortest words of the language; hence, the method name. This is usually a good method for big texts, especially if you happen to have few languages active. =cut sub _langof_by_word_method { my ($method, $text) = @_; sub _words_count { my ($words, $text) = @_; for my $word (split /[\s\n]+/, $text) { $words->{$word}++ } } my %words; _words_count(\%words, $text); return _langof_by_method($method, \%words, $text); } =head2 Prefix Analysis - B, B, B, B This method analyses text for the common prefixes of each active language. The methods are, respectively, for prefixes of size 1, 2, 3 and 4. =cut sub _langof_by_prefix_method { use Text::Affixes; (my $method = shift) =~ /^prefixes(\d)$/; my $text = shift; my $prefixes = get_prefixes( {min => $1, max => $1}, $text); return _langof_by_method($method, $$prefixes{$1}, $text); } =head2 Suffix Analysis - B, B, B, B Similar to the Prefix Analysis (see above), but instead analysing common suffixes. The methods are, respectively, for suffixes of size 1, 2, 3 and 4. =cut sub _langof_by_suffix_method { use Text::Affixes; (my $method = shift) =~ /^suffixes(\d)$/; my $text = shift; my $suffixes = get_suffixes({min => $1, max => $1}, $text); return _langof_by_method($method, $$suffixes{$1}, $text); } ### # Have you seen my brother? He's a two line long comment. I think he # might be lost... :-\ Me and my father have been looking for him for # some time now :-/ ### =head2 Ngram Categorization - B, B, B, B Ngrams are sequences of tokens. You can think of them as syllables, but they are also more than that, as they are not only comprised by characters, but also by spaces (delimiting or separating words). Ngrams are a very good way for identifying languages, given that the most common ones of each language are not generally very common in others. This is usually the best method for small amounts of text or too many active languages. The methods are, respectively, for ngrams of size 1, 2, 3 and 4. =cut sub _langof_by_ngram_method { use Text::Ngram qw(ngram_counts); (my $method = shift) =~ /^ngrams([1-4])$/; my $text = shift; my $ngrams = ngram_counts( {spaces => 0}, $text, $1); return _langof_by_method($method, $ngrams, $text); } =head1 LANGUAGE MANIPULATION When trying to perform language identification, C works not with all available languages, but instead with the ones that are active. By default, all available languages are active, but that can be changed by the user. For your convenience, several methods regarding language manipulation were created. In order to use them, load the module with the tag :language_manipulation. These methods work with the two letters code for languages. =over 6 =item B Activate a language activate_language('en'); # or activate_language($_) for get_all_languages(); =cut sub activate_language { unless (grep { $_ eq $_[0] } @active_languages) { push @active_languages, $_[0]; } return @active_languages; } =item B Activates all languages activate_all_languages(); =cut sub activate_all_languages { @active_languages = get_all_languages(); return @active_languages; } =item B Deactivates a language deactivate_language('en'); =cut sub deactivate_language { @active_languages = grep { ! ($_ eq $_[0]) } @active_languages; return @active_languages; } =item B Deactivates all languages deactivate_all_languages(); =cut sub deactivate_all_languages { @active_languages = (); return @active_languages; } =item B Returns the names of all available languages my @all_languages = get_all_languages(); =cut sub get_all_languages { return @all_languages; } =item B Returns the names of all active languages my @active_languages = get_active_languages(); =cut sub get_active_languages { return @active_languages; } =item B Returns the names of all inactive languages my @active_languages = get_inactive_languages(); =cut sub get_inactive_languages { return grep { ! is_active($_) } get_all_languages(); } =item B Returns the name of the language if it is active, an empty list otherwise if (is_active('en')) { # YOUR CODE HERE } =cut sub is_active { return grep { $_ eq $_[0] } get_active_languages(); } =item B Returns the name of the language if it exists, an empty list otherwise if (is_valid_language('en')) { # YOUR CODE HERE } =cut sub is_valid_language { return grep { $_ eq $_[0] } get_all_languages(); } =item B Sets the active languages set_active_languages('en', 'pt'); # or set_active_languages(get_all_languages()); =cut sub set_active_languages { @active_languages = grep { is_valid_language($_) } @_; return @active_languages; } =item B Given the two letter tag of a language, returns its name my $language_name = name_of('pt'); =cut sub name_of { my $tag = shift || return undef; return $languages{_names}{$tag}; } =back =cut 1; __END__ =head1 KNOWN LANGUAGES Currently, C knows the following languages (33 total): =over 6 =item AF - Afrikaans =item BG - Bulgarian =item BR - Breton =item BS - Bosnian =item CY - Welsh =item DA - Danish =item DE - German =item EN - English =item EO - Esperanto =item ES - Spanish =item FI - Finnish =item FR - French =item FY - Frisian =item GA - Irish =item HR - Croatian =item HU - Hungarian =item ID - Indonesian =item IS - Icelandic =item IT - Italian =item LA - Latin =item MS - Malay =item NL - Dutch =item NO - Norwegian =item PL - Polish =item PT - Portuguese =item RO - Romanian =item RU - Russian =item SL - Slovene =item SO - Somali =item SQ - Albanian =item SV - Swedish =item SW - Swahili =item TR - Turkish =back =head1 CONTRIBUTING WITH NEW LANGUAGES Please do not contribute with modules you made yourself. It's easier to contribute with unprocessed text, because that allows for new versions of Lingua::Identify not having to drop languages down in case I can't contact you by that time. Use I to create a new module for your own personal use, if you must, but try to contribute with unprocessed text rather than those modules. =head1 EXAMPLES =head2 THE BASIC EXAMPLE Check the language a given text file is written in: use Lingua::Identify qw/langof/; my $text = join "\n", <>; # identify the language by letting the module decide on the best way # to do so my $language = langof($text); =head2 IDENTIFYING BETWEEN TWO LANGUAGES Check the language a given text file is written in, supposing you happen to know it's either Portuguese or English: use Lingua::Identify qw/langof set_active_languages/; set_active_languages(qw/pt en/); my $text = join "\n", <>; # identify the language by letting the module decide on the best way # to do so my $language = langof($text); =head1 TO DO =over 6 =item * WordNgrams based methods; =item * More languages (always); =item * File recognition and treatment; =item * Deal with different encodings; =item * Create sets of languages and allow their activation/deactivation; =item * There should be a way of knowing the default configuration (other than using the dummy mode, of course, or than accessing the variables directly); =item * Add a section about other similar tools. =back =head1 ACKNOWLEDGMENTS The following people and/or projects helped during this tool development: * EuroParl v5 corpus was used to train Dutch, German, English, Spanish, Finish, French, Italian, Portuguese, Danish and Swedish. =head1 SEE ALSO langident(1), Text::ExtractWords(3), Text::Ngram(3), Text::Affixes(3). ISO 639 Language Codes, at http://www.w3.org/WAI/ER/IG/ert/iso639.htm =head1 AUTHOR Alberto Simoes, C<< >> Jose Castro, C<< >> =head1 COPYRIGHT & LICENSE Copyright 2008-2010 Alberto Simoes, All Rights Reserved. Copyright 2004-2008 Jose Castro, All Rights Reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut Lingua-Identify-0.56/lib/Lingua/Identify/BG.pm000644 000765 000024 00000024023 11750235531 021112 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'bg'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'bg'}} = 'bulgarian'; ${Lingua::Identify::languages{'_sets'}{'bg'}} = ''; =head1 NAME Lingua::Identify::BG - Meta-information on Bulgarian =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'bg'}} = { 'с' => 0.113577142528085, 'н' => 0.104286735736874, 'п' => 0.0833497291097813, 'д' => 0.0655127239223176, 'к' => 0.0508812929789286, 'м' => 0.0500127167241719, 'о' => 0.0493408897867909, 'т' => 0.048966586207393, 'в' => 0.0406215358923541, 'б' => 0.0367777260577675, 'з' => 0.0354340721830057, 'и' => 0.0327515631973204, 'ч' => 0.0240466055943989, 'г' => 0.0233555836016642, 'р' => 0.018998306036365, 'С' => 0.0166085216448243, 'Н' => 0.0154088306852155, 'Т' => 0.0141371582680302, 'Д' => 0.0136332880649945, 'л' => 0.0124959810352853, }; ${Lingua::Identify::languages{'prefixes2'}{'bg'}} = { 'пр' => 0.0448948112045835, 'по' => 0.0386988476544778, 'на' => 0.0290808456582887, 'ка' => 0.0254902977432693, 'из' => 0.0233126369139435, 'за' => 0.0229820991094923, 'ко' => 0.0223015801003279, 'не' => 0.02189974982825, 'то' => 0.0215108818230132, 'съ' => 0.0177583055724785, 'ст' => 0.0171750035646234, 'ра' => 0.0161639467510078, 'от' => 0.0151464088039716, 'мо' => 0.0141094274566736, 'до' => 0.0137464839851193, 'бе' => 0.0132215121780497, 'въ' => 0.012229898764696, 'об' => 0.0105577663421779, 'ня' => 0.00995502093406095, 'ми' => 0.00973466239776012, }; ${Lingua::Identify::languages{'prefixes3'}{'bg'}} = { 'пре' => 0.0187538847332662, 'про' => 0.0130669924484357, 'раз' => 0.0128598066742397, 'при' => 0.010237834980103, 'тов' => 0.00894470997563781, 'беш' => 0.00874466853848297, 'каз' => 0.00735866715248159, 'няк' => 0.00712290403012053, 'кат' => 0.00643704767416108, 'мож' => 0.00639418165191361, 'как' => 0.00630130527037743, 'нещ' => 0.0060369647998514, 'нег' => 0.00602267612576891, 'Сюз' => 0.00600838745168642, 'ста' => 0.00591551107015025, 'нап' => 0.00590122239606776, 'стр' => 0.00589407805902651, 'под' => 0.00561544891441799, 'въз' => 0.0055368612069643, 'так' => 0.00545827349951061, }; ${Lingua::Identify::languages{'prefixes4'}{'bg'}} = { 'пред' => 0.00813653152795776, 'Сюза' => 0.0074136760726029, 'Стра' => 0.00621479385396557, 'Бекъ' => 0.00570350584895847, 'какв' => 0.00563298336550922, 'всич' => 0.00557127619249112, 'мног' => 0.00543023122559261, 'Дори' => 0.0048396054267051, 'стра' => 0.00480434418498047, 'няко' => 0.00463685328678849, 'разб' => 0.00444291645730304, 'напр' => 0.00424016431738644, 'койт' => 0.00422253369652412, 'погл' => 0.00413438059221256, 'Гуен' => 0.00410793466091908, 'тряб' => 0.0040638581087633, 'коет' => 0.00404622748790099, 'малк' => 0.00390518252100248, 'добр' => 0.00386110596884669, 'може' => 0.00363190789763662, }; ${Lingua::Identify::languages{'suffixes1'}{'bg'}} = { 'а' => 0.274703547828257, 'е' => 0.182708596273173, 'о' => 0.13954247268225, 'и' => 0.120466837187651, 'т' => 0.0486752630998028, 'н' => 0.0318551115504773, 'я' => 0.0309673146784016, 'м' => 0.0232026912242479, 'л' => 0.0215662656384219, 'й' => 0.0177367418334685, 'р' => 0.0154428666721054, 'д' => 0.0150541556091966, 'у' => 0.012093233068274, 'к' => 0.0108311218393232, 'с' => 0.00923788637159818, 'ш' => 0.00793258502454638, 'х' => 0.00780781357225467, 'з' => 0.00715516289872877, 'в' => 0.00492847236552277, 'ж' => 0.00283615108863092, }; ${Lingua::Identify::languages{'suffixes2'}{'bg'}} = { 'та' => 0.0597385426051112, 'то' => 0.0560701022107863, 'на' => 0.0452073705837746, 'ва' => 0.038512142796959, 'ше' => 0.037857527108219, 'но' => 0.0339622396930436, 'те' => 0.0316937694845388, 'ка' => 0.0209736274134903, 'ни' => 0.0191523699032335, 'ен' => 0.0177005489697905, 'ли' => 0.0151793063666237, 'ия' => 0.0151728250231708, 'ри' => 0.0148033884463572, 'ой' => 0.0141811794748816, 'ко' => 0.0139154443933138, 'ха' => 0.0122432577824732, 'ан' => 0.0121395562872272, 'ла' => 0.0118803025491124, 'во' => 0.0118349331449423, 'ие' => 0.0117441943366021, }; ${Lingua::Identify::languages{'suffixes3'}{'bg'}} = { 'ата' => 0.0415386593887087, 'ите' => 0.0254276038466485, 'еше' => 0.0211408484917766, 'аше' => 0.0193904233885372, 'ова' => 0.0176185645085235, 'ото' => 0.0164182730091594, 'ето' => 0.0163468270865782, 'ато' => 0.0121600960233199, 'ние' => 0.00795907577554549, 'ния' => 0.00723747195747539, 'ава' => 0.00694454367489247, 'ена' => 0.00686595316005316, 'зан' => 0.00627295200262921, 'ята' => 0.00624437363359673, 'аза' => 0.00615149393424117, 'ина' => 0.00612291556520869, 'ено' => 0.00595858994327194, 'ост' => 0.00582998728262578, 'кво' => 0.00577997513681894, 'ори' => 0.00573710758327022, }; ${Lingua::Identify::languages{'suffixes4'}{'bg'}} = { 'ната' => 0.0139285588349378, 'ваше' => 0.0109224584787898, 'ните' => 0.00880672802284989, 'ката' => 0.00775767833844635, 'юзан' => 0.00741387213935611, 'ение' => 0.00719348355019571, 'акво' => 0.00713177474523079, 'ного' => 0.00597693853803026, 'екър' => 0.00570365668747135, 'тмор' => 0.00565957896963927, 'ието' => 0.00528932613984978, 'риан' => 0.00484854896152897, 'гато' => 0.00478684015656405, 'оето' => 0.00471631580803272, 'ност' => 0.00460171374166931, 'маше' => 0.00456645156740365, 'ното' => 0.00453118939313798, 'тата' => 0.00451355830600515, 'ията' => 0.00433724743467682, 'ойто' => 0.00424909199901266, }; ${Lingua::Identify::languages{'smallwords'}{'bg'}} = { '—' => 0.0424162957919908, 'да' => 0.0299809624670625, 'на' => 0.0267954326731134, 'се' => 0.0254716307138865, 'и' => 0.0239923345562742, 'е' => 0.016339498468172, 'в' => 0.014877012494169, 'не' => 0.0128135624561359, 'от' => 0.0115822165067598, 'че' => 0.0115780139608575, 'за' => 0.0114813554051044, 'с' => 0.0103508705573837, 'си' => 0.00933805699492753, 'бе' => 0.00674088362730142, 'го' => 0.00625759084853604, 'му' => 0.00618614756819681, 'беше' => 0.00494219398111376, 'ще' => 0.00394619060226685, 'това' => 0.00371084803173763, 'като' => 0.00355535383335224, }; ${Lingua::Identify::languages{'ngrams1'}{'bg'}} = { 'а' => 0.124477439748745, 'е' => 0.0935177992239025, 'о' => 0.0871530963807118, 'и' => 0.0796337842204628, 'н' => 0.0678238068033988, 'т' => 0.0677960295142956, 'с' => 0.0514213175879313, 'р' => 0.0468102875967923, 'в' => 0.0409761309754736, 'д' => 0.0355577077810668, 'к' => 0.0354178954259138, 'л' => 0.0340040314105585, 'м' => 0.0297513284488514, 'п' => 0.026922674508504, 'з' => 0.0239079127311649, 'ъ' => 0.0201774228045988, 'б' => 0.0181533843386089, 'я' => 0.0177163549900511, 'г' => 0.0159802744210981, 'у' => 0.0146201131646758, }; ${Lingua::Identify::languages{'ngrams2'}{'bg'}} = { 'на' => 0.028499302230513, 'то' => 0.0183922169334793, 'та' => 0.017288002593729, 'ра' => 0.0154002659511984, 'ва' => 0.014835237124156, 'ка' => 0.0147436108278788, 'ст' => 0.0146778278459363, 'ат' => 0.0144193947025904, 'да' => 0.0143395153673745, 'не' => 0.0131965360561223, 'ен' => 0.0125563266782883, 'ни' => 0.0125046400496192, 'но' => 0.0123765981740524, 'по' => 0.0121252132073433, 'се' => 0.0120053942045193, 'за' => 0.0116200938817128, 'от' => 0.011492052006146, 'те' => 0.0113252451590773, 'пр' => 0.0109634387583932, 'ко' => 0.0107261501449575, }; ${Lingua::Identify::languages{'ngrams3'}{'bg'}} = { 'ата' => 0.00961610253857641, 'ите' => 0.00676953708312593, 'пре' => 0.00515648332503733, 'еше' => 0.00477227476356396, 'ова' => 0.004512506222001, 'аше' => 0.00431184668989547, 'ени' => 0.00426984818317571, 'ост' => 0.00425895968143355, 'раз' => 0.00402407914385266, 'ото' => 0.00391519412643106, 'ето' => 0.00385608511697362, 'ста' => 0.00357142857142857, 'ава' => 0.00356831757093081, 'про' => 0.00343454454952713, 'как' => 0.00341276754604281, 'кат' => 0.00336143603782977, 'лед' => 0.00332877053260329, 'нат' => 0.0032261075161772, 'ред' => 0.0032261075161772, 'стр' => 0.00315766550522648, }; ${Lingua::Identify::languages{'ngrams4'}{'bg'}} = { 'това' => 0.00349794196568374, 'ната' => 0.00333624657932387, 'пред' => 0.00311110110464557, 'беше' => 0.0030722123408375, 'стра' => 0.00294326538715811, 'какв' => 0.00263829560782114, 'ваше' => 0.00254823741794983, 'ение' => 0.0023005773958037, 'каза' => 0.0022801096253784, 'като' => 0.0022187063141025, 'може' => 0.00220233209776226, 'ните' => 0.0021368352324013, 'глед' => 0.00212660134718865, 'след' => 0.00206110448182769, 'прав' => 0.0019935608394242, 'една' => 0.0019321575281483, 'кога' => 0.0018912219872977, 'ката' => 0.00181549123672409, 'акво' => 0.00178683635812867, 'олко' => 0.00174590081727807, }; Lingua-Identify-0.56/lib/Lingua/Identify/CS.pm000644 000765 000024 00000022713 12203745465 021141 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'cs'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'cs'}} = 'czech'; ${Lingua::Identify::languages{'_sets'}{'cs'}} = ''; =head1 NAME Lingua::Identify::CS - Meta-information on Czech =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'cs'}} = { 'p' => 0.146600762897194, 's' => 0.0992247282874476, 'v' => 0.0792768958006071, 'z' => 0.0693335105753503, 'n' => 0.0620428115123013, 'd' => 0.0488226104614939, 'j' => 0.0465298120532137, 'k' => 0.0427488291489627, 'o' => 0.042216022354109, 'm' => 0.035093919355963, 't' => 0.0344233177003713, 'r' => 0.0340510013903772, 'c' => 0.031292942687605, 'u' => 0.0295626715343377, 'l' => 0.0199137890506583, 'b' => 0.0198646151983949, 'a' => 0.0193285661715238, 'h' => 0.0186412129838423, 'P' => 0.0153206270260573, 'S' => 0.010330291579329, }; ${Lingua::Identify::languages{'prefixes2'}{'cs'}} = { 'pr' => 0.0831031005710082, 'po' => 0.0505053904055307, 'vy' => 0.0375309378296119, 'za' => 0.0276441763778285, 'ne' => 0.0218227008536694, 'je' => 0.0215755463217488, 'st' => 0.0208891993469551, 'na' => 0.0201146657786122, 'č' => 0.0186224557877674, 'ro' => 0.0160296539024308, 'ob' => 0.0144718841413817, 'do' => 0.0140715634206651, 've' => 0.0136451348268584, 'te' => 0.01296226890181, 'ž' => 0.0127394817181069, 'so' => 0.0123733913198864, 'ú' => 0.0120664787673371, 'ko' => 0.0115918956520528, 'sv' => 0.0109560238985669, 'me' => 0.00992099177427945, }; ${Lingua::Identify::languages{'prefixes3'}{'cs'}} = { 'př' => 0.0377051208919231, 'vý' => 0.0204364845817903, 'pro' => 0.0197187689577429, 'pra' => 0.014816089144023, 'zá' => 0.0135382052280362, 'ná' => 0.0102767943645138, 'kte' => 0.010238162068518, 'pod' => 0.00957537674033966, 'sta' => 0.0094474072598535, 'ži' => 0.00833069245372424, 'roz' => 0.00828602386147907, 'jej' => 0.00727252972121366, 'dě' => 0.00692001002025177, 'str' => 0.00675159735489498, 'vě' => 0.00669183802202645, 'sve' => 0.00659525728203689, 'sou' => 0.00641054661680686, 'nej' => 0.00618599639633114, 'jed' => 0.00568196565951064, 'spo' => 0.00563850432651534, }; ${Lingua::Identify::languages{'prefixes4'}{'cs'}} = { 'při' => 0.0200191562284028, 'pře' => 0.015415378579131, 'kter' => 0.0108712237291048, 'živ' => 0.0078573873930476, 'jeji' => 0.00764389930542644, 'prá' => 0.00664441603635324, 'auto' => 0.00540580046492961, 'jedn' => 0.00530514692211724, 'zák' => 0.00508717109691846, 'prac' => 0.00481149578857884, 'výr' => 0.00425886296116312, 'čá' => 0.00413897625730379, 'prob' => 0.00412358972846623, 'svě' => 0.0040562736648019, 'výz' => 0.00380431925508685, 'pros' => 0.00374790198268246, 'tech' => 0.00368250923512283, 'spol' => 0.00364917175597478, 'roma' => 0.00338439523889509, 'druh' => 0.00337285534226692, }; ${Lingua::Identify::languages{'suffixes1'}{'cs'}} = { '́' => 0.230796912722376, 'e' => 0.104085931100367, 'h' => 0.0821622398540654, 'y' => 0.0737416058425713, 'u' => 0.071679134075254, 'o' => 0.0683625368240637, 'a' => 0.0639593190006765, 'i' => 0.0597279287963637, 'm' => 0.0533551774147184, '̌' => 0.0439527484854314, '̊' => 0.0324181441320328, 't' => 0.0226282925521913, 'k' => 0.0190890758699989, 'n' => 0.0136246333803788, 'd' => 0.0129405649346299, 'l' => 0.0128589738324908, 'r' => 0.00822989454755302, 's' => 0.00570651410391356, 'v' => 0.00448210723340242, 'j' => 0.0038850332740402, }; ${Lingua::Identify::languages{'suffixes2'}{'cs'}} = { 'í' => 0.140378499484265, 'ch' => 0.0848172070956761, 'é' => 0.06090196164882, 'ho' => 0.0388384885546313, 'ů' => 0.0348054304732578, '́m' => 0.0335662745380141, 'ě' => 0.0285406154242427, 'á' => 0.0278374988542449, 'ou' => 0.0245504869024234, 'ce' => 0.0210731925785234, 'ky' => 0.0208330190966925, 'je' => 0.0206276533658515, 'ý' => 0.0185977813539519, 'ti' => 0.0163950308454687, 'em' => 0.0150909004417104, 'mi' => 0.0139706709890158, 'ka' => 0.0132507306727062, 'ž' => 0.0124745178256294, 'ny' => 0.0118624815373887, 'dy' => 0.00979200048266748, }; ${Lingua::Identify::languages{'suffixes3'}{'cs'}} = { '́ch' => 0.0726041511367731, 'ní' => 0.06897053329076, '́ho' => 0.0341475574784624, 'ké' => 0.022676914032344, 'uje' => 0.0172699458156355, 'né' => 0.0172536488585189, '̌í' => 0.0168830439818657, 'ím' => 0.0152780955013759, 'ně' => 0.0130550698324612, 'sti' => 0.0127925077455816, 'tí' => 0.0124454429180969, 'vé' => 0.0118527165518536, 'jí' => 0.0118273657296721, 'cí' => 0.0112201531793251, 'vá' => 0.0102079310650786, 'ech' => 0.00915225039852096, 'ví' => 0.00903575733468699, 'ým' => 0.00880578916204064, 'ém' => 0.00858004612642455, '́mi' => 0.00837060004792513, }; ${Lingua::Identify::languages{'suffixes4'}{'cs'}} = { 'ých' => 0.0425115359515253, 'ích' => 0.027651025125873, 'ého' => 0.026325304217049, 'ení' => 0.0186241991504616, '́ní' => 0.0177408125874251, 'cké' => 0.0116519841580198, 'osti' => 0.0105166593371652, '̌ní' => 0.0104448601390084, 'ího' => 0.00993906043056441, 'ské' => 0.00975058753540277, 'ním' => 0.0095012135346617, 'ové' => 0.00866206040620396, 'ší' => 0.00841909704815547, '́vá' => 0.00830498760822767, 'tví' => 0.00717030385164234, 'vní' => 0.00695170093582562, 'ách' => 0.00694785455021008, 'ými' => 0.00600164368878638, 'dní' => 0.00558302872096139, '́cí' => 0.00512082138282691, }; ${Lingua::Identify::languages{'smallwords'}{'cs'}} = { 'a' => 0.055483486941922, 'v' => 0.020574600720111, 'o' => 0.016004931988744, 'se' => 0.0122352325710903, 'na' => 0.0113123332530745, 'i' => 0.0103191882183144, 'z' => 0.00851590672662816, 'je' => 0.0085089745835284, 's' => 0.00700423738800545, 'k' => 0.00606886021241011, 'pro' => 0.00483632516927138, 'do' => 0.00416945300307371, 've' => 0.00369344584355632, 'jsou' => 0.00366941441414379, 'jeho' => 0.003175383682567, 'jejich' => 0.00310097867996283, 'V' => 0.00288423367237675, 'při' => 0.0028209200987322, 'za' => 0.00256258223254752, 'po' => 0.00243410651376516, }; ${Lingua::Identify::languages{'ngrams1'}{'cs'}} = { 'e' => 0.0949524023566544, 'a' => 0.0866288484484764, 'o' => 0.0839143085621032, 'i' => 0.0814149057808624, 'n' => 0.0629910529404838, 'r' => 0.0543239253059958, 's' => 0.0526291153137958, 't' => 0.0488196743689696, 'v' => 0.0485627242644832, 'c' => 0.0433846762121235, 'u' => 0.0400820641602028, 'k' => 0.0393189769219914, 'l' => 0.0347513679153484, 'p' => 0.0346681262272961, 'd' => 0.0326615634310886, 'h' => 0.0306984277389897, 'y' => 0.0306382446902871, 'z' => 0.0299808428824283, 'm' => 0.0283120891526934, 'j' => 0.0199358078225358, }; ${Lingua::Identify::languages{'ngrams2'}{'cs'}} = { 'ni' => 0.0267634429628645, 'ch' => 0.0220670276819572, 'st' => 0.0198203742306485, 'pr' => 0.0197551920977101, 'ov' => 0.0161054186801768, 'ne' => 0.0149870977719213, 'ro' => 0.0149615361511612, 'na' => 0.014856307479032, 'po' => 0.0145623488402906, 'ra' => 0.0144786345323012, 've' => 0.0137394776653208, 'te' => 0.0129276831926805, 'je' => 0.0122755423430378, 'en' => 0.0121065161257615, 'va' => 0.0118089362574123, 'ho' => 0.0113923883457755, 'vy' => 0.0113266736790713, 'ti' => 0.0113222003954383, 'le' => 0.0108107549667294, 'ic' => 0.00969935699742978, }; ${Lingua::Identify::languages{'ngrams3'}{'cs'}} = { 'ost' => 0.00839484282276417, 'ick' => 0.00784843384564947, 'ova' => 0.00759062115926436, 'pro' => 0.0074297768265644, 'pra' => 0.00711840066861989, 'sti' => 0.00657168385546174, 'ove' => 0.00635573687098514, 'eni' => 0.00570435580305573, 'sta' => 0.00551980809501052, 'ter' => 0.00489151473033947, 'ske' => 0.004720973562271, 'cke' => 0.00455412642672386, 'str' => 0.00450825885625057, 'uje' => 0.00449532974242589, 'ech' => 0.00426876241445044, 'nos' => 0.00404204116845327, 'pod' => 0.00385472293601705, 'vni' => 0.00382255406947706, 'nic' => 0.00365031980316963, 'tic' => 0.00353195684446507, }; ${Lingua::Identify::languages{'ngrams4'}{'cs'}} = { 'icke' => 0.00544220419310788, 'nost' => 0.00533219011829132, 'osti' => 0.00509317144383869, 'kter' => 0.00485196995044128, 'tick' => 0.00446430130584961, 'icky' => 0.00437327775585257, 'klad' => 0.00366538957206271, 'jedn' => 0.00292388597652335, 'jeji' => 0.00278898776573638, 'stav' => 0.00269578139679457, 'tech' => 0.00258467591250563, 'auto' => 0.00257398009967624, 'ivot' => 0.00244039158025614, 'prac' => 0.00240917726934589, 'utor' => 0.00228148236107667, 'prav' => 0.00220202775148694, 'stvi' => 0.00206167249333804, 'nick' => 0.00204595619693567, 'jsou' => 0.00198876634058262, 'poda' => 0.00198592867595442, }; Lingua-Identify-0.56/lib/Lingua/Identify/CY.pm000644 000765 000024 00000022626 12151700141 021132 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'cy'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'cy'}} = 'welsh'; ${Lingua::Identify::languages{'_sets'}{'cy'}} = ''; =head1 NAME Lingua::Identify::CY - Meta-information on Welsh =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'cy'}} = { 'y' => 0.106174895303898, 'a' => 0.0862241733611544, 'g' => 0.0743429874793969, 'd' => 0.0736154526588347, 'c' => 0.0524141390291929, 'f' => 0.049131194637029, 'e' => 0.0465125211679871, 'b' => 0.0420671026603599, 'h' => 0.0405239154478011, 'm' => 0.0388847026766277, 'o' => 0.0309552509599732, 'n' => 0.0296323576760007, 'w' => 0.0294595116704945, 's' => 0.0269899602454187, 'r' => 0.0251869391683735, 'p' => 0.0250852650474875, 't' => 0.0224541647414484, 'l' => 0.0220203551590014, 'C' => 0.0157967692483232, 'M' => 0.0131961711785499, }; ${Lingua::Identify::languages{'prefixes2'}{'cy'}} = { 'dd' => 0.0284359782666245, 'gw' => 0.0240973165262659, 'cy' => 0.0226992714997718, 'ga' => 0.0216747062602939, 'gy' => 0.0215774011816843, 'rh' => 0.0189473021451477, 'oe' => 0.0186310606396664, 'll' => 0.0177095243069517, 'ma' => 0.0172115512575966, 'we' => 0.0164660226406023, 'hy' => 0.0141478722384318, 'ar' => 0.0137371875684176, 'ch' => 0.0136942588572663, 'me' => 0.0130918259441097, 'di' => 0.0130188471351525, 'fe' => 0.0126739864889036, 'ca' => 0.0117338477146901, 'bo' => 0.0116565760346177, 'da' => 0.0112072555245674, 'dr' => 0.0111128123600345, }; ${Lingua::Identify::languages{'prefixes3'}{'cy'}} = { 'oed' => 0.0192189973614776, 'wed' => 0.0134793315743184, 'ddi' => 0.0120070360598065, 'cyf' => 0.0094740545294635, 'rha' => 0.00936147757255937, 'gyf' => 0.00920492524186456, 'gwe' => 0.00867897977132806, 'mew' => 0.0077080035180299, 'wrt' => 0.00702022867194371, 'cyn' => 0.0067792436235708, 'dda' => 0.00668777484608619, 'gwa' => 0.00637291116974494, 'lla' => 0.00600351802990325, 'chw' => 0.00564467897977133, 'idd' => 0.00548988566402814, 'hyn' => 0.00537379067722076, 'rhy' => 0.00514160070360598, 'dde' => 0.0050325417766051, 'med' => 0.00497449428320141, 'gyn' => 0.00476693051890941, }; ${Lingua::Identify::languages{'prefixes4'}{'cy'}} = { 'hynn' => 0.00615113831674341, 'roed' => 0.00578349463699782, 'medd' => 0.00554222847216478, 'Roed' => 0.00505050505050505, 'Cymr' => 0.00502063400152572, 'gwel' => 0.00473341237672448, 'hefy' => 0.00408314261817447, 'Gymr' => 0.00390391632429849, 'llaw' => 0.00383268536134778, 'aral' => 0.00366265015946545, 'bydd' => 0.00344206395161809, 'newy' => 0.00337313076166579, 'gwei' => 0.00330879311771031, 'fydd' => 0.00330879311771031, 'gyfe' => 0.00326513543074053, 'cyfa' => 0.00318241560279777, 'rhai' => 0.00315714009981526, 'alla' => 0.00304225144989476, 'chwa' => 0.00280787860405695, 'ysgo' => 0.00275273205209511, }; ${Lingua::Identify::languages{'suffixes1'}{'cy'}} = { 'n' => 0.175898223036932, 'd' => 0.161646827884304, 'r' => 0.0892954154510965, 'i' => 0.081638876559064, 'u' => 0.0689641930590804, 'l' => 0.0657972163584963, 'h' => 0.0490926566574842, 's' => 0.0413525384944663, 'o' => 0.0350750575736938, 'g' => 0.0322401390578357, 'a' => 0.0276150429134379, 'e' => 0.0259683053851028, 'm' => 0.0237794320449702, 'y' => 0.0231006328306153, 'w' => 0.0225483319723466, 't' => 0.0220265262534914, 'f' => 0.0181355723542361, 'c' => 0.0167858800727817, 'b' => 0.0106156968647608, 'p' => 0.00186020350423035, }; ${Lingua::Identify::languages{'suffixes2'}{'cy'}} = { 'dd' => 0.083487447061398, 'au' => 0.0427756776355593, 'th' => 0.0419790062060275, 'od' => 0.0289175995732015, 'ol' => 0.0279635962635109, 'yn' => 0.0273428504817931, 'an' => 0.0266062512246857, 'yd' => 0.022511331471096, 'on' => 0.0223911871262474, 'ai' => 0.0205718584756829, 'el' => 0.0196450306725652, 'ad' => 0.0194533718367353, 'wn' => 0.0189656430082428, 'ch' => 0.0187367966371026, 'di' => 0.0186423975090073, 'en' => 0.016508405098125, 'io' => 0.0162681164084278, 'nt' => 0.0147920573145737, 'id' => 0.0146418768835129, 'll' => 0.0141427057364634, }; ${Lingua::Identify::languages{'suffixes3'}{'cy'}} = { 'edd' => 0.0428521435743766, 'ydd' => 0.0278728855734516, 'eth' => 0.0242766678448896, 'odd' => 0.0197572157361343, 'iad' => 0.0135425294511767, 'edi' => 0.0133016092414882, 'dau' => 0.0132611627829274, 'ion' => 0.0126685342379272, 'ith' => 0.0123678236112357, 'wyd' => 0.0111104663124964, 'iau' => 0.0106690723516802, 'rth' => 0.0103349494331341, 'ewn' => 0.00887535984159057, 'ael' => 0.00862212984016615, 'aid' => 0.00799960608666445, 'nol' => 0.00784133733577419, 'dol' => 0.00775868587697594, 'dai' => 0.00768306858488393, 'wyr' => 0.00747731920872659, 'ant' => 0.00709395712323684, }; ${Lingua::Identify::languages{'suffixes4'}{'cy'}} = { 'aeth' => 0.0215873789084759, 'oedd' => 0.0188783294424735, 'aith' => 0.0136532416683517, 'wydd' => 0.0119919670594291, 'ddai' => 0.00916113674380986, 'adau' => 0.00857521001452179, 'ynny' => 0.00622690759361041, 'dodd' => 0.00485974522527159, 'diad' => 0.00468511608242496, 'neud' => 0.00467132957114759, 'nnau' => 0.00463226778919505, 'ddol' => 0.00457941949596515, 'thau' => 0.00438870608996158, 'nydd' => 0.0042876050072609, 'efyd' => 0.00426692524034485, 'ymru' => 0.00425084097718792, 'rbyn' => 0.00406012757118435, 'aidd' => 0.00395902648848367, 'raeg' => 0.00378209959375747, 'lion' => 0.0037683130824801, }; ${Lingua::Identify::languages{'smallwords'}{'cy'}} = { 'yn' => 0.0507751301118083, 'y' => 0.0417227109110501, 'a' => 0.0239197471895085, 'i' => 0.0221448004952273, 'o' => 0.0178125167177434, 'ei' => 0.0144106947703078, 'ar' => 0.0144030523733464, 'yr' => 0.0123386498941528, 'ac' => 0.0102780686134399, 'oedd' => 0.00939250586553967, 'am' => 0.00727269600837607, 'wedi' => 0.00596584612797958, 'fod' => 0.0050458925937531, 'eu' => 0.00503156309945051, 'un' => 0.00479273819440729, 'gan' => 0.00464753265214102, 'fel' => 0.00431031188621999, 'mewn' => 0.00397882291802001, 'ond' => 0.00394920862979465, 'bod' => 0.00353078739615893, }; ${Lingua::Identify::languages{'ngrams1'}{'cy'}} = { 'd' => 0.102356331008686, 'a' => 0.0967461444918757, 'y' => 0.0837654418392229, 'n' => 0.0830576668708424, 'e' => 0.0757995168695146, 'i' => 0.0706973920741932, 'r' => 0.0696972895839885, 'o' => 0.0594522194835415, 'l' => 0.0504846557678832, 'w' => 0.0435416576594529, 'h' => 0.0389875372349986, 'g' => 0.035183329079368, 'f' => 0.0319762854784548, 't' => 0.0292130003228332, 's' => 0.0265791995800316, 'u' => 0.0254055001906054, 'c' => 0.0251419225977295, 'm' => 0.024152134967529, 'b' => 0.0173851478744914, 'p' => 0.0082365254459915, }; ${Lingua::Identify::languages{'ngrams2'}{'cy'}} = { 'dd' => 0.0360070275425073, 'yn' => 0.0338395354612426, 'ed' => 0.0185952275185722, 'yd' => 0.0169525119291752, 'th' => 0.0159174475288707, 'an' => 0.0152512624627173, 'wy' => 0.0151628821037887, 'ar' => 0.0149438147223133, 'di' => 0.0144987254393156, 'od' => 0.0130498670962242, 'll' => 0.012681857077079, 'ei' => 0.0124094717085778, 'ch' => 0.011667656236915, 'ai' => 0.0115404464743916, 'ae' => 0.0113964299550883, 'da' => 0.0111304195632967, 'ia' => 0.0108177559328576, 'er' => 0.00995452613204373, 'ad' => 0.00985745262305661, 'ol' => 0.0096241864298189, }; ${Lingua::Identify::languages{'ngrams3'}{'cy'}} = { 'edd' => 0.014013116688798, 'ydd' => 0.0120713759271695, 'oed' => 0.00906834249313745, 'eth' => 0.00766757647601724, 'ddi' => 0.00744697726463993, 'wyd' => 0.00729068701064294, 'odd' => 0.00583379681253627, 'ith' => 0.00546976913863803, 'wed' => 0.00532205341229174, 'aet' => 0.00517940445228451, 'iad' => 0.00482044354472532, 'dda' => 0.00475574483608816, 'ddo' => 0.00443108203913181, 'edi' => 0.00427479178513481, 'mae' => 0.00404912580741845, 'rth' => 0.00403431525965813, 'fod' => 0.00401677645309987, 'gan' => 0.00355959822881439, 'lla' => 0.0033074291656322, 'ynn' => 0.00324389971076559, }; ${Lingua::Identify::languages{'ngrams4'}{'cy'}} = { 'oedd' => 0.0110134754371643, 'aeth' => 0.00711054829537541, 'wydd' => 0.00584474856492701, 'wedi' => 0.00405130908226038, 'eith' => 0.00333779014829623, 'aith' => 0.00321887032596888, 'wrth' => 0.00285300438610588, 'roed' => 0.00283318441571799, 'nydd' => 0.00262480688920744, 'mewn' => 0.00251820758901309, 'yddi' => 0.0022219793829454, 'ddai' => 0.00221180156031378, 'yddo' => 0.00220644481156029, 'ddia' => 0.00215716272302823, 'wedd' => 0.0020827039153548, 'adau' => 0.00202645805344321, 'iaet' => 0.00196378409302744, 'fydd' => 0.00189950310798563, 'bydd' => 0.00188182583709913, 'rwyd' => 0.00183361509831777, }; Lingua-Identify-0.56/lib/Lingua/Identify/DA.pm000644 000765 000024 00000022605 11375521415 021114 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'da'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'da'}} = 'danish'; ${Lingua::Identify::languages{'_sets'}{'da'}} = ''; =head1 NAME Lingua::Identify::DA - Meta-information on Danish =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'da'}} = { 'd' => 0.0880745162592566, 'a' => 0.0849201426091969, 's' => 0.0832843259674591, 'f' => 0.0813629686916779, 'o' => 0.0709948898072735, 'e' => 0.0685655566386407, 'm' => 0.0563642889814434, 'v' => 0.0459666714107587, 'h' => 0.0430414463572982, 't' => 0.0413129856540447, 'b' => 0.0358850282721032, 'p' => 0.0315500022825349, 'i' => 0.0295516654000836, 'k' => 0.0290222068263009, 'g' => 0.0225563674218724, 'u' => 0.019543868977129, 'n' => 0.0185632741037918, 'D' => 0.0183242792784326, 'r' => 0.0162668650234206, 'l' => 0.0156120908108727, }; ${Lingua::Identify::languages{'prefixes2'}{'da'}} = { 'de' => 0.0796353549273594, 'fo' => 0.0641323191814296, 'me' => 0.037457784292574, 'ti' => 0.0322878117682682, 'be' => 0.0235593612058796, 'ha' => 0.0207756621894647, 'vi' => 0.0202778767040407, 'so' => 0.0199636066049481, 'st' => 0.0188602203446304, 'De' => 0.0184977774566259, 'in' => 0.0176828544441978, 'si' => 0.0156922859880841, 'fr' => 0.0146043838384881, 're' => 0.0144724821545624, 'ik' => 0.0144208684521567, 'ko' => 0.0137498903208824, 'ud' => 0.013682792507755, 'sk' => 0.0134992771214235, 'an' => 0.0114817548429424, 'he' => 0.0108302752214658, }; ${Lingua::Identify::languages{'prefixes3'}{'da'}} = { 'for' => 0.049209568411812, 'ikk' => 0.0200365680946968, 'ska' => 0.0127005833481773, 'ind' => 0.0125631933899455, 'ove' => 0.0100813875746758, 'Eur' => 0.00941919992715137, 'bes' => 0.00886165230595466, 'til' => 0.00879934755745416, 'sam' => 0.00869390875229949, 'fre' => 0.00866754905101082, 'vær' => 0.0085692992552985, 'ogs' => 0.00832966560721968, 'Kom' => 0.00806766615198684, 'pro' => 0.00784959953223512, 'den' => 0.00779448379317699, 'det' => 0.00766907551734908, 'bet' => 0.0076642828443875, 'kon' => 0.00756523426984826, 'med' => 0.00739828949502001, 'kom' => 0.00734716765009653, }; ${Lingua::Identify::languages{'prefixes4'}{'da'}} = { 'fors' => 0.0118391589813655, 'Euro' => 0.0109947246574823, 'form' => 0.00993016812782269, 'Komm' => 0.00951838779460321, 'denn' => 0.00843580401533265, 'dett' => 0.00816349766594558, 'frem' => 0.00795665869672473, 'forb' => 0.00740635318228396, 'Parl' => 0.00722513188356294, 'komm' => 0.0069784432046757, 'over' => 0.00667197996128885, 'unde' => 0.00659133173934495, 'medl' => 0.00618809062962541, 'fore' => 0.00590250104368287, 'mege' => 0.00566814679873999, 'vore' => 0.00549641352613002, 'arbe' => 0.00538255721279745, 'bliv' => 0.00534934912140878, 'elle' => 0.00524592963679836, 'land' => 0.00523739041329842, }; ${Lingua::Identify::languages{'suffixes1'}{'da'}} = { 'e' => 0.205400254186509, 'r' => 0.179192084571201, 't' => 0.1514295080264, 'n' => 0.0995322000926106, 'g' => 0.0856392683226109, 'd' => 0.0441100471740783, 'l' => 0.0427002883109136, 's' => 0.0420388480819129, 'm' => 0.0369779570677374, 'å' => 0.0274020759460566, 'f' => 0.0214603095558185, 'i' => 0.0170281674119998, 'a' => 0.0110537096631178, 'k' => 0.00945317657005848, 'v' => 0.00598878820746563, 'u' => 0.00443035083649574, 'o' => 0.00350066233588266, 'U' => 0.00238109525903647, 'p' => 0.00220644279504847, 'b' => 0.00165830275422457, }; ${Lingua::Identify::languages{'suffixes2'}{'da'}} = { 'er' => 0.119097623277906, 'et' => 0.0833567514805048, 'en' => 0.0753248049220548, 'de' => 0.0383342643241637, 'or' => 0.0366602013463619, 'il' => 0.0337100883083992, 're' => 0.0336125923487465, 'ke' => 0.0324506699119439, 'ne' => 0.0308012676768777, 'ed' => 0.0277444825889421, 'es' => 0.0257435213934811, 'ge' => 0.0251138121952534, 'ng' => 0.023349135325539, 'te' => 0.0231197330675326, 'om' => 0.0205773825431764, 'le' => 0.0201896927271455, 'ar' => 0.0183682387985745, 'se' => 0.0177643373543726, 'ig' => 0.0175056863084703, 'eg' => 0.0160139981257836, }; ${Lingua::Identify::languages{'suffixes3'}{'da'}} = { 'ing' => 0.0287442168089566, 'der' => 0.0237478273094423, 'kke' => 0.0230337150452431, 'nde' => 0.0215176243545831, 'rne' => 0.0206908836460304, 'ske' => 0.0196189164664383, 'ige' => 0.0187706086089668, 'ter' => 0.0185006198558356, 'ger' => 0.0183632291293901, 'ere' => 0.0183368692807116, 'lse' => 0.0154660421246358, 'tte' => 0.0152711390010736, 'den' => 0.0150490772455396, 'igt' => 0.0143645199631921, 'gen' => 0.0143213856653545, 'get' => 0.0142830440672767, 'lle' => 0.0133181138489852, 'res' => 0.0129706431164051, 'det' => 0.0123667629466796, 'ver' => 0.0122780980011247, }; ${Lingua::Identify::languages{'suffixes4'}{'da'}} = { 'erne' => 0.0242818025006689, 'ning' => 0.0198565787251484, 'ende' => 0.0186572899489353, 'else' => 0.0183185667740081, 'iske' => 0.0170898650610366, 'ngen' => 0.0117575842680638, 'nger' => 0.0112926701063989, 'lige' => 0.0109378172564751, 'onen' => 0.0107338243079895, 'ette' => 0.0107129506109352, 'enne' => 0.00923281572890001, 'nder' => 0.00885519157309874, 'ligt' => 0.00829634577468931, 'ring' => 0.00772231910569491, 'eder' => 0.00767582768952843, 'ller' => 0.00697086601173861, 'eres' => 0.00693386263968773, 'mand' => 0.00674789697502177, 'nden' => 0.00650879826330841, 'slag' => 0.00644143315008758, }; ${Lingua::Identify::languages{'smallwords'}{'da'}} = { 'at' => 0.033941432121887, 'og' => 0.0298985743743289, 'i' => 0.0268720461438653, 'er' => 0.0231198767792246, 'af' => 0.0197020206514011, 'for' => 0.0195460941934024, 'til' => 0.0164051047126072, 'det' => 0.0160764065559949, 'en' => 0.0149495031517877, 'som' => 0.0134485490200821, 'de' => 0.0127354338340545, 'der' => 0.0126179490789973, 'den' => 0.0117605695246273, 'på' => 0.0111951741409143, 'med' => 0.0105814026815895, 'har' => 0.0104276358698234, 'om' => 0.0102635027561405, 'ikke' => 0.0100833882603359, 'vi' => 0.00888003337949217, 'et' => 0.00770173039494746, }; ${Lingua::Identify::languages{'ngrams1'}{'da'}} = { 'e' => 0.161330354783632, 'r' => 0.0888844634874029, 'n' => 0.0738827369491463, 't' => 0.0723510573409268, 'i' => 0.066562643183, 'd' => 0.0598662618770054, 's' => 0.0590444774546072, 'a' => 0.0544545469750646, 'o' => 0.0512290389975186, 'l' => 0.0489655477550759, 'g' => 0.0454372467404021, 'm' => 0.0355635571648124, 'k' => 0.0311290192479972, 'f' => 0.0278643747554483, 'v' => 0.0229296309805641, 'u' => 0.0164697166178504, 'p' => 0.0149360595857149, 'h' => 0.0134212704734439, 'b' => 0.0131418769526546, 'å' => 0.0104838896389825, }; ${Lingua::Identify::languages{'ngrams2'}{'da'}} = { 'er' => 0.0426607771835151, 'de' => 0.0406810921977606, 'en' => 0.0315263460754235, 'et' => 0.0221044818371379, 're' => 0.020244216755467, 'or' => 0.0186462889247601, 'ge' => 0.0162402931911226, 'nd' => 0.0153282669049097, 'in' => 0.0151368288644622, 'te' => 0.0150833238648472, 'ti' => 0.0145607854940826, 'fo' => 0.0139029588163067, 'at' => 0.0130840068164963, 'an' => 0.0129344776350627, 'ne' => 0.0123969863461548, 'me' => 0.0122761423546289, 'le' => 0.0120254012442659, 'ig' => 0.0118213498579015, 'ed' => 0.0116479163515449, 'st' => 0.0115993956579777, }; ${Lingua::Identify::languages{'ngrams3'}{'da'}} = { 'for' => 0.0173100535560177, 'der' => 0.0115285412075723, 'det' => 0.0110231343412252, 'ing' => 0.010304245720395, 'den' => 0.00980805200004896, 'nde' => 0.00919945789848929, 'til' => 0.00756438771603347, 'ere' => 0.00735182727615049, 'lig' => 0.00665636636943742, 'ter' => 0.00625335704007416, 'and' => 0.00585429620185424, 'ger' => 0.00581217896299198, 'lle' => 0.00567740379863274, 'nin' => 0.00548669167640957, 'ion' => 0.00539943002214182, 'nge' => 0.00524596533303746, 'kke' => 0.005232803695893, 'ske' => 0.00508605144173231, 'som' => 0.00507565374838819, 'end' => 0.00498997149057778, }; ${Lingua::Identify::languages{'ngrams4'}{'da'}} = { 'ning' => 0.00709971457996074, 'else' => 0.00590022956250569, 'ikke' => 0.00538299221803775, 'erne' => 0.00505519219689046, 'inge' => 0.00494006076684707, 'ende' => 0.00457485157184295, 'ione' => 0.00379182119303151, 'ring' => 0.00344813508435598, 'tion' => 0.00329696102561948, 'komm' => 0.00325374403481684, 'iske' => 0.00322743804041523, 'enne' => 0.0030930041599343, 'euro' => 0.00305422844091375, 'fors' => 0.00302365199287552, 'lige' => 0.00299273390854636, 'nger' => 0.00283096912479103, 'urop' => 0.00275632159522283, 'inde' => 0.00270678433303799, 'elle' => 0.00261642153408702, 'form' => 0.00256568854488392, }; Lingua-Identify-0.56/lib/Lingua/Identify/DE.pm000644 000765 000024 00000022615 11375521415 021121 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'de'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'de'}} = 'german'; ${Lingua::Identify::languages{'_sets'}{'de'}} = ''; =head1 NAME Lingua::Identify::DE - Meta-information on German =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'de'}} = { 'd' => 0.160298467638764, 'e' => 0.0538407100600081, 'i' => 0.0495813042033913, 'w' => 0.0487322502668203, 'a' => 0.0475638071543175, 's' => 0.0441044203815353, 'u' => 0.0436392519136845, 'v' => 0.0285144294993674, 'g' => 0.0276613997639259, 'z' => 0.0272792813169259, 'm' => 0.0271569150628, 'A' => 0.0254888465589019, 'b' => 0.0254504138364869, 'n' => 0.0245240526996558, 'E' => 0.0245209604116454, 'h' => 0.0211865904255695, 'S' => 0.0210845449212262, 'D' => 0.0195392844268842, 'B' => 0.0192123854086414, 'P' => 0.0181720513708555, }; ${Lingua::Identify::languages{'prefixes2'}{'de'}} = { 'de' => 0.0752946605758878, 'di' => 0.0558922422006619, 'un' => 0.0414281397953843, 'da' => 0.0336007860452383, 'ei' => 0.0302261440424116, 'wi' => 0.023378982652061, 'au' => 0.0229576843440158, 'be' => 0.0216621093001956, 'ge' => 0.0207074501711909, 'vo' => 0.0188794599816255, 'we' => 0.0166429380994103, 'Be' => 0.0151303305938698, 'si' => 0.0146343239438817, 'ha' => 0.012775126472239, 'Ko' => 0.0122725000957496, 'fü' => 0.011949552010009, 'Ve' => 0.0110719654109834, 'is' => 0.0110535247443012, 'ni' => 0.0105740674105663, 'zu' => 0.010112105068299, }; ${Lingua::Identify::languages{'prefixes3'}{'de'}} = { 'ein' => 0.0354715731299352, 'die' => 0.0190436332292137, 'Ver' => 0.016082183352134, 'nic' => 0.0145924507279596, 'ver' => 0.0131588429142773, 'wer' => 0.0117300853928598, 'Eur' => 0.0102417385664753, 'auc' => 0.0101024658885874, 'Kom' => 0.00998536597533834, 'wir' => 0.00953775328919105, 'Her' => 0.00925227894446554, 'sic' => 0.00904302347818616, 'Fra' => 0.00885663367544061, 'hab' => 0.00880189466273839, 'übe' => 0.00862312674783746, 'Ber' => 0.00743272644628786, 'Par' => 0.00716041718056668, 'sei' => 0.00686870674578648, 'sin' => 0.00675922872038204, 'all' => 0.00669271042646541, }; ${Lingua::Identify::languages{'prefixes4'}{'de'}} = { 'dies' => 0.0200506778095099, 'nich' => 0.0173805566632898, 'eine' => 0.0159521987217435, 'werd' => 0.0137394823729897, 'Euro' => 0.0112138483017815, 'Komm' => 0.0110924791587715, 'habe' => 0.00835300421654567, 'Parl' => 0.00749433817076061, 'Präs' => 0.00683795402999221, 'unte' => 0.00597350850120667, 'Beri' => 0.00579599580904918, 'unse' => 0.00553674471445639, 'könn' => 0.00541867813316094, 'Unio' => 0.00526345773257672, 'möch' => 0.00506200146798868, 'alle' => 0.00481761190111139, 'durc' => 0.00471605812838874, 'müss' => 0.00450964802122887, 'mein' => 0.0044559813933673, 'Mitg' => 0.00423305847763464, }; ${Lingua::Identify::languages{'suffixes1'}{'de'}} = { 'n' => 0.261666272534593, 'e' => 0.155058777247603, 'r' => 0.132202505795952, 't' => 0.110329280771585, 's' => 0.074162100996528, 'h' => 0.0475491246952843, 'd' => 0.0440054721739073, 'g' => 0.0398155903914284, 'm' => 0.0349838995206323, 'ß' => 0.0180692243259789, 'u' => 0.0173531563348811, 'l' => 0.0139566955672297, 'f' => 0.0109116385359504, 'i' => 0.00764159470993721, 'o' => 0.00642339509051409, 'a' => 0.0057837960885891, 'k' => 0.00514684919033485, 'z' => 0.00475124372611725, 'b' => 0.00298759478508015, 'U' => 0.00125267696714265, }; ${Lingua::Identify::languages{'suffixes2'}{'de'}} = { 'en' => 0.19397621980485, 'er' => 0.0885347683097809, 'ie' => 0.0620540622332767, 'ch' => 0.0505673153535702, 'nd' => 0.0404305875065314, 'ng' => 0.031438184023435, 'on' => 0.0254266982222779, 'es' => 0.023188172792979, 'ht' => 0.0214064313866421, 'te' => 0.0202720371101554, 'it' => 0.0193736006260684, 'as' => 0.0161869882754039, 'em' => 0.0156285390713477, 'ne' => 0.0156242833195809, 'st' => 0.0150095636199425, 'aß' => 0.0132164735422277, 'ir' => 0.0132112720678461, 'in' => 0.0131805360828642, 'ür' => 0.012161992826694, 'he' => 0.0118196412401261, }; ${Lingua::Identify::languages{'suffixes3'}{'de'}} = { 'ung' => 0.0429281048650004, 'ten' => 0.0393590197522635, 'gen' => 0.0375145420644154, 'hen' => 0.0346307268364575, 'cht' => 0.0268315379354738, 'ich' => 0.0258614895647047, 'nen' => 0.0257977435289113, 'den' => 0.0256175916886256, 'ion' => 0.0187073828302824, 'ine' => 0.0176936822828287, 'ren' => 0.0171171963939145, 'che' => 0.015768136266852, 'men' => 0.0156468802205059, 'eit' => 0.0151777925440696, 'sen' => 0.0147114764344071, 'ben' => 0.0143920533637467, 'len' => 0.0137275702297698, 'ber' => 0.0131496985574688, 'uch' => 0.0115789130885162, 'ent' => 0.00988617868152411, }; ${Lingua::Identify::languages{'suffixes4'}{'de'}} = { 'chen' => 0.0337718458144542, 'icht' => 0.0256023698498038, 'ngen' => 0.0178077238081399, 'rden' => 0.0174510574497034, 'lich' => 0.0162266680757189, 'rung' => 0.0121737163313563, 'ssen' => 0.0116238556954334, 'aben' => 0.0095672448184304, 'eren' => 0.00930634998216668, 'nden' => 0.00927993025191213, 'llen' => 0.00926506915364394, 'sion' => 0.009154436533203, 'igen' => 0.00904958322875524, 'sche' => 0.00864833357551419, 'iner' => 0.00846587231344368, 'chte' => 0.00815461486638221, 'inen' => 0.00741403680268424, 'tion' => 0.00718451539609781, 'tung' => 0.00713497840187052, 'mmen' => 0.00701361276601366, }; ${Lingua::Identify::languages{'smallwords'}{'de'}} = { 'die' => 0.0390075934653572, 'der' => 0.0370495988824464, 'und' => 0.0270268022937508, 'in' => 0.0163873730793688, 'den' => 0.0129440334775936, 'zu' => 0.0126779492785748, 'daß' => 0.0118740073812107, 'für' => 0.0100634969678218, 'von' => 0.00993570653013513, 'das' => 0.00905561553634128, 'des' => 0.00850244049101287, 'nicht' => 0.0082293540762305, 'wir' => 0.00802016288029143, 'eine' => 0.00756326830171322, 'auf' => 0.0071807722656238, 'ist' => 0.00695144969936424, 'es' => 0.00657595587903847, 'im' => 0.00644816544135185, 'ich' => 0.00644335141801434, 'mit' => 0.00627879934756855, }; ${Lingua::Identify::languages{'ngrams1'}{'de'}} = { 'e' => 0.16347210799744, 'n' => 0.104262137234933, 'i' => 0.0854715021584291, 'r' => 0.0758891387533222, 's' => 0.0603040250082876, 't' => 0.0594704494511644, 'a' => 0.0523778743626403, 'd' => 0.0514602922407541, 'h' => 0.0432981648211718, 'u' => 0.0399409649699917, 'l' => 0.0325080612028661, 'g' => 0.0319147648392273, 'c' => 0.0279173795435556, 'm' => 0.0265495018162774, 'o' => 0.02647905881145, 'b' => 0.0185663258810079, 'f' => 0.0166171839803565, 'w' => 0.015253098216071, 'k' => 0.01323431545661, 'z' => 0.0121458762329531, }; ${Lingua::Identify::languages{'ngrams2'}{'de'}} = { 'en' => 0.0494212104632282, 'er' => 0.041820117871142, 'ch' => 0.0316217297825517, 'de' => 0.0263146193115747, 'ie' => 0.0216047007283277, 'ei' => 0.0210930341270354, 'in' => 0.0196976353525018, 'un' => 0.0193447015862014, 'ge' => 0.0181271674523083, 'te' => 0.0172415133625179, 'nd' => 0.0146194950304043, 'be' => 0.0141895098824512, 'es' => 0.0138760627652518, 'ic' => 0.0135104618216956, 'di' => 0.0132637576246579, 'he' => 0.0125944065064912, 're' => 0.0115614637211607, 'ng' => 0.0115366535257079, 'st' => 0.0115095719743333, 'ne' => 0.0107505022972581, }; ${Lingua::Identify::languages{'ngrams3'}{'de'}} = { 'ich' => 0.0162476608431319, 'der' => 0.0146904549769669, 'die' => 0.014520275088223, 'sch' => 0.0120244122669322, 'ein' => 0.0116484486355612, 'ung' => 0.0104647021541638, 'che' => 0.00971615018032209, 'den' => 0.00953704210803559, 'cht' => 0.00864912336670042, 'gen' => 0.00822960762048774, 'und' => 0.00801739994092156, 'ten' => 0.00706502406962067, 'hen' => 0.00624755087404195, 'ine' => 0.00602924589839792, 'eit' => 0.00601476482021306, 'nde' => 0.00596620421216456, 'ver' => 0.00544390547492547, 'ent' => 0.00506140902632823, 'ber' => 0.00503397119397796, 'ere' => 0.00497212719090275, }; ${Lingua::Identify::languages{'ngrams4'}{'de'}} = { 'eine' => 0.00763245727244077, 'icht' => 0.00731561106902212, 'chen' => 0.00648629442141337, 'sche' => 0.00632263769937972, 'lich' => 0.00587254635148768, 'isch' => 0.00515115544191841, 'dies' => 0.00435253327026585, 'iese' => 0.00388532656942129, 'iche' => 0.00367357712008301, 'nder' => 0.00362845482593545, 'rung' => 0.0035764015210881, 'ngen' => 0.00342208044611945, 'sich' => 0.00310353685232534, 'nich' => 0.00307609570792212, 'rden' => 0.00307255947797325, 'tion' => 0.00295204476131579, 'komm' => 0.00284793815162109, 'euro' => 0.00272572604458818, 'erun' => 0.0027025283761236, 'chte' => 0.00269418287344427, }; Lingua-Identify-0.56/lib/Lingua/Identify/EL.pm000644 000765 000024 00000024010 11472757307 021131 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'el'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'el'}} = 'greek'; ${Lingua::Identify::languages{'_sets'}{'el'}} = ''; =head1 NAME Lingua::Identify::EL - Meta-information on Greek =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'el'}} = { 'τ' => 0.163417457638063, 'π' => 0.103719871599745, 'α' => 0.0822022211768178, 'κ' => 0.0784560607318446, 'σ' => 0.0759851398843291, 'ε' => 0.0748672874523803, 'μ' => 0.048422208078516, 'δ' => 0.0407505926520372, 'ο' => 0.0298683943511267, 'Ε' => 0.0236403593731263, 'ν' => 0.0221477755947275, 'γ' => 0.0215793369676509, 'έ' => 0.0150832249494977, 'υ' => 0.0147436606986376, 'ό' => 0.0131599883775632, 'χ' => 0.0125062263054487, 'ά' => 0.0115595972733394, 'q' => 0.00911519587495734, 'λ' => 0.00911173681637472, 'Σ' => 0.00805153536080287, }; ${Lingua::Identify::languages{'prefixes2'}{'el'}} = { 'κα' => 0.0643416012459748, 'τη' => 0.0546126849493823, 'το' => 0.0413538136930945, 'πρ' => 0.0382158567128996, 'στ' => 0.0359316134531602, 'απ' => 0.0320872708521878, 'πο' => 0.0282501631132532, 'πα' => 0.0247997916359733, 'δι' => 0.0234001746890325, 'τω' => 0.0222504893397596, 'αν' => 0.0212139309241681, 'συ' => 0.0197886631027298, 'επ' => 0.019332866794351, 'γι' => 0.0179977058910193, 'πε' => 0.0136219297876371, 'υπ' => 0.0132549249679035, 'μέ' => 0.013081945993728, 'τι' => 0.0116014301349104, 'αρ' => 0.0115547323890304, 'με' => 0.0114876454864984, }; ${Lingua::Identify::languages{'prefixes3'}{'el'}} = { 'προ' => 0.0333433423115362, 'παρ' => 0.0327311513760851, 'κατ' => 0.0206683906011213, 'δια' => 0.0190245279721063, 'περ' => 0.0174747796157449, 'επι' => 0.0153307668520568, 'quo' => 0.0141117629395921, 'ανα' => 0.0134359328292991, 'συν' => 0.0120851689351194, 'απο' => 0.0118368865205953, 'οπο' => 0.0110570825471804, 'στη' => 0.0102468034759538, 'καν' => 0.0101777863426745, 'του' => 0.00962475295003428, 'άρθ' => 0.00959696683144134, 'αυτ' => 0.00950733419081893, 'στο' => 0.00935495870176083, 'μετ' => 0.0091120542456741, 'καθ' => 0.00877772449615252, 'είν' => 0.00840037107913218, }; ${Lingua::Identify::languages{'prefixes4'}{'el'}} = { 'περι' => 0.0136258705993206, 'παρα' => 0.0132630668937472, 'οποί' => 0.0117247791821161, 'παρά' => 0.0115413041652975, 'άρθρ' => 0.0110976470624821, 'προσ' => 0.0100683211206695, 'κανο' => 0.0100278944220485, 'είνα' => 0.00971070032517578, 'κατα' => 0.00943807925498777, 'πρέπ' => 0.00861503313434414, 'κράτ' => 0.00797131570245535, 'μετα' => 0.00763028021921636, 'Επιτ' => 0.00726022043953151, 'προϊ' => 0.00645894254122227, 'αναφ' => 0.00640711344042608, 'εφαρ' => 0.00638949154615537, 'κοιν' => 0.00617180932281133, 'οδηγ' => 0.00597693190381763, 'Άρθρ' => 0.0059282125490692, 'πληρ' => 0.00569498159548631, }; ${Lingua::Identify::languages{'suffixes1'}{'el'}} = { 'ς' => 0.209318186919729, 'ν' => 0.143048310705227, 'α' => 0.126989060347949, 'ι' => 0.102766079538816, 'υ' => 0.0648931388173757, 'ο' => 0.0626001113131324, 'η' => 0.0616032429302635, 'ε' => 0.0337519374758625, 'ό' => 0.0293673281567547, 'ή' => 0.025255533856977, 'ά' => 0.0210873347641733, 'ύ' => 0.0137075158235589, 'ί' => 0.0134427586318039, 't' => 0.0134053472894907, 'Κ' => 0.00649576014379769, 'Ε' => 0.00444907193970903, 'ω' => 0.00437079590040755, 'θ' => 0.00371580963037018, 'Α' => 0.00346198806175289, 'e' => 0.00334802735747574, }; ${Lingua::Identify::languages{'suffixes2'}{'el'}} = { 'αι' => 0.0742450851341104, 'ου' => 0.0738446085012654, 'ης' => 0.056064102522018, 'ων' => 0.0536691209537909, 'ση' => 0.0315050371425664, 'ην' => 0.0313586333899034, 'ια' => 0.0293017591428487, 'ών' => 0.0263946926998362, 'ος' => 0.0250448894914275, 'ις' => 0.0248525294038479, 'ες' => 0.0242294928061923, 'ας' => 0.0240043067651008, 'ει' => 0.0237804337621497, 'ής' => 0.0215738731670808, 'ία' => 0.0212081920449584, 'πό' => 0.0180214484780247, 'υς' => 0.0162278383781353, 'τα' => 0.01585427902717, 'ές' => 0.0149154567567301, 'ον' => 0.0142366160381044, }; ${Lingua::Identify::languages{'suffixes3'}{'el'}} = { 'ται' => 0.0489944388470378, 'σης' => 0.0230190857622293, 'ους' => 0.0221128437176182, 'ουν' => 0.01576287473243, 'ίας' => 0.0155083022886421, 'εις' => 0.0153370933563763, 'uot' => 0.0141108429933417, 'τος' => 0.0133390582253791, 'ίου' => 0.0130862785452798, 'ικά' => 0.012086812788859, 'θρο' => 0.0113544688220467, 'κών' => 0.0113464013854478, 'την' => 0.0108587696621358, 'ική' => 0.0106624620382289, 'ούν' => 0.0105029060699393, 'των' => 0.0101721411693839, 'ηση' => 0.0100502332385559, 'κής' => 0.0095446738783574, 'ίες' => 0.00946668865790125, 'πει' => 0.00901311944467352, }; ${Lingua::Identify::languages{'suffixes4'}{'el'}} = { 'νται' => 0.0277146002608189, 'εται' => 0.0199802208874892, 'σεις' => 0.0133363879691579, 'ρθρο' => 0.0131124734879034, 'ικών' => 0.0112413362718654, 'έπει' => 0.0103819607859398, 'ίναι' => 0.0101881088970761, 'τικά' => 0.00920122655376954, 'ικές' => 0.00889230750092779, 'ικής' => 0.00855021593234465, 'σμού' => 0.00837398694246849, 'ματα' => 0.00723679163720878, 'ησης' => 0.00703568326052657, 'ικού' => 0.0069983641803175, 'ατος' => 0.0068812237341057, 'σεων' => 0.00672987413103558, 'τητα' => 0.00661169704370686, 'χουν' => 0.00600733527254332, 'ισμό' => 0.00586116887505779, 'φωνα' => 0.00567353683289552, }; ${Lingua::Identify::languages{'smallwords'}{'el'}} = { 'και' => 0.0242980894569425, 'της' => 0.0242756149393867, 'του' => 0.0199552809366961, 'την' => 0.0181928606297673, 'που' => 0.0178181111161054, 'των' => 0.0176388376388585, 'να' => 0.0166039645048968, 'για' => 0.0142012817792292, 'το' => 0.0129918391368113, 'με' => 0.0125413034592986, 'από' => 0.0122078443383554, 'τα' => 0.0105405487336393, 'ή' => 0.00937762311592983, 'η' => 0.00895269793516677, 'σε' => 0.00848386904569017, 'τις' => 0.00779238563345212, 'τη' => 0.0075974322602361, 'στο' => 0.00752948604436992, 'στην' => 0.00588989158919927, 'οι' => 0.00569127957359046, }; ${Lingua::Identify::languages{'ngrams1'}{'el'}} = { 'α' => 0.0891279287631483, 'ο' => 0.0809105426133775, 'τ' => 0.0769490429616098, 'ι' => 0.0651272525025264, 'ε' => 0.0619461781935302, 'ν' => 0.058346729570849, 'ρ' => 0.0473468206792417, 'σ' => 0.0441621249069379, 'π' => 0.0421776583160806, 'η' => 0.038641108792951, 'κ' => 0.0350648184751047, 'ς' => 0.0346692212595907, 'μ' => 0.0314543148591689, 'υ' => 0.0304275347098198, 'ί' => 0.0208500031830756, 'λ' => 0.0199111864714329, 'ό' => 0.0179348205221845, 'ω' => 0.017710099193786, 'γ' => 0.0177018079488451, 'ά' => 0.0167760475654427, }; ${Lingua::Identify::languages{'ngrams2'}{'el'}} = { 'ου' => 0.0208149614584568, 'το' => 0.0196626343684152, 'τη' => 0.0183742029266245, 'τα' => 0.0180461210202031, 'αι' => 0.0148417235682521, 'πο' => 0.0144935550144987, 'κα' => 0.0141587775589667, 'ρο' => 0.0132287888755991, 'στ' => 0.012806622960123, 'ικ' => 0.0112047705556532, 'ων' => 0.0110458089845265, 'ει' => 0.0104138876770843, 'να' => 0.00993480959416776, 'ης' => 0.00986104311344881, 'ια' => 0.00968049416846533, 'ση' => 0.00967241333333179, 'αρ' => 0.00947743432560986, 'τι' => 0.00934433142691041, 'αν' => 0.00872972619475436, 'ατ' => 0.00863194808963863, }; ${Lingua::Identify::languages{'ngrams3'}{'el'}} = { 'του' => 0.00848506443855641, 'ται' => 0.00811271492031518, 'και' => 0.00743327450432324, 'της' => 0.00702901756323156, 'την' => 0.00698801435920655, 'των' => 0.00671976100329643, 'τικ' => 0.00604003183234656, 'παρ' => 0.00597491758933501, 'προ' => 0.00583992464650618, 'που' => 0.00560415622336236, 'ντα' => 0.00529966412023294, 'στο' => 0.00524494505570661, 'οπο' => 0.00484039935965701, 'για' => 0.00447425807301111, 'ετα' => 0.00424671916616808, 'από' => 0.00420456094231138, 'κατ' => 0.00410291919712261, 'δια' => 0.00409570032317455, 'επι' => 0.00371728695081695, 'σης' => 0.00370920181199511, }; ${Lingua::Identify::languages{'ngrams4'}{'el'}} = { 'νται' => 0.00496535734834234, 'οντα' => 0.00418193758792897, 'εται' => 0.00380206407804015, 'άρθρ' => 0.00305896195402199, 'quot' => 0.00291193204147982, 'ρθρο' => 0.00286810048264649, 'παρα' => 0.00281705613565072, 'περι' => 0.00279874675031528, 'οποί' => 0.00275417541833709, 'νισμ' => 0.00261324863424007, 'οποι' => 0.00252188665084908, 'ότητ' => 0.00247491105615008, 'τους' => 0.00242183233300592, 'τροπ' => 0.0023926112937837, 'κοιν' => 0.00239224140721127, 'σεις' => 0.00238003515032098, 'επιτ' => 0.00226093167399751, 'κατά' => 0.00225556831869723, 'στην' => 0.00224613621110019, 'τητα' => 0.00219120805509387, }; Lingua-Identify-0.56/lib/Lingua/Identify/EN.pm000644 000765 000024 00000022607 11375521415 021134 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'en'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'en'}} = 'english'; ${Lingua::Identify::languages{'_sets'}{'en'}} = ''; =head1 NAME Lingua::Identify::EN - Meta-information on English =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'en'}} = { 't' => 0.170758391694659, 'a' => 0.0952308999612686, 'i' => 0.0744895997155101, 'o' => 0.0713993554869753, 'w' => 0.0530210514273427, 's' => 0.0476842809668798, 'c' => 0.0448318008460267, 'p' => 0.0400873731407881, 'b' => 0.0368787637369075, 'f' => 0.0352433601788537, 'r' => 0.0311676009532618, 'm' => 0.0292978791441595, 'd' => 0.0250694684697226, 'h' => 0.0249516201728717, 'e' => 0.0237466041082881, 'n' => 0.0202326918067178, 'l' => 0.0163762613558205, 'E' => 0.0140146435123479, 'T' => 0.0131173424450972, 'C' => 0.0124012934367458, }; ${Lingua::Identify::languages{'prefixes2'}{'en'}} = { 'th' => 0.14773642550096, 'an' => 0.0420250770380145, 'co' => 0.0309584871050066, 're' => 0.0291803293568269, 'pr' => 0.0213847889712726, 'fo' => 0.0200085968783693, 'in' => 0.0183300661017392, 'ha' => 0.0182008319824622, 'wi' => 0.0181969419922498, 'wh' => 0.0156276034569911, 'be' => 0.0143957732230795, 'Th' => 0.0133469886397158, 'de' => 0.0131989529010791, 'ar' => 0.0122294808959339, 'al' => 0.012083174041836, 'Co' => 0.0119249811065336, 'ma' => 0.0118707373541281, 'no' => 0.0116243713073457, 'Eu' => 0.0114746066841702, 'po' => 0.0101749177321098, }; ${Lingua::Identify::languages{'prefixes3'}{'en'}} = { 'tha' => 0.033426693062962, 'pro' => 0.0190548257157897, 'the' => 0.018297267296821, 'thi' => 0.0174839219096881, 'con' => 0.0166794452200581, 'Eur' => 0.0151852127343053, 'wit' => 0.0132609914631634, 'hav' => 0.0124153182431969, 'com' => 0.0110601240473302, 'Com' => 0.00998357860526858, 'whi' => 0.00908126015608908, 'wil' => 0.00896053143524134, 'int' => 0.00843212871627034, 'res' => 0.00803933124299087, 'imp' => 0.00763051289680269, 'cou' => 0.00747688416908887, 'sho' => 0.00681230403039388, 'Pre' => 0.00655310919369235, 'par' => 0.00635256219531733, 'wor' => 0.0061328473668551, }; ${Lingua::Identify::languages{'prefixes4'}{'en'}} = { 'Euro' => 0.0198695043275931, 'Comm' => 0.0130219439034349, 'whic' => 0.0110534973226175, 'Pres' => 0.0085832898486506, 'inte' => 0.00763766355002265, 'woul' => 0.00719985257462747, 'shou' => 0.00714763311515942, 'poli' => 0.00710373849705584, 'Memb' => 0.00689750947234507, 'ther' => 0.00688048138773592, 'comp' => 0.00650813393761591, 'coun' => 0.00647521297403823, 'Stat' => 0.00640710063560164, 'cont' => 0.00616416662851114, 'cons' => 0.00611724479536594, 'Unio' => 0.00577138547686016, 'comm' => 0.00553374909609251, 'repo' => 0.00551331539456153, 'impo' => 0.00548228644038487, 'part' => 0.00541493090570868, }; ${Lingua::Identify::languages{'suffixes1'}{'en'}} = { 'e' => 0.215404337364344, 's' => 0.132237835677075, 't' => 0.108556098594789, 'n' => 0.106317375561167, 'd' => 0.0817060678819144, 'y' => 0.0592910968846893, 'r' => 0.0555547227613847, 'o' => 0.0500687916702204, 'f' => 0.0418268862141837, 'l' => 0.0381420304613134, 'g' => 0.0295515575691944, 'h' => 0.0229155090109765, 'm' => 0.0123532092584887, 'w' => 0.00823484106410419, 'k' => 0.00724565389131691, 'a' => 0.00670617557081579, 'p' => 0.00497832868548035, 'c' => 0.00488890382174988, 'u' => 0.00275356213348082, 'U' => 0.00255799908852114, }; ${Lingua::Identify::languages{'suffixes2'}{'en'}} = { 'he' => 0.0984437952904326, 'nd' => 0.0455569533233677, 'on' => 0.0413252538944018, 'ng' => 0.0365448447544843, 'es' => 0.0350476171540616, 'ed' => 0.0348029777713136, 're' => 0.028900080156137, 'nt' => 0.0282508781192332, 'at' => 0.0280336850276592, 'er' => 0.0279316799140045, 'al' => 0.0239294919907616, 'ly' => 0.019966420435255, 've' => 0.019406472872799, 'an' => 0.0184045751886824, 'ts' => 0.0180425867027258, 'or' => 0.0179098936099631, 'll' => 0.0156525982452959, 'se' => 0.0155363297047319, 'is' => 0.0153794320087292, 'en' => 0.0151929268962632, }; ${Lingua::Identify::languages{'suffixes3'}{'en'}} = { 'ion' => 0.04985545391476, 'ing' => 0.0456096182867606, 'hat' => 0.0352148880322135, 'ent' => 0.029132938054743, 'his' => 0.0174090418189491, 'ons' => 0.0131014111302631, 'ies' => 0.0125438251428653, 'ave' => 0.0123976340502228, 'uld' => 0.0120703490991796, 'ted' => 0.0116137522618852, 'ean' => 0.011505038729196, 'ith' => 0.0113179370176731, 'ity' => 0.0110924994814651, 'ill' => 0.010091476715993, 'ate' => 0.00951472281625267, 'nce' => 0.00930273142750881, 'ere' => 0.00927040345594599, 'ive' => 0.00909102612700888, 'ich' => 0.00847393378487594, 'ers' => 0.00840727522404286, }; ${Lingua::Identify::languages{'suffixes4'}{'en'}} = { 'tion' => 0.043447396134284, 'ment' => 0.0215948509592771, 'ions' => 0.0163236609077118, 'ould' => 0.0159641755847407, 'pean' => 0.0142515118249859, 'sion' => 0.0135945996979566, 'ting' => 0.0117653870124386, 'hich' => 0.0110664718634622, 'port' => 0.00964518041811549, 'ther' => 0.00915779400128734, 'mber' => 0.00844487784499523, 'tive' => 0.00838395454289171, 'dent' => 0.00814896466334956, 'here' => 0.00805549847937708, 'ents' => 0.00732820291016613, 'ates' => 0.00726538758004698, 'ally' => 0.00695395976867309, 'onal' => 0.00681130085629404, 'ding' => 0.00676854102314064, 'nion' => 0.00676135131668122, }; ${Lingua::Identify::languages{'smallwords'}{'en'}} = { 'the' => 0.0701104399345048, 'of' => 0.0365143240841526, 'to' => 0.0329557853898738, 'and' => 0.0291434460137306, 'in' => 0.0219181203028555, 'that' => 0.016723699654538, 'is' => 0.0164339272224879, 'a' => 0.016432928007205, 'for' => 0.0114395161666371, 'I' => 0.0104373032378572, 'on' => 0.0102784280078712, 'we' => 0.00844420182017056, 'be' => 0.00829931560414554, 'this' => 0.00787215107069248, 'are' => 0.00718569017131879, 'have' => 0.0068547833767881, 'not' => 0.00631154333463453, 'with' => 0.00629389053130275, 'it' => 0.00629355745954177, 'as' => 0.00627990151734171, }; ${Lingua::Identify::languages{'ngrams1'}{'en'}} = { 'e' => 0.125890889211489, 't' => 0.0989762624225809, 'o' => 0.0802197780766555, 'i' => 0.0800499358147479, 'a' => 0.0775778139473822, 'n' => 0.0745558602611499, 's' => 0.0644019685963574, 'r' => 0.0631647956817092, 'h' => 0.0453599617533292, 'l' => 0.0389057847268491, 'c' => 0.0327976561767857, 'd' => 0.0320229304845271, 'u' => 0.0298258271709914, 'm' => 0.0279955749316243, 'p' => 0.0247245032943548, 'f' => 0.0213395305914346, 'g' => 0.0178665232108577, 'w' => 0.0157363098292377, 'y' => 0.0143180721986317, 'b' => 0.0137603025464125, }; ${Lingua::Identify::languages{'ngrams2'}{'en'}} = { 'th' => 0.0370351591712443, 'he' => 0.0267079287024732, 'in' => 0.0232758770962344, 'on' => 0.020911717392357, 're' => 0.0204802307001167, 'an' => 0.0193096858540155, 'at' => 0.0169485820658636, 'er' => 0.0164712566377395, 'en' => 0.0159675900768821, 'ti' => 0.0155715261824188, 'es' => 0.0141409272121763, 'is' => 0.014120913116227, 'or' => 0.0128245592325778, 'nt' => 0.012530803953322, 'it' => 0.0123146517170695, 'nd' => 0.0121674512694425, 'io' => 0.0115121510311026, 'te' => 0.0112025996804201, 'al' => 0.0111979512452319, 'to' => 0.0110733473575475, }; ${Lingua::Identify::languages{'ngrams3'}{'en'}} = { 'the' => 0.0321360300483628, 'ion' => 0.0141982828884038, 'and' => 0.0114744529132272, 'ent' => 0.00983327004982701, 'ing' => 0.00980854186195337, 'tio' => 0.00970664566319795, 'hat' => 0.00711385787155594, 'tha' => 0.00707340921157701, 'ati' => 0.00687288713125596, 'for' => 0.00643827918892921, 'men' => 0.00600200740101467, 'res' => 0.0051133991091656, 'com' => 0.0048660598564434, 'pro' => 0.00473823061611284, 'thi' => 0.00473662414451084, 'ate' => 0.00473582090870984, 'con' => 0.00442410804393608, 'ons' => 0.0041405658061831, 'ope' => 0.00407114328338239, 'her' => 0.00394245343326505, }; ${Lingua::Identify::languages{'ngrams4'}{'en'}} = { 'tion' => 0.0131262017745609, 'that' => 0.00865535437089381, 'atio' => 0.00726512703095898, 'ment' => 0.00678617461517829, 'this' => 0.00475554759210813, 'euro' => 0.00423855704415438, 'port' => 0.00420825150969207, 'rope' => 0.00419778516531591, 'ther' => 0.00405555223166675, 'urop' => 0.004055317910524, 'comm' => 0.00396596344808873, 'sion' => 0.00394612425800257, 'with' => 0.00371625521696498, 'ions' => 0.00342108868414775, 'ould' => 0.00330463107620108, 'have' => 0.00326932669069344, 'opea' => 0.00301524446490502, 'pean' => 0.00301516635785744, 'pres' => 0.00290159871067134, 'ssio' => 0.00276311491530619, }; Lingua-Identify-0.56/lib/Lingua/Identify/ES.pm000644 000765 000024 00000022626 11375521415 021142 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'es'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'es'}} = 'spanish'; ${Lingua::Identify::languages{'_sets'}{'es'}} = ''; =head1 NAME Lingua::Identify::ES - Meta-information on Spanish =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'es'}} = { 'd' => 0.123692374671329, 'e' => 0.11702320930107, 'l' => 0.0977325901599634, 'p' => 0.0838099555879987, 'c' => 0.0659256755706169, 's' => 0.0602675849675155, 'a' => 0.0496351374464286, 'q' => 0.0447000614582882, 'm' => 0.0362591624814745, 't' => 0.0304398563954673, 'n' => 0.027204089799479, 'r' => 0.0255276942830698, 'u' => 0.0239073559269604, 'h' => 0.023351285263841, 'i' => 0.0226196779454764, 'E' => 0.0188249764519009, 'C' => 0.015212767611564, 'P' => 0.0146861576458284, 'f' => 0.0143444953914467, 'v' => 0.0121234861497896, }; ${Lingua::Identify::languages{'prefixes2'}{'es'}} = { 'qu' => 0.0608421200368028, 'co' => 0.0554236219560282, 'de' => 0.0489708807941561, 'pr' => 0.0338125641876764, 'es' => 0.0332238754764647, 're' => 0.0304419453794898, 'po' => 0.0288579995945455, 'lo' => 0.0284029771942996, 'pa' => 0.0262398107285037, 'in' => 0.0225578608043593, 'la' => 0.0214033180433925, 'se' => 0.0177380764081189, 'Co' => 0.0165183713205563, 'un' => 0.0162410137253024, 'ha' => 0.0145100349982958, 'so' => 0.0138528423027141, 'di' => 0.0134936140919936, 'pe' => 0.0114574306216152, 'si' => 0.0112586019840536, 'ca' => 0.0106487494402723, }; ${Lingua::Identify::languages{'prefixes3'}{'es'}} = { 'est' => 0.0307741274530282, 'con' => 0.0278783073519444, 'par' => 0.0232532364839231, 'pro' => 0.0222330211588112, 'com' => 0.0201542258654424, 'pre' => 0.0131540994042113, 'Com' => 0.0130610296217672, 'res' => 0.0106468989290581, 'des' => 0.0104145797011252, 'tra' => 0.0101531317626564, 'deb' => 0.00982348001415229, 'per' => 0.00967002144157278, 'Eur' => 0.00956984709558338, 'int' => 0.00834715170942904, 'tod' => 0.00828321063752091, 'sob' => 0.00826544922865754, 'pue' => 0.00821003363300382, 'nue' => 0.00804165547697908, 'Pre' => 0.00777310297496494, 'Est' => 0.00746760674251499, }; ${Lingua::Identify::languages{'prefixes4'}{'es'}} = { 'Comi' => 0.0129892071878787, 'Euro' => 0.0108324232211585, 'cons' => 0.00954465828150881, 'cont' => 0.00953091565503421, 'sobr' => 0.00924717083782326, 'inte' => 0.0086360281545997, 'Pres' => 0.00852608714280286, 'prop' => 0.00728520881112538, 'esta' => 0.00698529619806197, 'Parl' => 0.0066837668054133, 'resp' => 0.00667810807686493, 'Seño' => 0.00663849697702637, 'pres' => 0.00656574189569023, 'comp' => 0.00649945393269508, 'part' => 0.00645499249410078, 'tamb' => 0.00631999139873261, 'polí' => 0.00620034970942429, 'Cons' => 0.00603543819172904, 'pued' => 0.00584384981087721, 'deci' => 0.00566115371774424, }; ${Lingua::Identify::languages{'suffixes1'}{'es'}} = { 'e' => 0.200769749129416, 's' => 0.186129286084498, 'a' => 0.175487817559385, 'o' => 0.146703081207264, 'n' => 0.122347616119452, 'l' => 0.062030676971474, 'r' => 0.0582719987685324, 'd' => 0.0107732945115356, 'i' => 0.00580730315964782, 'y' => 0.00572460487002858, 'á' => 0.00470070178914884, 'u' => 0.00455577508357849, 'í' => 0.00302626612309599, 'z' => 0.00245147207049497, 'é' => 0.00237655233286962, 'ó' => 0.00169695252708778, 'E' => 0.00142756898961522, 't' => 0.000728727502585345, 'm' => 0.000530169728004507, 'C' => 0.000440511681338107, }; ${Lingua::Identify::languages{'suffixes2'}{'es'}} = { 'os' => 0.111041857092011, 'as' => 0.0633217532444023, 'ue' => 0.0587610140348004, 'es' => 0.0517032311177432, 'ón' => 0.0513695864198253, 'te' => 0.0433804947633378, 'do' => 0.0404339497683704, 'to' => 0.0328024539316881, 'ra' => 0.0313893377403398, 'ar' => 0.027577767343258, 'or' => 0.0261173059109698, 'ta' => 0.0201506915468693, 'el' => 0.0193965097088379, 'na' => 0.0192349788868076, 'ia' => 0.0170832769367963, 'on' => 0.015450144625786, 'er' => 0.0150819657521237, 'al' => 0.0150223664488229, 'an' => 0.0144742756595888, 'da' => 0.0141980022536335, }; ${Lingua::Identify::languages{'suffixes3'}{'es'}} = { 'ión' => 0.0642928138603183, 'nte' => 0.0371558951893709, 'mos' => 0.0218205095596609, 'nto' => 0.0217011514505207, 'ado' => 0.02145177825821, 'nes' => 0.0191875265092229, 'ara' => 0.0167527631757498, 'dos' => 0.0166106701886782, 'tos' => 0.014050865026582, 'dad' => 0.0137041581381272, 'sta' => 0.013416419839307, 'les' => 0.0127592397741006, 'cia' => 0.0120068574075561, 'ido' => 0.0111635355292857, 'bre' => 0.0108800600200777, 'ste' => 0.010367104336749, 'ica' => 0.0101028113807957, 'ndo' => 0.00995077188462902, 'das' => 0.00957209407408302, 'ero' => 0.00949181153638753, }; ${Lingua::Identify::languages{'suffixes4'}{'es'}} = { 'ción' => 0.0448115075996786, 'ente' => 0.0338093136280078, 'ones' => 0.0197247268852878, 'sión' => 0.0177263857024832, 'ento' => 0.0169373966270717, 'idad' => 0.0135841930565728, 'emos' => 0.0123788829014103, 'ados' => 0.0115155219049559, 'ncia' => 0.0106190169002107, 'ales' => 0.0100943715006799, 'obre' => 0.00920433361950355, 'amos' => 0.00897636751369818, 'ntes' => 0.00847193187532033, 'nión' => 0.0072464119590049, 'bién' => 0.00722054346472912, 'tado' => 0.00700631999650775, 'ante' => 0.00700066126338493, 'eñor' => 0.00614700095228395, 'ores' => 0.00596592149235343, 'tica' => 0.00587214820060371, }; ${Lingua::Identify::languages{'smallwords'}{'es'}} = { 'de' => 0.0675563030380086, 'la' => 0.0440701192472395, 'que' => 0.0365505522580535, 'en' => 0.027754537033423, 'el' => 0.0245319206739399, 'y' => 0.0221169872730034, 'a' => 0.0215373105084963, 'los' => 0.0189817088793727, 'las' => 0.0134550706065622, 'del' => 0.012351752498117, 'se' => 0.011969938735895, 'una' => 0.00959094529435793, 'un' => 0.00904643558689761, 'no' => 0.00881456488109477, 'por' => 0.00835662023713417, 'para' => 0.00810581342369077, 'con' => 0.00781249698085018, 'es' => 0.00738237682158593, 'al' => 0.00558228724220325, 'ha' => 0.00512704775647702, }; ${Lingua::Identify::languages{'ngrams1'}{'es'}} = { 'e' => 0.137582751339263, 'a' => 0.110241519388447, 'o' => 0.0868936234315965, 's' => 0.0775012626707994, 'n' => 0.0728254144094035, 'r' => 0.0661694950133168, 'i' => 0.0647388531853114, 'l' => 0.0506737247872097, 'd' => 0.0490228953522778, 't' => 0.0473240851971237, 'c' => 0.045585612006794, 'u' => 0.0395515935991409, 'p' => 0.0305046478145822, 'm' => 0.0303106455124318, 'b' => 0.0120554832721342, 'q' => 0.00989719803376087, 'g' => 0.00951550870884729, 'ó' => 0.00851130227068181, 'y' => 0.0078238610047055, 'v' => 0.00765812342859291, }; ${Lingua::Identify::languages{'ngrams2'}{'es'}} = { 'de' => 0.0312043888139478, 'en' => 0.0286472816885396, 'es' => 0.0281682201996675, 'os' => 0.0212260056354301, 'la' => 0.0208438319080878, 'nt' => 0.0170910395432659, 'ue' => 0.0168259185228659, 're' => 0.0167987438583338, 'ci' => 0.0162685018175337, 'er' => 0.015716750501627, 'ar' => 0.0156113166441844, 'co' => 0.0153944954692248, 'te' => 0.0149374233732074, 'ra' => 0.0143007872394689, 'ta' => 0.0142054358547325, 'on' => 0.01395932749715, 'as' => 0.0138690653605415, 'st' => 0.0125814855841765, 'el' => 0.0124803727864651, 'qu' => 0.0123324965201066, }; ${Lingua::Identify::languages{'ngrams3'}{'es'}} = { 'que' => 0.0138976964252045, 'ent' => 0.0136774087038489, 'ión' => 0.0113528463594757, 'est' => 0.0107764100954501, 'nte' => 0.0105425739265396, 'con' => 0.00960559841924743, 'aci' => 0.00702361557133113, 'res' => 0.00699965489093312, 'ció' => 0.00698547920043586, 'los' => 0.00686241413514551, 'men' => 0.0066426282081438, 'ado' => 0.00624633611716296, 'com' => 0.00605904137457531, 'par' => 0.00603746421735825, 'sta' => 0.00577352038721463, 'por' => 0.00508944923428063, 'las' => 0.00486564895244783, 'nto' => 0.00483516494544931, 'pre' => 0.00482111470354052, 'pro' => 0.0047635338014322, }; ${Lingua::Identify::languages{'ngrams4'}{'es'}} = { 'ción' => 0.0089756383164582, 'ente' => 0.00808833994408401, 'ment' => 0.00741671336587085, 'ació' => 0.00577359221936477, 'esta' => 0.00540620535678867, 'amen' => 0.00449008597888845, 'cion' => 0.00421029554322009, 'ento' => 0.00419102756761562, 'ones' => 0.00402231134425541, 'pres' => 0.00388063505304603, 'uest' => 0.00386525305571473, 'para' => 0.0038582906779753, 'ione' => 0.00373053914224478, 'sión' => 0.00355065073134921, 'idad' => 0.0034146414917882, 'cons' => 0.00320836081178734, 'enci' => 0.0031923311514105, 'euro' => 0.0031678818714418, 'ncia' => 0.00314488983332553, 'enta' => 0.00310392514569585, }; Lingua-Identify-0.56/lib/Lingua/Identify/FI.pm000644 000765 000024 00000022732 11375521415 021127 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'fi'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'fi'}} = 'finnish'; ${Lingua::Identify::languages{'_sets'}{'fi'}} = ''; =head1 NAME Lingua::Identify::FI - Meta-information on Finnish =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'fi'}} = { 't' => 0.105303142859377, 'k' => 0.09949671181443, 'j' => 0.0797555680804491, 'o' => 0.0774782296430546, 's' => 0.0745560663760544, 'e' => 0.0739088354836517, 'm' => 0.067657282821253, 'v' => 0.064410301076653, 'p' => 0.0583373985769342, 'a' => 0.0412314951214069, 'h' => 0.0351104713657474, 'l' => 0.0266495515399704, 'n' => 0.0251012501300778, 'y' => 0.0214957650287194, 'E' => 0.0162319011445039, 'r' => 0.0151293228677622, 'u' => 0.0122221974932435, 'T' => 0.011254960248835, 'i' => 0.0108621704972189, 'M' => 0.00898123090563602, }; ${Lingua::Identify::languages{'prefixes2'}{'fi'}} = { 'va' => 0.03584548769979, 'ko' => 0.0302102814334994, 'jo' => 0.029788091166419, 'ta' => 0.0264292441126801, 'et' => 0.0263151927135162, 'ka' => 0.0243783198294698, 'ol' => 0.0228949846730927, 'tä' => 0.0228402933588737, 'si' => 0.0220852864357537, 'ku' => 0.0201610859293922, 'tu' => 0.0196141727872028, 'mi' => 0.0193300447401629, 'to' => 0.0177840146626079, 'pa' => 0.016131936792848, 'se' => 0.014562562861663, 'ma' => 0.0144211658053896, 'sa' => 0.0144144961329239, 'mu' => 0.014286438421582, 'pu' => 0.0142490882557739, 'te' => 0.0135541083848454, }; ${Lingua::Identify::languages{'prefixes3'}{'fi'}} = { 'ett' => 0.0249024927459041, 'toi' => 0.0114727062909517, 'kan' => 0.0110177649062654, 'Eur' => 0.0107854396564638, 'yht' => 0.0103499165016117, 'par' => 0.00914321221905983, 'puh' => 0.00904612106988898, 'myö' => 0.00898509234755303, 'mie' => 0.0088970168050909, 'kom' => 0.00879160355741971, 'val' => 0.00855164971732605, 'sii' => 0.00854471463524242, 'tar' => 0.00851836132332462, 'hyv' => 0.00811543305426563, 'jot' => 0.00800239121630244, 'voi' => 0.00753704720849076, 'ole' => 0.00746006779736245, 'tul' => 0.00734078438552399, 'kos' => 0.00725895041693714, 'esi' => 0.00721664641622698, }; ${Lingua::Identify::languages{'prefixes4'}{'fi'}} = { 'Euro' => 0.0119457932703524, 'yhte' => 0.0103780099728438, 'komi' => 0.009176509380259, 'kans' => 0.00908085593502409, 'puhe' => 0.00835762256861385, 'kosk' => 0.00806210785975805, 'toim' => 0.00788402136415811, 'parl' => 0.00715456663545617, 'vast' => 0.00690571214378813, 'jäse' => 0.0066856314527192, 'kaik' => 0.00632168175865469, 'tark' => 0.00566066201516146, 'asia' => 0.00547557648698335, 'käyt' => 0.00546080075154057, 'esit' => 0.00542736092922267, 'oike' => 0.00542658325893621, 'hyvä' => 0.00525160744448212, 'kesk' => 0.0051645083723983, 'jotk' => 0.00506107822429877, 'unio' => 0.00493742864875122, }; ${Lingua::Identify::languages{'suffixes1'}{'fi'}} = { 'n' => 0.339474275038026, 'a' => 0.256301602960942, 'ä' => 0.135272664673524, 'i' => 0.0777885386878124, 'e' => 0.0641376989857123, 't' => 0.0587674096509914, 's' => 0.0279992993704567, 'o' => 0.0143310041213148, 'u' => 0.0081601672843322, 'y' => 0.00519031662556588, 'ö' => 0.00304569200108586, 'U' => 0.00170582828677417, 'E' => 0.000811381979735916, 'r' => 0.00066692228007967, 'l' => 0.000650068648453107, 'd' => 0.000412312059435536, 'm' => 0.000391846935317568, 'k' => 0.000374391388275771, 'O' => 0.000359945418310147, 'K' => 0.000320820916319913, }; ${Lingua::Identify::languages{'suffixes2'}{'fi'}} = { 'en' => 0.130286657191241, 'ta' => 0.0709556418022339, 'in' => 0.0703400281594886, 'an' => 0.0695283252307551, 'tä' => 0.066063414207936, 'sa' => 0.0387142945754634, 'än' => 0.0321473044059395, 'aa' => 0.0279900781489171, 'si' => 0.0256570158278736, 'ia' => 0.0224275453423125, 'le' => 0.0223888610614033, 'la' => 0.0208354870917891, 'at' => 0.0206013804952522, 'me' => 0.0195062151633044, 'ti' => 0.0194928757560943, 'on' => 0.0179741842452263, 'et' => 0.017915490853502, 'ka' => 0.0168303300769617, 'lä' => 0.0148787748021266, 'sä' => 0.014228478700635, }; ${Lingua::Identify::languages{'suffixes3'}{'fi'}} = { 'sta' => 0.0382228417471991, 'aan' => 0.0318918975148495, 'ssa' => 0.0307538671188274, 'ttä' => 0.025750278266116, 'sen' => 0.0223597660137382, 'den' => 0.0213632092318468, 'lla' => 0.0210636180489818, 'ten' => 0.0195296002330154, 'een' => 0.0191814641825564, 'mme' => 0.0191759161976886, 'stä' => 0.0180302573224732, 'nen' => 0.0180184678546289, 'lle' => 0.0171550627095665, 'ksi' => 0.0165475583665346, 'sti' => 0.0155066177057002, 'itä' => 0.0154809582756863, 'taa' => 0.0151716581193025, 'vat' => 0.0147264323336558, 'iin' => 0.014050271677884, 'ssä' => 0.0133373556223625, }; ${Lingua::Identify::languages{'suffixes4'}{'fi'}} = { 'ista' => 0.019487336614014, 'inen' => 0.0165376790946814, 'iden' => 0.0149450351190439, 'isen' => 0.0142692501508657, 'sten' => 0.0123258823309838, 'taan' => 0.01214002202328, 'ille' => 0.0106873565220637, 'opan' => 0.00985137396649226, 'seen' => 0.00959241378881292, 'ksen' => 0.00947343208555484, 'issa' => 0.00939411095004946, 'istä' => 0.00938244607718102, 'esti' => 0.00929534835976334, 'esta' => 0.0092836834868949, 'assa' => 0.00918725387118248, 'utta' => 0.00860634320233422, 'ttaa' => 0.00846480941153049, 'essa' => 0.00742196977709206, 'emme' => 0.00666453069883476, 'siin' => 0.00657898829779954, }; ${Lingua::Identify::languages{'smallwords'}{'fi'}} = { 'ja' => 0.0357749475523437, 'on' => 0.0300220488164362, 'että' => 0.0202748078900293, 'ei' => 0.00882190380535234, 'Euroopan' => 0.00728086388570274, 'myös' => 0.00539274825719261, 'ole' => 0.00465818391446723, 'ovat' => 0.00455120852474995, 'joka' => 0.00442105513392724, 'se' => 0.00410250619565799, 'jotka' => 0.00367341602134753, 'Arvoisa' => 0.00362289986509215, 'sen' => 0.00344936201066188, 'komission' => 0.00295965244884496, 'mutta' => 0.00286991198302657, 'kuin' => 0.00269875135947891, 'parlamentin' => 0.00256681504549426, 'tai' => 0.00252521350504864, 'tämän' => 0.00250025258078128, 'voi' => 0.00235108134289773, }; ${Lingua::Identify::languages{'ngrams1'}{'fi'}} = { 'a' => 0.111288500707088, 't' => 0.107034427208969, 'i' => 0.10611965239025, 'e' => 0.0907145323663325, 'n' => 0.0831806888188069, 's' => 0.0820332608223127, 'o' => 0.0551003396798811, 'l' => 0.052176727161174, 'ä' => 0.0485228551709782, 'k' => 0.0457594940362923, 'u' => 0.0448486981954633, 'm' => 0.0363883769530735, 'v' => 0.0241836814782948, 'r' => 0.0214171995766368, 'y' => 0.0192065262727229, 'j' => 0.0177265025361693, 'h' => 0.017368316506938, 'p' => 0.0159116985226757, 'd' => 0.0109952422347126, 'ö' => 0.00600888076657272, }; ${Lingua::Identify::languages{'ngrams2'}{'fi'}} = { 'is' => 0.0260976014670293, 'en' => 0.0260117688453055, 'ta' => 0.0250039306318151, 'st' => 0.0214999506260624, 'tä' => 0.0188220087039839, 'in' => 0.0181186655278522, 'tt' => 0.0180762425079198, 'si' => 0.0180589324807592, 'an' => 0.0167997849610305, 'se' => 0.0156288916730428, 'te' => 0.0154898733201986, 'it' => 0.0151439418447664, 'll' => 0.0130776806752023, 'aa' => 0.0129117555443882, 'va' => 0.0118918989700713, 'et' => 0.0118363813181935, 'mi' => 0.0117584413513409, 'on' => 0.0109510586337258, 'ai' => 0.0107268354839771, 'li' => 0.0106983142993291, }; ${Lingua::Identify::languages{'ngrams3'}{'fi'}} = { 'ist' => 0.00946716675948203, 'sta' => 0.00910187754021449, 'ise' => 0.00703018389245278, 'ett' => 0.00698623536261015, 'ttä' => 0.0069059265382934, 'tta' => 0.0066080883000793, 'sen' => 0.00580120593922751, 'lis' => 0.00568232358512082, 'ssa' => 0.00536372309180121, 'aan' => 0.00533958828764302, 'mis' => 0.00533695348369562, 'itt' => 0.00524241671806291, 'est' => 0.00510055887353489, 'ksi' => 0.00501445348053386, 'ais' => 0.0047945000470049, 'isi' => 0.00460637504516054, 'taa' => 0.00448812504400123, 'lli' => 0.00445018386715867, 'lla' => 0.00414391425631289, 'ste' => 0.0039695956271529, }; ${Lingua::Identify::languages{'ngrams4'}{'fi'}} = { 'että' => 0.00484500417696513, 'ista' => 0.00431277595570139, 'llis' => 0.00401536902867348, 'utta' => 0.00290472438914545, 'ukse' => 0.00269180806638683, 'inen' => 0.00267979162489076, 'ises' => 0.00265863768100704, 'sest' => 0.00258078115381371, 'iden' => 0.00255286796158844, 'isen' => 0.00251544175317879, 'tava' => 0.00238250986912844, 'tämä' => 0.00235372047804408, 'euro' => 0.00227148295655531, 'uroo' => 0.00216884251877632, 'roop' => 0.00216671460726139, 'miss' => 0.00208685533981871, 'asta' => 0.0020600686889837, 'mise' => 0.00205105635786164, 'iste' => 0.00204379642445776, 'taan' => 0.00204004128649024, }; Lingua-Identify-0.56/lib/Lingua/Identify/FR.pm000644 000765 000024 00000022623 11375521415 021137 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'fr'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'fr'}} = 'french'; ${Lingua::Identify::languages{'_sets'}{'fr'}} = ''; =head1 NAME Lingua::Identify::FR - Meta-information on French =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'fr'}} = { 'd' => 0.139677666922487, 'p' => 0.0917476278518105, 'l' => 0.0912330810018838, 'e' => 0.0827639886974116, 'c' => 0.0731982139119514, 'a' => 0.0665544547547617, 's' => 0.0558135072908672, 'q' => 0.0388038093909161, 'r' => 0.032390288146236, 'm' => 0.031925451754692, 'n' => 0.0299248238331124, 'i' => 0.0284749354636154, 't' => 0.0255324251726784, 'u' => 0.0240768680666992, 'f' => 0.020607601339566, 'é' => 0.0175281692597502, 'v' => 0.0167070745831298, 'o' => 0.0166778587874137, 'C' => 0.0128335833391474, 'b' => 0.00912230516988767, }; ${Lingua::Identify::languages{'prefixes2'}{'fr'}} = { 'co' => 0.0468163073537997, 'qu' => 0.0432582610172093, 'de' => 0.037579574871901, 'pa' => 0.0336813655827195, 'pr' => 0.0335807703847737, 'le' => 0.0299192214745744, 'po' => 0.0291452780730379, 'no' => 0.0233706485308449, 'es' => 0.0201748612018858, 'so' => 0.0200969435341128, 'in' => 0.0181088800556355, 'ce' => 0.0172988851958757, 'au' => 0.017148864611656, 'dé' => 0.0170808320211378, 're' => 0.0170790875957399, 'un' => 0.0167993980569427, 'su' => 0.0167964906812795, 'da' => 0.013801893748212, 'ré' => 0.0131983225605374, 'ma' => 0.0129837582365953, }; ${Lingua::Identify::languages{'prefixes3'}{'fr'}} = { 'con' => 0.0294696509020054, 'pro' => 0.0221957770556213, 'pou' => 0.0212222461972359, 'com' => 0.0202828615704506, 'nou' => 0.0174828705792436, 'dan' => 0.0166408390382968, 'par' => 0.0124938881878014, 'tou' => 0.0106732401795075, 'pré' => 0.0092318332966443, 'Com' => 0.0092282007188145, 'eur' => 0.00907054684100134, 'plu' => 0.00850531773068504, 'fai' => 0.00845446164106789, 'int' => 0.00845373512550193, 'tra' => 0.00837817750664217, 'rap' => 0.00779769156944072, 'sou' => 0.00692587289028961, 'vou' => 0.00680091221294462, 'ave' => 0.00654009312476524, 'aut' => 0.00653864009363333, }; ${Lingua::Identify::languages{'prefixes4'}{'fr'}} = { 'comm' => 0.0166803665231573, 'Comm' => 0.0112944691463859, 'euro' => 0.0110222805784866, 'cons' => 0.0108947769636235, 'cont' => 0.00911786488414804, 'rapp' => 0.00906179946484653, 'conc' => 0.00774335783030444, 'cett' => 0.00750734050066419, 'comp' => 0.0074015396287565, 'prop' => 0.00726137608050271, 'poli' => 0.00682551265948125, 'part' => 0.00681556556896001, 'Prés' => 0.00589229471239796, 'entr' => 0.0058190479549234, 'Unio' => 0.00573947123075351, 'inte' => 0.00568340581145199, 'Mons' => 0.00562191470641162, 'Parl' => 0.00542478145789984, 'impo' => 0.00540217443398794, 'prés' => 0.00537866312911956, }; ${Lingua::Identify::languages{'suffixes1'}{'fr'}} = { 'e' => 0.302164383573595, 's' => 0.225803524729861, 't' => 0.133664655555609, 'n' => 0.0852906805046695, 'r' => 0.0654248699306544, 'a' => 0.0445522678763287, 'u' => 0.0342775856721533, 'i' => 0.0258389456432931, 'é' => 0.0230070174413101, 'l' => 0.0206880817786882, 'x' => 0.0112282550707088, 'd' => 0.00553560826912566, 'c' => 0.00527126007282226, 'z' => 0.00217018527493303, 'à' => 0.00195469351094973, 'f' => 0.00164236131861766, 'm' => 0.00128597109356836, 'o' => 0.00126895858588546, 'p' => 0.0012384233156854, 'E' => 0.00112195307077945, }; ${Lingua::Identify::languages{'suffixes2'}{'fr'}} = { 'es' => 0.127388020635124, 'nt' => 0.0780893859487561, 'on' => 0.0582856857332598, 're' => 0.0518376450997971, 'ns' => 0.0492114832991797, 'ue' => 0.0412249301772136, 'ur' => 0.0349244658927236, 'er' => 0.0322431082135534, 'us' => 0.0294757611656905, 'ne' => 0.0291364517649027, 'le' => 0.0267066710900837, 'te' => 0.0237045963060594, 'is' => 0.020932601184213, 'it' => 0.0206217612365736, 'ts' => 0.0191274054509126, 'té' => 0.0186341285994249, 'me' => 0.0178956658453133, 'st' => 0.0168998159943712, 'ui' => 0.0149110213387266, 'ce' => 0.014004065902717, }; ${Lingua::Identify::languages{'suffixes3'}{'fr'}} = { 'ent' => 0.0647887283014427, 'ion' => 0.0613378821539673, 'ons' => 0.0317928271801719, 'ous' => 0.0263281398704225, 'tre' => 0.0201827277522133, 'ans' => 0.0194271740693976, 'our' => 0.0181129465576538, 'res' => 0.0173021793364786, 'que' => 0.0170595303652666, 'ant' => 0.0169847014909108, 'ire' => 0.0159756014279965, 'eur' => 0.0137518035211708, 'ait' => 0.0129570191663628, 'ité' => 0.0129446687696244, 'les' => 0.0125981311670253, 'ais' => 0.0117328769014162, 'lle' => 0.0112461259711408, 'nce' => 0.0111378783761989, 'urs' => 0.00997548809494404, 'nts' => 0.00978078772283385, }; ${Lingua::Identify::languages{'suffixes4'}{'fr'}} = { 'tion' => 0.0497842387588936, 'ment' => 0.0492886932428571, 'ions' => 0.0234081956717535, 'sion' => 0.0188524323327172, 'ique' => 0.0164524728444674, 'aire' => 0.0139702238635912, 'eurs' => 0.00924716869888557, 'ette' => 0.00907716403280005, 'ques' => 0.00829857883333394, 'ence' => 0.00815298973099475, 'elle' => 0.00811862708572214, 'dent' => 0.00793505821755533, 'ents' => 0.00786452436673262, 'enne' => 0.00714019597559167, 'port' => 0.00685534773188456, 'ieur' => 0.00684178352980327, 'vons' => 0.00680199520369814, 'ient' => 0.0067703453988418, 'omme' => 0.00672784423232042, 'nion' => 0.00666544890274648, }; ${Lingua::Identify::languages{'smallwords'}{'fr'}} = { 'de' => 0.0548889164996241, 'la' => 0.0336530571656713, 'le' => 0.0231614960853302, 'et' => 0.0227153895311683, 'à' => 0.020405558351967, 'des' => 0.0193085263599359, 'les' => 0.0187852416756814, 'que' => 0.0166993118848808, 'en' => 0.0129205518999602, 'du' => 0.010745994476251, 'dans' => 0.0094148837599828, 'pour' => 0.00902390444160139, 'qui' => 0.0083199720462509, 'nous' => 0.00814059650213448, 'une' => 0.00807359570787823, 'ce' => 0.00750112056708115, 'un' => 0.00724668717117134, 'pas' => 0.00710462852512169, 'est' => 0.00709233091098605, 'au' => 0.00677004860950029, }; ${Lingua::Identify::languages{'ngrams1'}{'fr'}} = { 'e' => 0.145653930304067, 's' => 0.0820071187990941, 'n' => 0.0784593683145175, 'i' => 0.0731375389327145, 't' => 0.0720664768819875, 'a' => 0.0675077832844263, 'r' => 0.0674565437019873, 'o' => 0.0631043562117119, 'u' => 0.0571598314910165, 'l' => 0.0515154941491566, 'd' => 0.0399454498028865, 'c' => 0.0358443281203643, 'p' => 0.0328639602934782, 'm' => 0.0318483738479036, 'é' => 0.0244188787801819, 'v' => 0.0125584224921076, 'q' => 0.0116290228163588, 'f' => 0.00950099163662485, 'g' => 0.00855277424459088, 'b' => 0.00767852432609359, }; ${Lingua::Identify::languages{'ngrams2'}{'fr'}} = { 'es' => 0.0295571603224842, 'on' => 0.0281920193792864, 'en' => 0.0268050632195757, 'de' => 0.0236974702644772, 'nt' => 0.0224344409215772, 'le' => 0.0221838219788204, 're' => 0.0191259021693031, 'ti' => 0.0158675486568945, 'me' => 0.0147244517953872, 'qu' => 0.0145762517094987, 'ou' => 0.0138932203530439, 'ur' => 0.0138126167126886, 'ns' => 0.0135289164791633, 'er' => 0.012973089342939, 'io' => 0.012676791590019, 'co' => 0.0126203587998845, 'la' => 0.0119428580617006, 'te' => 0.011878026925303, 'an' => 0.0118382884088381, 'is' => 0.011799471662085, }; ${Lingua::Identify::languages{'ngrams3'}{'fr'}} = { 'ent' => 0.0175142522383181, 'ion' => 0.0158966295467364, 'que' => 0.0123769684578298, 'tio' => 0.0107756745738248, 'men' => 0.00983422512714386, 'ons' => 0.00942734336457489, 'les' => 0.0088123362267445, 'eme' => 0.00737901491248428, 'ati' => 0.00733538219715616, 'con' => 0.00717784935684575, 'des' => 0.00667687618668271, 'eur' => 0.00665613057049909, 'com' => 0.00576045532210714, 'est' => 0.00562728185047684, 'our' => 0.00555259763221583, 'par' => 0.00553439502704827, 'omm' => 0.0051034215811693, 'ous' => 0.00492969377596715, 'tre' => 0.00484630978317753, 'ant' => 0.0047689487112154, }; ${Lingua::Identify::languages{'ngrams4'}{'fr'}} = { 'tion' => 0.01399184009266, 'ment' => 0.0117471440585099, 'emen' => 0.00929259997903258, 'atio' => 0.00726870511933608, 'comm' => 0.00556140784915142, 'ique' => 0.00496020276154464, 'pour' => 0.00485710426040213, 'ions' => 0.00450069291581164, 'nous' => 0.00445131690514642, 'sion' => 0.0042501618194434, 'elle' => 0.00415436540438518, 'dans' => 0.00412202759458331, 'aire' => 0.00384976409915468, 'leme' => 0.00370007133442668, 'port' => 0.00345649460575776, 'cons' => 0.00303714623348838, 'ssio' => 0.0029594311744484, 'euro' => 0.00287024140870454, 'miss' => 0.00284624884014186, 'urop' => 0.00278261637569302, }; Lingua-Identify-0.56/lib/Lingua/Identify/HI.pm000644 000765 000024 00000026026 12106516231 021122 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'hi'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'hi'}} = 'hindi'; ${Lingua::Identify::languages{'_sets'}{'hi'}} = ''; =head1 NAME Lingua::Identify::HI - Meta-information on Hindi =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'hi'}} = { 'क' => 0.163667320710738, 'स' => 0.0891882417747509, 'ह' => 0.0783775458020538, 'म' => 0.0668839731887456, 'प' => 0.064014373794533, 'व' => 0.0368631985638345, 'ज' => 0.0367493857903994, 'अ' => 0.0365885304039443, 'न' => 0.0320284319483078, 'ब' => 0.030612601046774, 'द' => 0.0280829224692212, 'र' => 0.0247777995286634, 'उ' => 0.0221828682943411, 'ल' => 0.0219719352875745, 'भ' => 0.0215060616683132, 'ग' => 0.0200340831318847, 'इ' => 0.019939997905845, 'य' => 0.0182737789027538, 'औ' => 0.0173830042626677, 'आ' => 0.0170825385407988, 'त' => 0.0167926953444506, 'श' => 0.0150718462101105 }; ${Lingua::Identify::languages{'prefixes2'}{'hi'}} = { 'के' => 0.044878752194513, 'है' => 0.044821871646738, 'मे' => 0.0341667614675796, 'का' => 0.0263433801803575, 'की' => 0.0230181741036702, 'प्' => 0.0213686382181938, 'से' => 0.0197344754537377, 'और' => 0.0167812989057412, 'को' => 0.0167474780394966, 'कर' => 0.0154038672623239, 'पर' => 0.0142877786762513, 'कि' => 0.0140648684214572, 'वि' => 0.0135160480010331, 'इस' => 0.0129503171474866, 'जा' => 0.0116697361664971, 'हो' => 0.0116190048671301, 'सं' => 0.0115313780773145, 'एक' => 0.010470632726915, 'रा' => 0.0104275879880582, 'स्' => 0.0103860805613034, 'सा' => 0.00991873768228678, 'ने' => 0.00925308154210852 }; ${Lingua::Identify::languages{'prefixes3'}{'hi'}} = { 'में' => 0.0412033734737549, 'प्र' => 0.0287773255570008, 'हैं' => 0.0158855368606554, 'है।' => 0.0150778332564092, 'राज' => 0.0079721394704821, 'किय' => 0.00760290353711241, 'जात' => 0.00725464691813872, 'भार' => 0.0072021986321487, 'अपन' => 0.0068056895900642, 'होत' => 0.0064322577938153, 'लिए' => 0.00603155288885159, 'कार' => 0.00578819284185793, 'गया' => 0.00563714177820669, 'नही' => 0.00560357487517308, 'करन' => 0.00560147694373348, 'द्व' => 0.00506230856375614, 'अनु' => 0.00496999958041371, 'सकत' => 0.00481475265388327, 'स्थ' => 0.00478538161372886, 'है.' => 0.00477908781941006, 'करत' => 0.00456719674401041, 'विश' => 0.0045357277724164 }; ${Lingua::Identify::languages{'prefixes4'}{'hi'}} = { 'हैं।' => 0.00930812237085419, 'किया' => 0.00922972110778281, 'भारत' => 0.0088944881208569, 'नहीं' => 0.00562596649832924, 'करने' => 0.00562326300649919, 'द्वा' => 0.00554486174342781, 'जाता' => 0.00551512333329729, 'हिन्' => 0.00524747764212257, 'अपने' => 0.00482573291663513, 'प्रत' => 0.00474462816173369, 'प्रा' => 0.00472029673526327, 'राज्' => 0.00469866880062288, 'अधिक' => 0.00462297102938155, 'दिया' => 0.00429314502611573, 'निर्' => 0.00426070312415516, 'उन्ह' => 0.00423907518951478, 'होता' => 0.00401468536762082, 'प्रक' => 0.00396331902284991, 'विश्' => 0.00365512095422448, 'कार्' => 0.00365241746239443, 'सकता' => 0.00347669049344133, 'प्रद' => 0.00330096352448823 }; ${Lingua::Identify::languages{'prefixes1'}{'hi'}} = { 'े' => 0.1339637043478, 'ा' => 0.115626190291922, 'ी' => 0.0959426505022178, 'र' => 0.0784519034806981, 'ं' => 0.0754502812693008, 'न' => 0.0412927917059321, '।' => 0.0340285017534755, 'त' => 0.0327887012748548, 'क' => 0.0323349676847599, 'ो' => 0.0293485205098205, 'य' => 0.0254288085927126, 'ि' => 0.0200826432485504, 'ल' => 0.0195181318923119, 'ै' => 0.0172661564819409, 'म' => 0.0168382204538246, 'स' => 0.016804835373617, 'द' => 0.0128638784054679, 'ह' => 0.0126559804059929, 'ण' => 0.0124283548591225, ')' => 0.0111733793440439, 'ग' => 0.0101096092883363, 'व' => 0.00972416336230248 }; ${Lingua::Identify::languages{'prefixes2'}{'hi'}} = { 'ेक' => 0.0499503448191045, 'ंे' => 0.0359069987670757, 'ीक' => 0.026592424740886, 'ेन' => 0.0254317541038547, 'ेस' => 0.0243617848808429, 'ाक' => 0.0239128897470507, 'ंो' => 0.0229197861291404, 'ाय' => 0.0202248780142847, 'ात' => 0.0191072521161101, 'ैह' => 0.0174484923580215, 'रऔ' => 0.0169227316191279, 'ोक' => 0.0151594346381014, 'रा' => 0.0136928388927663, 'य्' => 0.0125629144977755, 'ना' => 0.0115436765741307, '।ै' => 0.0110855575677263, 'ति' => 0.0108764831218504, 'ान' => 0.0107319757842598, 'कि' => 0.0100079017842044, 'रप' => 0.00971734979692107, 'कए' => 0.00965739462494197, 'ीभ' => 0.00923463379688418 }; ${Lingua::Identify::languages{'prefixes3'}{'hi'}} = { 'ंेम' => 0.0443188016615617, '।ैह' => 0.0151281836109596, 'ायि' => 0.0126148617463181, '।ंै' => 0.00724835312381991, 'ंोय' => 0.00695464272227584, 'ंैह' => 0.00672387026391977, 'राक' => 0.00641547434229849, 'र्त' => 0.00627281500440566, 'एिल' => 0.00620568119833844, 'तरा' => 0.00578609491041833, 'ाता' => 0.00561406453237108, 'ारा' => 0.00505391683799773, 'ंीह' => 0.00485461335123568, 'ेकस' => 0.00469936642470524, 'ेनर' => 0.0045630008811312, '।ाय' => 0.00438467670876516, 'पूर' => 0.00426719254814753, 'ायग' => 0.00424411530231192, 'ंोन' => 0.00421684219359711, 'ंोर' => 0.00400495111819746, 'दाब' => 0.00392732765493224, 'ाथत' => 0.00384131246590861 }; ${Lingua::Identify::languages{'prefixes4'}{'hi'}} = { '।ंैह' => 0.00934056427281477, 'ायिक' => 0.00783742281530825, 'तराभ' => 0.00723995112086771, 'ंोयि' => 0.00590983314048425, 'ेनरक' => 0.00559082110453862, 'ाताज' => 0.0055043093659771, 'ंीहन' => 0.00546105349669633, 'ाराव' => 0.00513933796892066, 'ेनपअ' => 0.00485547132676565, 'य्जा' => 0.0042174472548744, 'ातोह' => 0.00394980156369967, 'राकर' => 0.00370648729899538, 'ीद्न' => 0.00345506255880095, 'ातकस' => 0.00341180668952018, '।ायि' => 0.00330637050814832, 'ेयिल' => 0.00329285304899809, 'ेकसइ' => 0.00318741686762623, 'णराक' => 0.00301439339050317, 'ंेमस' => 0.00297924799671255, 'ेनोह' => 0.00297384101305246, 'ायिद' => 0.00295762006207217, 'यीतर' => 0.00290625371730127 }; ${Lingua::Identify::languages{'smallwords'}{'hi'}} = { 'के' => 0.0422222592322868, 'में' => 0.0293816173364563, 'की' => 0.0217659554173398, 'का' => 0.0177476717725415, 'है' => 0.017101173235869, 'से' => 0.017101173235869, 'और' => 0.016463758987768, 'को' => 0.0141578637386995, 'है।' => 0.0108814636605746, 'एक' => 0.00930231149722702, 'पर' => 0.00854831554579163, 'ने' => 0.00717507392339825, 'भी' => 0.00706909055673063, 'यह' => 0.00589267518671998, 'हैं।' => 0.00521286759195193, 'इस' => 0.00503572396480747, 'हैं' => 0.00484646795290099, 'कर' => 0.00462238883480373, 'कि' => 0.0044346468709925, 'ही' => 0.00427718586908632, 'किया' => 0.00422873633003826, 'लिए' => 0.00420299751241898 }; ${Lingua::Identify::languages{'ngrams1'}{'fr'}} = { 'ा' => 0.0824104389086596, 'क' => 0.0639486841527174, 'र' => 0.0630070735029217, '्' => 0.0562158077413119, 'े' => 0.0530018891964325, 'न' => 0.039349764948816, 'स' => 0.0388583981371644, 'ि' => 0.0387332718246123, 'त' => 0.0358012389613813, 'ी' => 0.0336449189402926, 'ं' => 0.0318618689864241, 'ह' => 0.0317089758797944, 'म' => 0.0314407978559817, 'य' => 0.0270223628135846, 'प' => 0.0245089407319538, 'ल' => 0.022878432406309, 'व' => 0.0225051623390888, 'ो' => 0.0224338122226616, 'द' => 0.0190803567505821, 'ज' => 0.0156632836870085, 'ु' => 0.0139452572382584, 'ब' => 0.0136626685997979 }; ${Lingua::Identify::languages{'ngrams2'}{'fr'}} = { '्र' => 0.0169823817966579, 'के' => 0.0161251662741955, 'है' => 0.0133993672889484, 'का' => 0.0131212525409991, 'ार' => 0.0128865932224168, 'ें' => 0.0117352530569751, 'मे' => 0.0116908827764634, 'र्' => 0.0107906692502063, 'या' => 0.0107247999677972, '्य' => 0.0102275783707233, 'रा' => 0.00948426181575996, 'ने' => 0.00888869372064471, 'की' => 0.00837820678197451, 'ान' => 0.00833292165031828, 'स्' => 0.00817373755116305, 'से' => 0.00816824844429563, 'ता' => 0.00813073954736825, 'प्' => 0.00782472183950949, 'ों' => 0.00758045658390923, 'िय' => 0.00747524870228365, 'त्' => 0.00724745076728566, '्त' => 0.00679185489728966 }; ${Lingua::Identify::languages{'ngrams3'}{'fr'}} = { 'में' => 0.0139588008298701, 'प्र' => 0.00975996019957517, 'िया' => 0.00603387985116517, 'हैं' => 0.00494119148614734, 'है।' => 0.00469569475574703, 'कार' => 0.00468201987156028, 'त्र' => 0.00463773929419364, 'क्ष' => 0.00353723670964053, '्या' => 0.00349881679692537, 'राज' => 0.00298828778728653, 'मान' => 0.002906238482166, '्रा' => 0.00283721287627095, 'स्त' => 0.0027532100162666, 'ारत' => 0.00269785929455831, 'भार' => 0.00269199862990684, 'स्थ' => 0.0026529275322304, 'ार्' => 0.00253310949935598, 'किय' => 0.00248557299718297, 'न्द' => 0.00246668863330603, 'यों' => 0.00243673412508742, 'जात' => 0.00233775401097377, '्वा' => 0.00228956632383949 }; ${Lingua::Identify::languages{'ngrams4'}{'fr'}} = { 'भारत' => 0.00371198543534384, 'किया' => 0.00337298701985454, 'हैं।' => 0.00326250564210455, 'द्वा' => 0.0022162375519593, 'स्था' => 0.00218035471132255, 'िन्द' => 0.00210386760364947, 'हिन्' => 0.0020859261833311, 'ियों' => 0.00207742761581187, '्वार' => 0.00204909905741443, 'राज्' => 0.00204437763101486, 'ाज्य' => 0.00198016623198068, 'नहीं' => 0.0019688348086217, 'करने' => 0.00196789052334179, '्ट्र' => 0.00196505766750204, 'जाता' => 0.00193767339438452, 'पूर्' => 0.00187535056591017, 'वारा' => 0.00187535056591017, 'प्रत' => 0.00174126205616231, 'प्रा' => 0.00172426492112385, 'अपने' => 0.00171010064192513, 'दिया' => 0.00163550210481189, 'अधिक' => 0.00161472782865377 }; Lingua-Identify-0.56/lib/Lingua/Identify/HR.pm000644 000765 000024 00000022651 11375521415 021142 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'hr'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'hr'}} = 'croatian'; ${Lingua::Identify::languages{'_sets'}{'hr'}} = ''; =head1 NAME Lingua::Identify::HR - Meta-information on Croatian =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'hr'}} = { 'p' => 0.115274889399014, 's' => 0.0966301724115771, 'n' => 0.0604599869926557, 'k' => 0.0600743937832092, 'o' => 0.0518201284796574, 'j' => 0.0489739364696628, 'z' => 0.0456758292181969, 'i' => 0.0395058237427733, 'd' => 0.0373567842554581, 't' => 0.0317173550362329, 'u' => 0.0275457506343008, 'm' => 0.0271637562948091, 'b' => 0.0271421630750801, 'r' => 0.0263529823064129, 'v' => 0.0251062309292025, 'S' => 0.0227782761927041, 'g' => 0.0194236152705193, 'B' => 0.0194231011462401, 'M' => 0.0141127114657426, 'P' => 0.0139337962165594, }; ${Lingua::Identify::languages{'prefixes2'}{'hr'}} = { 'pr' => 0.0629905188387514, 'po' => 0.0578521745438628, 'ka' => 0.0328422458216371, 'ko' => 0.0299951924081854, 'na' => 0.024019515625903, 'iz' => 0.0224456748272771, 'za' => 0.0183721321977088, 'st' => 0.0169175590221039, 'ra' => 0.0161669107349468, 'od' => 0.0127075348857797, 'do' => 0.0123595685217086, 'bi' => 0.0118812684665012, 'su' => 0.0114183379503429, 'ne' => 0.0109092988170373, 'tr' => 0.0108717971417576, 'sv' => 0.0104752630342913, 'go' => 0.0102422608223072, 're' => 0.0093250067318581, 'sa' => 0.00840037526266542, 'mi' => 0.00837086574769119, }; ${Lingua::Identify::languages{'prefixes3'}{'hr'}} = { 'pre' => 0.0216399743168189, 'pro' => 0.0206961239429039, 'kak' => 0.0197143889173024, 'koj' => 0.017588276137307, 'pri' => 0.0174230206739157, 'pos' => 0.014697938487083, 'god' => 0.00837512075732725, 'raz' => 0.00827387729161319, 'str' => 0.00825428178212014, 'vla' => 0.00787870118350346, 'pot' => 0.00758738127570686, 'Srb' => 0.00725295124702556, 'zem' => 0.00668598783902681, 'pol' => 0.00648742000949729, 'sta' => 0.00545734939381291, 'pod' => 0.00529209393042157, 'izj' => 0.00519019728105774, 'međ' => 0.00514120850732513, 'kaz' => 0.00511965344688278, 'nov' => 0.00493806839224723, }; ${Lingua::Identify::languages{'prefixes4'}{'hr'}} = { 'pred' => 0.0125687757139468, 'godi' => 0.0095640585990721, 'stra' => 0.00794642097676064, 'post' => 0.00753828085710917, 'Srbi' => 0.00683839889690246, 'vlad' => 0.00680034561335178, 'poli' => 0.00628998392808377, 'izja' => 0.00590497423568861, 'zeml' => 0.0057885759565924, 'među' => 0.00526179912940056, 'prij' => 0.00478650615642436, 'posl' => 0.00475516815820615, 'Koso' => 0.00471935330309963, 'mini' => 0.00460817802370645, 'kaza' => 0.00443581903350629, 'Hrva' => 0.00420749933220218, 'Turs' => 0.00418660733339004, 'svoj' => 0.00418586119057532, 'prot' => 0.00410900848065923, 'Alba' => 0.00406647834022023, }; ${Lingua::Identify::languages{'suffixes1'}{'hr'}} = { 'a' => 0.227803468831737, 'e' => 0.207188174796896, 'i' => 0.121896159049002, 'o' => 0.091537330862471, 'u' => 0.0879014230748878, 'm' => 0.0458493263783389, 'h' => 0.0216895745468987, 'g' => 0.0216140512397382, 'k' => 0.0212780495874732, 'n' => 0.0200799519527913, 't' => 0.0189491574013612, 'j' => 0.0151164780039652, 'r' => 0.0132987809921707, 'd' => 0.0123868499023078, 's' => 0.00765713857088325, 'z' => 0.00610043366818758, 'U' => 0.0059406530659637, 'ć' => 0.00573411994025952, 'v' => 0.00553015563452679, 'A' => 0.00440655377901675, }; ${Lingua::Identify::languages{'suffixes2'}{'hr'}} = { 'je' => 0.0523132266407542, 'ja' => 0.0344885970465688, 'ti' => 0.0331475783140217, 'ne' => 0.0330690557166905, 'na' => 0.0284233398605242, 'ka' => 0.0275332126048399, 'ko' => 0.0271804743746411, 'om' => 0.0271694321343914, 'ju' => 0.0266259085309894, 'ma' => 0.0251634251556956, 'ih' => 0.0243021304162188, 'og' => 0.02055513022482, 'ke' => 0.0202490147867866, 'va' => 0.0194423177907667, 'ni' => 0.0178835215421838, 'ji' => 0.0176786266397727, 'la' => 0.0174031840913218, 'im' => 0.0170124114780407, 'ao' => 0.0163584654721417, 'io' => 0.0156125007974951, }; ${Lingua::Identify::languages{'suffixes3'}{'hr'}} = { 'ako' => 0.0207575069355697, 'ije' => 0.0201716410955887, 'nje' => 0.0190592128543788, 'iti' => 0.0139812744504494, 'ima' => 0.013727768541937, 'nja' => 0.0129594305827182, 'ija' => 0.0127854303833022, 'ine' => 0.011971474394274, 'ske' => 0.0116782156312134, 'ska' => 0.00998252829458299, 'nih' => 0.00984241577445403, 'nju' => 0.00971729203554818, 'sti' => 0.00940187594372299, 'kih' => 0.00870066165693808, 'nog' => 0.00868502118957485, 'nik' => 0.00863809978748515, 'ski' => 0.00855207721698737, 'ati' => 0.00826338025690771, 'kom' => 0.00822623414692003, 'ada' => 0.0080978519773135, }; ${Lingua::Identify::languages{'suffixes4'}{'hr'}} = { 'anje' => 0.0126879412460797, 'anja' => 0.00817482089669985, 'nika' => 0.00767394193971888, 'skih' => 0.00764491626464866, 'osti' => 0.00701230539773362, 'cije' => 0.00667888225846545, 'dine' => 0.00665060083147395, 'enje' => 0.00598896428948869, 'avio' => 0.00551562251142049, 'cija' => 0.00515689493747572, 'anju' => 0.00510033208349273, 'skog' => 0.00492692228115013, 'nici' => 0.00475648947112243, 'nost' => 0.00468057616709262, 'jući' => 0.00458382391685856, 'ođer' => 0.00456670621104792, 'azao' => 0.00443497430111384, 'nije' => 0.00438138843944574, 'akon' => 0.00436501498171382, 'skoj' => 0.00387753249014988, }; ${Lingua::Identify::languages{'smallwords'}{'hr'}} = { 'u' => 0.0323639427562264, 'je' => 0.0298665808012594, 'i' => 0.0223775326276523, 'za' => 0.0131319394633691, 'na' => 0.0115918299773806, 'kako' => 0.0112481911497624, 'se' => 0.0109896076783723, 'su' => 0.00912598406958743, 'će' => 0.00725248796409753, 's' => 0.00571579588081466, 'o' => 0.00559808534317748, 'a' => 0.00514888674309751, 'od' => 0.0044714815845661, 'EU' => 0.00429111866399299, 'bi' => 0.00366535425745725, 'koji' => 0.00358029890122909, 'da' => 0.00329817332231158, 'godine' => 0.00320286575796663, 'U' => 0.00305325946174389, 'BiH' => 0.00271265832541951, }; ${Lingua::Identify::languages{'ngrams1'}{'hr'}} = { 'a' => 0.112766591329527, 'i' => 0.0935458195987913, 'o' => 0.0872861570882225, 'e' => 0.0862383978996539, 'n' => 0.0661005229966739, 'r' => 0.0565823172593762, 's' => 0.0520452405164226, 'j' => 0.0518124411420162, 'u' => 0.0497069595348678, 't' => 0.044762767393248, 'k' => 0.0410112467856433, 'v' => 0.032918268951157, 'l' => 0.0313609269983607, 'p' => 0.0311889600283312, 'd' => 0.0311287756389144, 'm' => 0.0281019787622817, 'z' => 0.0206366844184965, 'b' => 0.0169396780694079, 'g' => 0.0166148605706711, 'c' => 0.0105943965822395, }; ${Lingua::Identify::languages{'ngrams2'}{'hr'}} = { 'je' => 0.0323645215540824, 'ra' => 0.0179687811113385, 'an' => 0.0169387904431461, 'na' => 0.0167560580733868, 'ko' => 0.0164783794159587, 'ni' => 0.0159063476497557, 'st' => 0.0153780617970861, 'ij' => 0.0133154321654875, 'po' => 0.0129411397752559, 'pr' => 0.0127292761581436, 're' => 0.0121662682126695, 'ka' => 0.0120883887163792, 'ti' => 0.0116448483105544, 'nj' => 0.0108808641838474, 'ta' => 0.0107864083212182, 'ja' => 0.0107546287786514, 'en' => 0.0103312938844586, 'ov' => 0.0101847548826227, 'no' => 0.010143853434319, 'os' => 0.00961390013651471, }; ${Lingua::Identify::languages{'ngrams3'}{'hr'}} = { 'ije' => 0.0086172983396165, 'ako' => 0.00616274670232133, 'sta' => 0.00615632149183204, 'anj' => 0.00602054345885083, 'pre' => 0.00564472926042069, 'nje' => 0.00552980285393302, 'jed' => 0.00471901402841665, 'ost' => 0.00457802308881205, 'pro' => 0.00420439103734054, 'koj' => 0.00408619141041493, 'cij' => 0.00405285305410258, 'ran' => 0.00401951469779022, 'ija' => 0.00395574751444369, 'red' => 0.00383827526983763, 'iti' => 0.00381499903561227, 'pri' => 0.00381111966324138, 'kak' => 0.00373595682355535, 'ist' => 0.00356793150774111, 'nik' => 0.00347070473769563, 'sti' => 0.0034018458781123, }; ${Lingua::Identify::languages{'ngrams4'}{'hr'}} = { 'kako' => 0.00453101179854985, 'jedn' => 0.00333151522751756, 'pred' => 0.00331444215628357, 'anje' => 0.00296512409705361, 'vanj' => 0.00253557771193649, 'stav' => 0.00246562344661492, 'stra' => 0.00238584838810566, 'reds' => 0.00235789689980222, 'edni' => 0.0022426158966913, 'nost' => 0.00223657233165272, 'godi' => 0.00216540935332343, 'među' => 0.002043782606922, 'rije' => 0.00201129844483963, 'acij' => 0.00194738774455664, 'sjed' => 0.00190387407627886, 'post' => 0.00185341030820671, 'tran' => 0.00185038852568742, 'odin' => 0.00182893386980046, 'koji' => 0.00182001961136855, 'vlad' => 0.0017976584207258, }; Lingua-Identify-0.56/lib/Lingua/Identify/HU.pm000644 000765 000024 00000022750 11375521416 021146 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'hu'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'hu'}} = 'hungarian'; ${Lingua::Identify::languages{'_sets'}{'hu'}} = ''; =head1 NAME Lingua::Identify::HU - Meta-information on Hungarian =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'hu'}} = { 'k' => 0.0845069231415429, 'a' => 0.0769779820475856, 'm' => 0.0648146321029222, 'e' => 0.0604807268918057, 'v' => 0.0555035296492226, 's' => 0.047722571431632, 't' => 0.0463576196277927, 'h' => 0.0419137740607288, 'b' => 0.0381667812113623, 'é' => 0.0367866068966995, 'f' => 0.036354174829973, 'n' => 0.0323061145443198, 'i' => 0.0317225849627528, 'g' => 0.023174299440319, 'p' => 0.0208931780035846, 'r' => 0.020110628187918, 'l' => 0.0200378984139835, 'A' => 0.0174946115130668, 'o' => 0.0162102826369223, 'E' => 0.0150229267926903, }; ${Lingua::Identify::languages{'prefixes2'}{'hu'}} = { 'sz' => 0.0376488568737724, 'ke' => 0.0322747387696841, 'be' => 0.0302013808755257, 'me' => 0.0260930014427248, 'al' => 0.0225980036981804, 'va' => 0.0211111920434172, 'el' => 0.0197917824744073, 'gy' => 0.0173874540442938, 'fe' => 0.0170743738075796, 'ne' => 0.0162143615655034, 'ha' => 0.0156437888075732, 'in' => 0.015260425252413, 'eg' => 0.0150578815074366, 'kö' => 0.0125986043010835, 'vi' => 0.012201184082234, 'ta' => 0.0119040773269848, 'ad' => 0.0118599905181414, 'fo' => 0.0116069705717356, 'te' => 0.0111539626373879, 'ko' => 0.0104441011210828, }; ${Lingua::Identify::languages{'prefixes3'}{'hu'}} = { 'bet' => 0.0215999529608711, 'kez' => 0.019494951940537, 'vag' => 0.0150642810449604, 'sze' => 0.0144437795940108, 'alk' => 0.0117556317251255, 'egy' => 0.0116774637608475, 'meg' => 0.0116684709861961, 'viz' => 0.00991833868864593, 'kel' => 0.00976615327146766, 'gyó' => 0.00897963136541448, 'ese' => 0.00872645017138153, 'hat' => 0.00860608534143144, 'ada' => 0.00842277108892125, 'fel' => 0.00835912991446488, 'elő' => 0.00783754898468115, 'mel' => 0.00756569048944905, 'tar' => 0.00707593014689351, 'jel' => 0.00692236122592271, 'ala' => 0.0068241324566531, 'min' => 0.00632883809892744, }; ${Lingua::Identify::languages{'prefixes4'}{'hu'}} = { 'bete' => 0.0240621774656241, 'keze' => 0.0201884742812589, 'alka' => 0.0127027732246043, 'vizs' => 0.0101461136374019, 'gyóg' => 0.0100508776806405, 'eset' => 0.00880661603417345, 'tart' => 0.00787283909226971, 'hatá' => 0.00715585945031045, 'mell' => 0.0065031447222633, 'jele' => 0.00637616344658153, 'inje' => 0.00602077072988682, 'köve' => 0.00576990528280821, 'adag' => 0.0055755619889294, 'gyak' => 0.00554613949822265, 'dózi' => 0.00549503727752145, 'csök' => 0.00529372549900157, 'szer' => 0.00524030045008668, 'tüne' => 0.00518610112510056, 'össz' => 0.00447144431135499, 'rend' => 0.00430265212782679, }; ${Lingua::Identify::languages{'suffixes1'}{'hu'}} = { 't' => 0.130864471084059, 's' => 0.12073009980058, 'n' => 0.099162977466865, 'k' => 0.0819827621291393, 'l' => 0.0746339439543097, 'a' => 0.0692765023220479, 'i' => 0.054014180997644, 'z' => 0.0430539791354837, 'e' => 0.0391520568325186, 'g' => 0.0363648079731936, 'y' => 0.032992525131025, 'ó' => 0.0246584758140734, 'm' => 0.0246273870517, 'r' => 0.0208882568135262, 'ő' => 0.0198555446525068, 'd' => 0.0131528074848174, 'b' => 0.0126491695343695, 'A' => 0.00787506613427632, 'I' => 0.00705827955919526, 'K' => 0.00645967957095247, }; ${Lingua::Identify::languages{'suffixes2'}{'hu'}} = { 'en' => 0.0368453078662337, 'ek' => 0.0360715589631252, 'an' => 0.0299590707326915, 'tt' => 0.0276269351729085, 'gy' => 0.026356142272439, 'ás' => 0.020932213703298, 'lt' => 0.0181049557079995, 'el' => 0.0176290488909386, 'ak' => 0.0169167899668851, 'al' => 0.0168008557355418, 'nt' => 0.0166580196383621, 'em' => 0.0162916418464928, 'és' => 0.0155889907316667, 'es' => 0.0148786533694587, 'os' => 0.0137974545710754, 'on' => 0.0137718337464691, 'ok' => 0.0136795987778867, 'et' => 0.0136783177366564, 'is' => 0.0136462917058986, 'át' => 0.013319626192169, }; ${Lingua::Identify::languages{'suffixes3'}{'hu'}} = { 'ban' => 0.0207501845824278, 'ben' => 0.0197061239769974, 'agy' => 0.0158889940344346, 'nak' => 0.0111019830912097, 'nek' => 0.00984578268148872, 'ott' => 0.00968702446193789, 'ell' => 0.00963641594653522, 'ett' => 0.00878716346203841, 'gek' => 0.00874903375865284, 'int' => 0.00828246484268031, 'tek' => 0.00823670919861762, 'lés' => 0.00823185632727764, 'ása' => 0.00786927751144758, 'olt' => 0.00709212482971621, 'nél' => 0.00690702245146262, 'zer' => 0.0067808477966231, 'ség' => 0.00650492739757842, 'lis' => 0.00621514165184808, 'ató' => 0.006161066799774, 'ogy' => 0.00597111155018042, }; ${Lingua::Identify::languages{'suffixes4'}{'hu'}} = { 'elés' => 0.00789610228689585, 'szer' => 0.00708294934444384, 'egek' => 0.00676388361845315, 'ében' => 0.00597086589171901, 'ható' => 0.00569826604815414, 'knél' => 0.00562546949902034, 'ának' => 0.00540553098887142, 'ában' => 0.00530020832203954, 'ciós' => 0.00527852424357416, 'ikai' => 0.00525064471411866, 'ális' => 0.00485491028212535, 'tban' => 0.00483864722327631, 'ások' => 0.00478985804672919, 'eket' => 0.00464813710533041, 'ával' => 0.00459547577191447, 'kori' => 0.00449170196783012, 'ását' => 0.00434765773231005, 'kben' => 0.00411610275155467, 'atok' => 0.00411145616331209, 'etén' => 0.00387990118255671, }; ${Lingua::Identify::languages{'smallwords'}{'hu'}} = { 'a' => 0.0427610675210298, 'A' => 0.0185903083371674, 'és' => 0.0164667145044332, 'az' => 0.0126650673478884, 'vagy' => 0.00807010434379607, 'nem' => 0.00703871911208329, 'kell' => 0.00506811657638314, 'Az' => 0.00418337910012012, 'mg' => 0.00335162885356158, 'kezelés' => 0.00318483102484392, 'hogy' => 0.00317102449316036, 'betegek' => 0.00314453087830811, 'is' => 0.00234225944264151, 'volt' => 0.00231875102382895, 'lásd' => 0.00220755247135051, 'meg' => 0.00211799659015982, 'ha' => 0.00207620384560417, '•' => 0.00194448207035286, 'egy' => 0.00182358163074543, 'Ha' => 0.00177954998916001, }; ${Lingua::Identify::languages{'ngrams1'}{'hu'}} = { 'e' => 0.102935518398245, 'a' => 0.0910578247866597, 't' => 0.0771737727088755, 'l' => 0.0654526744063217, 's' => 0.0653487066310314, 'n' => 0.0565200024382829, 'i' => 0.0505995942000672, 'k' => 0.0472857254819111, 'o' => 0.0435147923124425, 'r' => 0.042062348104254, 'z' => 0.0396899443170404, 'g' => 0.0344440771875543, 'á' => 0.0324389302904775, 'é' => 0.031368811863382, 'm' => 0.0307272860132296, 'b' => 0.0205046715455041, 'v' => 0.0198902545544587, 'y' => 0.0195517724309718, 'd' => 0.0182327522723547, 'h' => 0.0144749191940188, }; ${Lingua::Identify::languages{'ngrams2'}{'hu'}} = { 'el' => 0.0201514506713158, 'sz' => 0.0174162839971244, 'et' => 0.0146963448502417, 'en' => 0.0146927828555496, 'al' => 0.0130133023582275, 'te' => 0.0130082265157913, 'gy' => 0.0126514927473781, 'és' => 0.0126294083802871, 'at' => 0.0116965219704278, 'eg' => 0.0108532197270746, 'in' => 0.0106491174312177, 'er' => 0.0103475945805319, 'ze' => 0.0102203423201569, 'ás' => 0.010056312464586, 'ek' => 0.00987580838356414, 'ke' => 0.00984900437350613, 'ta' => 0.0095585237063659, 'le' => 0.00851628405945931, 'be' => 0.00841450006113273, 'an' => 0.00821413785970247, }; ${Lingua::Identify::languages{'ngrams3'}{'hu'}} = { 'sze' => 0.00712113004356113, 'ete' => 0.00578693572143545, 'kez' => 0.0043043096379576, 'zer' => 0.00425335229538513, 'eze' => 0.00385362260188614, 'bet' => 0.00383850455211879, 'teg' => 0.00380425106872983, 'hat' => 0.00376471155395368, 'ell' => 0.00373510977818544, 'ség' => 0.00358181486795705, 'tás' => 0.00344120643305791, 'agy' => 0.00342524261826861, 'ben' => 0.00330461538201303, 'zel' => 0.00328008819637649, 'int' => 0.00324467178608234, 'ban' => 0.00323219675200859, 'ció' => 0.00321750158475221, 'elé' => 0.00306188082071346, 'egy' => 0.00301663239203915, 'lés' => 0.00290361704098112, }; ${Lingua::Identify::languages{'ngrams4'}{'hu'}} = { 'szer' => 0.00506216118109664, 'bete' => 0.00446802091637175, 'eteg' => 0.00442825113527526, 'keze' => 0.0035923258001262, 'ezel' => 0.00343312002038645, 'alma' => 0.00318398893944126, 'elés' => 0.00308291796710051, 'lmaz' => 0.00296221541491914, 'vagy' => 0.00288634885798667, 'ysze' => 0.00267154137792412, 'gysz' => 0.00266204222638834, 'zelé' => 0.00262936514510524, 'alka' => 0.00242646326830088, 'lkal' => 0.00242152370950227, 'kalm' => 0.0023687084269633, 'tege' => 0.00234983677924554, 'yógy' => 0.00231070027491811, 'gyóg' => 0.00231070027491811, 'atás' => 0.00222672777534177, 'ógys' => 0.0022186218326979, }; Lingua-Identify-0.56/lib/Lingua/Identify/ID.pm000644 000765 000024 00000022670 11375521416 021127 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'id'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'id'}} = 'indonesian'; ${Lingua::Identify::languages{'_sets'}{'id'}} = ''; =head1 NAME Lingua::Identify::ID - Meta-information on Indonesian =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'id'}} = { 'd' => 0.124127989638618, 'm' => 0.122378261022979, 'p' => 0.108345146941919, 's' => 0.0934423575465608, 'k' => 0.0917175360998279, 't' => 0.0739691026569722, 'b' => 0.0644836224986457, 'a' => 0.0594222782172279, 'i' => 0.0401545074711129, 'y' => 0.0278638574147604, 'j' => 0.0273999613938882, 'h' => 0.0261400737667319, 'l' => 0.0234013229857884, 'r' => 0.0206739879905934, 'u' => 0.0204685038471198, 'n' => 0.0176249353970306, 'g' => 0.0116233454894155, 'o' => 0.0115569263723331, 'w' => 0.00993484699733701, 'c' => 0.00877873924062193, }; ${Lingua::Identify::languages{'prefixes2'}{'id'}} = { 'me' => 0.0907796863201201, 'pe' => 0.0575976669299471, 'se' => 0.0517172541293083, 'da' => 0.0452227613203371, 'ke' => 0.0387825158619253, 'te' => 0.0354929565240034, 'di' => 0.0342908352356071, 'be' => 0.0289713400397525, 'ka' => 0.0285633999635458, 'ya' => 0.0275598239781969, 'pa' => 0.023658354525965, 'ma' => 0.0229846024320172, 'ba' => 0.0218237091300461, 'sa' => 0.0196451355315806, 'ti' => 0.018562358414415, 'de' => 0.0162134481351931, 'in' => 0.016079999652817, 'ha' => 0.0160420265074254, 'it' => 0.0153324711621084, 'ja' => 0.0134511730447085, }; ${Lingua::Identify::languages{'prefixes3'}{'id'}} = { 'men' => 0.0579752838051444, 'yan' => 0.0283805144417301, 'ter' => 0.0241653494275997, 'per' => 0.0237677827274034, 'ber' => 0.0211021698519902, 'pen' => 0.0184688892082196, 'mem' => 0.0162966422378694, 'pem' => 0.0124934138046654, 'den' => 0.0110013411888681, 'tid' => 0.0102553048809695, 'unt' => 0.0102457249604828, 'dar' => 0.0101187910140346, 'seb' => 0.0091943286870719, 'mas' => 0.00905541984001533, 'dal' => 0.00899315035685204, 'kat' => 0.00870455525219141, 'pad' => 0.00861713847775063, 'mel' => 0.00826627388992671, 'mer' => 0.00817646213536428, 'aka' => 0.00795372898404943, }; ${Lingua::Identify::languages{'prefixes4'}{'id'}} = { 'meng' => 0.0252578089971604, 'deng' => 0.0128428295544791, 'tida' => 0.0125058243294329, 'untu' => 0.0123739527196322, 'dala' => 0.0109218997717156, 'meny' => 0.00869766528641048, 'ters' => 0.00860975087987668, 'memb' => 0.00809691684176286, 'pert' => 0.00806468155936713, 'peng' => 0.00770276725246966, 'mela' => 0.00734817914611667, 'menj' => 0.00723828613794943, 'jaka' => 0.00709029688695086, 'sela' => 0.0064177516769673, 'kare' => 0.00610272505355453, 'seba' => 0.00601774112723852, 'kepa' => 0.00598990156516948, 'indo' => 0.00568952734284567, 'menu' => 0.00568220114230119, 'mere' => 0.00556937765391615, }; ${Lingua::Identify::languages{'suffixes1'}{'id'}} = { 'n' => 0.208423928629156, 'a' => 0.199195983043689, 'i' => 0.13314145380031, 'g' => 0.0599289837163103, 't' => 0.052941023114184, 'h' => 0.048204518554438, 'k' => 0.0463962625204002, 'r' => 0.0443919556591427, 'u' => 0.0432458925787462, 's' => 0.0380797447167952, 'l' => 0.0367126030440019, 'm' => 0.0287808871299932, 'e' => 0.0129560397732902, 'p' => 0.0125034543530062, 'o' => 0.0118277046932274, 'd' => 0.00697857519018494, 'b' => 0.00529128668783598, 'f' => 0.00358001324385907, 'y' => 0.00315662688294827, 'c' => 0.00117839061041676, }; ${Lingua::Identify::languages{'suffixes2'}{'id'}} = { 'an' => 0.178856354705584, 'ng' => 0.0604172686187177, 'ya' => 0.0370460827232038, 'ah' => 0.0338394273596066, 'at' => 0.031146316598612, 'ta' => 0.0222394251789501, 'ak' => 0.0220791469271824, 'tu' => 0.0217433258282406, 'ri' => 0.021528531164307, 'da' => 0.0203902284782834, 'ra' => 0.020323718455441, 'si' => 0.019746934814726, 'ar' => 0.0194558171737602, 'ia' => 0.0163309364284118, 'al' => 0.0154423189101079, 'am' => 0.0154379575971346, 'ai' => 0.0147717670404675, 'ga' => 0.0138973237893268, 'na' => 0.0137283229116125, 'uk' => 0.0131559005838708, }; ${Lingua::Identify::languages{'suffixes3'}{'id'}} = { 'kan' => 0.0661704970989343, 'ang' => 0.0519187413172329, 'nya' => 0.0340039994808366, 'gan' => 0.0266455796603326, 'ara' => 0.0177476962125109, 'lah' => 0.0143899589957073, 'ari' => 0.0136821181865815, 'ada' => 0.0132663067774856, 'tan' => 0.0112401274834517, 'dak' => 0.0109925634653194, 'tuk' => 0.0108399390463738, 'lam' => 0.0106548669163137, 'asi' => 0.00953481999932701, 'aan' => 0.00850250207905705, 'ian' => 0.00846404553255106, 'rta' => 0.00805063765761175, 'ung' => 0.00792325034731069, 'pat' => 0.00778504713330481, 'nan' => 0.0077538011892687, 'ama' => 0.007381253394992, }; ${Lingua::Identify::languages{'suffixes4'}{'id'}} = { 'ngan' => 0.0323941216118769, 'akan' => 0.0195892310001678, 'ntuk' => 0.0131948798097441, 'anya' => 0.0129432264027102, 'idak' => 0.0124413912459934, 'alam' => 0.0119115945996062, 'ukan' => 0.0116952609689981, 'ikan' => 0.00897269486950814, 'atan' => 0.00819860310284236, 'ebut' => 0.00810000206032029, 'rang' => 0.0080146459339579, 'arta' => 0.00751281077724111, 'elah' => 0.00730972206279268, 'tara' => 0.0067769821017033, 'nnya' => 0.00676373718554362, 'alah' => 0.00635314478459351, 'rena' => 0.00599847536298429, 'jadi' => 0.00571738880892884, 'esia' => 0.0053995108210965, 'edia' => 0.00528177823301045, }; ${Lingua::Identify::languages{'smallwords'}{'id'}} = { 'yang' => 0.0240621642402856, 'dan' => 0.019671763849213, 'di' => 0.0194151224902359, 'itu' => 0.0138515044581252, 'dengan' => 0.00886125581134823, 'tidak' => 0.00858730134482109, 'untuk' => 0.00854961987544747, 'dari' => 0.00783570879353101, 'dalam' => 0.00754240438327146, 'pada' => 0.00691811409337875, 'akan' => 0.00664110437257807, 'ini' => 0.00549640243809291, 'tersebut' => 0.00504829847797416, 'jakarta' => 0.00486600163965313, 'ke' => 0.0046979626546086, 'juga' => 0.004097095980813, 'karena' => 0.00407570920089824, 'katanya' => 0.00372333654135032, 'ada' => 0.00366324987397076, 'indonesia' => 0.00364288151214718, }; ${Lingua::Identify::languages{'ngrams1'}{'id'}} = { 'a' => 0.191077880129202, 'n' => 0.101021140601234, 'e' => 0.0831982636337224, 'i' => 0.0766395501854289, 't' => 0.0548510587390836, 'r' => 0.0533023430690616, 'k' => 0.0496805838018902, 'u' => 0.0476917949857297, 'm' => 0.0460543810777092, 's' => 0.0452313161177858, 'd' => 0.0405832891836002, 'g' => 0.0358663886657666, 'p' => 0.0339555312494659, 'l' => 0.0334467554218721, 'b' => 0.0240406405414181, 'h' => 0.020855365474339, 'o' => 0.0199637687351529, 'y' => 0.015363765317109, 'j' => 0.0106494283321655, 'w' => 0.00575940389315195, }; ${Lingua::Identify::languages{'ngrams2'}{'id'}} = { 'an' => 0.0603791917653776, 'ng' => 0.0281082360349895, 'ka' => 0.0243674592546658, 'en' => 0.0240073439768314, 'er' => 0.0228453119603517, 'da' => 0.0199744213741827, 'me' => 0.0197494260982646, 'ta' => 0.0192894039099225, 'ar' => 0.0192380173637392, 'la' => 0.0163155355041112, 'ak' => 0.0159386325896752, 'di' => 0.0155496507659769, 'ya' => 0.015406546400311, 'at' => 0.0153488133085832, 'ga' => 0.0152587333073057, 'ra' => 0.0143818863857795, 'in' => 0.0137541925586958, 'se' => 0.01279954927243, 'pe' => 0.0127854230904115, 'al' => 0.0116811241656596, }; ${Lingua::Identify::languages{'ngrams3'}{'id'}} = { 'ang' => 0.0175417254200468, 'kan' => 0.0149755767012467, 'men' => 0.0138467977080635, 'nga' => 0.00947424555304177, 'nya' => 0.00909730579887892, 'eng' => 0.00886216476143769, 'aka' => 0.00790221805327194, 'ata' => 0.00726437886168785, 'ara' => 0.00703229822820463, 'dan' => 0.00696598947578086, 'ter' => 0.00694099617679036, 'gan' => 0.00690580153127312, 'per' => 0.00678389544027864, 'yan' => 0.0064278684464956, 'tan' => 0.00631565363470151, 'ber' => 0.00607107635172305, 'ala' => 0.00597620382902441, 'ada' => 0.00571657955991901, 'ela' => 0.0055949285025877, 'ing' => 0.00450976026580628, }; ${Lingua::Identify::languages{'ngrams4'}{'id'}} = { 'yang' => 0.00807657971827776, 'ngan' => 0.0079659462047873, 'akan' => 0.00686427633852389, 'enga' => 0.00660568716241366, 'meng' => 0.0057602738379399, 'anya' => 0.00368356282567324, 'atan' => 0.00366456851763422, 'pada' => 0.00346596136691038, 'ntuk' => 0.00322036829630054, 'anga' => 0.00316338537218347, 'untu' => 0.00308874107392485, 'dala' => 0.00306508149724466, 'alam' => 0.00304475425530816, 'deng' => 0.00299543570110157, 'idak' => 0.00293278780792023, 'tida' => 0.00292379050411227, 'kata' => 0.00281615609189113, 'ukan' => 0.00277083633937697, 'dari' => 0.00269485910722087, 'pert' => 0.00256623098611449, }; Lingua-Identify-0.56/lib/Lingua/Identify/IT.pm000644 000765 000024 00000022610 11375521416 021141 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'it'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'it'}} = 'italian'; ${Lingua::Identify::languages{'_sets'}{'it'}} = ''; =head1 NAME Lingua::Identify::IT - Meta-information on Italian =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'it'}} = { 'd' => 0.1351138547109, 'c' => 0.0989621545857807, 'p' => 0.0913318776989373, 's' => 0.0851115634059623, 'a' => 0.0783866991368492, 'i' => 0.0657609145963531, 'l' => 0.0466648552242447, 'r' => 0.0385573388645589, 'n' => 0.0364183246996655, 'm' => 0.0353754337856848, 'e' => 0.0343482811278405, 'u' => 0.0282844236164568, 't' => 0.0250136436794006, 'q' => 0.0219645879979133, 'f' => 0.0210114977218874, 'g' => 0.0181443577657416, 'v' => 0.0175657454077852, 'o' => 0.0175226966483532, 'C' => 0.0148657087006172, 'P' => 0.0115884483051518, }; ${Lingua::Identify::languages{'prefixes2'}{'it'}} = { 'de' => 0.0669933027010953, 'co' => 0.0565046859650142, 'ch' => 0.0367524786890019, 'pr' => 0.0341918297354654, 'pe' => 0.0290361114700484, 'qu' => 0.0267750889145568, 'ri' => 0.0242104900838333, 'in' => 0.0238431515054393, 'al' => 0.0216690262480624, 'no' => 0.0202098287615386, 'di' => 0.0198989170001021, 'so' => 0.0189368397709748, 'po' => 0.0184628545085309, 'ne' => 0.0167384509823062, 'pa' => 0.0167271656189147, 're' => 0.0159749961488697, 'Co' => 0.0144401867276227, 'es' => 0.0143098407804506, 'su' => 0.0136925314029343, 'st' => 0.0136377973904854, }; ${Lingua::Identify::languages{'prefixes3'}{'it'}} = { 'del' => 0.0353390354828556, 'con' => 0.0260104038901974, 'pro' => 0.0233551596131511, 'que' => 0.0189150538918256, 'com' => 0.0187820881589945, 'all' => 0.0120639266680416, 'eur' => 0.0116216120873992, 'par' => 0.0115537724277915, 'pre' => 0.0115130686320268, 'qua' => 0.0107071334758871, 'int' => 0.0103964278348837, 'Com' => 0.00964408600983404, 'per' => 0.00906066493720761, 'sta' => 0.00827983045512271, 'imp' => 0.00816857341336604, 'pos' => 0.00793520498431547, 'ris' => 0.00784972701320974, 'tut' => 0.00737077901637921, 'son' => 0.00733685918657535, 'anc' => 0.00711027472348555, }; ${Lingua::Identify::languages{'prefixes4'}{'it'}} = { 'dell' => 0.0268840361011816, 'ques' => 0.0174909648466943, 'euro' => 0.0120054173004797, 'inte' => 0.0103280560657234, 'cont' => 0.0102188607593017, 'Comm' => 0.0092838759480655, 'prop' => 0.00781883892024041, 'tutt' => 0.00759589850296268, 'part' => 0.00736234187533839, 'rela' => 0.00708252890263267, 'comp' => 0.00705371347454915, 'Pres' => 0.00679209971958039, 'cons' => 0.00656840100156362, 'stat' => 0.00641901575597276, 'esse' => 0.00619152553426079, 'poli' => 0.00602242446945489, 'poss' => 0.00591019596007698, 'Unio' => 0.00586848941942979, 'Stat' => 0.0057888678418306, 'pres' => 0.00571531267014373, }; ${Lingua::Identify::languages{'suffixes1'}{'it'}} = { 'e' => 0.25106195796619, 'o' => 0.196356421523459, 'i' => 0.195523894633026, 'a' => 0.174697294519121, 'l' => 0.0592182211335886, 'n' => 0.0534173886066982, 'r' => 0.0224328491589071, 'à' => 0.0163981867768059, 'd' => 0.00519935731517023, 'ù' => 0.00417143201107566, 'é' => 0.0038135102723076, 'ò' => 0.00353661868138483, 'u' => 0.00211882260880034, 't' => 0.00166829498678051, 'ì' => 0.00162060295690585, 's' => 0.00132241201283518, 'E' => 0.000838546272844714, 'm' => 0.000538966240525261, 'g' => 0.00046488153392385, 'h' => 0.000460251239761262, }; ${Lingua::Identify::languages{'suffixes2'}{'it'}} = { 'to' => 0.0767054367674178, 're' => 0.0647189073648144, 'ne' => 0.0590636490241265, 'ti' => 0.0413349614819463, 'he' => 0.0412649902153084, 'te' => 0.0398610506072829, 'le' => 0.0378177767645738, 'no' => 0.0359872381438203, 'la' => 0.0291678610686644, 'li' => 0.0265168529181404, 'ta' => 0.0260795325016534, 'ni' => 0.0228574685298585, 'er' => 0.0219224492651888, 'on' => 0.0213231792234995, 'io' => 0.0206482950710885, 'ia' => 0.0189023991115906, 'el' => 0.0188905491390149, 'ro' => 0.0179831798103553, 'll' => 0.0175757664675119, 'mo' => 0.0167507826624744, }; ${Lingua::Identify::languages{'suffixes3'}{'it'}} = { 'one' => 0.0600624395379854, 'lla' => 0.0320474117169788, 'are' => 0.0291900091312058, 'nte' => 0.028649327777725, 'nto' => 0.0258448400546516, 'ato' => 0.0208769485219793, 'ere' => 0.0208545614646834, 'ale' => 0.0182379893437607, 'oni' => 0.0155332257850056, 'lle' => 0.0145658335515506, 'tto' => 0.0145597279904699, 'ono' => 0.0144783205093937, 'ati' => 0.0142388468358947, 'nti' => 0.013879975523484, 'ell' => 0.0132728113937911, 'amo' => 0.0131900471213636, 'ità' => 0.0129295431819199, 'sto' => 0.0119126280641437, 'nza' => 0.0110497087647364, 'ali' => 0.0106399577766531, }; ${Lingua::Identify::languages{'suffixes4'}{'it'}} = { 'ione' => 0.0648962339694981, 'ente' => 0.0264897598912898, 'ella' => 0.0221302823309593, 'ento' => 0.0188157447737834, 'ioni' => 0.0169889955229856, 'iamo' => 0.0144031860812172, 'elle' => 0.0107041894662768, 'esto' => 0.0105070309751156, 'enti' => 0.0103841860690843, 'enza' => 0.00921715946178765, 'tato' => 0.00871137210177018, 'tati' => 0.00867042379975977, 'tare' => 0.00788785625022749, 'anno' => 0.00678831851105908, 'opea' => 0.00647210662331202, 'anto' => 0.00617636888657017, 'nche' => 0.00608385605610221, 'esso' => 0.00591323813105883, 'ario' => 0.0057699190740224, 'etto' => 0.00561067567731525, }; ${Lingua::Identify::languages{'smallwords'}{'it'}} = { 'di' => 0.0404712161287194, 'e' => 0.024265952957028, 'che' => 0.0242165839626788, 'la' => 0.0203452683969407, 'in' => 0.0170558953132659, 'il' => 0.0164822538125534, 'per' => 0.0143847268933446, 'a' => 0.0122985592117738, 'è' => 0.0107440912304064, 'del' => 0.0103504499568786, 'un' => 0.0101844303298635, 'della' => 0.00993452710183032, 'non' => 0.00924292428718599, 'le' => 0.0085565641976053, 'i' => 0.0084324863710993, 'una' => 0.00804627229141163, 'dei' => 0.00627073606986106, 'si' => 0.00605185229845436, 'con' => 0.00532136593958808, 'delle' => 0.00502253061096097, }; ${Lingua::Identify::languages{'ngrams1'}{'it'}} = { 'e' => 0.12136047454886, 'i' => 0.11609569440686, 'a' => 0.0971568588496762, 'o' => 0.094846763072041, 'n' => 0.0739149144632905, 't' => 0.0675850153568822, 'r' => 0.067518949924832, 'l' => 0.0608219683940779, 's' => 0.0517923206889739, 'c' => 0.040250462438432, 'd' => 0.0363400628381788, 'p' => 0.0312179150643056, 'u' => 0.0306216020483262, 'm' => 0.0283023370470257, 'g' => 0.0158270204795786, 'v' => 0.0142759326609079, 'z' => 0.011803925634618, 'h' => 0.00953058512107725, 'b' => 0.00925072787570169, 'f' => 0.00908113642260936, }; ${Lingua::Identify::languages{'ngrams2'}{'it'}} = { 'on' => 0.0260491501799113, 're' => 0.0223789666457896, 'er' => 0.0186172539881369, 'en' => 0.0170285425032851, 'co' => 0.016882997764924, 'io' => 0.0164447325277235, 'di' => 0.015895365901583, 'nt' => 0.0158536309238723, 'de' => 0.0156131031327905, 'to' => 0.015478303951909, 'ti' => 0.0152737545898881, 'ne' => 0.014905815188692, 'la' => 0.0147395468751919, 'ri' => 0.0146032126146704, 'te' => 0.0142329705940144, 'ta' => 0.0140828206167336, 'in' => 0.0139399622677424, 'si' => 0.0134852908668902, 'al' => 0.0127867336882437, 'es' => 0.0127537294759852, }; ${Lingua::Identify::languages{'ngrams3'}{'it'}} = { 'ion' => 0.0143770673667643, 'ent' => 0.0119098720367062, 'one' => 0.0109244219178725, 'zio' => 0.00929757604442525, 'del' => 0.00925074205253176, 'che' => 0.00907090920510614, 'ell' => 0.00870652380564547, 'con' => 0.00844016304038046, 'men' => 0.00750408829284533, 'nte' => 0.00743583410310909, 'per' => 0.00714212325472282, 'azi' => 0.00626958299231451, 'lla' => 0.00608926607262125, 'are' => 0.00587687936519731, 'com' => 0.00528933665035787, 'sta' => 0.00505916028709842, 'est' => 0.0049691228453187, 'pro' => 0.00483987554986071, 'ere' => 0.00471389574220925, 'nto' => 0.00468823991202471, }; ${Lingua::Identify::languages{'ngrams4'}{'it'}} = { 'ione' => 0.0132327931708034, 'zion' => 0.0114075165716104, 'ment' => 0.00857670483037391, 'dell' => 0.00795030588942872, 'azio' => 0.00743621507144199, 'ente' => 0.00559815167635664, 'amen' => 0.00524767022267974, 'ella' => 0.00457027815594684, 'ques' => 0.00395127398852978, 'uest' => 0.0039389493659829, 'ento' => 0.00385483381710044, 'sion' => 0.0038448200612811, 'ioni' => 0.0035229933550257, 'pres' => 0.00344935373530809, 'euro' => 0.0033424376347139, 'urop' => 0.003059587547263, 'iamo' => 0.00294512261535885, 'enti' => 0.00267582961270952, 'stat' => 0.00266843483918139, 'comm' => 0.00256567829869678, }; Lingua-Identify-0.56/lib/Lingua/Identify/LA.pm000644 000765 000024 00000022633 11375521416 021126 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'la'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'la'}} = 'latin'; ${Lingua::Identify::languages{'_sets'}{'la'}} = ''; =head1 NAME Lingua::Identify::LA - Meta-information on Latin =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'la'}} = { 'e' => 0.117626695130584, 'i' => 0.0861085283015518, 's' => 0.0859420942726722, 'a' => 0.0700576305564001, 'p' => 0.0688903732338576, 'd' => 0.0636155240785657, 'c' => 0.0625991669422074, 'n' => 0.0545149113794274, 'q' => 0.054033362255869, 'm' => 0.0356257586617816, 'u' => 0.0311697649285776, 't' => 0.0286976648196188, 'f' => 0.0283270717153135, 'v' => 0.0255021314651299, 'h' => 0.0224841277414458, 'o' => 0.0213967587527656, 'r' => 0.0201296410128953, 'l' => 0.0160020770966804, 'C' => 0.011279788917269, 'S' => 0.0100748065481804, }; ${Lingua::Identify::languages{'prefixes2'}{'la'}} = { 'qu' => 0.0627570716461095, 'no' => 0.034137825215779, 'co' => 0.0302593474151647, 'pr' => 0.0298493221689562, 'de' => 0.0286192464303305, 'in' => 0.0276599420807105, 'su' => 0.022827317480743, 'es' => 0.0209602842841707, 'di' => 0.0206147284162968, 're' => 0.0192660290215354, 'se' => 0.0186625956403228, 'pe' => 0.0146242337814385, 'si' => 0.014487558699369, 'ma' => 0.013915070619757, 'po' => 0.0135772510772833, 'cu' => 0.013355476415812, 'ca' => 0.0133167947888112, 'pa' => 0.0129454511696035, 'au' => 0.01150391587004, 'al' => 0.0114884432192397, }; ${Lingua::Identify::languages{'prefixes3'}{'la'}} = { 'qua' => 0.0226904722624481, 'con' => 0.0193206606419445, 'quo' => 0.0174295333589965, 'pro' => 0.0160883300032384, 'qui' => 0.0147990639072706, 'ill' => 0.0114231420209093, 'ali' => 0.0110565260694493, 'pra' => 0.0102285850457353, 'ips' => 0.00889654708876383, 'eni' => 0.0084230014847946, 'Chr' => 0.00817859085048791, 'int' => 0.00801972393818855, 'per' => 0.00795251101375421, 'omn' => 0.00787307755760453, 'pos' => 0.00675184377272255, 'fac' => 0.00622330577603432, 'dic' => 0.00605221833201963, 'car' => 0.00597278487586995, 'hom' => 0.00596667461001228, 'deu' => 0.00585668982457427, }; ${Lingua::Identify::languages{'prefixes4'}{'la'}} = { 'prae' => 0.0118987163472053, 'Chri' => 0.00982663459079165, 'inte' => 0.00813264288653221, 'cons' => 0.00724301923440471, 'prop' => 0.0065572676692231, 'etia' => 0.00608650848663896, 'omni' => 0.00590487699099627, 'anim' => 0.00588634316491028, 'homi' => 0.00576031314752555, 'domi' => 0.00502266686930317, 'cont' => 0.00474836624323052, 'aute' => 0.00468164446932096, 'habe' => 0.00462604299106299, 'fili' => 0.0046112159301942, 'carn' => 0.00459638886932541, 'temp' => 0.00439622354759672, 'spir' => 0.00438881001716232, 'nost' => 0.00417752439978204, 'mort' => 0.0040774417389177, 'faci' => 0.003895810243275, }; ${Lingua::Identify::languages{'suffixes1'}{'la'}} = { 's' => 0.173730178655937, 'm' => 0.164285254053141, 't' => 0.161323578095596, 'e' => 0.120108957681139, 'i' => 0.0785855053401261, 'a' => 0.0651435401914758, 'o' => 0.0627462045889314, 'n' => 0.0486776883374844, 'r' => 0.0402859028223683, 'd' => 0.0316030598800666, 'c' => 0.0207472843897681, 'x' => 0.00727643567964131, 'l' => 0.00606110428520962, 'u' => 0.00522348100239289, 'b' => 0.00434808690841467, 'I' => 0.00147083982287711, 'f' => 0.00120644414474664, 'X' => 0.000995371964726506, 'h' => 0.00062877291521786, 'g' => 0.000562118542579924, }; ${Lingua::Identify::languages{'suffixes2'}{'la'}} = { 'um' => 0.088399586990191, 'us' => 0.0740887971089313, 'is' => 0.065126484254001, 'am' => 0.0485802787816211, 'em' => 0.0430614352090862, 'it' => 0.0375967991739804, 'ae' => 0.0263035622096025, 'ur' => 0.0245302013422819, 'nt' => 0.0221011874032008, 'on' => 0.0219385647909138, 'es' => 0.0219101703665462, 're' => 0.020665978316985, 'ia' => 0.0198760970573051, 'st' => 0.0198502839442437, 'ue' => 0.0181621063500258, 'os' => 0.0164816726897264, 'at' => 0.0155085183273103, 'et' => 0.0152762003097574, 'ta' => 0.0152581311306144, 'er' => 0.0152452245740836, }; ${Lingua::Identify::languages{'suffixes3'}{'la'}} = { 'tur' => 0.0256964684941478, 'rum' => 0.0212465325671084, 'que' => 0.0208183600380462, 'tum' => 0.0182952004917867, 'ius' => 0.0173776879295106, 'tis' => 0.0173715711790954, 'bus' => 0.0164204164895358, 'tus' => 0.0150533227717443, 'iam' => 0.0138574980655777, 'tem' => 0.0135853026721024, 'nem' => 0.0130562037611898, 'unt' => 0.0126800236106566, 'nis' => 0.0125943891048442, 'ris' => 0.012114224197253, 'ere' => 0.0120714069443467, 'ium' => 0.0116340592896618, 'uam' => 0.0112181202614299, 'mus' => 0.0108572319869346, 'uod' => 0.010468818335571, 'sse' => 0.00915371699630854, }; ${Lingua::Identify::languages{'suffixes4'}{'la'}} = { 'ibus' => 0.0185515744159724, 'orum' => 0.0133338281409459, 'tiam' => 0.0103315829514037, 'itur' => 0.00931104225038502, 'onem' => 0.00859851928821925, 'atur' => 0.00810494869463567, 'atis' => 0.00780064201287737, 'ntur' => 0.00773384298517433, 'ione' => 0.00716976230679309, 'erit' => 0.00690627725307554, 'imus' => 0.00661681479969569, 'ndum' => 0.0064535282875327, 'ique' => 0.00596737980813835, 'etur' => 0.00581893752435381, 'onis' => 0.00580409329597536, 'utem' => 0.00563338466962314, 'ntes' => 0.0055665856419201, 'oris' => 0.005039615534485, 'tate' => 0.00488004007941662, 'atem' => 0.00488004007941662, }; ${Lingua::Identify::languages{'smallwords'}{'la'}} = { 'et' => 0.0440819036054981, 'in' => 0.0209212580064409, 'non' => 0.0157996567392752, 'de' => 0.00923904937366472, 'ut' => 0.00814666518214456, 'est' => 0.00758996939223525, 'qui' => 0.00713620980498841, 'si' => 0.00705007951296471, 'ad' => 0.00690302779487546, 'quod' => 0.00640515269220185, 'cum' => 0.00606693374059657, 'a' => 0.00550813721185741, 'enim' => 0.00533167515015031, 'ex' => 0.00474136753896345, 'quae' => 0.00448927887938188, 'quam' => 0.00442625671448648, 'sed' => 0.00404392224745443, 'per' => 0.00384645279744886, 'nec' => 0.00373091216180731, 'etiam' => 0.00334647695594541, }; ${Lingua::Identify::languages{'ngrams1'}{'la'}} = { 'i' => 0.118506109784777, 'e' => 0.116374487131667, 't' => 0.0864145540301524, 'a' => 0.0823461511441558, 'u' => 0.0807470460220125, 's' => 0.0768738736885494, 'n' => 0.0678963733304974, 'r' => 0.0607500812166861, 'o' => 0.0575623505448019, 'm' => 0.0541743436003386, 'c' => 0.0387949889013566, 'd' => 0.0343074031192641, 'l' => 0.0274219359185792, 'p' => 0.0268148969838626, 'q' => 0.0156506591845107, 'b' => 0.0131076162815294, 'g' => 0.00977161906676691, 'v' => 0.00915332429502558, 'f' => 0.00880905093499133, 'h' => 0.00837550714338112, }; ${Lingua::Identify::languages{'ngrams2'}{'la'}} = { 'er' => 0.0216199207645401, 'qu' => 0.0188950814196715, 'in' => 0.0188710220695004, 'is' => 0.0188356406721899, 'ti' => 0.0185157928405035, 'um' => 0.0183770977630465, 'et' => 0.0182992586889636, 'it' => 0.0179963939279861, 'us' => 0.015872566585431, 'at' => 0.01554422721839, 'ri' => 0.0146606357895571, 'en' => 0.0145497740779844, 'te' => 0.0140832113854506, 'tu' => 0.0140648130588492, 'nt' => 0.0140124485908297, 're' => 0.0139369682765674, 'on' => 0.0137982731991105, 'de' => 0.013651086586299, 'es' => 0.0135038999734875, 'st' => 0.0126127605132284, }; ${Lingua::Identify::languages{'ngrams3'}{'la'}} = { 'qui' => 0.00750578404860638, 'est' => 0.00672099858259358, 'tur' => 0.00658920255013341, 'qua' => 0.00650113883753503, 'ent' => 0.00644422691442723, 'ter' => 0.0057343255577668, 'eri' => 0.00551386673983343, 'ati' => 0.00537188646850135, 'que' => 0.00527004407978213, 'ita' => 0.00517898500280965, 'per' => 0.00513405453719823, 'non' => 0.00508792592583717, 'ant' => 0.00475843584468676, 'quo' => 0.00465180087296899, 'tia' => 0.00462604073935178, 'ere' => 0.00462304537497768, 'tio' => 0.0044810651036456, 'rum' => 0.00443074298216081, 'tat' => 0.00442535132628744, 'ris' => 0.00436544403880554, }; ${Lingua::Identify::languages{'ngrams4'}{'la'}} = { 'tion' => 0.00408024401872433, 'ibus' => 0.00398738705979754, 'ione' => 0.00353870847674792, 'omin' => 0.0034169800263901, 'enti' => 0.00341307847349402, 'atio' => 0.00307364337153474, 'itat' => 0.00291758125569139, 'erit' => 0.00289885380179019, 'quod' => 0.00289651287005254, 'orum' => 0.00284111081892815, 'prae' => 0.00269519274061462, 'quam' => 0.00265305596933692, 'ntia' => 0.00261638137211373, 'atur' => 0.00248997105828062, 'quae' => 0.00248997105828062, 'rist' => 0.00234639391170474, 'enim' => 0.00225431726335716, 'tate' => 0.00222856701424301, 'tiam' => 0.00222856701424301, 'chri' => 0.00217472558427705, }; Lingua-Identify-0.56/lib/Lingua/Identify/NL.pm000644 000765 000024 00000022573 11375521416 021146 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'nl'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'nl'}} = 'dutch'; ${Lingua::Identify::languages{'_sets'}{'nl'}} = ''; =head1 NAME Lingua::Identify::NL - Meta-information on Dutch =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'nl'}} = { 'd' => 0.139214911353547, 'v' => 0.0962097071027891, 'e' => 0.0677926860385065, 'h' => 0.0611730331315269, 'o' => 0.0605082230313152, 'i' => 0.0508691330727488, 'm' => 0.0472041430930432, 'w' => 0.0458767112076304, 'b' => 0.0425767321716882, 'g' => 0.0394242455674585, 't' => 0.0392618725936741, 'a' => 0.0383090802380712, 'z' => 0.035350040680776, 'n' => 0.0267407717661847, 's' => 0.024632986748002, 'k' => 0.016894229544616, 'p' => 0.0162823766676602, 'l' => 0.014459947926856, 'r' => 0.0138870470570883, 'E' => 0.0120536767412313, }; ${Lingua::Identify::languages{'prefixes2'}{'nl'}} = { 'he' => 0.0587293298136506, 'va' => 0.0540563939647771, 'ge' => 0.0373741122442058, 'be' => 0.0359177488294224, 'da' => 0.0329327828452411, 'vo' => 0.0306354674397175, 'ee' => 0.0286783929379885, 've' => 0.026656799496642, 'di' => 0.0228810629192008, 'me' => 0.0217528086417881, 'on' => 0.0195757080126457, 'zi' => 0.0180024384850512, 'aa' => 0.0168306201372985, 'mo' => 0.0145873462367536, 'st' => 0.014281294856517, 'de' => 0.0138688147981081, 'we' => 0.0138445512652605, 'ni' => 0.0129798871856011, 'wo' => 0.0128855902738526, 'wi' => 0.0128409233156559, }; ${Lingua::Identify::languages{'prefixes3'}{'nl'}} = { 'voo' => 0.0336157232771447, 'ver' => 0.0317753408552715, 'nie' => 0.0171741106848781, 'wor' => 0.0167324798182308, 'zij' => 0.0159504886112537, 'ove' => 0.0137804059044524, 'Eur' => 0.0137765987418089, 'moe' => 0.0136212665059536, 'hee' => 0.0120321568185522, 'aan' => 0.0111321435696262, 'maa' => 0.0091067330432783, 'bes' => 0.00893464929179161, 'uit' => 0.0089338878592629, 'ond' => 0.00881967297995757, 'waa' => 0.00877703275835025, 'dez' => 0.00850215561548876, 'heb' => 0.00830189886044008, 'bel' => 0.00817550106067551, 'ste' => 0.00808108342711644, 'pro' => 0.00772321013862641, }; ${Lingua::Identify::languages{'prefixes4'}{'nl'}} = { 'word' => 0.0208805718360983, 'voor' => 0.0177624726354812, 'Euro' => 0.0171925077785405, 'onde' => 0.0106002006810937, 'hebb' => 0.0100751074293057, 'heef' => 0.0098469305435288, 'moet' => 0.00931610900172135, 'Comm' => 0.00882347606004394, 'waar' => 0.00876523844484564, 'vers' => 0.00849982767394191, 'kunn' => 0.00701810998889666, 'bela' => 0.00699328739881214, 'over' => 0.00695891765869511, 'Parl' => 0.00589822929008347, 'betr' => 0.00583235395485917, 'Voor' => 0.00555930546392944, 'alle' => 0.00539604919837355, 'land' => 0.00524043065284367, 'ande' => 0.00500366133203747, 'comm' => 0.00476211843621501, }; ${Lingua::Identify::languages{'suffixes1'}{'nl'}} = { 'n' => 0.284426590855065, 'e' => 0.212059578182684, 't' => 0.1496310181989, 'r' => 0.071050612036551, 's' => 0.0567089808992763, 'g' => 0.0407694574039109, 'd' => 0.0390634336999438, 'k' => 0.0350144182424147, 'l' => 0.0263446483082698, 'p' => 0.0164587290670629, 'j' => 0.0147487653545417, 'm' => 0.0130694461529975, 'a' => 0.00815275325608874, 'f' => 0.0067566768917185, 'u' => 0.00547179632432224, 'h' => 0.00434758055019155, 'o' => 0.00398816421430319, 'w' => 0.00300447541193884, 'i' => 0.00179401722834407, 'b' => 0.00123891380093063, }; ${Lingua::Identify::languages{'suffixes2'}{'nl'}} = { 'en' => 0.203399261274715, 'an' => 0.0728330412475171, 'et' => 0.0712680200772695, 'er' => 0.0416886787978785, 'at' => 0.037371360002294, 'ie' => 0.0366666041687576, 'ng' => 0.0309243330462106, 'or' => 0.0212022318370966, 'ij' => 0.0185718051344606, 'de' => 0.0181725537858844, 'ar' => 0.0160092070449114, 'jn' => 0.0153380897918489, 'el' => 0.0152118072520368, 'it' => 0.0150204533598324, 'le' => 0.0119758861008694, 'nt' => 0.0116389488439909, 'jk' => 0.0115887666993494, 'te' => 0.0109391560797047, 'ze' => 0.0105426619918231, 'ls' => 0.0103998358878435, }; ${Lingua::Identify::languages{'suffixes3'}{'nl'}} = { 'ing' => 0.0383178929270579, 'ten' => 0.0376379414449845, 'den' => 0.0375473320089635, 'gen' => 0.0334089934814506, 'oor' => 0.028044762584242, 'aar' => 0.0218764681203263, 'ijn' => 0.0211782424662822, 'len' => 0.0181835625345972, 'ken' => 0.0161170582542531, 'ijk' => 0.0160013218317725, 'eer' => 0.0156594425311556, 'ren' => 0.0152627407146267, 'nen' => 0.0147891350910549, 'tie' => 0.0145622307890864, 'iet' => 0.0143787276455481, 'eid' => 0.0132251705399028, 'men' => 0.0128277072995419, 'ent' => 0.0112348086427696, 'ter' => 0.010734553185074, 'eft' => 0.0100035863062492, }; ${Lingua::Identify::languages{'suffixes4'}{'nl'}} = { 'ngen' => 0.0184660623332235, 'lijk' => 0.0173910552922138, 'rden' => 0.0167351673373527, 'eren' => 0.0130165595001122, 'llen' => 0.0129564125697537, 'eten' => 0.0116828251872431, 'heid' => 0.0111959214652938, 'nnen' => 0.0108808661157972, 'eeft' => 0.0105285769522691, 'ijke' => 0.0101610123778564, 'bben' => 0.0100827258970724, 'ssie' => 0.0100655410598271, 'nden' => 0.00985454944698239, 'ment' => 0.00977435353983779, 'pese' => 0.00954426766338723, 'ende' => 0.0091623823912701, 'eden' => 0.00907072992596199, 'aten' => 0.00881486679364352, 'ring' => 0.00782482922567988, 'elen' => 0.00781814623341782, }; ${Lingua::Identify::languages{'smallwords'}{'nl'}} = { 'de' => 0.0715636306167488, 'van' => 0.0391846555891462, 'het' => 0.0292177526882981, 'en' => 0.0252711092239006, 'een' => 0.0192213259419518, 'dat' => 0.0183620952031132, 'in' => 0.0178627948748341, 'te' => 0.0142135607364115, 'is' => 0.012204202545737, 'voor' => 0.0109776604349644, 'op' => 0.010504844732794, 'die' => 0.00984055820908353, 'met' => 0.00847811957417931, 'niet' => 0.00741916870402909, 'zijn' => 0.00738964485853085, 'om' => 0.00730454671562415, 'aan' => 0.00656297718458004, 'ik' => 0.00583313035689553, 'ook' => 0.00548231760450464, 'worden' => 0.00504423583821454, }; ${Lingua::Identify::languages{'ngrams1'}{'nl'}} = { 'e' => 0.192288205591257, 'n' => 0.100867369540107, 'i' => 0.0718293741978683, 'a' => 0.0702511291166097, 't' => 0.0679412672438682, 'r' => 0.0624315716276347, 'o' => 0.0600369816248521, 'd' => 0.0576705527057536, 's' => 0.0387130825538495, 'l' => 0.0361203038229936, 'g' => 0.032204607169616, 'v' => 0.0276689585435381, 'm' => 0.0256460948453871, 'h' => 0.022836434956998, 'k' => 0.0212814533796525, 'u' => 0.018390167161744, 'b' => 0.0151804566323801, 'p' => 0.0149943486010757, 'w' => 0.0147696313159087, 'c' => 0.014271547453181, }; ${Lingua::Identify::languages{'ngrams2'}{'nl'}} = { 'en' => 0.0568600335719896, 'de' => 0.0360278616776171, 'er' => 0.0303496933248675, 'an' => 0.0235537310040066, 'te' => 0.0214265272960003, 'ge' => 0.0191468423475562, 'in' => 0.017825438712536, 'et' => 0.0171547422419142, 'ie' => 0.015833941299544, 'he' => 0.0157389167584015, 'ee' => 0.0155979871270877, 'aa' => 0.0151214581385255, 'el' => 0.0149464763724894, 'ij' => 0.0148414069538477, 'or' => 0.0131743590840799, 'at' => 0.0117277962754197, 'st' => 0.0114956591564131, 'va' => 0.0113359456041758, 're' => 0.0113294164338013, 've' => 0.0110718657747217, }; ${Lingua::Identify::languages{'ngrams3'}{'nl'}} = { 'van' => 0.0128230063508211, 'ing' => 0.0101819365950328, 'het' => 0.0101214501011193, 'oor' => 0.0100139040720707, 'ver' => 0.00979072976694189, 'een' => 0.00872217849407923, 'gen' => 0.00804209651402257, 'ten' => 0.00782257225594026, 'nde' => 0.00749133048647045, 'den' => 0.00747177666300706, 'aar' => 0.00728301708717381, 'voo' => 0.00719619811099637, 'dat' => 0.00666524662455381, 'men' => 0.00637141783731062, 'ste' => 0.00602479372671628, 'aan' => 0.0057782851922545, 'der' => 0.00549397259909682, 'ijk' => 0.00543296466989105, 'lij' => 0.00536231018777667, 'and' => 0.00508946917105085, }; ${Lingua::Identify::languages{'ngrams4'}{'nl'}} = { 'voor' => 0.00942255181379214, 'lijk' => 0.00593281897245617, 'elij' => 0.0044179041288002, 'nder' => 0.00437454280692859, 'word' => 0.00381494275537301, 'orde' => 0.00368315165110183, 'ngen' => 0.0036128175384597, 'over' => 0.00360274542038716, 'zijn' => 0.00359711186282117, 'ande' => 0.00335401531815516, 'niet' => 0.00332635967192209, 'euro' => 0.00331867754796846, 'eren' => 0.00321215209581145, 'inge' => 0.00319627570630728, 'rden' => 0.00316930291553675, 'urop' => 0.00304758392933812, 'onde' => 0.00296717769862345, 'emen' => 0.00291391497254494, 'moet' => 0.00285228726705026, 'ment' => 0.0028232659098921, }; Lingua-Identify-0.56/lib/Lingua/Identify/Nothing.pm000644 000765 000024 00000001216 12150145321 022217 0ustar00ambsstaff000000 000000 use strict; 1; =head1 NAME Lingua::Identify::Nothing - Module for tests =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION This module serves to exercise part of I's code. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2004 by Jose Alves de Castro This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut Lingua-Identify-0.56/lib/Lingua/Identify/PL.pm000644 000765 000024 00000022667 11375521416 021154 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'pl'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'pl'}} = 'polish'; ${Lingua::Identify::languages{'_sets'}{'pl'}} = ''; =head1 NAME Lingua::Identify::PL - Meta-information on Polish =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'pl'}} = { 'p' => 0.122687536683372, 'n' => 0.0625317452255181, 's' => 0.061888375321685, 'd' => 0.0601762494920764, 'w' => 0.0570482866043614, 'z' => 0.0551478057700122, 'o' => 0.0472629972007766, 'l' => 0.0463162840308818, 'm' => 0.0447297282044336, 'k' => 0.0335328344394781, 'c' => 0.0319815510858278, 't' => 0.0317431091697142, 'P' => 0.0300641394645356, 'r' => 0.028122037112285, 'b' => 0.0248621551763059, 'j' => 0.0209617251343176, 'i' => 0.0162133448462685, 'N' => 0.0161265745631857, 'u' => 0.0160800148990925, 'a' => 0.0145886947491986, }; ${Lingua::Identify::languages{'prefixes2'}{'pl'}} = { 'pr' => 0.0472917205421669, 'po' => 0.047002922052281, 'za' => 0.0305274714724387, 'le' => 0.0282317944683862, 'ni' => 0.025835463835151, 'wy' => 0.0244696715130099, 'st' => 0.0229033461911892, 'pa' => 0.0198140540339652, 'na' => 0.0195159644398471, 'je' => 0.0160782558739135, 'do' => 0.0160155409203458, 'lu' => 0.0144174709923983, 'cz' => 0.0130648410679195, 'da' => 0.0130299994270486, 'mo' => 0.0113676660281613, 'in' => 0.0111810696848303, 'ob' => 0.0107753581333552, 'ko' => 0.0107691640638671, 'si' => 0.0106313460177553, 'mi' => 0.0095876453089989, }; ${Lingua::Identify::languages{'prefixes3'}{'pl'}} = { 'prz' => 0.0339219148518737, 'pod' => 0.0172164359256511, 'pac' => 0.0165067260366285, 'nie' => 0.0151009545256801, 'lek' => 0.0141754314131928, 'lec' => 0.0136124403954585, 'sto' => 0.0118398717062893, 'daw' => 0.0111506342179116, 'pro' => 0.0102353473057468, 'wys' => 0.0098199281759944, 'jes' => 0.00918784280608372, 'pow' => 0.009124719570762, 'nal' => 0.00905306616850492, 'roz' => 0.00810962970545334, 'bad' => 0.00763705845723401, 'dzi' => 0.00751507707005826, 'zwi' => 0.0073717702655441, 'któ' => 0.00688725678361526, 'moż' => 0.0065255777055557, 'pre' => 0.00592164188653172, }; ${Lingua::Identify::languages{'prefixes4'}{'pl'}} = { 'prze' => 0.0257740177641884, 'pacj' => 0.0181271630759996, 'lecz' => 0.0148341290013975, 'stos' => 0.0117286787533179, 'przy' => 0.0103180483778689, 'nale' => 0.00972528348605783, 'poda' => 0.00967932544855983, 'dawk' => 0.00955270636565715, 'wyst' => 0.00894118309119388, 'bada' => 0.00837561785422861, 'któr' => 0.00756619364278412, 'prep' => 0.00630844408595091, 'wstr' => 0.00526829176788377, 'leka' => 0.0052551609000272, 'niep' => 0.0052476575469663, 'stęż' => 0.00522796124518144, 'dzia' => 0.0051641827441638, 'odpo' => 0.00515949314850074, 'prod' => 0.0050657012352395, 'szcz' => 0.00504037741865896, }; ${Lingua::Identify::languages{'suffixes1'}{'pl'}} = { 'e' => 0.131996829465064, 'a' => 0.129745382308019, 'o' => 0.0750529482160089, 'i' => 0.0748417205097158, 'y' => 0.0704808220784524, 'u' => 0.0609868784501114, 'h' => 0.0430650199853907, 'm' => 0.0400068101507982, 'ć' => 0.0348349103589176, 'ą' => 0.0274610147124689, 'w' => 0.0256503905946448, 't' => 0.0250470847044296, 'j' => 0.0235345812958219, 'k' => 0.022703799414214, 'ę' => 0.0207476471776729, 'z' => 0.0161726952514034, 'b' => 0.0152126335361779, 'n' => 0.0149427707140909, 'A' => 0.0128735869290317, 'l' => 0.0123868448232257, }; ${Lingua::Identify::languages{'suffixes2'}{'pl'}} = { 'ie' => 0.0682766998198835, 'ia' => 0.0508429480765084, 'ch' => 0.0467240060628344, 'ne' => 0.0280142355600442, 'ów' => 0.0258469429072749, 'ej' => 0.0240659465111553, 'em' => 0.0215658805770552, 'go' => 0.0204263841894682, 'ci' => 0.0196879222691259, 'ny' => 0.019514166523163, 'ki' => 0.0164656840158676, 'ym' => 0.0162562462863588, 'ać' => 0.0160832662356904, 'na' => 0.0150190122916676, 'iu' => 0.0148599947562998, 'ek' => 0.0147816495315576, 'ku' => 0.0137057601581177, 'ub' => 0.0130696900166464, 'ży' => 0.0120504263997034, 'mi' => 0.0119472589255379, }; ${Lingua::Identify::languages{'suffixes3'}{'pl'}} = { 'nia' => 0.0496637276121029, 'nie' => 0.0446386535091354, 'ych' => 0.0386651297795239, 'ego' => 0.0223090732450905, 'ści' => 0.0164517559562209, 'niu' => 0.0150658232585482, 'tów' => 0.0143395330039109, 'eży' => 0.0118641649242821, 'wać' => 0.0114232640167611, 'iem' => 0.011098569549982, 'ość' => 0.010541462201719, 'ane' => 0.00957934122910527, 'est' => 0.00946826154310191, 'nym' => 0.00850272273399577, 'ach' => 0.00840018763922343, 'nej' => 0.0083907885888693, 'eku' => 0.00761493970509198, 'cji' => 0.00737398223237699, 'ami' => 0.00693137240660975, 'ano' => 0.00615637798195553, }; ${Lingua::Identify::languages{'suffixes4'}{'pl'}} = { 'enia' => 0.026359262327146, 'ania' => 0.0256425650632598, 'nych' => 0.0217513867247779, 'enie' => 0.0210656462798815, 'ości' => 0.0166594590436594, 'anie' => 0.0143958589157046, 'ntów' => 0.0130966106035548, 'leży' => 0.0126988623838064, 'nego' => 0.00980862119266305, 'ować' => 0.00784333222951948, 'aniu' => 0.00767260068236334, 'niem' => 0.00728235714600643, 'eniu' => 0.00686584721777935, 'ność' => 0.00603188927590124, 'cych' => 0.00572513534227454, 'awki' => 0.00561631743309809, 'czne' => 0.00545590482560523, 'wych' => 0.00511256556044506, 'wego' => 0.00510037044993391, 'rzez' => 0.00475890735562162, }; ${Lingua::Identify::languages{'smallwords'}{'pl'}} = { 'w' => 0.022492418366537, 'i' => 0.014643000940113, 'z' => 0.0145478419959852, 'do' => 0.00903053597916641, 'na' => 0.00882778526403977, 'lub' => 0.00805264632719949, 'u' => 0.00739466287433568, 'pacjentów' => 0.00654301423367402, 'się' => 0.00598927525226684, 'nie' => 0.00542453800093533, 'jest' => 0.00507546247725032, 'należy' => 0.00479524568700454, 'mg' => 0.00364090553059958, 'W' => 0.00360360704998666, 'leku' => 0.00343289477333522, 'po' => 0.0033343885296652, 'o' => 0.00326313886798155, 'leczenia' => 0.00300348406063777, 'dawki' => 0.00240192651434222, 'może' => 0.00234741335036949, }; ${Lingua::Identify::languages{'ngrams1'}{'pl'}} = { 'a' => 0.08847144317907, 'e' => 0.0817378629416753, 'i' => 0.0788816639289244, 'o' => 0.0788332570514376, 'n' => 0.0701638848171394, 'z' => 0.0526224305102848, 'w' => 0.0465904118246504, 'r' => 0.0463012266264107, 'c' => 0.0431403831331888, 't' => 0.0405071842666247, 'y' => 0.037475908882604, 'p' => 0.0354103555354714, 's' => 0.0351515091786107, 'k' => 0.0326329022415863, 'd' => 0.0317744289751781, 'u' => 0.0275338512386122, 'l' => 0.0272970343589311, 'm' => 0.0244949428339898, 'j' => 0.0179239749215659, 'b' => 0.0150004314105948, }; ${Lingua::Identify::languages{'ngrams2'}{'pl'}} = { 'ni' => 0.0313628981983346, 'ie' => 0.0246100511220746, 'an' => 0.0158815605310811, 'cz' => 0.0155631183665973, 'st' => 0.0144339507776134, 'ow' => 0.0142469745138903, 'en' => 0.0135100344689657, 'po' => 0.0131851960486673, 'ze' => 0.0129470497384274, 'ia' => 0.0124885752591454, 'wa' => 0.0123205593251585, 'rz' => 0.011741355517376, 'na' => 0.0112749999371796, 'ch' => 0.0106173277768145, 'le' => 0.0106152718374456, 'ro' => 0.0102722726193878, 'za' => 0.00988998211561187, 'pr' => 0.00972767712431649, 'od' => 0.00972082399308656, 'ne' => 0.00872860480984731, }; ${Lingua::Identify::languages{'ngrams3'}{'pl'}} = { 'nie' => 0.0167214213787286, 'eni' => 0.0100474328874378, 'nia' => 0.00991430467420379, 'ani' => 0.009157640428463, 'owa' => 0.00778765981858045, 'wan' => 0.00699379597999125, 'rze' => 0.00690236401368618, 'ych' => 0.00670832657699703, 'prz' => 0.00628032686587298, 'cze' => 0.00529269811508437, 'zen' => 0.00506841046003496, 'czn' => 0.00454012173668956, 'ego' => 0.00438519156427694, 'rzy' => 0.00421677142962259, 'acj' => 0.00420982205513443, 'ści' => 0.00409209147557022, 'sto' => 0.00397449715825088, 'ecz' => 0.00381125498890144, 'pod' => 0.00371995928484123, 'wie' => 0.00368671129709394, }; ${Lingua::Identify::languages{'ngrams4'}{'pl'}} = { 'zeni' => 0.00612851226347246, 'owan' => 0.00606750489794906, 'ania' => 0.00569104481313394, 'prze' => 0.00549926014102787, 'enia' => 0.00542519157800759, 'wani' => 0.00529606758756376, 'enie' => 0.00481148062726814, 'nych' => 0.00396217212674321, 'ości' => 0.00383966140085476, 'acje' => 0.00383850407955757, 'pacj' => 0.00355413370367615, 'cjen' => 0.00355413370367615, 'lecz' => 0.00332118145971282, 'anie' => 0.00326083542064496, 'jent' => 0.00318395622018864, 'czen' => 0.00303929105803966, 'stos' => 0.00303333911993982, 'osow' => 0.0027274756342534, 'sowa' => 0.00272119303292579, 'toso' => 0.00270631318767618, }; Lingua-Identify-0.56/lib/Lingua/Identify/PT.pm000644 000765 000024 00000022655 11375521416 021161 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'pt'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'pt'}} = 'portuguese'; ${Lingua::Identify::languages{'_sets'}{'pt'}} = ''; =head1 NAME Lingua::Identify::PT - Meta-information on Portuguese =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'pt'}} = { 'd' => 0.152189963061362, 'p' => 0.086118381212766, 'c' => 0.0704642210218402, 'a' => 0.0693132622484047, 'e' => 0.0605035016932544, 's' => 0.0577541201639895, 'n' => 0.0439850900399591, 'm' => 0.0399051152182884, 'q' => 0.0388733228580943, 't' => 0.032869872341505, 'f' => 0.0285474351641798, 'o' => 0.0273230297647614, 'r' => 0.024734699873194, 'u' => 0.0228410157815857, 'i' => 0.0211077933949361, 'v' => 0.0175522687780753, 'P' => 0.0149247083593764, 'C' => 0.0143866897015737, 'l' => 0.0139106139813469, 'A' => 0.0131625792685247, }; ${Lingua::Identify::languages{'prefixes2'}{'pt'}} = { 'co' => 0.0520762382129872, 'qu' => 0.0484397382366963, 'pa' => 0.0287490447433222, 'pr' => 0.0279638871622692, 'po' => 0.0272331318855214, 'de' => 0.0255308543771616, 're' => 0.0254194766865909, 'se' => 0.0246335839392635, 'es' => 0.0232371356010517, 'pe' => 0.0177924941728883, 'ma' => 0.0172984624364954, 'in' => 0.0162640834884228, 'di' => 0.0159439185759241, 'te' => 0.01494776827412, 'do' => 0.0138608249374282, 'ca' => 0.0129396615956122, 'um' => 0.0119685069471375, 'en' => 0.0111980526915724, 'me' => 0.0106996099575331, 'nã' => 0.0105250079673645, }; ${Lingua::Identify::languages{'prefixes3'}{'pt'}} = { 'con' => 0.0262230277588871, 'par' => 0.0243294288315679, 'est' => 0.0173064362190245, 'com' => 0.0156401229584076, 'pro' => 0.0145108673481053, 'des' => 0.0120891766744834, 'mai' => 0.00952941107907793, 'pel' => 0.00941823386933002, 'pre' => 0.0092586407779177, 'qua' => 0.0081840771820881, 'ent' => 0.00710547892945313, 'res' => 0.00652045369267481, 'tra' => 0.00605422668405454, 'per' => 0.00595022219751618, 'pri' => 0.00577718025008596, 'for' => 0.00575745526125972, 'dis' => 0.00559158603703905, 'int' => 0.00550775483452752, 'rec' => 0.00523294987656192, 'por' => 0.00517332661488259, }; ${Lingua::Identify::languages{'prefixes4'}{'pt'}} = { 'cont' => 0.0109587909732807, 'cons' => 0.00778954939584209, 'part' => 0.00590514961245659, 'entr' => 0.00578753797549078, 'inte' => 0.00575303856198081, 'esta' => 0.00537093142143856, 'pres' => 0.00467780684092007, 'comp' => 0.00457639947393622, 'segu' => 0.0045047870549837, 'Port' => 0.00447394667017933, 'pass' => 0.0044551288082648, 'sobr' => 0.00422460999981182, 'outr' => 0.00416920185084126, 'prim' => 0.00398311410524203, 'cent' => 0.00395959177784887, 'muit' => 0.00387543411762, 'port' => 0.00368777821686122, 'dest' => 0.00367157394687927, 'conc' => 0.00365171064819171, 'aind' => 0.00348182717257443, }; ${Lingua::Identify::languages{'suffixes1'}{'pt'}} = { 'o' => 0.216256820777837, 's' => 0.193733666381006, 'a' => 0.185662191584141, 'e' => 0.169346843846771, 'm' => 0.0729633148538241, 'r' => 0.0567325453510262, 'l' => 0.0261588396927128, 'u' => 0.0211389593871231, 'á' => 0.0110054849132469, 'i' => 0.00965518013690475, 'z' => 0.00533756440617828, 'n' => 0.00432056270754084, 'é' => 0.00298499281527053, 't' => 0.0019995237685468, 'd' => 0.00169274348217138, 'ó' => 0.00155453026957765, 'y' => 0.00141661175466536, 'P' => 0.00134558961343916, 'g' => 0.00118556877241913, 'A' => 0.00114136412020365, }; ${Lingua::Identify::languages{'suffixes2'}{'pt'}} = { 'os' => 0.0749514314665271, 'as' => 0.059924599282024, 'ão' => 0.0560652208121865, 'es' => 0.0427811202602479, 'ue' => 0.0405004948769573, 'do' => 0.0394805924405155, 'ra' => 0.0380786861715348, 'to' => 0.0324416479035012, 'te' => 0.03082723132152, 'ia' => 0.0265979819608893, 'ar' => 0.0241342594567089, 'ma' => 0.0216153872173208, 'is' => 0.021573473418563, 'da' => 0.021241104347711, 'al' => 0.0210922000626501, 'or' => 0.0204370212083822, 'er' => 0.0195042553535686, 'em' => 0.0192443162682895, 'ta' => 0.0190726167593427, 'de' => 0.0177868925991997, }; ${Lingua::Identify::languages{'suffixes3'}{'pt'}} = { 'ção' => 0.0310741960879959, 'nte' => 0.0242200291597996, 'ara' => 0.0199388648480947, 'ado' => 0.0192614285432474, 'nto' => 0.0176362090825782, 'ais' => 0.0164306324950638, 'dos' => 0.0145027859624079, 'ões' => 0.0142458892035214, 'ndo' => 0.0139437104679377, 'tos' => 0.0132003687118578, 'ada' => 0.0119804453594492, 'res' => 0.0111111210741423, 'ade' => 0.0110062103558536, 'sta' => 0.0100508054811815, 'ica' => 0.00994948145411636, 'tes' => 0.00977463025696849, 'ram' => 0.00969975807767697, 'ria' => 0.00914606262004207, 'das' => 0.0083417471131619, 'cia' => 0.0081000937919755, }; ${Lingua::Identify::languages{'suffixes4'}{'pt'}} = { 'ação' => 0.0232592543029228, 'ente' => 0.0212838251083898, 'ento' => 0.0134500643491225, 'dade' => 0.0119947600682069, 'ções' => 0.0110679445186152, 'ados' => 0.00986930489356519, 'ntes' => 0.00853318501496601, 'ando' => 0.00801044640831531, 'eira' => 0.00767223452981231, 'ores' => 0.00756977776290877, 'ncia' => 0.00732252240196299, 'ntos' => 0.00697071931968707, 'eiro' => 0.00601672136254954, 'ante' => 0.00574071537823797, 'adas' => 0.00526815967782574, 'tado' => 0.00500104024982724, 'ição' => 0.00488081037029757, 'stas' => 0.00479194480716696, 'ista' => 0.00476476239962112, 'ário' => 0.00472869343576222, }; ${Lingua::Identify::languages{'smallwords'}{'pt'}} = { 'de' => 0.0430133932745471, 'a' => 0.029736597657219, 'que' => 0.022692889220427, 'o' => 0.0207708742295662, 'e' => 0.0199284859924512, 'do' => 0.0156198920778782, 'da' => 0.015303574055704, 'em' => 0.0101417776126596, 'um' => 0.00835471596601699, 'para' => 0.0083206960077917, 'os' => 0.00812491227469381, 'uma' => 0.00727463861679806, 'com' => 0.00709327393884203, 'não' => 0.00641918311096071, 'no' => 0.00629391756941592, 'dos' => 0.00607673055134186, 'por' => 0.00597106591288052, 'na' => 0.00578744825755857, 'é' => 0.00540601918950942, 'se' => 0.00521203783830426, }; ${Lingua::Identify::languages{'ngrams1'}{'pt'}} = { 'a' => 0.123692471695767, 'e' => 0.115792320843729, 'o' => 0.103691513568575, 's' => 0.0790681216774899, 'r' => 0.0689517187625363, 'i' => 0.0655977966466184, 'd' => 0.0541838415028691, 'n' => 0.0531845381588463, 't' => 0.0494705764332719, 'm' => 0.0424123904499732, 'u' => 0.038617666233213, 'c' => 0.0376233972964491, 'p' => 0.0292979715154297, 'l' => 0.0280546848975095, 'v' => 0.0134649196672882, 'g' => 0.0123998510663352, 'f' => 0.0103359030714442, 'b' => 0.00982907074486864, 'q' => 0.00940893621066843, 'ã' => 0.00841328943612837, }; ${Lingua::Identify::languages{'ngrams2'}{'pt'}} = { 'de' => 0.0251124618700597, 'es' => 0.0220355085577785, 'os' => 0.0190615342193951, 'do' => 0.0179899154968218, 'ra' => 0.0178542700785927, 'nt' => 0.0165629495993281, 'as' => 0.016162188099362, 're' => 0.0159152643057282, 'en' => 0.0154345629704474, 'co' => 0.0152963945341602, 'ar' => 0.0152355101247055, 'er' => 0.0148407905890364, 'te' => 0.0146870192779162, 'da' => 0.0146228815030707, 'or' => 0.0128594246708806, 'ta' => 0.0124956453040293, 'qu' => 0.011772003887838, 'an' => 0.0117099907597783, 'se' => 0.0115244825372957, 'ma' => 0.0112886467440751, }; ${Lingua::Identify::languages{'ngrams3'}{'pt'}} = { 'que' => 0.0115054080550109, 'ent' => 0.0113316484016254, 'nte' => 0.00881650312296134, 'est' => 0.0067832238653229, 'con' => 0.00656542453647385, 'ado' => 0.00648402824913844, 'com' => 0.00646492154169024, 'res' => 0.00617026698153607, 'ção' => 0.00594235737699693, 'par' => 0.00589326256368831, 'ara' => 0.00549793364814561, 'men' => 0.00541568055778562, 'dos' => 0.00517560435029741, 'por' => 0.00503517433456822, 'sta' => 0.00500604303173239, 'nto' => 0.00489988513698652, 'ica' => 0.00433062520745343, 'tra' => 0.00391601822385761, 'ida' => 0.00389494086945286, 'ant' => 0.00389297022249632, }; ${Lingua::Identify::languages{'ngrams4'}{'pt'}} = { 'ment' => 0.00613569444123265, 'ente' => 0.0058691436895115, 'ação' => 0.0049731935517061, 'para' => 0.00489812142045856, 'ento' => 0.00378041803782228, 'dade' => 0.00333032039378014, 'esta' => 0.00323167650703673, 'cont' => 0.00319112415042534, 'amen' => 0.00290290078938569, 'idad' => 0.00288145160902926, 'port' => 0.00258261537229242, 'ante' => 0.00258116308403912, 'pres' => 0.00253536014681964, 'ncia' => 0.00238599788569173, 'ções' => 0.00236622442255064, 'ados' => 0.00211654255746397, 'enta' => 0.00206828190166199, 'mais' => 0.00206224931968674, 'eira' => 0.00196997315836164, 'cons' => 0.00196047742747468, }; Lingua-Identify-0.56/lib/Lingua/Identify/RO.pm000644 000765 000024 00000022654 11375521416 021155 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'ro'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'ro'}} = 'romanian'; ${Lingua::Identify::languages{'_sets'}{'ro'}} = ''; =head1 NAME Lingua::Identify::RO - Meta-information on Romanian =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'ro'}} = { 'd' => 0.113234863963634, 'a' => 0.0925965969783178, 'c' => 0.0912987426630064, 'p' => 0.0794932518075709, 's' => 0.0636316953790753, 'î' => 0.0420777802574475, 'l' => 0.0376269416355564, 'm' => 0.030885032641361, 'i' => 0.0301002532874281, 'f' => 0.0276111463801998, 'ş' => 0.0265819630750697, 'r' => 0.0249071026648264, 'e' => 0.0244841624939603, 'v' => 0.0201572592090239, 't' => 0.0199106885561317, 'u' => 0.0196697513276465, 'n' => 0.0180273914428283, 'S' => 0.0179194896984168, 'A' => 0.0176755190875586, 'P' => 0.0164097319573333, }; ${Lingua::Identify::languages{'prefixes2'}{'ro'}} = { 'pr' => 0.0392673872285755, 'di' => 0.0306722416270733, 'co' => 0.028654890453407, 'de' => 0.0285755983822119, 're' => 0.022859099713227, 'pe' => 0.0225218211205349, 'in' => 0.0218805895882616, 'ac' => 0.0208302569350397, 'ca' => 0.020531475217493, 'în' => 0.0200890484434334, 'ma' => 0.0154337994519654, 'fo' => 0.0123034863804376, 'tr' => 0.0118518663227612, 'ce' => 0.011770850510888, 'al' => 0.0110939950045995, 'pa' => 0.0110233216367952, 'po' => 0.0107854454232099, 'su' => 0.010618817157655, 'lu' => 0.0100936508310441, 'as' => 0.00975235017676961, }; ${Lingua::Identify::languages{'prefixes3'}{'ro'}} = { 'pen' => 0.0190812425419362, 'pro' => 0.0173210761400229, 'con' => 0.0168948105810133, 'dec' => 0.0139679559774719, 'pre' => 0.0138362126843456, 'ace' => 0.0123851363163055, 'car' => 0.0121830877080974, 'pri' => 0.011523104480032, 'int' => 0.0093544071931838, 'par' => 0.00907255255163957, 'fos' => 0.00901744838576463, 'est' => 0.00848224125743906, 'com' => 0.00801100563202578, 'înt' => 0.00700202935341912, 'înc' => 0.00696656000526973, 'tre' => 0.00686775253542501, 'sta' => 0.00564976045522375, 'mar' => 0.00518422526076305, 'Min' => 0.00514812253139671, 'dis' => 0.00509681865282349, }; ${Lingua::Identify::languages{'prefixes4'}{'ro'}} = { 'pent' => 0.0218511417840626, 'decl' => 0.0118419043128463, 'aces' => 0.00909809364097226, 'inte' => 0.00844642933927616, 'part' => 0.00703189522364911, 'Mini' => 0.00590723680991419, 'poli' => 0.00573598032107255, 'guve' => 0.00563144713957181, 'Serb' => 0.00544758735934356, 'cons' => 0.00511100534231281, 'cont' => 0.00458611532456441, 'Koso' => 0.0044830648832268, 'afir' => 0.00439410047343893, 'asup' => 0.00437630759148136, 'priv' => 0.0042228439845973, 'Euro' => 0.00418799959076372, 'disc' => 0.00418503411043745, 'asem' => 0.00417613766945867, 'Turc' => 0.00412350039366751, 'acor' => 0.00398486418841476, }; ${Lingua::Identify::languages{'suffixes1'}{'ro'}} = { 'e' => 0.227849749900479, 'i' => 0.155672672516745, 'a' => 0.139453415719058, 'ă' => 0.0808820984128633, 'n' => 0.0704510445332918, 't' => 0.0698366131851775, 'l' => 0.0641293248178341, 'u' => 0.0512470360177925, 'r' => 0.0392392301434827, 's' => 0.013398497672084, 'c' => 0.0120882877269502, 'd' => 0.011608858196168, 'o' => 0.00741471519808921, 'm' => 0.00709019159873306, 'E' => 0.00619407376637762, 'A' => 0.0048038146667359, 'v' => 0.00372033854301885, 'H' => 0.00297479966076466, 'p' => 0.00296008792426052, 'P' => 0.00267710334562196, }; ${Lingua::Identify::languages{'suffixes2'}{'ro'}} = { 'ul' => 0.0541373201622728, 're' => 0.0459967616465315, 'le' => 0.0459234490812733, 'te' => 0.0446771354718838, 'ea' => 0.042287031293583, 'at' => 0.0404925917080052, 'ia' => 0.0341802652883904, 'ii' => 0.0330616759137867, 'or' => 0.0329139052744381, 'ui' => 0.0290586952998054, 'ei' => 0.0270889928629072, 'in' => 0.0253002808214902, 'ru' => 0.0238913049579341, 'ie' => 0.0238741223254517, 'ri' => 0.0190022732622774, 'tă' => 0.0187359424588003, 'ne' => 0.0152072025013331, 'ta' => 0.0127729962329942, 'st' => 0.0121836319388482, 'ni' => 0.0109029530644939, }; ${Lingua::Identify::languages{'suffixes3'}{'ro'}} = { 'are' => 0.0308410754797732, 'rea' => 0.0253467711493474, 'lor' => 0.024832942596387, 'lui' => 0.0246221087675311, 'tru' => 0.0227189231567505, 'ile' => 0.0201743987132824, 'ele' => 0.0175787439101217, 'ate' => 0.0175673816079678, 'tul' => 0.0143922493949574, 'iei' => 0.0142458019449736, 'rii' => 0.0141643721128706, 'rul' => 0.0127308283244518, 'ste' => 0.0111167501795875, 'rat' => 0.010749369076611, 'ale' => 0.00953297150713363, 'ţia' => 0.00943134202675696, 'tea' => 0.00924133908518318, 'nia' => 0.00909867906925071, 'ţii' => 0.00907153579188302, 'ost' => 0.00795171779071397, }; ${Lingua::Identify::languages{'suffixes4'}{'ro'}} = { 'ului' => 0.0277711108075075, 'ntru' => 0.0224029894284417, 'ilor' => 0.0197148670154825, 'area' => 0.018949786021025, 'rile' => 0.00960117568430809, 'elor' => 0.0093131625686339, 'arat' => 0.00844912322161132, 'tate' => 0.00777856960870834, 'tele' => 0.00747061712348747, 'ării' => 0.00729337828307258, 'erea' => 0.00657556097939229, 'iile' => 0.00630453325259119, 'ează' => 0.00613689484936545, 'ţiei' => 0.00607116877937826, 'ntre' => 0.00541538506984318, 'aţia' => 0.00517906661595667, 'atea' => 0.00478027922502317, 'rmat' => 0.00465916601740633, 'rare' => 0.00463848815269126, 'upra' => 0.00440660066981512, }; ${Lingua::Identify::languages{'smallwords'}{'ro'}} = { 'de' => 0.0432881167885008, 'a' => 0.0331486595169223, 'în' => 0.0182858865522393, 'şi' => 0.0181296845642699, 'la' => 0.0132621298363229, 'din' => 0.0125443526487175, 'pentru' => 0.0100687049488003, 'că' => 0.0092176262834967, 'să' => 0.00771063598818607, 'au' => 0.00768671008193256, 'cu' => 0.00766756935692975, 'o' => 0.00663943898535017, 'care' => 0.00628704456610198, 'al' => 0.00587141739461236, 'un' => 0.00574905347405868, 'va' => 0.00535940300078716, 'este' => 0.0043281964412607, 'se' => 0.00432136046804541, 'fost' => 0.00418156481579274, 'pe' => 0.00406774586175816, }; ${Lingua::Identify::languages{'ngrams1'}{'ro'}} = { 'e' => 0.115288210940408, 'i' => 0.111130635891002, 'a' => 0.109388368850512, 'r' => 0.0806649112410775, 'n' => 0.0661937409069729, 't' => 0.0650184945281867, 'u' => 0.0567355264799251, 'l' => 0.0501939983633773, 'o' => 0.0477119953104436, 'c' => 0.0467820994712594, 's' => 0.0392115728458384, 'd' => 0.0351481378134658, 'p' => 0.031199144558989, 'm' => 0.0265042495721028, 'ă' => 0.0213884314100122, 'v' => 0.0130792966477507, 'b' => 0.012191733739549, 'ţ' => 0.0114094392210019, 'f' => 0.0107259466053162, 'g' => 0.0105239064888864, }; ${Lingua::Identify::languages{'ngrams2'}{'ro'}} = { 're' => 0.0239704362439386, 'de' => 0.0209827568401516, 'in' => 0.0190401341770312, 'ri' => 0.0182948495303463, 'ar' => 0.0181898279608505, 'te' => 0.0162955704942085, 'at' => 0.0160432423548673, 'er' => 0.0148190040767715, 'ul' => 0.0146865295005128, 'nt' => 0.0137319387083066, 'or' => 0.0134682792943619, 'st' => 0.0123750416055845, 'en' => 0.0119934632364165, 'al' => 0.011156238654331, 'ni' => 0.0111501584582023, 'tr' => 0.0110516777057541, 'le' => 0.0109982456791685, 'ra' => 0.0109542103193273, 'an' => 0.0108429243053353, 'pr' => 0.0108154712985724, }; ${Lingua::Identify::languages{'ngrams3'}{'ro'}} = { 'are' => 0.00952786705650192, 'ent' => 0.00786501993262935, 'din' => 0.00617256062304975, 'ntr' => 0.00612562589392517, 'tru' => 0.00569431797167062, 'rea' => 0.00549124626336352, 'ate' => 0.00530514596583713, 'lui' => 0.00514725332347041, 'est' => 0.00508053812247038, 'lor' => 0.00490895130726681, 'tat' => 0.0046797787396211, 'ulu' => 0.00443995514865785, 'pre' => 0.00439477608271749, 'aţi' => 0.00435252312208414, 'nte' => 0.00428276477156481, 'pro' => 0.00421394277474373, 'con' => 0.00420001451348232, 'pen' => 0.00417414774256828, 'ele' => 0.00410520870153492, 'pri' => 0.00405921032610858, }; ${Lingua::Identify::languages{'ngrams4'}{'ro'}} = { 'ului' => 0.00556723302516876, 'entr' => 0.00491027422660607, 'ntru' => 0.00474022398676315, 'pent' => 0.00441718737197504, 'ilor' => 0.00395219705351184, 'area' => 0.00377861635886249, 'inte' => 0.00341983388915931, 'care' => 0.00333583848522306, 'este' => 0.00309253130814677, 'tate' => 0.00293763260351477, 'clar' => 0.00245572552243744, 'ment' => 0.00242792319083682, 'ecla' => 0.00239335415419055, 'decl' => 0.00237996784638285, 'mini' => 0.0023215682397932, 'itat' => 0.00230582829544788, 'fost' => 0.00214887015884546, 'cest' => 0.00213813169214258, 'aces' => 0.00213268890764933, 'part' => 0.00207355378964168, }; Lingua-Identify-0.56/lib/Lingua/Identify/RU.pm000644 000765 000024 00000024032 11746562302 021154 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'ru'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'ru'}} = 'russian'; ${Lingua::Identify::languages{'_sets'}{'ru'}} = ''; =head1 NAME Lingua::Identify::RU - Meta-information on Russian =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'ru'}} = { 'п' => 0.105443715787223, 'н' => 0.0933222626049689, 'с' => 0.0822896520178391, 'в' => 0.0637594403647374, 'о' => 0.0625065255977447, 'т' => 0.0456617826192879, 'к' => 0.0428427243935545, 'д' => 0.0359417494071028, 'з' => 0.0341667868205299, 'м' => 0.0322028926421089, 'Г' => 0.0316907885905206, 'б' => 0.0316211822145766, 'ч' => 0.0303384361436079, 'у' => 0.0242429063645087, 'р' => 0.0242130750605327, 'г' => 0.0214536794427512, 'и' => 0.0199372548239704, 'е' => 0.0180628545574775, 'Д' => 0.0139511064927833, 'л' => 0.0136030746130631, }; ${Lingua::Identify::languages{'prefixes2'}{'ru'}} = { 'по' => 0.0536131070828483, 'пр' => 0.0394428228169676, 'Га' => 0.0251979882898744, 'чт' => 0.0219292460674737, 'за' => 0.0216310450226231, 'на' => 0.0208396653266735, 'не' => 0.0198361041180417, 'ко' => 0.0190275205156584, 'ст' => 0.0162748954862684, 'ра' => 0.0155695353224872, 'ка' => 0.0155580660515314, 'вс' => 0.0148355019813166, 'во' => 0.0127767678447519, 'от' => 0.0125875248739814, 'эт' => 0.0115724943943938, 'го' => 0.0111366620980737, 'ск' => 0.0110850503787727, 'бы' => 0.0110391732949495, 'то' => 0.0110219693885158, 'об' => 0.0100126735444062, }; ${Lingua::Identify::languages{'prefixes3'}{'ru'}} = { 'Гар' => 0.0293250602684717, 'про' => 0.0229993889213454, 'при' => 0.0116709307869484, 'под' => 0.00946164642048927, 'Гер' => 0.00919304041848815, 'ска' => 0.00869611931478609, 'пер' => 0.00864911326443589, 'пос' => 0.00815219216073383, 'раз' => 0.00734637415473049, 'сто' => 0.00721207115372993, 'был' => 0.0069166045515287, 'пре' => 0.00621151379627578, 'гол' => 0.00600334414472491, 'пол' => 0.00600334414472491, 'кот' => 0.00594962294432469, 'зна' => 0.0057347381427238, 'ста' => 0.00531839883962207, 'рас' => 0.00517738068857149, 'бол' => 0.00496249588697059, 'Дум' => 0.00461330808436915, }; ${Lingua::Identify::languages{'prefixes4'}{'ru'}} = { 'Гарр' => 0.0336338418862691, 'Герм' => 0.0105486207427955, 'сказ' => 0.00896131915549391, 'пере' => 0.00662659885960857, 'голо' => 0.00659577746956388, 'кото' => 0.00642625982431808, 'Думб' => 0.00483125288950532, 'Сири' => 0.0047233780243489, 'Кхем' => 0.00461550315919248, 'отве' => 0.00409924487594391, 'очен' => 0.00375250423794113, 'боль' => 0.00369856680536292, 'когд' => 0.00366774541531823, 'Уэсл' => 0.00365233472029589, 'оста' => 0.00365233472029589, 'чтоб' => 0.00352134381260595, 'двер' => 0.00349052242256126, 'глаз' => 0.00345970103251657, 'спро' => 0.00344429033749422, 'Огри' => 0.00342887964247188, }; ${Lingua::Identify::languages{'suffixes1'}{'ru'}} = { 'о' => 0.14413066176105, 'а' => 0.112623676229304, 'е' => 0.0989260677173967, 'и' => 0.0976035399990056, 'ь' => 0.0711330979963208, 'я' => 0.0600009943817432, 'л' => 0.0563615572018098, 'м' => 0.0511460249589818, 'й' => 0.0419430219261174, 'у' => 0.0360711977328096, 'т' => 0.0342862825038532, 'ы' => 0.0317307214239547, 'н' => 0.0271068463183016, 'к' => 0.0208869885148909, 'х' => 0.0171878884303684, 'ю' => 0.0144732262715657, 'с' => 0.0132948839059315, 'з' => 0.0106299408342863, 'р' => 0.0106050812907075, 'в' => 0.0104956992989609, }; ${Lingua::Identify::languages{'suffixes2'}{'ru'}} = { 'то' => 0.0372753443668353, 'ть' => 0.0347062129397057, 'но' => 0.0334560552363257, 'ся' => 0.0315292066659785, 'ри' => 0.0293500326876097, 'ли' => 0.0279221002649417, 'го' => 0.0277328562089254, 'ла' => 0.0271307160306919, 'сь' => 0.025926435674225, 'ал' => 0.0231107135074379, 'ом' => 0.0219637798346122, 'ой' => 0.0212641502941885, 'на' => 0.0192226083565587, 'ил' => 0.0167509662916194, 'ак' => 0.015305829863859, 'ет' => 0.0138262854259138, 'ми' => 0.0127940451203707, 'ло' => 0.0122377822890502, 'да' => 0.0119395795341155, 'ем' => 0.0113489086926103, }; ${Lingua::Identify::languages{'suffixes3'}{'ru'}} = { 'рри' => 0.0293789787531226, 'лся' => 0.0172983426898386, 'ать' => 0.0168081334443579, 'ого' => 0.0128864594805125, 'ала' => 0.010139944666792, 'его' => 0.0100190711542077, 'ись' => 0.00985119127561847, 'она' => 0.00893120954094926, 'ить' => 0.00889091837008783, 'сли' => 0.0079776518305622, 'ние' => 0.00782991753740364, 'ной' => 0.00782991753740364, 'ила' => 0.00767546804910151, 'нно' => 0.00766203765881437, 'ами' => 0.00765532246367079, 'тся' => 0.00757474012194794, 'ась' => 0.00748072738993795, 'али' => 0.00747401219479438, 'ься' => 0.00745386660936367, 'гда' => 0.00700394853474442, }; ${Lingua::Identify::languages{'suffixes4'}{'ru'}} = { 'арри' => 0.0336186902349379, 'лись' => 0.00893827198545219, 'лась' => 0.00844512594487552, 'ться' => 0.00791345286987879, 'иона' => 0.00779787176661864, 'льно' => 0.00741260142241811, 'ался' => 0.00714291218147774, 'огда' => 0.00680387427858128, 'лось' => 0.00678846346481326, 'лько' => 0.00657271207206097, 'ился' => 0.00651877422387289, 'азал' => 0.00587152004561601, 'енно' => 0.00554789295648757, 'ного' => 0.00554018754960356, 'ение' => 0.00544001726011142, 'вать' => 0.00462324413040631, 'ридж' => 0.0046155387235223, 'чень' => 0.00418403593801771, 'ется' => 0.0041763305311337, 'улся' => 0.00409927646229359, }; ${Lingua::Identify::languages{'smallwords'}{'ru'}} = { '—' => 0.0664472487446221, 'и' => 0.0251752452587538, 'в' => 0.0189692152142921, 'не' => 0.0180892873006942, 'на' => 0.0150687655203631, 'что' => 0.011887487678894, 'с' => 0.0111725462490957, 'Гарри' => 0.0104787569326051, 'он' => 0.0064556249814919, 'к' => 0.0056729967891092, 'как' => 0.00522034156432569, 'его' => 0.00490729028737261, 'за' => 0.00429387900145104, 'я' => 0.00423042266152811, 'это' => 0.00411620124966685, 'по' => 0.00399774941514407, 'у' => 0.00376930659142155, 'от' => 0.00374392405545238, 'из' => 0.00360432010762195, 'а' => 0.00356201588100667, }; ${Lingua::Identify::languages{'ngrams1'}{'ru'}} = { 'о' => 0.111596141182221, 'а' => 0.0811936944582109, 'е' => 0.074256274652238, 'и' => 0.0690644539338353, 'н' => 0.0646618442759286, 'т' => 0.0576183477145127, 'р' => 0.0545803094386248, 'с' => 0.0541823094522027, 'л' => 0.053496629305019, 'в' => 0.0409617512678294, 'м' => 0.0323432270415957, 'к' => 0.0318484850542095, 'д' => 0.0308140245351292, 'у' => 0.0296369968567336, 'п' => 0.0287849883570153, 'г' => 0.0214639950033605, 'ь' => 0.0198295643554946, 'я' => 0.0196004385637377, 'з' => 0.0183979524640357, 'ы' => 0.0170418672224523, }; ${Lingua::Identify::languages{'ngrams2'}{'ru'}} = { 'то' => 0.0174108367124354, 'на' => 0.0153425540778453, 'но' => 0.0142696847962386, 'по' => 0.0141545476050418, 'ст' => 0.0139033391878851, 'ро' => 0.0130586508851957, 'не' => 0.0126221762603859, 'ри' => 0.0126033356290991, 'ал' => 0.0125227395952614, 'он' => 0.0115995486622105, 'ни' => 0.0108815112698376, 'ли' => 0.0104921382232447, 'ко' => 0.0104439899432897, 'ер' => 0.0100274026515048, 'пр' => 0.0100075153184799, 'ра' => 0.00990598524987911, 'ла' => 0.00964116971012642, 'от' => 0.00956685388671756, 'го' => 0.00931459876782271, 'ол' => 0.00922144231312711, }; ${Lingua::Identify::languages{'ngrams3'}{'ru'}} = { 'про' => 0.00648723037821627, 'рри' => 0.00586276982065652, 'гар' => 0.0058415567231598, 'арр' => 0.00580841125832117, 'что' => 0.00574609778442455, 'ать' => 0.00408484708671251, 'его' => 0.00407689217515124, 'ост' => 0.00387404193033884, 'ста' => 0.00367384332271353, 'она' => 0.00361815894178463, 'лся' => 0.00341928615275286, 'вер' => 0.00331587230245634, 'нул' => 0.0032734461074629, 'аза' => 0.00325753628434036, 'оль' => 0.003246929735592, 'ени' => 0.00315147079685675, 'сто' => 0.00313953842951484, 'ого' => 0.0031116962390504, 'при' => 0.00288763289674127, 'льн' => 0.00287570052939936, }; ${Lingua::Identify::languages{'ngrams4'}{'ru'}} = { 'арри' => 0.00754994205617792, 'гарр' => 0.00754649301914905, 'каза' => 0.00322312510347111, 'азал' => 0.00245916340157828, 'герм' => 0.00236776392031345, 'ерми' => 0.00236431488328459, 'рмио' => 0.002353967772198, 'мион' => 0.002353967772198, 'сказ' => 0.00234017162408256, 'льно' => 0.00220565917995696, 'лись' => 0.00200389051376856, 'прос' => 0.00193835881022019, 'лась' => 0.00189352132884499, 'тель' => 0.00188662325478726, 'енно' => 0.00183661221786877, 'ться' => 0.00178315214392142, 'иона' => 0.00177452955134926, 'отор' => 0.0017262430329452, 'кото' => 0.00165036421831025, 'пере' => 0.00162966999613708, }; Lingua-Identify-0.56/lib/Lingua/Identify/SL.pm000644 000765 000024 00000022623 11375521416 021147 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'sl'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'sl'}} = 'slovene'; ${Lingua::Identify::languages{'_sets'}{'sl'}} = ''; =head1 NAME Lingua::Identify::SL - Meta-information on Slovene =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'sl'}} = { 'p' => 0.110917025633493, 's' => 0.0687880934123531, 'z' => 0.0647850590362826, 'n' => 0.0583901679641175, 'i' => 0.0514118220483271, 'k' => 0.049676865655927, 'o' => 0.0480949634390554, 'd' => 0.0411582194971503, 'm' => 0.0401916156100878, 't' => 0.0374993002137109, 'b' => 0.0344798186770264, 'v' => 0.0328671001832028, 'P' => 0.029953421197387, 'a' => 0.0286940626792801, 'j' => 0.0259154617074375, 'u' => 0.025247262102194, 'r' => 0.0233469250234846, 'l' => 0.0187537589437824, 'Z' => 0.0152104006989132, 'N' => 0.0135714883696803, }; ${Lingua::Identify::languages{'prefixes2'}{'sl'}} = { 'pr' => 0.0628870818277725, 'po' => 0.0502317965550948, 'zd' => 0.0328408622488871, 'bo' => 0.0215632087833079, 'ko' => 0.0196075940603681, 'al' => 0.0195148627354089, 'od' => 0.019192208536236, 'na' => 0.0182267865229628, 'in' => 0.0171051185785939, 'te' => 0.0160380731955024, 'za' => 0.0142450558642718, 'up' => 0.0137001005436215, 'ne' => 0.0125358778131413, 'ra' => 0.0123211984444003, 'Pr' => 0.0111684083430246, 'st' => 0.0110934611077837, 'me' => 0.0106806161678971, 'do' => 0.0105504112253174, 'mo' => 0.0104195711366764, 'ka' => 0.0101839319479104, }; ${Lingua::Identify::languages{'prefixes3'}{'sl'}} = { 'zdr' => 0.036468309784491, 'pre' => 0.0256857908240799, 'bol' => 0.0220866701065176, 'pri' => 0.0152008425278663, 'upo' => 0.0142014009146234, 'odm' => 0.0121120448971225, 'pov' => 0.00955901582567024, 'raz' => 0.00915683599686173, 'uči' => 0.00823443761971741, 'lah' => 0.00722580736363701, 'pog' => 0.00710564818807031, 'pro' => 0.00677132295252299, 'inj' => 0.00674941157344906, 'bil' => 0.00661794329900551, 'pos' => 0.00659320464521236, 'kon' => 0.00652817732667039, 'pod' => 0.00646173637076881, 'tre' => 0.0054015083510627, 'ZDR' => 0.00519158320316089, 'gle' => 0.00512514224725931, }; ${Lingua::Identify::languages{'prefixes4'}{'sl'}} = { 'zdra' => 0.0395788481617341, 'boln' => 0.0170798414443716, 'upor' => 0.0145804651334695, 'odme' => 0.013145980208266, 'učin' => 0.00896803323877899, 'lahk' => 0.00786926244529246, 'pove' => 0.00628925006390888, 'bole' => 0.00587730726466901, 'ZDRA' => 0.00565555114096606, 'prim' => 0.00549616392705455, 'Zdra' => 0.00504495181424229, 'drug' => 0.00498258290445083, 'treb' => 0.004864004977193, 'kate' => 0.00447362180109092, 'štud' => 0.00422260618884382, 'glej' => 0.00418795679451523, 'pogl' => 0.00409324845001709, 'potr' => 0.00398314037470625, 'pres' => 0.00395003095345893, 'inje' => 0.00394002112843068, }; ${Lingua::Identify::languages{'suffixes1'}{'sl'}} = { 'a' => 0.16645962157578, 'e' => 0.16090799993413, 'i' => 0.156150422804259, 'o' => 0.129484318778767, 'n' => 0.0450423421791504, 'h' => 0.0442961564746276, 'm' => 0.0386199990119472, 't' => 0.0266892615128735, 'u' => 0.0221709784192802, 'v' => 0.0211561658611292, 'A' => 0.0175832228635417, 'l' => 0.0169981103490296, 'd' => 0.0149653975677434, 'k' => 0.014034981185828, 'j' => 0.0123537476019135, 'r' => 0.0115869774641625, 'g' => 0.0105706210735194, 'I' => 0.00970195799128867, 's' => 0.00703267161241982, 'E' => 0.00651034161925386, }; ${Lingua::Identify::languages{'suffixes2'}{'sl'}} = { 'je' => 0.0447367365659608, 'ih' => 0.0427117556476342, 'no' => 0.037054413484425, 'li' => 0.033213251943657, 'ni' => 0.0305593237926027, 'ja' => 0.0288042554518839, 'ti' => 0.0286011844324101, 'te' => 0.0279531396116442, 'na' => 0.0274177705603041, 'ne' => 0.0251827161735563, 'ri' => 0.0226039052021193, 'jo' => 0.0198360281091032, 'om' => 0.0194267031388159, 'la' => 0.0187302777616863, 'lo' => 0.0177811276361833, 'ov' => 0.0169446532801062, 'ko' => 0.0169045483452258, 'ga' => 0.0157236808181916, 'em' => 0.0150839116189089, 'ka' => 0.0134408824613481, }; ${Lingua::Identify::languages{'suffixes3'}{'sl'}} = { 'nje' => 0.0258047701572323, 'nih' => 0.0172197052805727, 'ega' => 0.0160157281901321, 'ila' => 0.0149923476632575, 'nja' => 0.0145199637107141, 'sti' => 0.0139229327182191, 'ilo' => 0.0129009686349804, 'kih' => 0.0125836852605583, 'ije' => 0.0114703605628097, 'ost' => 0.011150952522934, 'ina' => 0.0102394710432886, 'jte' => 0.00980249818164048, 'ijo' => 0.0097026389053157, 'ija' => 0.00967076892350992, 'kov' => 0.00858081554575219, 'ati' => 0.00825220062224369, 'ite' => 0.00819200176772166, 'ali' => 0.00758859677886553, 'hko' => 0.00731876426624324, 'ili' => 0.00729326828079862, }; ${Lingua::Identify::languages{'suffixes4'}{'sl'}} = { 'anje' => 0.0174249765443378, 'osti' => 0.0126360158805233, 'vila' => 0.0114959705560186, 'nega' => 0.0103828857628367, 'anja' => 0.00974738752451483, 'vilo' => 0.00902099378059057, 'ikih' => 0.00883997307028071, 'nost' => 0.00873675275035934, 'ahko' => 0.00794950523573518, 'cije' => 0.00743186303433847, 'enje' => 0.00699279152422519, 'njem' => 0.0054968671862603, 'enja' => 0.00528657504194288, 'ikov' => 0.0050747422958356, 'ejte' => 0.00465955011346532, 'erek' => 0.00444232526109349, 'reba' => 0.00440226961455684, 'enih' => 0.00421046469171788, 'čnih' => 0.00387230259884116, 'rabo' => 0.00386691049257661, }; ${Lingua::Identify::languages{'smallwords'}{'sl'}} = { 'in' => 0.0155350157275239, 'je' => 0.0134788257215537, 'v' => 0.0133083494027731, 'z' => 0.011350761165928, 'na' => 0.0102137708025353, 'za' => 0.00942062250583574, 'pri' => 0.00896011973794322, 'so' => 0.00827315796184003, 'ali' => 0.00822764945301301, 's' => 0.00801997173415952, 'se' => 0.00756271957404036, 'ki' => 0.00714736413633338, 'zdravila' => 0.00468412580141026, 'bolnikih' => 0.00374614486947546, 'lahko' => 0.00367282560525414, 'po' => 0.00362009352359743, 'do' => 0.00349657042820979, 'mg' => 0.003434086523233, 'ne' => 0.00315959075570491, 'da' => 0.00315128364695077, }; ${Lingua::Identify::languages{'ngrams1'}{'sl'}} = { 'i' => 0.101115470232767, 'a' => 0.0980768603003265, 'e' => 0.0940033766263422, 'o' => 0.0903964745949793, 'n' => 0.0755313104986257, 'r' => 0.058611193167373, 't' => 0.0470616291393945, 'l' => 0.0457850624084103, 'v' => 0.0393998166478067, 's' => 0.0380983248247645, 'p' => 0.0374853283671829, 'j' => 0.0358372973610117, 'k' => 0.034844153047784, 'd' => 0.0326297596272364, 'm' => 0.0309228732423991, 'z' => 0.0264258238104741, 'u' => 0.0202495468055439, 'b' => 0.0193269163817836, 'h' => 0.013883437083359, 'g' => 0.0130546375706687, }; ${Lingua::Identify::languages{'ngrams2'}{'sl'}} = { 'ra' => 0.0210247767064382, 'je' => 0.0190846204742305, 'ni' => 0.0179959310316738, 'in' => 0.017704530285119, 'po' => 0.0162215984646702, 'na' => 0.0145717787664635, 'pr' => 0.0142146870813593, 'an' => 0.0133894386191707, 'en' => 0.0131831748769214, 'te' => 0.0129734282572233, 're' => 0.0126284298965943, 'av' => 0.0125864418740164, 'no' => 0.0125262654914277, 'st' => 0.0120196035692141, 'nj' => 0.0119188903629845, 'li' => 0.0117300410079796, 'ne' => 0.0113289296217856, 'vi' => 0.0112746547815409, 'ko' => 0.0111315665663501, 'ri' => 0.0109857694464769, }; ${Lingua::Identify::languages{'ngrams3'}{'sl'}} = { 'rav' => 0.00909422847661127, 'dra' => 0.00819700848813725, 'anj' => 0.00803434342182472, 'zdr' => 0.00796570233889723, 'avi' => 0.00778230194545036, 'nje' => 0.00731659126475485, 'pri' => 0.006797850580756, 'ost' => 0.00664162061596792, 'pre' => 0.00647442714487892, 'vil' => 0.00592410679599155, 'ali' => 0.0051051805427317, 'jen' => 0.00460538765766595, 'bol' => 0.00459835671340775, 'nik' => 0.0044451059622884, 'ora' => 0.00441376463449338, 'por' => 0.0043516778216371, 'nos' => 0.00385593666716082, 'ila' => 0.00382054360877633, 'lni' => 0.00378515055039185, 'lje' => 0.00376048266121479, }; ${Lingua::Identify::languages{'ngrams4'}{'sl'}} = { 'zdra' => 0.00963555851217508, 'drav' => 0.00961399982811716, 'ravi' => 0.00660502349603247, 'avil' => 0.00640336913834779, 'anje' => 0.0046735413937, 'ljen' => 0.00418986425068614, 'nost' => 0.00404129998571554, 'pora' => 0.00379168855533058, 'orab' => 0.0036850683967584, 'upor' => 0.00366922936357298, 'olni' => 0.00364459086750678, 'lnik' => 0.00357976815761833, 'boln' => 0.00356627564786779, 'jenj' => 0.00321620368292719, 'avlj' => 0.0031777793616811, 'vila' => 0.00286583840255724, 'vlje' => 0.00283342704761301, 'osti' => 0.00273413977477481, 'dmer' => 0.00271052788271137, 'odme' => 0.0027081813592765, }; Lingua-Identify-0.56/lib/Lingua/Identify/SQ.pm000644 000765 000024 00000022741 11375521416 021155 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'sq'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'sq'}} = 'albanian'; ${Lingua::Identify::languages{'_sets'}{'sq'}} = ''; =head1 NAME Lingua::Identify::SQ - Meta-information on Albanian =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'sq'}} = { 't' => 0.128146844741877, 'n' => 0.0941279410774896, 'p' => 0.0917188296266382, 's' => 0.064459957115797, 'd' => 0.0628652609109528, 'm' => 0.0602503822078634, 'k' => 0.0482466514745051, 'B' => 0.0299328852607186, 'S' => 0.0251891808732608, 'v' => 0.0246944740915953, 'a' => 0.02305362448438, 'r' => 0.0206829742021711, 'K' => 0.0200507687426082, 'b' => 0.0180012692185652, 'A' => 0.0176243497658676, 'M' => 0.0174536783300161, 'P' => 0.0172974298324055, 'q' => 0.0168368573379102, 'g' => 0.0165080143460159, 'l' => 0.0159368659916732, }; ${Lingua::Identify::languages{'prefixes2'}{'sq'}} = { 'pë' => 0.0495515941789772, 'dh' => 0.0307369310835515, 'nj' => 0.029345197456116, 'sh' => 0.0262141094030175, 'pa' => 0.0226097318818289, 'pr' => 0.0225228266822721, 'ma' => 0.0153528351102008, 'ng' => 0.0145319248438832, 'kr' => 0.0145094170224153, 've' => 0.0144337657335924, 'ko' => 0.014092397107995, 'gj' => 0.0135078189670911, 'mb' => 0.0133358842197664, 'th' => 0.0131664503414937, 'nd' => 0.0127206704329755, 'mi' => 0.0112314029125121, 'Ko' => 0.010706845628856, 'Ma' => 0.00923758506080863, 'kë' => 0.00848919999699896, 'ku' => 0.00817784180002551, }; ${Lingua::Identify::languages{'prefixes3'}{'sq'}} = { 'për' => 0.0251484076629181, 'par' => 0.0137337618705484, 'mar' => 0.0105017143740902, 'ven' => 0.0103282874352558, 'pre' => 0.00981176044859768, 'sht' => 0.00953322627410616, 'pro' => 0.00952571861441636, 'Kos' => 0.0072696668776319, 'Ser' => 0.00669458014539334, 'kry' => 0.00667806329407578, 'shk' => 0.00611123498749599, 'zgj' => 0.00591978966540613, 'tje' => 0.00586423298370162, 'ësh' => 0.00582969774912855, 'kon' => 0.00570281830037095, 'ndë' => 0.00568404915114646, 'kom' => 0.00566903383176686, 'Tur' => 0.00551587757409497, 'kër' => 0.00548434540339782, 'duk' => 0.00542803795572433, }; ${Lingua::Identify::languages{'prefixes4'}{'sq'}} = { 'vend' => 0.0107310449864009, 'krye' => 0.00723608724872804, 'Koso' => 0.00720712586362425, 'Serb' => 0.0071739054512993, 'ësht' => 0.00661171385810795, 'ndër' => 0.00630080487096425, 'përf' => 0.00602737532336664, 'poli' => 0.00591408519928414, 'bash' => 0.00581527576775354, 'marr' => 0.00575905660843441, 'kërk' => 0.00563213759724424, 'qeve' => 0.00559380635225392, 'Turq' => 0.00526756486711409, 'tjer' => 0.00524541792556413, 'gjit' => 0.0051951614043546, 'përp' => 0.00519175418257768, 'zgje' => 0.00476755507135148, 'mini' => 0.00443620275354628, 'luft' => 0.00438850164866944, 'Kroa' => 0.00432717165668493, }; ${Lingua::Identify::languages{'suffixes1'}{'sq'}} = { 'ë' => 0.265154866831068, 'e' => 0.139949525454636, 'n' => 0.0893735116356907, 't' => 0.0860875687196316, 'i' => 0.0828223042091616, 'a' => 0.071715115048878, 'r' => 0.0694659777711949, 's' => 0.0385873475087811, 'o' => 0.0198781993821485, 'k' => 0.0128759141778843, 'm' => 0.0125469832610711, 'j' => 0.0121463992644258, 'u' => 0.00902059376839416, 'l' => 0.00790636433234718, 'E' => 0.00768322990924584, 'h' => 0.00744759226416038, 'd' => 0.00667239250114453, 'ç' => 0.005944801160299, 'A' => 0.00583131037613539, 'H' => 0.00455934798582712, }; ${Lingua::Identify::languages{'suffixes2'}{'sq'}} = { 'it' => 0.0444436538135043, 'in' => 0.0392947738462614, 'jë' => 0.035162686880187, 'ër' => 0.0350403471452397, 'et' => 0.0312590506436568, 'të' => 0.0294882453985279, 'he' => 0.0293939938680329, 'në' => 0.0279165543138489, 'ën' => 0.0273716431474768, 'ar' => 0.0261650987206759, 've' => 0.0254385504988465, 'së' => 0.0209313299577553, 'ës' => 0.0202984090840998, 'ri' => 0.0181412349821734, 'en' => 0.0179851894018835, 're' => 0.0175763499815242, 'ia' => 0.0154460157194076, 'te' => 0.0147013662102646, 'an' => 0.0146339545195794, 'ga' => 0.0130853581807832, }; ${Lingua::Identify::languages{'suffixes3'}{'sq'}} = { 'uar' => 0.0222132138837645, 'isë' => 0.0202635163791075, 'min' => 0.0168047140418683, 'mit' => 0.0141701496966055, 'eve' => 0.0117828567489356, 'inë' => 0.0116322459885836, 'tit' => 0.00921198352123462, 'jen' => 0.00875265816752412, 'ore' => 0.00862677454693136, 'tën' => 0.00857432303835104, 'anë' => 0.00814347136072701, 'het' => 0.0079396597845292, 'htë' => 0.00790294372852298, 'tin' => 0.00756350753728178, 'are' => 0.00749532057612737, 'tet' => 0.0074121474696643, 'ave' => 0.00731099098883083, 'jnë' => 0.00725329432939248, 'imi' => 0.00718061152464547, 'ojë' => 0.00710568079810215, }; ${Lingua::Identify::languages{'suffixes4'}{'sq'}} = { 'imin' => 0.0175548344215506, 'imit' => 0.014315681038161, 'shtë' => 0.00859552825143514, 'shme' => 0.00679533752301235, 'imet' => 0.00664149686114053, 'tare' => 0.0066244978929779, 'risë' => 0.00650295527061507, 'stri' => 0.0064902060444931, 'ojnë' => 0.00640351130686367, 'tuar' => 0.0062564702322569, 'shëm' => 0.00563600789432081, 'shte' => 0.00543372017318549, 'shtu' => 0.00516768632144029, 'enti' => 0.00503424442136362, 'meve' => 0.00457357238415628, 'ndit' => 0.00433133708783877, 'rtën' => 0.00419534534253771, 'jera' => 0.00415029807690673, 'onit' => 0.00410185101764323, 'urën' => 0.00405255400997159, }; ${Lingua::Identify::languages{'smallwords'}{'sq'}} = { 'të' => 0.0673613487311092, 'e' => 0.0398306957513544, 'në' => 0.0287589107499287, 'i' => 0.0184951525520388, 'për' => 0.0162432278300542, 'dhe' => 0.0155018534359852, 'një' => 0.0137286142001711, 'se' => 0.0111113487311092, 'me' => 0.00905082691759338, 'nga' => 0.00708761049329912, 'do' => 0.00674009124607927, 'së' => 0.00656686626746507, 'që' => 0.00552822925577417, 'u' => 0.00528371827773025, 'tha' => 0.0039364128885087, 'BE' => 0.00353257770173938, 'më' => 0.00314977188480183, 'BiH' => 0.00277445109780439, 'është' => 0.00276375819788993, 'ka' => 0.00267785856857713, }; ${Lingua::Identify::languages{'ngrams1'}{'sq'}} = { 'e' => 0.0911379402530753, 'i' => 0.0873983143926153, 't' => 0.0849105506684087, 'ë' => 0.0809993489137199, 'r' => 0.0798437705199305, 'a' => 0.0760746523419589, 'n' => 0.0694182362796101, 's' => 0.0560064314070282, 'o' => 0.0427426953167761, 'h' => 0.0348407492922278, 'm' => 0.0344157129516206, 'k' => 0.0322940876751085, 'u' => 0.0311534289242955, 'd' => 0.0299315795582155, 'p' => 0.0291247911900549, 'j' => 0.0288152085982943, 'l' => 0.0237905851154355, 'b' => 0.0169862737549775, 'v' => 0.0165312246441969, 'g' => 0.0149386394985751, }; ${Lingua::Identify::languages{'ngrams2'}{'sq'}} = { 'të' => 0.0305515134929533, 'sh' => 0.0202911062647158, 'ar' => 0.0185144586565213, 'në' => 0.0175812623777293, 'ër' => 0.0154902374558484, 'it' => 0.0150256372485386, 'ri' => 0.0149897824673144, 'in' => 0.014684908830579, 're' => 0.0124625603726478, 'an' => 0.0122526155090936, 'et' => 0.0121399753379583, 'is' => 0.0114660782450683, 'mi' => 0.0112966320048849, 'je' => 0.0112630451465694, 'en' => 0.0108107565207643, 'ti' => 0.0106939044927985, 'ra' => 0.0105502693752434, 'er' => 0.0104456209324533, 'pë' => 0.0101380473874932, 'te' => 0.00988274406576372, }; ${Lingua::Identify::languages{'ngrams3'}{'sq'}} = { 'për' => 0.0125841514203984, 'sht' => 0.00936941639648566, 'imi' => 0.00684687585610321, 'një' => 0.00659297836244612, 'dhe' => 0.00646554215471368, 'min' => 0.00559981158939692, 'uar' => 0.00486109940818068, 'tar' => 0.00463254988152611, 'par' => 0.00423088209671698, 'tet' => 0.00421347277872074, 'isë' => 0.00398506252661014, 'ash' => 0.0039801879175712, 'ist' => 0.00371779467673195, 'shk' => 0.00367754433352465, 'end' => 0.00352016409883869, 'nis' => 0.00345233739592536, 'eve' => 0.00335916272600951, 'ësh' => 0.00325568173983989, 'rit' => 0.00321222808212129, 'ani' => 0.00310178336875317, }; ${Lingua::Identify::languages{'ngrams4'}{'sq'}} = { 'imin' => 0.0037823619808978, 'shte' => 0.00317121059247802, 'imit' => 0.00309179138887873, 'nist' => 0.00298673573806794, 'istr' => 0.00290355174152151, 'inis' => 0.00289512577635409, 'mini' => 0.00289369156951708, 'vend' => 0.00277160471251683, 'ësht' => 0.00272427588689558, 'ndër' => 0.00266188788948576, 'serb' => 0.00253711189466611, 'stri' => 0.00250573862010658, 'shtë' => 0.00242865000261743, 'krye' => 0.00232646276548065, 'bash' => 0.00220383808091652, 'isht' => 0.00213535470444941, 'asht' => 0.00212710801513662, 'ashk' => 0.00209609329228633, 'gjit' => 0.00205252925961223, 'rimi' => 0.00198010181434336, }; Lingua-Identify-0.56/lib/Lingua/Identify/SV.pm000644 000765 000024 00000022642 11375521416 021162 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'sv'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'sv'}} = 'swedish'; ${Lingua::Identify::languages{'_sets'}{'sv'}} = ''; =head1 NAME Lingua::Identify::SV - Meta-information on Swedish =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'sv'}} = { 's' => 0.0929930824278701, 'a' => 0.0914711255452625, 'f' => 0.0803805600093441, 'd' => 0.0760017737623735, 'o' => 0.0620361760555226, 'm' => 0.0560535202963821, 'e' => 0.0456025811579715, 'v' => 0.0447510943173366, 't' => 0.0411054766682999, 'k' => 0.0383861796202389, 'h' => 0.0347506746415185, 'b' => 0.0330234305514894, 'p' => 0.0309953345195496, 'i' => 0.0306853811743541, 'u' => 0.0278988348686895, 'g' => 0.0253797686928919, 'ä' => 0.0252346518738525, 'r' => 0.0224187788242706, 'n' => 0.0200549421378286, 'l' => 0.0178129631286984, }; ${Lingua::Identify::languages{'prefixes2'}{'sv'}} = { 'de' => 0.0569816870456325, 'fö' => 0.0503969999936234, 'at' => 0.0480098408000172, 'oc' => 0.0401689677420477, 'in' => 0.0310852415245124, 'so' => 0.0277700002840476, 'me' => 0.0261474508467227, 'ti' => 0.0229794506055721, 'ko' => 0.0221249890583706, 'ha' => 0.0192265441899803, 'be' => 0.0180880350456766, 'fr' => 0.0178358703421266, 'vi' => 0.0164492543170887, 'st' => 0.0157715979068591, 'ut' => 0.0143873006377158, 'sk' => 0.0139896340017727, 'an' => 0.0138678993173003, 'De' => 0.0106761318282289, 'et' => 0.0106256988875189, 'va' => 0.0100744146735511, }; ${Lingua::Identify::languages{'prefixes3'}{'sv'}} = { 'för' => 0.0362339606357084, 'til' => 0.0297575299884699, 'int' => 0.0219494022534338, 'kom' => 0.0195860613457221, 'frå' => 0.0135647676778574, 'vil' => 0.0120518586445684, 'all' => 0.0113211109387959, 'Eur' => 0.00956883356820649, 'med' => 0.00935222207987485, 'det' => 0.00882207178353011, 'upp' => 0.00879088647198042, 'ska' => 0.00866361668700736, 'sam' => 0.0085194899768723, 'kon' => 0.00813599492943691, 'var' => 0.00801209652949625, 'fra' => 0.00770108626025744, 'pro' => 0.00723667797204448, 'bet' => 0.00683042607564039, 'mås' => 0.00678828376273541, 'tal' => 0.00675794129744382, }; ${Lingua::Identify::languages{'prefixes4'}{'sv'}} = { 'komm' => 0.0220479867947887, 'förs' => 0.013132029752382, 'Euro' => 0.0114813783345903, 'dett' => 0.0102352584834742, 'till' => 0.00983377614971327, 'före' => 0.00927881575924883, 'måst' => 0.0081851575538647, 'fråg' => 0.00817194421123459, 'skal' => 0.00734153644748103, 'skul' => 0.00704474444378942, 'rätt' => 0.0063962742439427, 'unde' => 0.00637086396965403, 'arbe' => 0.00635765062702393, 'ocks' => 0.00616961459728781, 'fram' => 0.00613912226814141, 'unio' => 0.00607203914401934, 'talm' => 0.00581082152433186, 'geno' => 0.00579760818170176, 'denn' => 0.00579455894878712, 'myck' => 0.00573865634535205, }; ${Lingua::Identify::languages{'suffixes1'}{'sv'}} = { 't' => 0.171861421057371, 'r' => 0.164542260325181, 'a' => 0.13690609277513, 'n' => 0.126084924319605, 'e' => 0.0768984847733736, 'm' => 0.0525513615996228, 's' => 0.0441977352030942, 'g' => 0.0423857762741632, 'l' => 0.0330366347542019, 'h' => 0.0326167782742988, 'd' => 0.0285993081979254, 'å' => 0.0252910414767617, 'v' => 0.0220976029735956, 'i' => 0.017514085426124, 'k' => 0.00796867364808743, 'u' => 0.00372255883808007, 'p' => 0.00366084499404613, 'o' => 0.00243870854170187, 'U' => 0.00149580194105215, 'y' => 0.000638890041105467, }; ${Lingua::Identify::languages{'suffixes2'}{'sv'}} = { 'en' => 0.076024198634711, 'et' => 0.0685119787415597, 'tt' => 0.0673201333753803, 'er' => 0.0628147723899978, 'ar' => 0.0380793435111672, 'ch' => 0.0371628515248201, 'om' => 0.0351576992283149, 'ör' => 0.0344614899458336, 'na' => 0.0292477461601247, 'an' => 0.0288048619870891, 'de' => 0.0265626159382652, 'll' => 0.0265562393336796, 'ra' => 0.025694238331973, 'te' => 0.0209906229131112, 'ag' => 0.0200242774727313, 'ta' => 0.0199031219856051, 'ka' => 0.0196799408251094, 'ga' => 0.0165229418639163, 'ng' => 0.0160470152853009, 'la' => 0.0136859325146546, }; ${Lingua::Identify::languages{'suffixes3'}{'sv'}} = { 'ill' => 0.0272307916635411, 'rna' => 0.0237177936551952, 'nde' => 0.0233705395334489, 'gen' => 0.0203885369054432, 'ing' => 0.0202528380860229, 'ska' => 0.0186252951027897, 'nte' => 0.0176231466786239, 'ter' => 0.0174048485778174, 'tta' => 0.0143722749612078, 'der' => 0.0140494634839534, 'nen' => 0.0136946237139165, 'igt' => 0.0132833130066054, 'ler' => 0.0128273312593609, 'iga' => 0.0125213767783077, 'lla' => 0.0122659932858584, 'ten' => 0.0121328230158683, 'det' => 0.0115428281488237, 'tet' => 0.0114324148237053, 'ade' => 0.0103324958215721, 'mer' => 0.00988494257242819, }; ${Lingua::Identify::languages{'suffixes4'}{'sv'}} = { 'ande' => 0.0238897477570328, 'erna' => 0.0205711625618868, 'iska' => 0.0182039390030828, 'ning' => 0.0159800295369329, 'onen' => 0.0147003668230925, 'ller' => 0.0139563532356968, 'ngen' => 0.0139421234676319, 'etta' => 0.0129663679431785, 'liga' => 0.0104527810557065, 'mmer' => 0.0101295620382313, 'ligt' => 0.00870556881973222, 'åste' => 0.00820549411344988, 'nder' => 0.00818110022533854, 'ngar' => 0.00811604985704165, 'kall' => 0.00733849467349289, 'ndet' => 0.00732934696545114, 'ulle' => 0.00683638714320127, 'enna' => 0.00645523264146168, 'igen' => 0.00641559257328076, 'ckså' => 0.00623670406046432, }; ${Lingua::Identify::languages{'smallwords'}{'sv'}} = { 'att' => 0.0401237669403926, 'och' => 0.0306583179754749, 'i' => 0.0240577722145853, 'som' => 0.021654824286262, 'för' => 0.0201406646497384, 'det' => 0.0182216234093763, 'av' => 0.0175245441179484, 'är' => 0.0163942902913294, 'en' => 0.0143176866918831, 'till' => 0.012266449168822, 'de' => 0.012060593702774, 'om' => 0.0120479106645814, 'har' => 0.0113259531059297, 'den' => 0.0112859527547071, 'på' => 0.0110464384565328, 'med' => 0.0100830153630617, 'inte' => 0.00992155053068759, 'vi' => 0.00897276171205406, 'ett' => 0.00868739335272212, 'jag' => 0.00589956399617167, }; ${Lingua::Identify::languages{'ngrams1'}{'sv'}} = { 'e' => 0.101382757172588, 't' => 0.0957430388177884, 'a' => 0.0946380779045734, 'r' => 0.0858728833506106, 'n' => 0.0822436644495907, 's' => 0.0593526588815034, 'i' => 0.0592347939993719, 'l' => 0.0504092361711763, 'd' => 0.0439291716119838, 'o' => 0.0421977740528421, 'm' => 0.0384678602676946, 'g' => 0.0328151749873192, 'k' => 0.0306581045609953, 'f' => 0.0226362278372885, 'v' => 0.0225895469052152, 'ä' => 0.0211856312871704, 'ö' => 0.0171433487362524, 'u' => 0.0170330851553205, 'h' => 0.0165805841662571, 'p' => 0.0164533294414671, }; ${Lingua::Identify::languages{'ngrams2'}{'sv'}} = { 'de' => 0.0287472709601738, 'en' => 0.0267185161457485, 'er' => 0.0249499562151046, 'et' => 0.0218521065399656, 'tt' => 0.018478428422902, 'an' => 0.0175954057017664, 'in' => 0.0165393205852417, 'ar' => 0.0162487331898253, 'at' => 0.0159135409571892, 'te' => 0.015106499515141, 'om' => 0.0148090246080951, 'ör' => 0.0147408054452886, 'll' => 0.0141107621195606, 'ra' => 0.0134695675845278, 'fö' => 0.0127871573052993, 'ti' => 0.0124940554215102, 'st' => 0.0120543479330359, 'nd' => 0.0119915950493004, 'ta' => 0.0113003293979768, 'me' => 0.0099571553046412, }; ${Lingua::Identify::languages{'ngrams3'}{'sv'}} = { 'för' => 0.0159007917342437, 'att' => 0.0129199860143547, 'det' => 0.0109028899994407, 'och' => 0.00892079993763083, 'ing' => 0.00862987396880621, 'nde' => 0.00827660672094774, 'and' => 0.00804355912080108, 'ter' => 0.00748555586036613, 'som' => 0.00703800963123948, 'den' => 0.00657470374991859, 'ill' => 0.00647777491562659, 'ion' => 0.00579620482471292, 'ska' => 0.00567947182571666, 'lig' => 0.0052790650872241, 'til' => 0.00523875942807246, 'med' => 0.00510319852601228, 'nin' => 0.00500361983869647, 'gen' => 0.00500166731541577, 'nte' => 0.00487154558535184, 'rna' => 0.00480069688345207, }; ${Lingua::Identify::languages{'ngrams4'}{'sv'}} = { 'till' => 0.00686923477562042, 'ande' => 0.00682809723445534, 'ning' => 0.00655941766872085, 'ione' => 0.00518002449152897, 'komm' => 0.00470014772784872, 'erna' => 0.00461034300628743, 'inte' => 0.00453504661397633, 'tion' => 0.0035572954123561, 'onen' => 0.00339825473981607, 'iska' => 0.00338686845610073, 'förs' => 0.00324453990965902, 'euro' => 0.00304987118807423, 'urop' => 0.00283793939117908, 'ller' => 0.00281700332112185, 'ring' => 0.00268330631233531, 'nder' => 0.00268330631233531, 'ngen' => 0.00265759534910713, 'inge' => 0.00259552173788481, 'sion' => 0.00257844231231181, 'miss' => 0.0023841408902017, }; Lingua-Identify-0.56/lib/Lingua/Identify/TR.pm000644 000765 000024 00000022745 11375521416 021163 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'tr'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'tr'}} = 'turkish'; ${Lingua::Identify::languages{'_sets'}{'tr'}} = ''; =head1 NAME Lingua::Identify::TR - Meta-information on Turkish =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'tr'}} = { 'b' => 0.0779128807120889, 'd' => 0.0649962171734486, 'y' => 0.0595192515425341, 's' => 0.0592514924856465, 'k' => 0.0542716745117568, 'g' => 0.0511152873738676, 'i' => 0.0489440200060225, 'a' => 0.0488839618998047, 'v' => 0.0364736215621948, 'o' => 0.0351089679264689, 't' => 0.0327199899235844, 'B' => 0.0316272660465667, 'e' => 0.0306730094699956, 'A' => 0.028572644033102, 'h' => 0.0231106929287419, 'S' => 0.0228279193452999, 'm' => 0.0194671678181974, 'K' => 0.0180983434806509, 'n' => 0.017668761193121, 'ü' => 0.0171432527637156, }; ${Lingua::Identify::languages{'prefixes2'}{'tr'}} = { 'bi' => 0.0327210398462044, 'ka' => 0.0286337788435585, 'ol' => 0.0277313130830911, 'ya' => 0.027678982922006, 'ge' => 0.0220254893788134, 'de' => 0.0192180221402485, 'ba' => 0.0181190887574616, 'sa' => 0.0142227869391263, 'ha' => 0.0141153723979516, 'ta' => 0.0140171385867919, 'gö' => 0.0135094442169664, 'ye' => 0.0132395307545275, 'ko' => 0.012987060679117, 'so' => 0.0126877688806303, 'il' => 0.0121249901307152, 'gü' => 0.0119780984504764, 'da' => 0.0112821991153449, 'al' => 0.0108387698556238, 'Ba' => 0.0106294492112835, 'te' => 0.0100749331183818, }; ${Lingua::Identify::languages{'prefixes3'}{'tr'}} = { 'ola' => 0.0142185969575413, 'içi' => 0.0106807125352061, 'baş' => 0.0103758976797493, 'gör' => 0.0102062174102117, 'yap' => 0.00959455560026173, 'kon' => 0.00906925799935786, 'kar' => 0.00884267895680164, 'Sır' => 0.00865877399400937, 'ger' => 0.00802882329273199, 'ver' => 0.00777887511125742, 'gün' => 0.00775652202185726, 'ülk' => 0.00731758862999947, 'old' => 0.00723528861902614, 'son' => 0.00708288119129774, 'ara' => 0.00657485643220308, 'tar' => 0.00640416011314727, 'gel' => 0.00637875887519254, 'sür' => 0.0062111107046913, 'Kos' => 0.00597437116695319, 'olm' => 0.00595608227562578, }; ${Lingua::Identify::languages{'prefixes4'}{'tr'}} = { 'konu' => 0.00850693801178258, 'oldu' => 0.0071973767090296, 'ülke' => 0.00703874726740524, 'söyl' => 0.00669833080143762, 'olma' => 0.0063185464448479, 'Koso' => 0.00566781837044722, 'Sırb' => 0.00563423987550484, 'bulu' => 0.00547445255474453, 'karş' => 0.00539919041090815, 'olar' => 0.00537950646559709, 'gere' => 0.00503214272481381, 'beli' => 0.00492561784430694, 'üzer' => 0.00472183111638075, 'tara' => 0.0046002538071066, 'yapı' => 0.0044381507280744, 'görü' => 0.00438257252954908, 'başl' => 0.00429457371521731, 'topl' => 0.00424478491237171, 'sonr' => 0.00421236429656527, 'aras' => 0.00403984030530957, }; ${Lingua::Identify::languages{'suffixes1'}{'tr'}} = { 'n' => 0.168950932812645, 'i' => 0.12965166157122, 'a' => 0.126068313717186, 'e' => 0.12555223825845, 'ı' => 0.086969136353139, 'r' => 0.0823011129388205, 'k' => 0.0650521869843935, 'u' => 0.0346337490005728, 't' => 0.0249717158967082, 'l' => 0.0213150002876349, 'm' => 0.0184836719227438, 'ü' => 0.0140490913653656, 'z' => 0.0117013232041366, 'ç' => 0.0109759828986433, 's' => 0.0106233174397655, 'ş' => 0.0101164129504092, 'p' => 0.00795790024819978, 'y' => 0.00569934060730159, 'B' => 0.00534750887291287, 'o' => 0.00427700662894341, }; ${Lingua::Identify::languages{'suffixes2'}{'tr'}} = { 'an' => 0.0535280675499215, 'in' => 0.0424286940446322, 'da' => 0.0320996009214021, 'en' => 0.0294412166028434, 'ir' => 0.0277126707454847, 'ın' => 0.0266205232037795, 'nı' => 0.0231505934826944, 'de' => 0.0225563771879631, 'ak' => 0.022524282172044, 'er' => 0.0222812770515134, 'ki' => 0.0215761037017474, 'ar' => 0.0209222823774521, 'ni' => 0.0192340845401059, 'ri' => 0.0181052569802075, 'na' => 0.0179356118960635, 'si' => 0.016998437431225, 'le' => 0.0161529630118697, 'di' => 0.0157265578003727, 'ya' => 0.0154074416420911, 'ek' => 0.0150149654474229, }; ${Lingua::Identify::languages{'suffixes3'}{'tr'}} = { 'nda' => 0.0243576967805179, 'eri' => 0.0172197540586014, 'ini' => 0.016004050990524, 'nin' => 0.015743253253883, 'ını' => 0.0151881701334838, 'yor' => 0.0142464569054573, 'lar' => 0.0139328907551769, 'arı' => 0.0133717189716654, 'ası' => 0.0133321426614358, 'dan' => 0.0132063102904495, 'ler' => 0.0125527937830691, 'nde' => 0.0124756707169807, 'tan' => 0.0124269614120827, 'ine' => 0.0123681043353311, 'nın' => 0.0116922627298724, 'esi' => 0.0116100657778572, 'den' => 0.0107891110348901, 'ına' => 0.00987175245931251, 'anı' => 0.0094485903730118, 'lan' => 0.00919591085385382, }; ${Lingua::Identify::languages{'suffixes4'}{'tr'}} = { 'ında' => 0.0199314113400226, 'leri' => 0.0141663564066572, 'ları' => 0.0135720446866913, 'inin' => 0.0132182326510696, 'stan' => 0.0122492964227984, 'inde' => 0.0116295472362193, 'ının' => 0.0101877053786367, 'ndan' => 0.00986395580355801, 'ması' => 0.00860595745468084, 'kanı' => 0.0079642395469356, 'arak' => 0.00763702122640964, 'iyor' => 0.00747167769342303, 'ğini' => 0.00720805303943039, 'arın' => 0.00682880353719536, 'erin' => 0.00646921025916153, 'mesi' => 0.0062784292595616, 'acak' => 0.00604717956307682, 'daki' => 0.00601827335101623, 'ğını' => 0.00600324212074472, 'deki' => 0.00569914876986724, }; ${Lingua::Identify::languages{'smallwords'}{'tr'}} = { 've' => 0.0209892287964058, 'bir' => 0.0170742121414594, 'da' => 0.00769892512320313, 'de' => 0.00618942819302771, 'için' => 0.00608145674516565, 'nin' => 0.00443944940170497, 'bu' => 0.00442192156925983, 'günü' => 0.003790218487937, 'nın' => 0.00362685908954829, 'ın' => 0.00334220709063923, 'ile' => 0.00327349798745428, 'olarak' => 0.00324545345554206, 'AB' => 0.00312205751512827, 'olan' => 0.00305194618534771, 'in' => 0.00285703668855776, 'daha' => 0.00258991252209383, 'söyledi' => 0.00245950544870199, 'Bu' => 0.00230245606999354, 'yeni' => 0.00213068331203117, 'yer' => 0.00212787885883995, }; ${Lingua::Identify::languages{'ngrams1'}{'tr'}} = { 'a' => 0.121427424625993, 'e' => 0.0917064686875973, 'i' => 0.0843054613927372, 'n' => 0.0776476111558781, 'r' => 0.0687311214645825, 'l' => 0.0675120141610733, 'ı' => 0.0462950069737554, 'k' => 0.0450095735369127, 'd' => 0.0411771151251738, 's' => 0.0384914774089132, 't' => 0.0373360330479154, 'm' => 0.0321137694828261, 'u' => 0.0305177572686721, 'y' => 0.0300093414588554, 'o' => 0.0280379461573834, 'b' => 0.0234386156074649, 'ü' => 0.0191995767302053, 'ş' => 0.0141116140737207, 'v' => 0.013706555431527, 'g' => 0.0132957899518571, }; ${Lingua::Identify::languages{'ngrams2'}{'tr'}} = { 'ar' => 0.0227680522182232, 'la' => 0.0218107041487479, 'an' => 0.021333452733541, 'er' => 0.0193927001947641, 'in' => 0.0179405800274401, 'le' => 0.0178983506897908, 'ın' => 0.0153876518421725, 'da' => 0.013804950176869, 'de' => 0.0134843366594677, 'ma' => 0.0117902214224937, 'en' => 0.0114687094085467, 'nd' => 0.011093886599553, 'ra' => 0.0110104761702244, 'ya' => 0.01067997919082, 'il' => 0.0105529916790235, 'ak' => 0.0101910473205186, 'ri' => 0.010166338665511, 'ka' => 0.00978103339681711, 'li' => 0.00972772260177046, 'ir' => 0.00967635854923956, }; ${Lingua::Identify::languages{'ngrams3'}{'tr'}} = { 'lar' => 0.0110083798617546, 'ler' => 0.00973701676337423, 'eri' => 0.00848372472698094, 'nda' => 0.00720378243755627, 'arı' => 0.00719648099836959, 'ara' => 0.00690570118276013, 'bir' => 0.00636156142737295, 'ası' => 0.00575627211879734, 'ınd' => 0.0051560938176524, 'ini' => 0.00513199906833636, 'lan' => 0.00503215188745854, 'ile' => 0.00489214679105398, 'ını' => 0.00467292107947398, 'rin' => 0.00440002978987188, 'esi' => 0.004260207229447, 'nin' => 0.00408479015298705, 'nde' => 0.00391284126014079, 'ele' => 0.00388819890288575, 'rın' => 0.00388381803937374, 'sın' => 0.00362881527577901, }; ${Lingua::Identify::languages{'ngrams4'}{'tr'}} = { 'ları' => 0.00715596815356012, 'leri' => 0.00711471746927736, 'ında' => 0.00630269433060623, 'erin' => 0.00516203314367135, 'arın' => 0.00473471251919524, 'inde' => 0.00355599130864035, 'asın' => 0.00355006579598095, 'ması' => 0.00334244494856883, 'iler' => 0.00299329551033022, 'stan' => 0.00274670302196588, 'inin' => 0.00264300655042634, 'eler' => 0.00262249516045149, 'ista' => 0.00243789265067781, 'için' => 0.00235174481278343, 'mesi' => 0.00234627510879013, 'anla' => 0.0023296380924772, 'lara' => 0.00231550802382785, 'esin' => 0.00222411838627323, 'sınd' => 0.00214161701770771, 'lama' => 0.00206025517080746, }; Lingua-Identify-0.56/lib/Lingua/Identify/UK.pm000644 000765 000024 00000024023 11746562315 021151 0ustar00ambsstaff000000 000000 use utf8; use strict; ${Lingua::Identify::languages{'_versions'}{'uk'}} = '0.02'; ${Lingua::Identify::languages{'_names'}{'uk'}} = 'ukranian'; ${Lingua::Identify::languages{'_sets'}{'uk'}} = ''; =head1 NAME Lingua::Identify::UK - Meta-information on Ukranian =head1 SYNOPSIS Nothing here is meant for public consumption. This module is to be loaded by Lingua::Identify. =head1 DESCRIPTION Automatically generated. Do not change this module yourself unless you know what you're doing. =head1 SEE ALSO Lingua::Identify(3). =head1 AUTHOR Jose Castro, C<< >> =head1 COPYRIGHT AND LICENSE Copyright (C) 2010 by Alberto Simoes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available. =cut ${Lingua::Identify::languages{'prefixes1'}{'uk'}} = { 'п' => 0.101843243261128, 'н' => 0.0925757430377946, 'в' => 0.0848732294640996, 'з' => 0.0618814911378082, 'с' => 0.0607499363086829, 'т' => 0.049788412481513, 'д' => 0.0467907847762863, 'м' => 0.0434490585265303, 'к' => 0.0362560754899268, 'б' => 0.0347076320395449, 'щ' => 0.027713167394016, 'р' => 0.0265187484077171, 'я' => 0.0240306512395819, 'о' => 0.0230909975218288, 'Г' => 0.022591392904291, 'ч' => 0.0179559884726988, 'г' => 0.0171519889888466, 'Н' => 0.0146870522996701, 'ц' => 0.0144918425484468, 'В' => 0.0127680411859489, }; ${Lingua::Identify::languages{'prefixes2'}{'uk'}} = { 'по' => 0.0452016018474538, 'пр' => 0.033388556136103, 'ві' => 0.0310637919915351, 'за' => 0.0293762140327707, 'на' => 0.0270746202949593, 'ви' => 0.0180613320666844, 'бу' => 0.0171113453896683, 'Га' => 0.016763789288321, 'ро' => 0.0142034593417287, 'ст' => 0.013898382319435, 'пі' => 0.013805700692409, 'як' => 0.0135392410147093, 'ко' => 0.0133770481674139, 'мо' => 0.0131800997099837, 'во' => 0.013099003286336, 'не' => 0.012438646693776, 'до' => 0.0123807206768848, 'пе' => 0.0111719977910879, 'та' => 0.0110291136160895, 'йо' => 0.0107124513904175, }; ${Lingua::Identify::languages{'prefixes3'}{'uk'}} = { 'Гар' => 0.0188274925483695, 'про' => 0.0175796751265454, 'від' => 0.0146695708919029, 'пер' => 0.0117286019153777, 'при' => 0.0110143036032382, 'роз' => 0.0109922573590363, 'бул' => 0.00991640064198663, 'пов' => 0.0097444399372123, 'йог' => 0.00933878904389848, 'вон' => 0.00858039824335526, 'під' => 0.00817915659888181, 'мен' => 0.00669764898851832, 'кол' => 0.00647718654649994, 'зна' => 0.00640222931621369, 'Гер' => 0.00583784546464664, 'ска' => 0.00570556799943562, 'гол' => 0.00508827316178416, 'сво' => 0.00506622691758232, 'так' => 0.00488985696396762, 'зап' => 0.00488103846628688, }; ${Lingua::Identify::languages{'prefixes4'}{'uk'}} = { 'Гарр' => 0.022594854231958, 'пере' => 0.0108720235210494, 'Герм' => 0.00708647001173898, 'сказ' => 0.00583179864947821, 'голо' => 0.00581564408258215, 'пове' => 0.00463097584353763, 'відп' => 0.00459328185411349, 'капі' => 0.00408710542470356, 'ньог' => 0.00387709605505476, 'проф' => 0.0038609414881587, 'Наут' => 0.00367785639667001, 'вели' => 0.0035970835621897, 'Дамб' => 0.00352169558334141, 'Сірі' => 0.00344630760449312, 'знов' => 0.00317167996726008, 'прос' => 0.00312321626657189, 'пові' => 0.00306936771025169, 'поба' => 0.00298321002013936, 'Амбр' => 0.00297244030887532, 'поча' => 0.00291859175255511, }; ${Lingua::Identify::languages{'suffixes1'}{'uk'}} = { 'и' => 0.126357373990034, 'о' => 0.1254077912109, 'а' => 0.10324313951257, 'е' => 0.0893765840165697, 'і' => 0.0822629848000582, 'в' => 0.0676486742236251, 'я' => 0.0624607097717693, 'у' => 0.0497190955472178, 'м' => 0.0361966397342492, 'ь' => 0.03525698290751, 'й' => 0.0285238785328119, 'н' => 0.0235873715416328, 'ю' => 0.0235013466208749, 'х' => 0.023054678763094, 'к' => 0.0199412383618208, 'д' => 0.0169072055797087, 'ї' => 0.0148558420847147, 'с' => 0.0120335629536987, 'ж' => 0.00834772596430628, 'р' => 0.00823192318636306, }; ${Lingua::Identify::languages{'suffixes2'}{'uk'}} = { 'ся' => 0.0459936126418716, 'го' => 0.0325121935207319, 'ти' => 0.030635376077915, 'ли' => 0.0276502322851218, 'ав' => 0.0241862297209103, 'ла' => 0.023444770978069, 'ів' => 0.0220390887780992, 'ні' => 0.0215216123638245, 'на' => 0.0204750742424184, 'но' => 0.0202201977995667, 'рі' => 0.0189882949924502, 'ий' => 0.0184283391710337, 'ть' => 0.0175053775067677, 'ми' => 0.0165360746710742, 'их' => 0.015238521871102, 'ки' => 0.0146051925282585, 'ін' => 0.0141610896354108, 'ом' => 0.0140375131782706, 'ою' => 0.0137092632139919, 'ив' => 0.013593410285423, }; ${Lingua::Identify::languages{'suffixes3'}{'uk'}} = { 'ого' => 0.036535358078256, 'ррі' => 0.0179017081580641, 'вся' => 0.0177076994982231, 'ати' => 0.0141802693192942, 'ися' => 0.0140700371262026, 'ому' => 0.0125311957106449, 'она' => 0.0110055821582582, 'ння' => 0.0101149060380786, 'ала' => 0.00984593948693528, 'али' => 0.00903463054578164, 'ний' => 0.00884062188594054, 'ами' => 0.00816159157649673, 'ити' => 0.00802049436933958, 'оли' => 0.00746492411615828, 'нув' => 0.0068388052593984, 'ими' => 0.00681675882078009, 'ься' => 0.00672857306630687, 'уло' => 0.00633614645890103, 'ючи' => 0.00616859352540191, 'ені' => 0.00586435267246929, }; ${Lingua::Identify::languages{'suffixes4'}{'uk'}} = { 'аррі' => 0.021808900185241, 'лися' => 0.00892818679188386, 'ився' => 0.00822814802050575, 'ться' => 0.0081742988842459, 'тися' => 0.00799659673458838, 'ього' => 0.00785120406668677, 'ався' => 0.00727501830870633, 'лася' => 0.00712962564080472, 'ного' => 0.00686576487313144, 'лося' => 0.00633804333778486, 'ання' => 0.00584801619782019, 'вати' => 0.00537414379873347, 'іона' => 0.00505643389480033, 'аючи' => 0.00498643001766252, 'азав' => 0.00496489036315857, 'ення' => 0.00494873562228062, 'ував' => 0.00490565631327274, 'увся' => 0.00430793090078835, 'льки' => 0.00393637186059536, 'ість' => 0.0036617412656701, }; ${Lingua::Identify::languages{'smallwords'}{'uk'}} = { '—' => 0.053529413446392, 'не' => 0.0178021607311837, 'на' => 0.0175877457552982, 'і' => 0.0171760690015981, 'що' => 0.0153835598031956, 'з' => 0.0139455500315905, 'в' => 0.0105492168135647, 'й' => 0.00813919248461215, 'до' => 0.00773609232994748, 'у' => 0.0069556218177244, 'Гаррі' => 0.00661255785630766, 'я' => 0.00645817907367013, 'він' => 0.00599218385941239, 'його' => 0.00565197876434079, 'а' => 0.00491153238094966, 'як' => 0.00478574226176352, 'за' => 0.00467138760795794, 'це' => 0.0042339810571516, 'Я' => 0.00392808235822167, 'було' => 0.003279119697875, }; ${Lingua::Identify::languages{'ngrams1'}{'uk'}} = { 'о' => 0.0945653580021905, 'а' => 0.0830555882953249, 'и' => 0.0648161190820471, 'н' => 0.0647258531424908, 'в' => 0.0609766544683077, 'і' => 0.0578846148317863, 'е' => 0.0501987862968255, 'р' => 0.0498359977117297, 'т' => 0.0452652065048971, 'с' => 0.0404799618238319, 'л' => 0.0398124538249473, 'у' => 0.0343832737788944, 'д' => 0.0339365436194342, 'к' => 0.0329459180534754, 'м' => 0.0318868743549866, 'п' => 0.0298331804945079, 'я' => 0.0247391918036227, 'з' => 0.0240285631329755, 'г' => 0.0189725206332414, 'б' => 0.0173787806048968, }; ${Lingua::Identify::languages{'ngrams2'}{'uk'}} = { 'на' => 0.0177609276669143, 'ро' => 0.0125856381267914, 'ві' => 0.0122504860061537, 'ли' => 0.0120213586540405, 'не' => 0.0118605425621846, 'по' => 0.0118448878983756, 'ти' => 0.011177429959611, 'ов' => 0.0110550389516499, 'ер' => 0.0105156646258678, 'ст' => 0.00984464880896471, 'ал' => 0.00983610990143254, 'ав' => 0.00970944943970535, 'го' => 0.00955361437724324, 'ся' => 0.00930669763443798, 'ва' => 0.00912168797124096, 'ні' => 0.00902277895899332, 'но' => 0.00887548280406338, 'за' => 0.00884417347644542, 'ви' => 0.00879365160688008, 'та' => 0.00844284482243342, }; ${Lingua::Identify::languages{'ngrams3'}{'uk'}} = { 'ого' => 0.00829123953396453, 'про' => 0.00554713478114246, 'від' => 0.00527517197115018, 'ере' => 0.00443118071747415, 'гар' => 0.00427072265957871, 'ати' => 0.0040740028936843, 'ува' => 0.00402414304518571, 'пер' => 0.00388816164018958, 'ррі' => 0.00382289056579143, 'арр' => 0.00381745130959158, 'вся' => 0.00369234841699514, 'али' => 0.00341585289350299, 'пов' => 0.00329709579980636, 'він' => 0.0032780584031069, 'ися' => 0.0031021891193119, 'роз' => 0.00297255351321558, 'ала' => 0.00275588980792173, 'при' => 0.00274501129552204, 'ові' => 0.00271872155722279, 'ому' => 0.00263894579962505, }; ${Lingua::Identify::languages{'ngrams4'}{'uk'}} = { 'аррі' => 0.00498613973985358, 'гарр' => 0.00497903191413747, 'пере' => 0.00327078446703154, 'його' => 0.00273888217594238, 'коли' => 0.00212405525149857, 'тися' => 0.0020115146776601, 'лися' => 0.00196886772336342, 'каза' => 0.00190134337906035, 'пові' => 0.00188357381477006, 'ього' => 0.00187291207619589, 'ився' => 0.00181131091998958, 'ться' => 0.00180894164475087, 'вати' => 0.00167152368090601, 'голо' => 0.00161347643755775, 'ався' => 0.00160163006136423, 'ермі' => 0.00158030658421589, 'герм' => 0.00157438339611913, 'лася' => 0.00157082948326107, 'рміо' => 0.00155661383182884, 'було' => 0.00155424455659014, };