HTTP-OAI-3.27/0000755000076400007640000000000011616526067011124 5ustar tdb2tdb2HTTP-OAI-3.27/CHANGES0000644000076400007640000001145311616525761012123 0ustar tdb2tdb23.26 - Added 'delay' option to delay between requests - Added --skip-identify option to oai_browser.pl - Fixed POD for RT #64077 - Hopefully fixes #69337 (no test case given) - Bumped XML::SAX::Base to 1.04 which should fix #68907 3.25 - Added unit test for OAI_DC metadata() parsing - Metadata::OAI_DC now parses dc when passed as a DOM to ->metadata 3.24 - Fixed CPAN bugs #60760 and #60856 3.23 - Changed license to BSD - Added dependency for XML::SAX rt #43287 - Fixed some POD issues rt #51853 3.22 - use XML::SAX::Parser rather than ParserFactory - test the namespace on NamespaceURI rather than the XMLNS attribute, which is reported differently between XML::LibXML::SAX and XML::LibXML::SAX::Parser - added debug class HTTP::OAI::Debug - added check in 01parse test case for file open - added 000xml_sax.pl test case for basic XML::SAX function - added "trace" and "saxtrace" options to oai_browser.pl 3.21 - added test case for bad characters - fix bad chars as they arrive, to avoid buffering an entire response if it contains bad chars - some code cleanup 3.20 - use strict UTF-8 (suggested by Dennis Vierkant) 3.19 3.18 - broke apart Metadata from Encapsulation - now uses XML::SAX in Response, rt.cpan.org ticket #27191 - shifted all use's into HTTP::OAI - added use strict(), use warnings() to every module - shifted $VERSION in HTTP::OAI - href attribute lookup in METS should be namespaced too - added mets test case - fixed test cases for changed XML output 3.17 - Removed next() call from oai_browser for List* - Fixed missing HTTP:: in ListIdentifiers documentation - Fixed missing namespace initialization in SAX - Fixed bug in request where '?' wasn't being removed 3.16 - Fixed recursion bug in onRecord - No longer need to call next() for onRecord (Harvester will do it for you) - Added PartialList module (ListIdentifiers, ListRecords, ListSets) 3.15 - Several bug fixes, particularly catching and throwing parsing errors - Added onRecord argument to ListRecords, ListIdentifiers, ListSets - INCOMPATIBLE: Changed error handling, so now only need to check after the next() loop for both initial errors and flow errors - Added is_deleted method to Record and Header 3.14 - By default oai-lib now fixes bad UTF-8 characters (replacing with '?') To disable this set $HTTP::OAI::UserAgent::IGNORE_BAD_CHARS to 0 - Fixed bug where a partially downloaded utf-8 character could cause a harvest to fail 3.13 3.12 - BUG: Static repository was re-fetching the source for every request - API change: when parsing a static repository records are now pre-parsed to allow for quick GetRecord access, this means you *must* supply the handlers argument to the Harvester constructor (otherwise the first request's handlers are always used) - Added resumptionToken option to ListIdentifiers in the oai_browser.pl tool 2005-09-05 - Added -oldstyle_urls to gateway & doc examples [thanks to Patrick Hochstenbach] 2005-04-13 - Improved display of errors in the event of XML parsing failure (bad chars etc.) - In the event of bad XML resumption token will not sleep(60) and retry, but simply fail 2005-04-08 - Fixed bug where Header wasn't getting end_element on , resulting in deleted items in ListRecords never being flagged as such 2005-04-06 - UserAgent now uses a callback with LWP to parse XML responses, resulting in a much reduced memory footprint - Compression support is removed for the moment 2005-03-30 - CGI has an annoying habit of return the URL with semi-colons, ::Response now checks for this and acts appropriately - Header was blanking setSpecs when given a dom (corrected POD for setSpec) - Fixed Header missing the record status - Tests added to getrecord.t for parsing Header - oai_static_gateway.pl now throws an error if given a resumption token (which should never occur because it doesn't use Flow-Control) 2005-03-07 - Fixed undef warning when trying to set the repository to an Identify w/o a base URL - Changed back to XML::LibXML::SAX::Parser due to unreliability in XML::LibXML::SAX :-( 2005-03-05 - 80network.t && added Prereqs (jaybonci@debian.org RT #11508 & #11509) 2005-02-25 - Added new script 'gateway.pl' that acts as a static repository broker - Library now uses temporary files to harvest (due to memory leak in LibXML's parse_string), provides for a smaller memory footprint - Numerous bug fixes 2005-02-23 - Added support for harvesting from Static repositories - Fixed bug where the error message for an unsupported namespace wasn't getting displayed 2004-10-08 - Moved to namespace HTTP::OAI - Changed all arguments from -style to 'argument' - Now uses (in a round-about way) XML::SAX, dropping the requirement for XML::Parser and XML::SAX::PerlSAX - Fixed some bugs with the DOM construction code (which also effected toString) HTTP-OAI-3.27/README0000644000076400007640000000225110640714230011767 0ustar tdb2tdb2Description ----------- OAI-PERL are a set of Perl modules that provide an API to the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). OAI-PMH is a XML-over-HTTP protocol for transferring metadata between a repository (the HTTP server) and service provider (the HTTP client). Synopsis -------- use HTTP::OAI::Harvester; my $h = HTTP::OAI::Harvester->new( baseURL=>'http://eprints.ecs.soton.ac.uk/perl/oai2', debug=>1, ); my $r = $h->ListIdentifiers( metadataPrefix=>'oai_dc' ); die unless $r->is_success(); while(my $rec = $r->next) { # Check we didn't fail on a resumption token die unless $rec->is_success(); print $rec->identifier(); } Installation ------------ perl Makefile.pl make make test make install To disable the network tests use: export SKIP_HTTP_OAI_NETTESTS=1 Use the following to view the root documentation: man HTTP::OAI::Harvester man HTTP::OAI::Repository Utility Scripts --------------- bin/oai_browser - Interactively browse an OAI repository Related Modules --------------- Net::OAI::Harvester by Ed Summers. Author ------ Copyright 2004 © Tim Brody This module is released under the same terms as Perl. HTTP-OAI-3.27/MANIFEST.SKIP0000644000076400007640000000010311220376362013004 0ustar tdb2tdb2.svn Makefile$ Makefile.old MANIFEST.bak ^blib .tar.gz$ pm_to_blib HTTP-OAI-3.27/MANIFEST0000644000076400007640000000215411357310376012253 0ustar tdb2tdb2bin/oai_browser.pl bin/oai_static_gateway.pl CHANGES LICENSE examples/badbytes.xml examples/getrecord.xml examples/identify.xml examples/mets.xml examples/repository.xml lib/HTTP/OAI.pm lib/HTTP/OAI/Debug.pm lib/HTTP/OAI/Encapsulation.pm lib/HTTP/OAI/Error.pm lib/HTTP/OAI/GetRecord.pm lib/HTTP/OAI/Harvester.pm lib/HTTP/OAI/Header.pm lib/HTTP/OAI/Headers.pm lib/HTTP/OAI/Identify.pm lib/HTTP/OAI/ListIdentifiers.pm lib/HTTP/OAI/ListMetadataFormats.pm lib/HTTP/OAI/ListRecords.pm lib/HTTP/OAI/ListSets.pm lib/HTTP/OAI/Metadata.pm lib/HTTP/OAI/Metadata/METS.pm lib/HTTP/OAI/Metadata/OAI_DC.pm lib/HTTP/OAI/Metadata/OAI_Eprints.pm lib/HTTP/OAI/Metadata/OAI_Identifier.pm lib/HTTP/OAI/MetadataFormat.pm lib/HTTP/OAI/PartialList.pm lib/HTTP/OAI/Record.pm lib/HTTP/OAI/Repository.pm lib/HTTP/OAI/Response.pm lib/HTTP/OAI/ResumptionToken.pm lib/HTTP/OAI/SAXHandler.pm lib/HTTP/OAI/Set.pm lib/HTTP/OAI/UserAgent.pm Makefile.PL MANIFEST MANIFEST.SKIP META.yml README t/000xml_sax.t t/00static.t t/01parse.t t/02token.t t/03badbytes.t t/50mets.t t/80network.t t/error.t t/getrecord.t t/identify.t t/listidentifiers.t t/listmetadataformats.t HTTP-OAI-3.27/t/0000755000076400007640000000000011616526067011367 5ustar tdb2tdb2HTTP-OAI-3.27/t/01parse.t0000644000076400007640000000146311446624007013025 0ustar tdb2tdb2use Test::More tests => 5; use IO::File; use HTTP::OAI; use HTTP::OAI::Metadata::OAI_DC; ok(1); my $fh; my $r = HTTP::OAI::GetRecord->new(handlers=>{ metadata=>'HTTP::OAI::Metadata::OAI_DC' }); $fh = IO::File->new('examples/getrecord.xml','r') or BAIL_OUT( "Failed to open examples/getrecord.xml: $!" ); $r->parse_file($fh); $fh->close(); my $rec = $r->next; ok($rec); ok($rec->metadata->dc->{creator}->[0] eq 'Aspinwall, Paul S.'); my $dom = $rec->metadata->dom; my $md = HTTP::OAI::Metadata::OAI_DC->new; $md->metadata( $dom ); ok($md->dc->{creator}->[0] eq 'Aspinwall, Paul S.'); $r = HTTP::OAI::Identify->new(); $fh = IO::File->new('examples/identify.xml','r') or BAIL_OUT( "Failed to open examples/identify.xml: $!" ); $r->parse_file($fh); $fh->close(); ok($r->repositoryName eq 'citebase.eprints.org'); HTTP-OAI-3.27/t/80network.t0000644000076400007640000000246211616525361013415 0ustar tdb2tdb2#!/usr/bin/perl -w use Test::More tests => 6; use strict; use warnings; use HTTP::OAI; my @repos = qw( http://eprints.ecs.soton.ac.uk/cgi/oai2 http://www.citebase.org/oai2 http://memory.loc.gov/cgi-bin/oai2_0 ); @repos = qw( http://eprints.ecs.soton.ac.uk/cgi/oai2 ); my $h = HTTP::OAI::Harvester->new(baseURL=>$repos[int(rand(@repos))]); my $r; my $dotest = defined($ENV{"HTTP_OAI_NETTESTS"}); SKIP : { skip "Skipping flakey net tests (set HTTP_OAI_NETTESTS env. variable to enable)", 6 unless $dotest; #$r = $h->GetRecord(identifier=>'oai:eprints.ecs.soton.ac.uk:23',metadataPrefix=>'oai_dc'); #ok($r->is_success()); $r = $h->Identify(); ok($r->is_success(), "Identify: ".$r->message); $r = $h->ListIdentifiers(metadataPrefix=>'oai_dc'); ok($r->is_success(), "ListIdentifiers: ".$r->message); $r = $h->ListMetadataFormats(); ok($r->is_success(), "ListMetadataFormats: ".$r->message); $r = $h->ListRecords(metadataPrefix=>'oai_dc'); ok($r->is_success(), "ListRecords: ".$r->message); $r = $h->ListSets(); ok($r->is_success(), "ListSets: ".$r->message); $r = $h->ListIdentifiers(metadataPrefix => 'oai_dc'); my $ok = 0; while(1) { last if $r->is_error; my $uri = $r->request->uri; my $rec = $r->next; $ok = 1, last if $uri ne $r->request->uri; } ok($ok, "Auto-resumption RT #69337"); } HTTP-OAI-3.27/t/listmetadataformats.t0000644000076400007640000000051110640714227015612 0ustar tdb2tdb2print "1..1\n"; use strict; use HTTP::OAI; my $r = new HTTP::OAI::ListMetadataFormats(); my $mf = new HTTP::OAI::MetadataFormat( metadataPrefix=>'oai_dc', schema=>'http://www.openarchives.org/OAI/2.0/oai_dc.xsd', metadataNamespace=>'http://www.openarchives.org/OAI/2.0/oai_dc/', ); $r->metadataFormat($mf); print "ok 1\n"; HTTP-OAI-3.27/t/identify.t0000644000076400007640000000142110640714227013356 0ustar tdb2tdb2use Test; BEGIN { plan tests => 8; } use warnings; use strict; use HTTP::OAI; ok(1); my $r = new HTTP::OAI::Identify( baseURL=>'http://citebase.eprints.org/cgi-bin/oai2', adminEmail=>'tdb01r@ecs.soton.ac.uk', repositoryName=>'oai:citebase.eprints.org', granularity=>'YYYY-MM-DD', deletedRecord=>'transient', ); ok($r->baseURL,'http://citebase.eprints.org/cgi-bin/oai2'); ok($r->adminEmail,'tdb01r@ecs.soton.ac.uk'); ok($r->repositoryName,'oai:citebase.eprints.org'); ok($r->granularity,'YYYY-MM-DD'); ok($r->deletedRecord,'transient'); $r = HTTP::OAI::Identify->new(); open my $fh, "; close $fh; $r->parse_string($xml); ok($r->adminEmail,'mailto:tdb01r@ecs.soton.ac.uk'); my $xml_out = $r->toDOM->toString; ok($xml_out); HTTP-OAI-3.27/t/02token.t0000644000076400007640000000035010640714227013025 0ustar tdb2tdb2use Test::More tests => 3; use strict; use warnings; use HTTP::OAI; use_ok( 'HTTP::OAI::ResumptionToken' ); my $rt = HTTP::OAI::ResumptionToken->new; $rt->resumptionToken(''); ok(!$rt); $rt->resumptionToken('token'); ok($rt); HTTP-OAI-3.27/t/getrecord.t0000644000076400007640000001145510640714227013531 0ustar tdb2tdb2use Test::More tests => 8; use_ok( 'HTTP::OAI' ); use_ok( 'HTTP::OAI::Metadata::OAI_DC' ); use XML::LibXML; my $expected = < 0000-00-00T00:00:00Zhttp://localhost/path/script
oai:arXiv.org:acc-phys/94110012004-06-22T17:51:18Za:aa:b
Symplectic Computation of Lyapunov Exponents Habib, Salman Ryne, Robert D. Accelerator Physics A recently developed method for the calculation of Lyapunov exponents of dynamical systems is described. The method is applicable whenever the linearized dynamics is Hamiltonian. By utilizing the exponential representation of symplectic matrices, this approach avoids the renormalization and reorthogonalization procedures necessary in usual techniques. It is also easily extendible to damped systems. The method is illustrated by considering two examples of physical interest: a model system that describes the beam halo in charged particle beams and the driven van der Pol oscillator. Comment: 12 pages, uuencoded PostScript (figures included) 1994-10-31 text http://arXiv.org/abs/acc-phys/9411001
EOF my $r = new HTTP::OAI::GetRecord( requestURL=>'http://localhost/path/script', responseDate=>'0000-00-00T00:00:00Z' ); my $rec = new HTTP::OAI::Record(); my $str_header = <
oai:arXiv.org:acc-phys/9411001 2004-06-22T17:51:18Z a:a a:b
EOF $rec->header->dom(XML::LibXML->new()->parse_string($str_header)); ok($rec->identifier eq 'oai:arXiv.org:acc-phys/9411001', 'header/identifier'); ok($rec->datestamp eq '2004-06-22T17:51:18Z', 'header/datestamp'); ok($rec->status eq 'deleted', 'header/status'); my @sets = $rec->header->setSpec; ok($sets[0] eq 'a:a', 'header/setSpec'); my $str = < Symplectic Computation of Lyapunov Exponents Habib, Salman Ryne, Robert D. Accelerator Physics A recently developed method for the calculation of Lyapunov exponents of dynamical systems is described. The method is applicable whenever the linearized dynamics is Hamiltonian. By utilizing the exponential representation of symplectic matrices, this approach avoids the renormalization and reorthogonalization procedures necessary in usual techniques. It is also easily extendible to damped systems. The method is illustrated by considering two examples of physical interest: a model system that describes the beam halo in charged particle beams and the driven van der Pol oscillator. Comment: 12 pages, uuencoded PostScript (figures included) 1994-10-31 text http://arXiv.org/abs/acc-phys/9411001 EOF $rec->metadata(new HTTP::OAI::Metadata()); my $parser = XML::LibXML::SAX::Parser->new(Handler=>$rec->metadata); $parser->parse_string($str); $r->record($rec); #warn $r->toDOM->toString; { # hopefully if we can re-parse our own output we're ok, because we can't # compare against the ever changing XML output my $str = $r->toDOM->toString; my $_r = HTTP::OAI::GetRecord->new(handlers=>{ metadata=>'HTTP::OAI::Metadata::OAI_DC' }); $_r->parse_string($str); is($_r->record->metadata->dc->{creator}->[1], 'Ryne, Robert D.', 'toDOM'); } SKIP: { eval { require XML::SAX::Writer }; skip "XML::SAX::Writer not installed", 1 if $@; my $output; my $w = XML::SAX::Writer->new(Output=>\$output); $r->set_handler($w); $r->generate; # SAX::Writer behaves differently :-( # ok($output eq $expected, 'XML::SAX::Writer'); ok(1); } HTTP-OAI-3.27/t/50mets.t0000644000076400007640000000101210655641046012660 0ustar tdb2tdb2use Test::More tests => 3; use IO::File; use HTTP::OAI; use HTTP::OAI::Metadata::METS; ok(1); my $fh; my $r = HTTP::OAI::GetRecord->new(handlers=>{ metadata=>'HTTP::OAI::Metadata::METS' }); $fh = IO::File->new('examples/mets.xml','r') or die "Unable to open examples/mets.xml: $!"; $r->parse_file($fh); $fh->close(); my $rec = $r->record; my @files = $rec->metadata->files; is(scalar(@files), 4, 'file_count'); is($files[1]->{ url }, "http://dspace.mit.edu/bitstream/1721.1/8338/2/50500372-MIT.pdf", 'file_url'); HTTP-OAI-3.27/t/error.t0000644000076400007640000000304410640714227012677 0ustar tdb2tdb2use Test::More tests => 5; use strict; use warnings; use_ok( 'HTTP::OAI' ); my $expected = < 0000-00-00T00:00:00Zhttp://localhost/path/scriptYou didn't supply a verb argument EOF my $r = HTTP::OAI::Response->new( requestURL=>'http://localhost/path/script?', responseDate=>'0000-00-00T00:00:00Z', ); $r->errors(HTTP::OAI::Error->new(code=>'badVerb',message=>'You didn\'t supply a verb argument')); is($r->toDOM->toString, $expected, 'badVerb'); $r = HTTP::OAI::Identify->new(); $r->parse_string("\n"); is($r->code, 600, 'Chunk xml'); $r = HTTP::OAI::Identify->new(); $r->parse_string($expected); ok($r->is_error, 'Parse_string'); my $err_noid = < 0000-00-00T00:00:00Zhttp://localhost/path/script?Requested identifier does not exist EOF $r = HTTP::OAI::GetRecord->new(); $r->parse_string($err_noid); ok($r->is_error); HTTP-OAI-3.27/t/00static.t0000644000076400007640000000365611220371271013177 0ustar tdb2tdb2use Test::More tests => 20; use strict; use HTTP::OAI; ok(1); # This test harness checks that the library correctly supports # transparent gateway to static repositories my $fn = "file:".$ENV{PWD}."/examples/repository.xml"; my $repo = HTTP::OAI::Harvester->new(baseURL=>$fn); ok($repo); # Identify my $id = $repo->Identify; if( !$id->is_success ) { BAIL_OUT( "Error parsing static repository: " . $id->message ); } ok($id->is_success); ok($id->repositoryName && $id->repositoryName eq 'Demo repository'); ok($repo->Identify->version eq '2.0s'); # Removed this test, as paths screw up too much #ok($repo->Identify->baseURL && $repo->Identify->baseURL eq 'file:///examples/repository.xml'); # ListMetadataFormats my $lmdf = $repo->ListMetadataFormats; ok($lmdf->is_success); ok(my $mdf = $lmdf->next); ok($mdf && $mdf->metadataPrefix && $mdf->metadataPrefix eq 'oai_dc'); # ListRecords my $lr = $repo->ListRecords(metadataPrefix=>'oai_rfc1807'); ok($lr->is_success); my $rec = $lr->next; ok($rec && $rec->identifier && $rec->identifier eq 'oai:arXiv:cs/0112017'); # ListIdentifiers my $li = $repo->ListIdentifiers(metadataPrefix=>'oai_dc'); ok($li->is_success); my @recs = $li->identifier; ok(@recs && $recs[-1]->identifier eq 'oai:perseus:Perseus:text:1999.02.0084'); # ListSets my $ls = $repo->ListSets(); ok($ls->is_success); my @errs = $ls->errors; ok(@errs && $errs[-1]->code eq 'noSetHierarchy'); # GetRecord my $gr = $repo->GetRecord(metadataPrefix=>'oai_dc',identifier=>'oai:perseus:Perseus:text:1999.02.0084'); ok($gr->is_success); $rec = $gr->next; ok($rec && $rec->identifier eq 'oai:perseus:Perseus:text:1999.02.0084'); # Errors $gr = $repo->GetRecord(metadataPrefix=>'oai_dc',identifier=>'invalid'); ok($gr->is_error); @errs = $gr->errors; ok(@errs && $errs[0]->code eq 'idDoesNotExist'); $lr = $repo->ListRecords(metadataPrefix=>'invalid'); ok($lr->is_error); @errs = $lr->errors; ok(@errs && $errs[0]->code eq 'cannotDisseminateFormat'); HTTP-OAI-3.27/t/000xml_sax.t0000644000076400007640000000326711220375503013443 0ustar tdb2tdb2use Test::More tests => 12; BEGIN { use_ok( "XML::SAX" ) } BEGIN { use_ok( "XML::SAX::ParserFactory" ) } BEGIN { use_ok( "XML::SAX::Base" ) } use Data::Dumper; use strict; my %EXPECTED = ( root_name => 0, root_ns => 0, element_name => 0, element_ns => 0, element_local_name => 0, element_prefix => 0, ns_name => 0, ns_prefix => 0, ); { package MyHandler; our @ISA = qw( XML::SAX::Base ); sub start_element { my( $self, $hash ) = @_; # print STDERR Data::Dumper::Dumper( $self, $hash ); if( $hash->{Name} eq "root" ) { $EXPECTED{"root_name"} = 1; if( $hash->{NamespaceURI} eq "NAMESPACE1" ) { $EXPECTED{"root_ns"} = 1; } } if( $hash->{Name} eq "x:element" ) { $EXPECTED{"element_name"} = 1; if( $hash->{LocalName} eq "element" ) { $EXPECTED{"element_local_name"} = 1; } if( $hash->{NamespaceURI} eq "NAMESPACE2" ) { $EXPECTED{"element_ns"} = 1; } if( $hash->{Prefix} eq "x" ) { $EXPECTED{"element_prefix"} = 1; } my $namespace_attr = $hash->{"Attributes"}->{"{http://www.w3.org/2000/xmlns/}x"}; if( defined $namespace_attr ) { if( $namespace_attr->{Name} eq "xmlns:x" ) { $EXPECTED{"ns_name"} = 1; } if( $namespace_attr->{Prefix} eq "xmlns" ) { $EXPECTED{"ns_prefix"} = 1; } if( $namespace_attr->{Value} eq "NAMESPACE2" ) { $EXPECTED{"ns_value"} = 1; } } } } } my $handler = MyHandler->new; my $parser = XML::SAX::ParserFactory->parser( Handler => $handler ); $parser->parse_string( join "", ); foreach my $test (sort keys %EXPECTED) { ok($EXPECTED{$test}, "parsed $test"); } __DATA__ content HTTP-OAI-3.27/t/listidentifiers.t0000644000076400007640000000270111600640511014735 0ustar tdb2tdb2use Test::More tests => 3; use strict; use HTTP::OAI; use URI; my $r = new HTTP::OAI::ListIdentifiers(); my $str = < 2004-10-08T17:11:44Zhttp://eprints.ecs.soton.ac.uk/perl/oai2
oai:eprints.ecs.soton.ac.uk:100092004-10-077374617475733D707562747970653D696E70726F63656564696E677366756C6C746578743D46414C5345
oai:eprints.ecs.soton.ac.uk:100102004-10-087374617475733D707562747970653D61727469636C6566756C6C746578743D46414C5345
EOF chomp($str); $r->parse_string($str); ok(1); my $ha = HTTP::OAI::Harvester->new(baseURL=>'http://domain.invalid/'); $r = $ha->ListRecords(metadataPrefix=>'oai_dc', from=>'2005-01-01'); my $uri = URI->new($r->request->uri); my %args = $uri->query_form; ok($args{metadataPrefix} eq 'oai_dc' && $args{'from'} eq '2005-01-01','Request arguments'); ok(1); HTTP-OAI-3.27/t/03badbytes.t0000644000076400007640000000067711144245602013513 0ustar tdb2tdb2use Test::More tests => 3; use strict; use warnings; use HTTP::OAI; my $ha = HTTP::OAI::Harvester->new( baseURL => 'file:///' ); ok(defined $ha); my $r = "HTTP::OAI::GetRecord"->new( harvestAgent => $ha, resume => $ha->resume, ); $HTTP::OAI::UserAgent::SILENT_BAD_CHARS = 1; $r = $ha->request( HTTP::Request->new( GET => 'file:examples/badbytes.xml' ), undef, # arg undef, # size undef, # previous $r ); ok($r->is_success); ok(1); HTTP-OAI-3.27/examples/0000755000076400007640000000000011616526067012742 5ustar tdb2tdb2HTTP-OAI-3.27/examples/badbytes.xml0000644000076400007640000000405511144246166015260 0ustar tdb2tdb2 2005-02-25T16:37:50Zhttp://citebase.eprints.org/cgi-bin/oai2
oai:arXiv.org:hep-th/00010012004-06-22T19:46:16Z
http://arXiv.org/abs/hep-th/0001001Aspinwall, Paul S.1999-12-312000-01-17textCompactification, Geometry and Duality: N=2 These are notes based on lectures given at TASI99. We review the geometry of  Â the moduli space of N=2 theories in four dimensions from the point of view of  superstring compactification. The cases of a type IIA or type IIB string  compactified on a Calabi-Yau threefold and the heterotic string compactified on K3xT2 are each considered in detail. We pay specific attention to the differences between N=2 theories and N>2 theories. The moduli spaces of vector multiplets and the moduli spaces of hypermultiplets are reviewed. In the case of hypermultiplets this review is limited by the poor state of our current understanding. Some peculiarities such as ``mixed instantons'' and the non-existence of a universal hypermultiplet are discussed. Comment: 82 pages, 8 figures, LaTeX2e, TASI99, refs added and some typos fixed
HTTP-OAI-3.27/examples/repository.xml0000644000076400007640000001321210640714230015665 0ustar tdb2tdb2 Demo repository file:examples/repository.xml 2.0 jondoe@oai.org 2002-09-19 no YYYY-MM-DD oai_dc http://www.openarchives.org/OAI/2.0/oai_dc.xsd http://www.openarchives.org/OAI/2.0/oai_dc/ oai_rfc1807 http://www.openarchives.org/OAI/1.1/rfc1807.xsd http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1807.txt oai:arXiv:cs/0112017 2001-12-14 Using Structural Metadata to Localize Experience of Digital Content Dushay, Naomi Digital Libraries With the increasing technical sophistication of both information consumers and providers, there is increasing demand for more meaningful experiences of digital information. We present a framework that separates digital object experience, or rendering, from digital object storage and manipulation, so the rendering can be tailored to particular communities of users. Comment: 23 pages including 2 appendices, 8 figures 2001-12-14 oai:perseus:Perseus:text:1999.02.0084 2002-05-01 Germany and its Tribes Tacitus text Complete Works of Tacitus. Tacitus. Alfred John Church. William Jackson Brodribb. Lisa Cerrato. edited for Perseus. New York: Random House, Inc. Random House, Inc. reprinted 1942. http://www.perseus.tufts.edu/cgi-bin/ptext? doc=Perseus:text:1999.02.0083 oai:arXiv:cs/0112017 2001-12-14 v2 cs/0112017 December 23, 2001 Using Structural Metadata to Localize Experience of Digital Content Naomi Dushay December 14, 2001 Los Alamos arXiv Metadata may be used without restrictions as long as the oai identifier remains attached to it. HTTP-OAI-3.27/examples/identify.xml0000644000076400007640000000374310640714230015271 0ustar tdb2tdb2 2005-02-25T16:36:30Zhttp://citebase.eprints.org/cgi-bin/oai2citebase.eprints.orghttp://citebase.eprints.org/cgi-bin/oai22.0mailto:tdb01r@ecs.soton.ac.uk0001-01-01transientYYYY-MM-DDoaicitebase.eprints.org:oai:arXiv.org:hep-th/0001001http://citebase.eprints.org/help/Identify/policy.phpNo commercial harvesting without prior permission.http://citebase.eprints.org/help/Identify/policy.phpNo commercial harvesting without prior permission.Repositories of scholarly literature, preferably peer-reviewed with journal reference. HTTP-OAI-3.27/examples/mets.xml0000644000076400007640000002337510655641533014444 0ustar tdb2tdb2 2007-06-01T15:44:53Z http://celestial.eprints.org/oai/DSpace%20at%20MIT
oai:dspace.mit.edu:1721.1/8338 2006-10-23T15:52:20Z hdl_1721.1_7680 hdl_1721.1_7843
DSpace at MIT advisorDaniel J. Kleitman. authorYang, Xiaochun, 1971- otherMassachusetts Institute of Technology. Dept. of Mathematics. 2005-08-23T19:18:06Z 2005-08-23T19:18:06Z 2002 2002 http://hdl.handle.net/1721.1/8338 Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mathematics, 2002. Includes bibliographical references (p. 89-91). Geometry is the synthetic tool we use to unify all existing analytical cone-beam reconstruction methods. These reconstructions are based on formulae derived by Tuy [Tuy, 1983], Smith [Smith, 1985] and Grangeat [Grangeat, 1991] which explicitly link the cone-beam data to some intermediate functions in the Radon transform domain. However, the essential step towards final reconstruction, that is, differential-backprojection, has not yet achieved desired efficiency. A new inversion formula is obtained directly from the 3D Radon inverse [Radon, 1917, Helgason, 1999]. It incorporates the cone-beam scanning geometry and allows the theoretical work mentioned above to be reduced to exact and frugal implementations. Extensions can be easily carried out to 2D fan-beam reconstruction as well as other scanning modalities such as parallel scans by allowing more abstract geometric description on the embedding subspace of the Radon manifold. The new approach provides a canonical inverse procedure for computerized tomography in general, with applications ranging from diagnostic medical imaging to industrial testing, such as X-ray CT, Emission CT, Ultrasound CT, etc. It also suggests a principled frame for approaching other 3D reconstruction problems related to the Radon transform. The idea is simple: as was spelled out by Helgason on the opening page of his book, The Radon Transform [Helgason, 1999] - a remarkable duality characterizes the Radon transform and its inverse. Our study shows that the dual space, the so-called Radon space, can be geometrically decomposed according to the specified scanning modality. (cont.) In cone-beam X-ray reconstruction, for example, each cone-beam projection is seen as a 2D projective subspace embedded in the Radon manifold. Besides the duality in the space relation, the symbiosis played between algebra and geometry, integration and differentiation is another striking feature in the tomographic reconstruction. Simply put, * Geometry and algebra: the two play fundamentally different roles during the inverse. Algebraic transforms carry cone-beam data into the Radon domain, whereas, the geometric decomposition of the dual space determines how the differential-backprojection operator should be systematically performed. The reason that different algorithms in cone-beam X-ray reconstruction share structural similarity is that the dual space decomposition is intrinsic to the specified scanning geometry. The differences in the algorithms lie in the appearance of algebra on the projection submanifold. The algebraic transforms initiate diverse reconstruction methods varying in terms of computational cost and stability. Equipped with this viewpoint, we are able to simplify mathematical analysis and develop algorithms that are easy to implement. Integration and differentiation: forward projection is the integral along straight lines (or planes) in the Euclidean space. During the reconstruction, differentiation is performed over the parallel planes in the projective Radon space, a manifold with clear differential structure. It is important to learn about this differential structure to ensure that correct differentiation can be carried out with respect to the parameters governing the scanning process during the reconstruction ... Made available in DSpace on 2005-08-23T19:18:06Z (GMT). No. of bitstreams: 2 50500372.pdf: 5942218 bytes, checksum: b1eba8f820c75fe04e1de950af6c3548 (MD5) 50500372-MIT.pdf: 5941980 bytes, checksum: ed337b0618cf18bd701502cc9e79e823 (MD5) Previous issue date: 2002 by Xiaochun Yang. 91 p. 5942218 bytes 5941980 bytes application/pdf application/pdf eng Massachusetts Institute of Technology M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See <a href="https://dspace.mit.edu/handle/1721.1/7582">https://dspace.mit.edu/handle/1721.1/7582</a> for inquiries about permission. Mathematics. Geometry of cone-beam reconstruction Thesis
http://dspace.mit.edu/dspace-oai/request oai:dspace.mit.edu:1721.1/8338 2006-10-23T15:52:20Z http://www.loc.gov/METS/ HTTP-OAI-3.27/examples/getrecord.xml0000644000076400007640000000402510640714230015426 0ustar tdb2tdb2 2005-02-25T16:37:50Zhttp://citebase.eprints.org/cgi-bin/oai2
oai:arXiv.org:hep-th/00010012004-06-22T19:46:16Z
http://arXiv.org/abs/hep-th/0001001Aspinwall, Paul S.1999-12-312000-01-17textCompactification, Geometry and Duality: N=2 These are notes based on lectures given at TASI99. We review the geometry of the moduli space of N=2 theories in four dimensions from the point of view of superstring compactification. The cases of a type IIA or type IIB string compactified on a Calabi-Yau threefold and the heterotic string compactified on K3xT2 are each considered in detail. We pay specific attention to the differences between N=2 theories and N>2 theories. The moduli spaces of vector multiplets and the moduli spaces of hypermultiplets are reviewed. In the case of hypermultiplets this review is limited by the poor state of our current understanding. Some peculiarities such as ``mixed instantons'' and the non-existence of a universal hypermultiplet are discussed. Comment: 82 pages, 8 figures, LaTeX2e, TASI99, refs added and some typos fixed
HTTP-OAI-3.27/META.yml0000664000076400007640000000135411616526067012402 0ustar tdb2tdb2--- #YAML:1.0 name: HTTP-OAI version: 3.27 abstract: ~ author: [] license: unknown distribution_type: module configure_requires: ExtUtils::MakeMaker: 0 build_requires: ExtUtils::MakeMaker: 0 requires: CGI: 0 Encode: 2.12 HTTP::Request: 0 HTTP::Response: 0 LWP::UserAgent: 0 Test::More: 0 URI: 0 XML::LibXML: 1.6 XML::LibXML::SAX: 0 XML::SAX: 0 XML::SAX::Base: 1.04 no_index: directory: - t - inc generated_by: ExtUtils::MakeMaker version 6.55_02 meta-spec: url: http://module-build.sourceforge.net/META-spec-v1.4.html version: 1.4 HTTP-OAI-3.27/LICENSE0000644000076400007640000000276511357310366012136 0ustar tdb2tdb2Copyright (c) 2010, Timothy D Brody All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of University of Southampton nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. HTTP-OAI-3.27/Makefile.PL0000644000076400007640000000123711616525316013075 0ustar tdb2tdb2require 5.005; use ExtUtils::MakeMaker; # n.b. I put 0.1 as version requirements as I have # very recent versions, whereas these modules will # most likely work with older versions of the req. # modules. TDB WriteMakefile( NAME => 'HTTP-OAI', VERSION_FROM => 'lib/HTTP/OAI.pm', EXE_FILES => [ qw( bin/oai_browser.pl ) ], PREREQ_PM => { 'Encode' => 2.12, 'XML::LibXML' => 1.60, 'XML::LibXML::SAX' => 0, 'XML::SAX::Base' => 1.04, 'XML::SAX' => 0, 'URI' => 0, 'HTTP::Request' => 0, 'HTTP::Response' => 0, 'LWP::UserAgent' => 0, 'Test::More' => 0, 'CGI' => 0, }, dist => { COMPRESS => 'gzip -9f', SUFFIX => 'gz', }, ); HTTP-OAI-3.27/lib/0000755000076400007640000000000011616526067011672 5ustar tdb2tdb2HTTP-OAI-3.27/lib/HTTP/0000755000076400007640000000000011616526067012451 5ustar tdb2tdb2HTTP-OAI-3.27/lib/HTTP/OAI/0000755000076400007640000000000011616526067013061 5ustar tdb2tdb2HTTP-OAI-3.27/lib/HTTP/OAI/Encapsulation.pm0000644000076400007640000000457611600640154016223 0ustar tdb2tdb2package HTTP::OAI::Encapsulation; use strict; use warnings; use HTTP::OAI::SAXHandler qw( :SAX ); use vars qw(@ISA); @ISA = qw(XML::SAX::Base); sub new { my $class = shift; my %args = @_ > 1 ? @_ : (dom => shift); my $self = bless {}, ref($class) || $class; $self->version($args{version}); $self->dom($args{dom}); $self; } sub dom { shift->_elem('dom',@_) } # Pseudo HTTP::Response sub code { 200 } sub message { 'OK' } sub is_info { 0 } sub is_success { 1 } sub is_redirect { 0 } sub is_error { 0 } sub version { shift->_elem('version',@_) } sub _elem { my $self = shift; my $name = shift; return @_ ? $self->{_elem}->{$name} = shift : $self->{_elem}->{$name}; } sub _attr { my $self = shift; my $name = shift or return $self->{_attr}; return $self->{_attr}->{$name} unless @_; if( defined(my $value = shift) ) { return $self->{_attr}->{$name} = $value; } else { delete $self->{_attr}->{$name}; return undef; } } package HTTP::OAI::Encapsulation::DOM; use strict; use warnings; use XML::LibXML qw( :all ); use vars qw(@ISA); @ISA = qw(HTTP::OAI::Encapsulation); sub toString { defined($_[0]->dom) ? $_[0]->dom->toString : undef } sub generate { my $self = shift; unless( $self->dom ) { Carp::confess("Can't generate() without a dom."); } unless( $self->dom->nodeType == XML_DOCUMENT_NODE ) { Carp::confess( "Can only generate() from a DOM of type XML_DOCUMENT_NODE" ); } return unless defined($self->get_handler); my $driver = XML::LibXML::SAX::Parser->new( Handler=>HTTP::OAI::FilterDOMFragment->new( Handler=>$self->get_handler )); $driver->generate($self->dom); } sub start_document { my ($self) = @_; HTTP::OAI::Debug::sax( ref($self) ); my $builder = XML::LibXML::SAX::Builder->new() or die "Unable to create XML::LibXML::SAX::Builder: $!"; $self->{OLDHandler} = $self->get_handler(); $self->set_handler($builder); $self->SUPER::start_document(); $self->SUPER::xml_decl({'Version'=>'1.0','Encoding'=>'UTF-8'}); } sub end_document { my ($self) = @_; $self->SUPER::end_document(); $self->dom($self->get_handler->result()); $self->set_handler($self->{OLDHandler}); HTTP::OAI::Debug::sax( ref($self) . " <" . $self->dom->documentElement->nodeName . " />" ); } 1; __END__ =head1 NAME HTTP::OAI::Encapsulation - Base class for data objects that contain DOM trees =head1 DESCRIPTION This class shouldn't be used directly, use L. =cut HTTP-OAI-3.27/lib/HTTP/OAI/Metadata.pm0000644000076400007640000000121510640714230015121 0ustar tdb2tdb2package HTTP::OAI::Metadata; use vars qw(@ISA); @ISA = qw(HTTP::OAI::Encapsulation::DOM); 1; __END__ =head1 NAME HTTP::OAI::Metadata - Base class for data objects that contain DOM trees =head1 SYNOPSIS use HTTP::OAI::Metadata; $xml = XML::LibXML::Document->new(); $xml = XML::LibXML->new->parse( ... ); $md = new HTTP::OAI::Metadata(dom=>$xml); print $md->dom->toString; my $dom = $md->dom(); # Return internal DOM tree =head1 METHODS =over 4 =item $md->dom( [$dom] ) Return and optionally set the XML DOM object that contains the actual metadata. If you intend to use the generate() method $dom must be a XML_DOCUMENT_NODE. =back HTTP-OAI-3.27/lib/HTTP/OAI/Metadata/0000755000076400007640000000000011616526067014601 5ustar tdb2tdb2HTTP-OAI-3.27/lib/HTTP/OAI/Metadata/METS.pm0000644000076400007640000000201210640714227015673 0ustar tdb2tdb2package HTTP::OAI::Metadata::METS; use strict; use warnings; use HTTP::OAI::Metadata; use vars qw(@ISA); @ISA = qw(HTTP::OAI::Metadata); use XML::LibXML; use XML::LibXML::XPathContext; sub new { my $class = shift; my $self = $class->SUPER::new(@_); my %args = @_; $self; } sub _xc { my $xc = XML::LibXML::XPathContext->new( @_ ); $xc->registerNs( 'mets', 'http://www.loc.gov/METS/' ); $xc->registerNs( 'xlink', 'http://www.w3.org/1999/xlink' ); return $xc; } sub files { my $self = shift; my $dom = $self->dom; my $xc = _xc($dom); my @files; foreach my $file ($xc->findnodes( '//mets:file' )) { my $f = {}; foreach my $attr ($file->attributes) { $f->{ $attr->nodeName } = $attr->nodeValue; } $file = _xc($file); foreach my $locat ($file->findnodes( 'mets:FLocat' )) { $f->{ url } = $locat->getAttribute( 'xlink:href' ); } push @files, $f; } return @files; } 1; __END__ =head1 NAME HTTP::OAI::Metadata::METS - METS accessor utility =head1 DESCRIPTION =head1 SYNOPSIS =head1 NOTE HTTP-OAI-3.27/lib/HTTP/OAI/Metadata/OAI_Identifier.pm0000644000076400007640000000176010640714227017706 0ustar tdb2tdb2package HTTP::OAI::Metadata::OAI_Identifier; use strict; use warnings; use Carp; use XML::LibXML; use HTTP::OAI::Metadata; use vars qw( @ISA ); @ISA = qw( HTTP::OAI::Metadata ); sub new { my $self = shift->SUPER::new(@_); my %args = @_; my $dom = XML::LibXML->createDocument(); $dom->setDocumentElement(my $root = $dom->createElementNS('http://www.openarchives.org/OAI/2.0/oai-identifier','oai-identifier')); # $root->setAttribute('xmlns','http://www.openarchives.org/OAI/2.0/oai-identifier'); $root->setAttribute('xmlns:xsi','http://www.w3.org/2001/XMLSchema-instance'); $root->setAttribute('xsi:schemaLocation','http://www.openarchives.org/OAI/2.0/oai-identifier http://www.openarchives.org/OAI/2.0/oai-identifier.xsd'); for(qw( scheme repositoryIdentifier delimiter sampleIdentifier )) { Carp::croak "Required argument $_ is undefined" unless defined($args{$_}); $root->appendChild($dom->createElement($_))->appendChild($dom->createTextNode($args{$_})); } $self->dom($dom); $self; } 1; HTTP-OAI-3.27/lib/HTTP/OAI/Metadata/OAI_Eprints.pm0000644000076400007640000000273610640714227017254 0ustar tdb2tdb2package HTTP::OAI::Metadata::OAI_Eprints; use strict; use warnings; use Carp; use XML::LibXML; use HTTP::OAI::Metadata; use vars qw( @ISA ); @ISA = qw( HTTP::OAI::Metadata ); sub new { my $self = shift->SUPER::new(@_); my %args = @_; my $dom = XML::LibXML->createDocument(); $dom->setDocumentElement(my $root = $dom->createElementNS('http://www.openarchives.org/OAI/1.1/eprints','eprints')); # $root->setAttribute('xmlns','http://www.openarchives.org/OAI/2.0/oai-identifier'); $root->setAttribute('xmlns:xsi','http://www.w3.org/2001/XMLSchema-instance'); $root->setAttribute('xsi:schemaLocation','http://www.openarchives.org/OAI/1.1/eprints http://www.openarchives.org/OAI/1.1/eprints.xsd'); for(qw( content metadataPolicy dataPolicy submissionPolicy )) { Carp::croak "Required argument $_ undefined" if !defined($args{$_}) && $_ =~ /metadataPolicy|dataPolicy/; next unless defined($args{$_}); my $node = $root->appendChild($dom->createElement($_)); $args{$_}->{'URL'} ||= []; $args{$_}->{'text'} ||= []; foreach my $value (@{$args{$_}->{'URL'}}) { $node->appendChild($dom->createElement('URL'))->appendChild($dom->createTextNode($value)); } foreach my $value (@{$args{$_}->{'text'}}) { $node->appendChild($dom->createElement('text'))->appendChild($dom->createTextNode($value)); } } $args{'comment'} ||= []; for(@{$args{'comment'}}) { $root->appendChild($dom->createElement('comment'))->appendChild($dom->createTextNode($_)); } $self->dom($dom); $self; } 1; HTTP-OAI-3.27/lib/HTTP/OAI/Metadata/OAI_DC.pm0000644000076400007640000000653411446623552016123 0ustar tdb2tdb2package HTTP::OAI::Metadata::OAI_DC; use XML::LibXML; use HTTP::OAI::Metadata; @ISA = qw(HTTP::OAI::Metadata); use strict; our $OAI_DC_SCHEMA = 'http://www.openarchives.org/OAI/2.0/oai_dc/'; our $DC_SCHEMA = 'http://purl.org/dc/elements/1.1/'; our @DC_TERMS = qw( contributor coverage creator date description format identifier language publisher relation rights source subject title type ); sub new { my( $class, %self ) = @_; my $self = $class->SUPER::new( %self ); if( exists $self{dc} && ref($self{dc}) eq 'HASH' ) { my ($dom,$dc) =_oai_dc_dom(); foreach my $term (@DC_TERMS) { foreach my $value (@{$self{dc}->{$term}||[]}) { $dc->appendChild($dom->createElementNS($DC_SCHEMA, $term))->appendText( $value ); } } $self->dom($dom); } $self; } sub dc { my( $self ) = @_; my $dom = $self->dom; my $metadata = $dom->documentElement; return $self->{dc} if defined $self->{dc}; my %dc = map { $_ => [] } @DC_TERMS; $self->_dc( $metadata, \%dc ); return \%dc; } sub _dc { my( $self, $node, $dc ) = @_; my $ns = $node->getNamespaceURI; $ns =~ s/\/?$/\//; if( $ns eq $DC_SCHEMA ) { push @{$dc->{lc($node->localName)}}, $node->textContent; } elsif( $node->hasChildNodes ) { for($node->childNodes) { next if $_->nodeType != XML_ELEMENT_NODE; $self->_dc( $_, $dc ); } } } sub _oai_dc_dom { my $dom = XML::LibXML->createDocument(); $dom->setDocumentElement(my $dc = $dom->createElement('oai_dc:dc')); $dc->setAttribute('xmlns:oai_dc','http://www.openarchives.org/OAI/2.0/oai_dc/'); $dc->setAttribute('xmlns:dc','http://purl.org/dc/elements/1.1/'); $dc->setAttribute('xmlns:xsi','http://www.w3.org/2001/XMLSchema-instance'); $dc->setAttribute('xsi:schemaLocation','http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd'); return ($dom,$dc); } sub metadata { my( $self, $md ) = @_; return $self->dom if @_ == 1; delete $self->{dc}; $self->dom( $md ); return if !defined $md; my $dc = $self->dc; my ($dom,$metadata) = _oai_dc_dom(); foreach my $term (@DC_TERMS) { foreach my $value (@{$dc->{$term}}) { $metadata->appendChild( $dom->createElementNS( $DC_SCHEMA, $term ) )->appendText( $value ); } } $self->dom($dom) } sub toString { my $self = shift; my $str = "Open Archives Initiative Dublin Core (".ref($self).")\n"; foreach my $term ( @DC_TERMS ) { for(@{$self->{dc}->{$term}}) { $str .= sprintf("%s:\t%s\n", $term, $_||''); } } $str; } sub end_element { my ($self,$hash) = @_; my $elem = lc($hash->{LocalName}); if( exists($self->{dc}->{$elem}) ) { push @{$self->{dc}->{$elem}}, $hash->{Text}; } $self->SUPER::end_element($hash); } sub end_document { my $self = shift; $self->SUPER::end_document(); $self->metadata($self->dom); } 1; __END__ =head1 NAME HTTP::OAI::Metadata::OAI_DC - Easy access to OAI Dublin Core =head1 DESCRIPTION HTTP::OAI::Metadata::OAI_DC provides a simple interface to parsing and generating OAI Dublin Core ("oai_dc"). =head1 SYNOPSIS use HTTP::OAI::Metadata::OAI_DC; my $md = new HTTP::OAI::Metadata( dc=>{title=>['Hello, World!','Hi, World!']}, ); # Prints "Hello, World!" print $md->dc->{title}->[0], "\n"; my $xml = $md->metadata(); $md->metadata($xml); =head1 NOTE HTTP::OAI::Metadata::OAI_DC will automatically (and silently) convert OAI version 1.x oai_dc records into OAI version 2.0 oai_dc records. HTTP-OAI-3.27/lib/HTTP/OAI/Repository.pm0000644000076400007640000001671611437454434015610 0ustar tdb2tdb2package HTTP::OAI::Repository; use strict; use warnings; use vars qw(@ISA @EXPORT @EXPORT_OK %EXPORT_TAGS); require Exporter; @ISA = qw(Exporter); @EXPORT = qw(); @EXPORT_OK = qw( &validate_request &validate_request_1_1 &validate_date &validate_metadataPrefix &validate_responseDate &validate_setSpec ); %EXPORT_TAGS = (validate=>[qw(&validate_request &validate_date &validate_metadataPrefix &validate_responseDate &validate_setSpec)]); use HTTP::OAI::Error qw(%OAI_ERRORS); # Copied from Simeon Warner's tutorial at # http://library.cern.ch/HEPLW/4/papers/3/OAIServer.pm # (note: corrected grammer for ListSets) # 0 = optional, 1 = required, 2 = exclusive my %grammer = ( 'GetRecord' => { 'identifier' => [1, \&validate_identifier], 'metadataPrefix' => [1, \&validate_metadataPrefix] }, 'Identify' => {}, 'ListIdentifiers' => { 'from' => [0, \&validate_date], 'until' => [0, \&validate_date], 'set' => [0, \&validate_setSpec_2_0], 'metadataPrefix' => [1, \&validate_metadataPrefix], 'resumptionToken' => [2, sub { 0 }] }, 'ListMetadataFormats' => { 'identifier' => [0, \&validate_identifier] }, 'ListRecords' => { 'from' => [0, \&validate_date], 'until' => [0, \&validate_date], 'set' => [0, \&validate_setSpec_2_0], 'metadataPrefix' => [1, \&validate_metadataPrefix], 'resumptionToken' => [2, sub { 0 }] }, 'ListSets' => { 'resumptionToken' => [2, sub { 0 }] } ); sub new { my ($class,%args) = @_; my $self = bless {}, $class; $self; } sub validate_request { validate_request_2_0(@_); } sub validate_request_2_0 { my %params = @_; my $verb = $params{'verb'}; delete $params{'verb'}; my @errors; return (new HTTP::OAI::Error(code=>'badVerb',message=>'No verb supplied')) unless defined $verb; my $grm = $grammer{$verb} or return (new HTTP::OAI::Error(code=>'badVerb',message=>"Unknown verb '$verb'")); if( defined $params{'from'} && defined $params{'until'} ) { if( granularity($params{'from'}) ne granularity($params{'until'}) ) { return (new HTTP::OAI::Error( code=>'badArgument', message=>'Granularity used in from and until must be the same' )); } } # Check exclusivity foreach my $arg (keys %$grm) { my ($type, $valid_func) = @{$grm->{$arg}}; next unless ($type == 2 && defined($params{$arg})); if( my $err = &$valid_func($params{$arg}) ) { return (new HTTP::OAI::Error( code=>'badArgument', message=>("Bad argument ($arg): " . $err) )); } delete $params{$arg}; if( %params ) { for(keys %params) { push @errors, new HTTP::OAI::Error( code=>'badArgument', message=>"'$_' can not be used in conjunction with $arg" ); } return @errors; } else { return (); } } # Check required/optional foreach my $arg (keys %$grm) { my ($type, $valid_func) = @{$grm->{$arg}}; if( $params{$arg} ) { if( my $err = &$valid_func($params{$arg}) ) { return (new HTTP::OAI::Error(code=>'badArgument',message=>"Bad argument ($arg): " . $err)) } } if( $type == 1 && (!defined($params{$arg}) || $params{$arg} eq '') ) { return (new HTTP::OAI::Error(code=>'badArgument',message=>"Required argument '$arg' was undefined")); } delete $params{$arg}; } if( %params ) { for(keys %params) { push @errors, new HTTP::OAI::Error( code=>'badArgument', message=>"'$_' is not a recognised argument for $verb" ); } return @errors; } else { return (); } } sub granularity { my $date = shift; return 'year' if $date =~ /^\d{4}-\d{2}-\d{2}$/; return 'seconds' if $date =~ /^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z$/; } sub validate_date { my $date = shift; return "Date not in OAI format (yyyy-mm-dd or yyyy-mm-ddThh:mm:ssZ)" unless $date =~ /^(\d{4})-(\d{2})-(\d{2})(T\d{2}:\d{2}:\d{2}Z)?$/; my( $y, $m, $d ) = ($1,($2||1),($3||1)); return "Month in date is not in range 1-12" if ($m < 1 || $m > 12); return "Day in date is not in range 1-31" if ($d < 1 || $d > 31); 0; } sub validate_responseDate { return shift =~ /^(\d{4})\-([01][0-9])\-([0-3][0-9])T([0-2][0-9]):([0-5][0-9]):([0-5][0-9])[\+\-]([0-2][0-9]):([0-5][0-9])$/ ? 0 : "responseDate not in OAI format (yyyy-mm-ddThh:mm:dd:ss[+-]hh:mm)"; } sub validate_setSpec { return shift =~ /^([A-Za-z0-9])+(:[A-Za-z0-9]+)*$/ ? 0 : "Set spec not in OAI format, must match ^([A-Za-z0-9])+(:[A-Za-z0-9]+)*\$"; } sub validate_setSpec_2_0 { return shift =~ /^([A-Za-z0-9_!'\$\(\)\+\-\.\*])+(:[A-Za-z0-9_!'\$\(\)\+\-\.\*]+)*$/ ? 0 : "Set spec not in OAI format, must match ([A-Za-z0-9_!'\\\$\(\\)\\+\\-\\.\\*])+(:[A-Za-z0-9_!'\\$\\(\\)\\+\\-\\.\\*]+)*"; } sub validate_metadataPrefix { return shift =~ /^[\w]+$/ ? 0 : "Metadata prefix not in OAI format, must match regexp ^[\\w]+\$"; } # OAI 2 requires identifiers by valid URIs # This doesn't check for invalid chars, merely : sub validate_identifier { return shift =~ /^[[:alpha:]][[:alnum:]\+\-\.]*:.+/ ? 0 : "Identifier not in OAI format, must match regexp ^[[:alpha:]][[:alnum:]\\+\\-\\.]*:.+"; } 1; __END__ =head1 NAME HTTP::OAI::Repository - Documentation for building an OAI compliant repository using OAI-PERL =head1 DESCRIPTION Using the OAI-PERL library in a repository context requires the user to build the OAI responses to be sent to OAI harvesters. =head1 SYNOPSIS 1 use HTTP::OAI::Harvester; use HTTP::OAI::Metadata::OAI_DC; use XML::SAX::Writer; use XML::LibXML; # (all of these options _must_ be supplied to comply with the OAI protocol) # (protocolVersion and responseDate both have sensible defaults) my $r = new HTTP::OAI::Identify( baseURL=>'http://yourhost/cgi/oai', adminEmail=>'youremail@yourhost', repositoryName=>'agoodname', requestURL=>self_url() ); # Include a description (an XML::LibXML Dom object) $r->description(new HTTP::OAI::Metadata(dom=>$dom)); my $r = HTTP::OAI::Record->new( header=>HTTP::OAI::Header->new( identifier=>'oai:myrepo:10', datestamp=>'2004-10-01' ), metadata=>HTTP::OAI::Metadata::OAI_DC->new( dc=>{title=>['Hello, World!'],description=>['My Record']} ) ); $r->about(HTTP::OAI::Metadata->new(dom=>$dom)); my $writer = XML::SAX::Writer->new(); $r->set_handler($writer); $r->generate; =head1 Building an OAI compliant repository The validation scripts included in this module provide the repository admin with a number of tools for helping with being OAI compliant, however they can not be exhaustive in themselves. =head1 METHODS =over 4 =item $r = HTTP::OAI::Repository::validate_request(%paramlist) =item $r = HTTP::OAI::Repository::validate_request_2_0(%paramlist) These functions, exported by the Repository module, validate an OAI request against the protocol requirements. Returns an L object, with the code set to 200 if the request is well-formed, or an error code and the message set. e.g: my $r = validate_request(%paramlist); print header(-status=>$r->code.' '.$r->message), $r->error_as_HTML; Note that validate_request attempts to be as strict to the Protocol as possible. =item $b = HTTP::OAI::Repository::validate_date($date) =item $b = HTTP::OAI::Repository::validate_metadataPrefix($mdp) =item $b = HTTP::OAI::Repository::validate_responseDate($date) =item $b = HTTP::OAI::Repository::validate_setSpec($set) These functions, exported by the Repository module, validate the given type of OAI data. Returns true if the given value is sane, false otherwise. =back =head1 EXAMPLE See the bin/gateway.pl for an example implementation (it's actually for creating a static repository gateway, but you get the idea!). HTTP-OAI-3.27/lib/HTTP/OAI/Identify.pm0000644000076400007640000001367511600640155015172 0ustar tdb2tdb2package HTTP::OAI::Identify; use strict; use warnings; use HTTP::OAI::SAXHandler qw( :SAX ); use vars qw( @ISA ); @ISA = qw( HTTP::OAI::Response ); sub new { my ($class,%args) = @_; delete $args{'harvestAgent'}; # Otherwise we get a memory cycle with $h->repository($id)! for(qw( adminEmail compression description )) { $args{$_} ||= []; } $args{handlers}->{description} ||= "HTTP::OAI::Metadata"; my $self = $class->SUPER::new(%args); $self->verb('Identify') unless $self->verb; $self->baseURL($args{baseURL}) unless $self->baseURL; $self->adminEmail($args{adminEmail}) if !ref($args{adminEmail}) && !$self->adminEmail; $self->protocolVersion($args{protocolVersion} || '2.0') unless $self->protocolVersion; $self->repositoryName($args{repositoryName}) unless $self->repositoryName; $self->earliestDatestamp($args{earliestDatestamp}) unless $self->earliestDatestamp; $self->deletedRecord($args{deletedRecord}) unless $self->deletedRecord; $self->granularity($args{granularity}) unless $self->granularity; $self; } sub adminEmail { my $self = shift; push @{$self->{adminEmail}}, @_; return wantarray ? @{$self->{adminEmail}} : $self->{adminEmail}->[0] } sub baseURL { shift->headers->header('baseURL',@_) } sub compression { my $self = shift; push @{$self->{compression}}, @_; return wantarray ? @{$self->{compression}} : $self->{compression}->[0]; } sub deletedRecord { return shift->headers->header('deletedRecord',@_) } sub description { my $self = shift; push(@{$self->{description}}, @_); return wantarray ? @{$self->{description}} : $self->{description}->[0]; }; sub earliestDatestamp { return shift->headers->header('earliestDatestamp',@_) } sub granularity { return shift->headers->header('granularity',@_) } sub protocolVersion { return shift->headers->header('protocolVersion',@_) }; sub repositoryName { return shift->headers->header('repositoryName',@_) }; sub next { my $self = shift; return shift @{$self->{description}}; } sub generate_body { my ($self) = @_; return unless defined(my $handler = $self->get_handler); g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','repositoryName',{},$self->repositoryName); g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','baseURL',{},"".$self->baseURL); g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','protocolVersion',{},$self->protocolVersion); g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','adminEmail',{},$_) for $self->adminEmail; g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','earliestDatestamp',{},$self->earliestDatestamp||'0001-01-01'); g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','deletedRecord',{},$self->deletedRecord||'no'); g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','granularity',{},$self->granularity) if defined($self->granularity); g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','compression',{},$_) for $self->compression; for($self->description) { g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','description',{},$_); } } sub start_element { my ($self,$hash) = @_; my $elem = lc($hash->{LocalName}); $self->SUPER::start_element($hash); if( $elem eq 'description' && !$self->{"in_$elem"} ) { $self->{OLDHandler} = $self->get_handler(); $self->set_handler(my $handler = $self->{handlers}->{$elem}->new()); $self->description($handler); $self->{"in_$elem"} = $hash->{Depth}; g_start_document($handler); } } sub end_element { my ($self,$hash) = @_; my $elem = $hash->{LocalName}; my $text = $hash->{Text}; if( defined $text ) { $text =~ s/^\s+//; $text =~ s/\s+$//; } if( defined($self->get_handler) ) { if( $elem eq 'description' && $self->{"in_$elem"} == $hash->{Depth} ) { $self->SUPER::end_document(); $self->set_handler($self->{OLDHandler}); $self->{"in_$elem"} = undef; } } elsif( $elem eq 'adminEmail' ) { $self->adminEmail($text); } elsif( $elem eq 'compression' ) { $self->compression($text); } elsif( $elem eq 'baseURL' ) { $self->baseURL($text); } elsif( $elem eq 'protocolVersion' ) { $text = '2.0' if $text =~ /\D/ or $text < 2.0; $self->protocolVersion($text); } elsif( defined($text) && length($text) ) { $self->headers->header($elem,$text); } $self->SUPER::end_element($hash); } 1; __END__ =head1 NAME HTTP::OAI::Identify - Provide access to an OAI Identify response =head1 SYNOPSIS use HTTP::OAI::Identify; my $i = new HTTP::OAI::Identify( adminEmail=>'billg@microsoft.com', baseURL=>'http://www.myarchives.org/oai', repositoryName=>'www.myarchives.org' ); for( $i->adminEmail ) { print $_, "\n"; } =head1 METHODS =over 4 =item $i = new HTTP::OAI::Identify(-baseURL=>'http://arXiv.org/oai1'[, adminEmail=>$email, protocolVersion=>'2.0', repositoryName=>'myarchive']) This constructor method returns a new instance of the OAI::Identify module. =item $i->version Return the original version of the OAI response, according to the given XML namespace. =item $i->headers Returns an HTTP::OAI::Headers object. Use $headers->header('headername') to retrive field values. =item $burl = $i->baseURL([$burl]) =item $eds = $i->earliestDatestamp([$eds]) =item $gran = $i->granularity([$gran]) =item $version = $i->protocolVersion($version) =item $name = $i->repositoryName($name) Returns and optionally sets the relevent header. NOTE: protocolVersion will always be '2.0'. Use $i->version to find out the protocol version used by the repository. =item @addys = $i->adminEmail([$email]) =item @cmps = $i->compression([$cmp]) Returns and optionally adds to the multi-value headers. =item @dl = $i->description([$d]) Returns the description list and optionally appends a new description $d. Returns an array ref of Ls, or an empty ref if there are no description. =item $d = $i->next Returns the next description or undef if no more description left. =item $dom = $i->toDOM Returns a XML::DOM object representing the Identify response. =back HTTP-OAI-3.27/lib/HTTP/OAI/ListSets.pm0000644000076400007640000000516110640714230015157 0ustar tdb2tdb2package HTTP::OAI::ListSets; use strict; use warnings; use vars qw( @ISA ); @ISA = qw( HTTP::OAI::PartialList ); sub new { my ($class,%args) = @_; $args{handlers} ||= {}; $args{handlers}->{description} ||= 'HTTP::OAI::Metadata'; my $self = $class->SUPER::new(%args); $self->{in_set} = 0; $self; } sub set { shift->item(@_) } sub generate_body { my ($self) = @_; return unless defined(my $handler = $self->get_handler); for( $self->set ) { $_->set_handler($handler); $_->generate; } if( defined($self->resumptionToken) ) { $self->resumptionToken->set_handler($handler); $self->resumptionToken->generate; } } sub start_element { my ($self,$hash) = @_; my $elem = lc($hash->{Name}); if( !$self->{in_set} ) { if( $elem eq 'set' ) { $self->set_handler(new HTTP::OAI::Set( version=>$self->version, handlers=>$self->{handlers} )); $self->{'in_set'} = $hash->{Depth}; } elsif( $elem eq 'resumptiontoken' ) { $self->set_handler(new HTTP::OAI::ResumptionToken( version=>$self->version )); $self->{'in_set'} = $hash->{Depth}; } } $self->SUPER::start_element($hash); } sub end_element { my ($self,$hash) = @_; my $elem = lc($hash->{LocalName}); $self->SUPER::end_element($hash); if( $self->{'in_set'} == $hash->{Depth} ) { if( $elem eq 'set' ) { $self->set( $self->get_handler ); $self->set_handler( undef ); $self->{in_set} = 0; } elsif( $elem eq 'resumptionToken' ) { $self->resumptionToken( $self->get_handler ); $self->set_handler( undef ); $self->{in_set} = 0; } } } 1; __END__ =head1 NAME HTTP::OAI::ListSets - Provide access to an OAI ListSets response =head1 SYNOPSIS my $r = $h->ListSets(); while( my $rec = $r->next ) { print $rec->setSpec, "\n"; } die $r->message if $r->is_error; =head1 METHODS =over 4 =item $ls = new HTTP::OAI::ListSets This constructor method returns a new OAI::ListSets object. =item $set = $ls->next Returns either an L object, or undef, if no more records are available. Use $set->is_error to test whether there was an error getting the next record. If -resume was set to false in the Harvest Agent, next may return a string (the resumptionToken). =item @setl = $ls->set([$set]) Returns the set list and optionally adds a new set or resumptionToken, $set. Returns an array ref of Ls, with an optional resumptionToken string. =item $token = $ls->resumptionToken([$token]) Returns and optionally sets the L. =item $dom = $ls->toDOM Returns a XML::DOM object representing the ListSets response. =back HTTP-OAI-3.27/lib/HTTP/OAI/Debug.pm0000644000076400007640000000264611220134171014433 0ustar tdb2tdb2package HTTP::OAI::Debug; =pod =head1 NAME B - debug the HTTP::OAI libraries =head1 DESCRIPTION This package is a copy of L and exposes the same API. In addition to "trace", "debug" and "conns" this exposes a "sax" level for debugging SAX events. =cut require Exporter; @ISA = qw(Exporter); @EXPORT_OK = qw(level trace debug conns); use Carp (); my @levels = qw(trace debug conns sax); %current_level = (); sub import { my $pack = shift; my $callpkg = caller(0); my @symbols = (); my @levels = (); for (@_) { if (/^[-+]/) { push(@levels, $_); } else { push(@symbols, $_); } } Exporter::export($pack, $callpkg, @symbols); level(@levels); } sub level { for (@_) { if ($_ eq '+') { # all on # switch on all levels %current_level = map { $_ => 1 } @levels; } elsif ($_ eq '-') { # all off %current_level = (); } elsif (/^([-+])(\w+)$/) { $current_level{$2} = $1 eq '+'; } else { Carp::croak("Illegal level format $_"); } } } sub trace { _log(@_) if $current_level{'trace'}; } sub debug { _log(@_) if $current_level{'debug'}; } sub conns { _log(@_) if $current_level{'conns'}; } sub sax { _log(@_) if $current_level{'sax'}; } sub _log { my $msg = shift; $msg =~ s/\n$//; $msg =~ s/\n/\\n/g; my($package,$filename,$line,$sub) = caller(2); print STDERR "$sub: $msg\n"; } 1; HTTP-OAI-3.27/lib/HTTP/OAI/Response.pm0000644000076400007640000002367311600640110015203 0ustar tdb2tdb2package HTTP::OAI::Response; use strict; use warnings; =head1 NAME HTTP::OAI::Response - An OAI response =head1 DESCRIPTION C inherits from L and supplies some utility methods for OAI. =head1 METHODS =over 4 =cut use vars qw($BAD_REPLACEMENT_CHAR @ISA); our $USE_EVAL = 1; use utf8; use POSIX qw/strftime/; use CGI qw/-oldstyle_urls/; $CGI::USE_PARAM_SEMICOLON = 0; use HTTP::OAI::SAXHandler qw/ :SAX /; @ISA = qw( HTTP::Response XML::SAX::Base ); $BAD_REPLACEMENT_CHAR = '?'; =item $r = new HTTP::OAI::Response([responseDate=>$rd][, requestURL=>$ru]) This constructor method returns a new HTTP::OAI::Response object. Optionally set the responseDate and requestURL. Use $r->is_error to test whether the request was successful. In addition to the HTTP response codes, the following codes may be returned: 600 - Error parsing XML or invalid OAI response Use $r->message to obtain a human-readable error message. =cut sub new { my ($class,%args) = @_; my $self = $class->SUPER::new( $args{code}, $args{message} ); # Force headers $self->{handlers} = $args{handlers} || {}; $self->{_headers} = new HTTP::OAI::Headers(handlers=>$args{handlers}); $self->{errors} = $args{errors} || []; $self->{resume} = $args{resume}; # Force the version of OAI to try to parse $self->version($args{version}); # Add the harvestAgent $self->harvestAgent($args{harvestAgent}); # OAI initialisation if( $args{responseDate} ) { $self->responseDate($args{responseDate}); } if( $args{requestURL} ) { $self->requestURL($args{requestURL}); } if( $args{xslt} ) { $self->xslt($args{xslt}); } # Do some intelligent filling of undefined values unless( defined($self->responseDate) ) { $self->responseDate(strftime("%Y-%m-%dT%H:%M:%S",gmtime).'Z'); } unless( defined($self->requestURL) ) { $self->requestURL(CGI::self_url()); } unless( defined($self->verb) ) { my $verb = ref($self); $verb =~ s/.*:://; $self->verb($verb); } return $self; } =item $r->copy_from( $r ) Copies an L $r into this object. =cut sub copy_from { my( $self, $r ) = @_; # The DOM stuff will break if headers isn't an HTTP::OAI::Headers object $self->{_headers}->{$_} = $r->{_headers}->{$_} for keys %{$r->{_headers}}; $self->{_content} = $r->{_content}; $self->code( $r->code ); $self->message( $r->message ); $self->request( $r->request ); $self; } =item $headers = $r->headers Returns an L object. =cut sub parse_file { my ($self, $fh) = @_; $self->code(200); $self->message('parse_file'); my $parser = XML::LibXML::SAX->new( Handler=>HTTP::OAI::SAXHandler->new( Handler=>$self->headers )); HTTP::OAI::Debug::trace( $self->verb . " " . ref($parser) . "->parse_file( ".ref($fh)." )" ); $self->headers->set_handler($self); $USE_EVAL ? eval { $parser->parse_file($fh) } : $parser->parse_file($fh); $self->headers->set_handler(undef); # Otherwise we memory leak! if( $@ ) { $self->code(600); my $msg = $@; $msg =~ s/^\s+//s; $msg =~ s/\s+$//s; if( $self->request ) { $msg = "Error parsing XML from " . $self->request->uri . " " . $msg; } else { $msg = "Error parsing XML from string: $msg\n"; } $self->message($msg); $self->errors(new HTTP::OAI::Error( code=>'parseError', message=>$msg )); } } sub parse_string { my ($self, $str) = @_; $self->code(200); $self->message('parse_string'); do { my $parser = XML::LibXML::SAX->new( Handler=>HTTP::OAI::SAXHandler->new( Handler=>$self->headers )); HTTP::OAI::Debug::trace( $self->verb . " " . ref($parser) . "->parse_string(...)" ); $self->headers->set_handler($self); eval { local $SIG{__DIE__}; $parser->parse_string( $str ) }; $self->headers->set_handler(undef); undef $@ if $@ && $@ =~ /^done\n/; if( $@ ) { die $@ if !$USE_EVAL; # rethrow $self->errors(new HTTP::OAI::Error( code=>'parseError', message=>"Error while parsing XML: $@", )); } } while( $@ && fix_xml(\$str,$@) ); if( $@ ) { $self->code(600); my $msg = $@; $msg =~ s/^\s+//s; $msg =~ s/\s+$//s; if( $self->request ) { $msg = "Error parsing XML from " . $self->request->uri . " " . $msg; } else { $msg = "Error parsing XML from string: $msg\n"; } $self->message($msg); $self->errors(new HTTP::OAI::Error( code=>'parseError', message=>$msg )); } $self; } sub harvestAgent { shift->headers->header('harvestAgent',@_) } # Resume a request using a resumptionToken sub resume { my ($self,%args) = @_; my $ha = $args{harvestAgent} || $self->harvestAgent || Carp::confess "Required argument harvestAgent is undefined"; my $token = $args{resumptionToken} || Carp::confess "Required argument resumptionToken is undefined"; my $verb = $args{verb} || $self->verb || Carp::confess "Required argument verb is undefined"; if( !ref($token) or !$token->isa( "HTTP::OAI::ResumptionToken" ) ) { $token = HTTP::OAI::ResumptionToken->new( resumptionToken => $token ); } HTTP::OAI::Debug::trace( "'" . $token->resumptionToken . "'" ); my $response; %args = ( baseURL=>$ha->repository->baseURL, verb=>$verb, resumptionToken=>$token->resumptionToken, ); $self->headers->{_args} = \%args; # Reset the resumptionToken $self->headers->header('resumptionToken',undef); # Retry the request upto 3 times (leave a minute between retries) my $tries = 3; do { $response = $ha->request(\%args, undef, undef, undef, $self); unless( $response->is_success ) { # If the token is expired, we need to break out (no point wasting 3 # minutes) if( my @errors = $response->errors ) { for( grep { $_->code eq 'badResumptionToken' } @errors ) { $tries = 0; } } HTTP::OAI::Debug::trace( sprintf("Error response to '%s': %d '%s'\n", $args{resumptionToken}, $response->code, $response->message ) ); } } while( !$response->is_success and $tries-- and sleep(60) ); if( $self->resumptionToken and !$self->resumptionToken->is_empty and $self->resumptionToken->resumptionToken eq $token->resumptionToken ) { $self->code(600); $self->message("Flow-control error: Resumption token hasn't changed (" . $response->request->uri . ")."); } $self; } sub generate { my ($self) = @_; return unless defined(my $handler = $self->get_handler); $self->headers->set_handler($handler); g_start_document($handler); $handler->xml_decl({'Version'=>'1.0','Encoding'=>'UTF-8'}); $handler->characters({'Data'=>"\n"}); if( $self->xslt ) { $handler->processing_instruction({ 'Target' => 'xml-stylesheet', 'Data' => 'type=\'text/xsl\' href=\''. $self->xslt . '\'' }); } $self->headers->generate_start(); if( $self->errors ) { for( $self->errors ) { $_->set_handler($handler); $_->generate(); } } else { g_start_element($handler,'http://www.openarchives.org/OAI/2.0/',$self->verb,{}); $self->generate_body(); g_end_element($handler,'http://www.openarchives.org/OAI/2.0/',$self->verb,{}); } $self->headers->generate_end(); $handler->end_document(); } sub toDOM { my $self = shift; $self->set_handler(my $builder = XML::LibXML::SAX::Builder->new()); $self->generate(); $builder->result; } =item $errs = $r->errors([$err]) Returns and optionally adds to the OAI error list. Returns a reference to an array. =cut sub errors { my $self = shift; push @{$self->{errors}}, @_; for (@_) { if( $_->code eq 'badVerb' || $_->code eq 'badArgument' ) { my $uri = URI->new($self->requestURL || ''); $uri->query(''); $self->requestURL($uri->as_string); last; } } @{$self->{errors}}; } sub next { undef } =item $rd = $r->responseDate( [$rd] ) Returns and optionally sets the response date. =cut sub responseDate { shift->headers->header('responseDate',@_) } =item $ru = $r->requestURL( [$ru] ) Returns and optionally sets the request URL. =cut sub requestURL { my $self = shift; $_[0] =~ s/;/&/sg if @_ && $_[0] !~ /&/; $self->headers->header('requestURL',@_) } =item $verb = $r->verb( [$verb] ) Returns and optionally sets the OAI verb. =cut sub verb { shift->headers->header('verb',@_) } =item $r->version Return the version of the OAI protocol used by the remote site (protocolVersion is automatically changed by the underlying API). =cut sub version { shift->headers->header('version',@_) } =item $r->xslt( $url ) Set the stylesheet to use in a response. =cut sub xslt { shift->headers->header('xslt',@_) } # HTTP::Response::is_error doesn't consider 0 an error sub is_error { return shift->code != 200 } sub end_element { my ($self,$hash) = @_; my $elem = lc($hash->{Name}); $self->SUPER::end_element($hash); if( $elem eq 'error' ) { my $code = $hash->{Attributes}->{'{}code'}->{'Value'} || 'oai-lib: Undefined error code'; my $msg = $hash->{Text} || 'oai-lib: Undefined error message'; $self->errors(new HTTP::OAI::Error( code=>$code, message=>$msg, )); if( $code !~ '^noRecordsMatch|noSetHierarchy$' ) { $self->verb($elem); $self->code(600); $self->message("Response contains error(s): " . $self->{errors}->[0]->code . " (" . $self->{errors}->[0]->message . ")"); } } } sub fix_xml { my ($str, $err) = @_; return 0 unless( $err =~ /not well-formed.*byte (\d+)/ ); my $offset = $1; if( substr($$str,$offset-1,1) eq '&' ) { substr($$str,$offset-1,1) = '&'; return 1; } elsif( substr($$str,$offset-1,1) eq '<' ) { substr($$str,$offset-1,1) = '<'; return 1; } elsif( substr($$str,$offset,1) ne $BAD_REPLACEMENT_CHAR ) { substr($$str,$offset,1) = $BAD_REPLACEMENT_CHAR; return 1; } else { return 0; } } 1; __END__ =back =head1 NOTE - requestURI/request Version 2.0 of OAI uses a "request" element to contain the client's request, rather than a URI. The OAI-PERL library automatically converts from a URI into the appropriate request structure, and back again when harvesting. The exception to this rule is for badVerb errors, where the arguments will not be available for conversion into a URI. HTTP-OAI-3.27/lib/HTTP/OAI/ListIdentifiers.pm0000644000076400007640000000533710640714230016513 0ustar tdb2tdb2package HTTP::OAI::ListIdentifiers; use strict; use warnings; use vars qw( @ISA ); @ISA = qw( HTTP::OAI::PartialList ); sub new { my $class = shift; my %args = @_; my $self = $class->SUPER::new(@_); $self->{in_record} = 0; $self; } sub identifier { shift->item(@_) } sub generate_body { my ($self) = @_; return unless defined(my $handler = $self->get_handler); for($self->identifier) { $_->set_handler($handler); $_->generate; } if( defined($self->resumptionToken) ) { $self->resumptionToken->set_handler($handler); $self->resumptionToken->generate; } } sub start_element { my ($self,$hash) = @_; my $elem = lc($hash->{LocalName}); if( $elem eq 'header' ) { $self->set_handler(new HTTP::OAI::Header( version=>$self->version )); } elsif( $elem eq 'resumptiontoken' ) { $self->set_handler(new HTTP::OAI::ResumptionToken( version=>$self->version )); } $self->SUPER::start_element($hash); } sub end_element { my ($self,$hash) = @_; my $elem = lc($hash->{LocalName}); $self->SUPER::end_element($hash); if( $elem eq 'header' ) { $self->identifier( $self->get_handler ); $self->set_handler( undef ); } elsif( $elem eq 'resumptiontoken' ) { $self->resumptionToken( $self->get_handler ); $self->set_handler( undef ); } # OAI 1.x if( $self->version eq '1.1' && $elem eq 'identifier' ) { $self->identifier(new HTTP::OAI::Header( version=>$self->version, identifier=>$hash->{Text}, datestamp=>'0000-00-00', )); } } 1; __END__ =head1 NAME HTTP::OAI::ListIdentifiers - Provide access to an OAI ListIdentifiers response =head1 SYNOPSIS my $r = $h->ListIdentifiers; while(my $rec = $r->next) { print "identifier => ", $rec->identifier, "\n", print "datestamp => ", $rec->datestamp, "\n" if $rec->datestamp; print "status => ", ($rec->status || 'undef'), "\n"; } die $r->message if $r->is_error; =head1 METHODS =over 4 =item $li = new OAI::ListIdentifiers This constructor method returns a new OAI::ListIdentifiers object. =item $rec = $li->next Returns either an L object, or undef, if there are no more records. Use $rec->is_error to test whether there was an error getting the next record (otherwise things will break). If -resume was set to false in the Harvest Agent, next may return a string (the resumptionToken). =item @il = $li->identifier([$idobj]) Returns the identifier list and optionally adds an identifier or resumptionToken, $idobj. Returns an array ref of Ls. =item $dom = $li->toDOM Returns a XML::DOM object representing the ListIdentifiers response. =item $token = $li->resumptionToken([$token]) Returns and optionally sets the L. =back HTTP-OAI-3.27/lib/HTTP/OAI/Headers.pm0000644000076400007640000001512611600640110014752 0ustar tdb2tdb2package HTTP::OAI::Headers; use strict; use warnings; use HTTP::OAI::SAXHandler qw( :SAX ); use vars qw( @ISA ); @ISA = qw( XML::SAX::Base ); my %VERSIONS = ( 'http://www.openarchives.org/oai/1.0/oai_getrecord' => '1.0', 'http://www.openarchives.org/oai/1.0/oai_identify' => '1.0', 'http://www.openarchives.org/oai/1.0/oai_listidentifiers' => '1.0', 'http://www.openarchives.org/oai/1.0/oai_listmetadataformats' => '1.0', 'http://www.openarchives.org/oai/1.0/oai_listrecords' => '1.0', 'http://www.openarchives.org/oai/1.0/oai_listsets' => '1.0', 'http://www.openarchives.org/oai/1.1/oai_getrecord' => '1.1', 'http://www.openarchives.org/oai/1.1/oai_identify' => '1.1', 'http://www.openarchives.org/oai/1.1/oai_listidentifiers' => '1.1', 'http://www.openarchives.org/oai/1.1/oai_listmetadataformats' => '1.1', 'http://www.openarchives.org/oai/1.1/oai_listrecords' => '1.1', 'http://www.openarchives.org/oai/1.1/oai_listsets' => '1.1', 'http://www.openarchives.org/oai/2.0/' => '2.0', 'http://www.openarchives.org/oai/2.0/static-repository' => '2.0s', ); sub new { my ($class,%args) = @_; my $self = bless { 'field'=>{ 'xmlns'=>'http://www.openarchives.org/OAI/2.0/', 'xmlns:xsi'=>'http://www.w3.org/2001/XMLSchema-instance', 'xsi:schemaLocation'=>'http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd' }, %args, }, ref($class) || $class; return $self; } sub set_error { my ($self,$error,$code) = @_; $code ||= 600; if( $self->get_handler ) { $self->get_handler->errors($error); $self->get_handler->code($code); } else { Carp::carp ref($self)." tried to set_error without having a handler to set it on!"; } } sub generate_start { my ($self) = @_; return unless defined(my $handler = $self->get_handler); $handler->start_prefix_mapping({ 'Prefix'=>'xsi', 'NamespaceURI'=>'http://www.w3.org/2001/XMLSchema-instance' }); $handler->start_prefix_mapping({ 'Prefix'=>'', 'NamespaceURI'=>'http://www.openarchives.org/OAI/2.0/' }); g_start_element($handler, 'http://www.openarchives.org/OAI/2.0/', 'OAI-PMH', { '{http://www.w3.org/2001/XMLSchema-instance}schemaLocation'=>{ 'LocalName' => 'schemaLocation', 'Prefix' => 'xsi', 'Value' => 'http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd', 'Name' => 'xsi:schemaLocation', 'NamespaceURI' => 'http://www.w3.org/2001/XMLSchema-instance', }, '{}xmlns' => { 'Prefix' => '', 'LocalName' => 'xmlns', 'Value' => 'http://www.openarchives.org/OAI/2.0/', 'Name' => 'xmlns', 'NamespaceURI' => '', }, '{http://www.w3.org/2000/xmlns/}xsi'=>{ 'LocalName' => 'xsi', 'Prefix' => 'xmlns', 'Value' => 'http://www.w3.org/2001/XMLSchema-instance', 'Name' => 'xmlns:xsi', 'NamespaceURI' => 'http://www.w3.org/2000/xmlns/', }, }); g_data_element($handler, 'http://www.openarchives.org/OAI/2.0/', 'responseDate', {}, $self->header('responseDate') ); my $uri = URI->new($self->header('requestURL')); my $attr; my %QUERY = $uri->query_form; while(my ($key,$value) = each %QUERY) { $attr->{"{}$key"} = { 'Name'=>$key, 'LocalName'=>$key, 'Value'=>$value, 'Prefix'=>'', 'NamespaceURI'=>'', }; } $uri->query( undef ); g_data_element($handler, 'http://www.openarchives.org/OAI/2.0/', 'request', $attr, $uri->as_string ); } sub generate_end { my ($self) = @_; return unless defined(my $handler = $self->get_handler); g_end_element($handler, 'http://www.openarchives.org/OAI/2.0/', 'OAI-PMH' ); $handler->end_prefix_mapping({ 'Prefix'=>'xsi', 'NamespaceURI'=>'http://www.w3.org/2001/XMLSchema-instance' }); $handler->end_prefix_mapping({ 'Prefix'=>'', 'NamespaceURI'=>'http://www.openarchives.org/OAI/2.0/' }); } sub header { my $self = shift; return @_ > 1 ? $self->{field}->{$_[0]} = $_[1] : $self->{field}->{$_[0]}; } sub end_document { my $self = shift; $self->set_handler(undef); unless( defined($self->header('version')) ) { die "Not an OAI-PMH response: No recognised OAI-PMH namespace found before end of document\n"; } } sub start_element { my ($self,$hash) = @_; return $self->SUPER::start_element($hash) if $self->{State}; my $elem = $hash->{LocalName}; my $attr = $hash->{Attributes}; # Root element unless( defined($self->header('version')) ) { my $xmlns = $hash->{NamespaceURI}; if( !defined($xmlns) || !length($xmlns) ) { die "Error parsing response: no namespace on root element"; } elsif( !exists $VERSIONS{lc($xmlns)} ) { die "Error parsing response: unrecognised OAI namespace '$xmlns'"; } else { $self->header('version',$VERSIONS{lc($xmlns)}) } } # With a static repository, don't process any headers if( $self->header('version') && $self->header('version') eq '2.0s' ) { my %args = %{$self->{_args}}; # ListRecords and the correct prefix if( $elem eq 'ListRecords' && $elem eq $args{'verb'} && $attr->{'{}metadataPrefix'}->{'Value'} eq $args{'metadataPrefix'} ) { $self->{State} = 1; # Start of the verb we're looking for } elsif( $elem ne 'ListRecords' && $elem eq $args{'verb'} ) { $self->{State} = 1; } } else { $self->{State} = 1; } } sub end_element { my ($self,$hash) = @_; my $elem = $hash->{LocalName}; my $attr = $hash->{Attributes}; my $text = $hash->{Text}; # Static repository, don't process any headers if( $self->header('version') && $self->header('version') eq '2.0s' ) { # Stop parsing when we get to the closing verb if( $self->{State} && $elem eq $self->{_args}->{'verb'} && $hash->{NamespaceURI} eq 'http://www.openarchives.org/OAI/2.0/static-repository' ) { $self->{State} = 0; die "done\n\n"; } return $self->{State} ? $self->SUPER::end_element($hash) : undef; } $self->SUPER::end_element($hash); if( $elem eq 'responseDate' || $elem eq 'requestURL' ) { $self->header($elem,$text); } elsif( $elem eq 'request' ) { $self->header("request",$text); my $uri = new URI($text); $uri->query_form(map { ($_->{LocalName},$_->{Value}) } values %$attr); $self->header("requestURL",$uri); } else { die "Still in headers, but came across an unrecognised element: $elem"; } if( $elem eq 'requestURL' || $elem eq 'request' ) { die "Oops! Root handler isn't \$self - $self != $hash->{State}" unless ref($self) eq ref($hash->{State}->get_handler); $hash->{State}->set_handler($self->get_handler); } return 1; } 1; __END__ =head1 NAME HTTP::OAI::Headers - Encapsulation of 'header' values =head1 METHODS =over 4 =item $value = $hdrs->header($name,[$value]) Return and optionally set the header field $name to $value. =back HTTP-OAI-3.27/lib/HTTP/OAI/Record.pm0000644000076400007640000000755111600640155014631 0ustar tdb2tdb2package HTTP::OAI::Record; use strict; use warnings; use vars qw(@ISA); use HTTP::OAI::SAXHandler qw/ :SAX /; @ISA = qw(HTTP::OAI::Encapsulation); sub new { my ($class,%args) = @_; my $self = $class->SUPER::new(%args); $self->{handlers} = $args{handlers}; $self->header($args{header}) unless defined($self->header); $self->metadata($args{metadata}) unless defined($self->metadata); $self->{about} = $args{about} || [] unless defined($self->{about}); $self->{in_record} = 0; $self->header(new HTTP::OAI::Header(%args)) unless defined $self->header; $self; } sub header { shift->_elem('header',@_) } sub metadata { shift->_elem('metadata',@_) } sub about { my $self = shift; push @{$self->{about}}, @_ if @_; return @{$self->{about}}; } sub identifier { shift->header->identifier(@_) } sub datestamp { shift->header->datestamp(@_) } sub status { shift->header->status(@_) } sub is_deleted { shift->header->is_deleted(@_) } sub generate { my ($self) = @_; return unless defined(my $handler = $self->get_handler); g_start_element($handler,'http://www.openarchives.org/OAI/2.0/','record',{}); $self->header->set_handler($handler); $self->header->generate; g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','metadata',{},$self->metadata) if defined($self->metadata); g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','about',{},$_) for $self->about; g_end_element($handler,'http://www.openarchives.org/OAI/2.0/','record'); } sub start_element { my ($self,$hash) = @_; return $self->SUPER::start_element( $hash ) if $self->{in_record}; my $elem = lc($hash->{LocalName}); if( $elem eq 'record' && $self->version eq '1.1' ) { $self->status($hash->{Attributes}->{'{}status'}->{Value}); } elsif( $elem =~ /^header|metadata|about$/ ) { my $handler = $self->{handlers}->{$elem}->new() or die "Error getting handler for <$elem> (failed to create new $self->{handlers}->{$elem})"; $self->set_handler($handler); $self->{in_record} = $hash->{Depth}; g_start_document( $handler ); $self->SUPER::start_element( $hash ); } } sub end_element { my ($self,$hash) = @_; $self->SUPER::end_element($hash); if( $self->{in_record} == $hash->{Depth} ) { $self->SUPER::end_document(); my $elem = lc ($hash->{LocalName}); $self->$elem ($self->get_handler); $self->set_handler ( undef ); $self->{in_record} = 0; } } 1; __END__ =head1 NAME HTTP::OAI::Record - Encapsulates an OAI record =head1 SYNOPSIS use HTTP::OAI::Record; # Create a new HTTP::OAI Record my $r = new HTTP::OAI::Record(); $r->header->identifier('oai:myarchive.org:oid-233'); $r->header->datestamp('2002-04-01'); $r->header->setSpec('all:novels'); $r->header->setSpec('all:books'); $r->metadata(new HTTP::OAI::Metadata(dom=>$md)); $r->about(new HTTP::OAI::Metadata(dom=>$ab)); =head1 METHODS =over 4 =item $r = new HTTP::OAI::Record( %opts ) This constructor method returns a new L object. Options (see methods below): header => $header metadata => $metadata about => [$about] =item $r->header([HTTP::OAI::Header]) Returns and optionally sets the record header (an L object). =item $r->metadata([HTTP::OAI::Metadata]) Returns and optionally sets the record metadata (an L object). =item $r->about([HTTP::OAI::Metadata]) Optionally adds a new About record (an L object) and returns an array of objects (may be empty). =back =head2 Header Accessor Methods These methods are equivalent to C<< $rec->header->$method([$value]) >>. =over 4 =item $r->identifier([$identifier]) Get and optionally set the record OAI identifier. =item $r->datestamp([$datestamp]) Get and optionally set the record datestamp. =item $r->status([$status]) Get and optionally set the record status (valid values are 'deleted' or undef). =item $r->is_deleted() Returns whether this record's status is deleted. =back HTTP-OAI-3.27/lib/HTTP/OAI/Error.pm0000644000076400007640000000644010640714230014477 0ustar tdb2tdb2package HTTP::OAI::Error; use strict; use warnings; use vars qw(@ISA @EXPORT @EXPORT_OK %EXPORT_TAG); use vars qw( $PARSER ); $PARSER = 600; use Exporter; use HTTP::OAI::SAXHandler qw( :SAX ); @ISA = qw(HTTP::OAI::Encapsulation Exporter); @EXPORT = qw(); @EXPORT_OK = qw(%OAI_ERRORS); %EXPORT_TAG = (); my %OAI_ERRORS = ( badArgument => 'The request includes illegal arguments, is missing required arguments, includes a repeated argument, or values for arguments have an illegal syntax.', # badGranularity => 'The values of the from and until arguments are illegal or specify a finer granularity than is supported by the repository.', badResumptionToken => 'The value of the resumptionToken argument is invalid or expired.', badVerb => 'Value of the verb argument is not a legal OAI-PMH verb, the verb argument is missing, or the verb argument is repeated.', cannotDisseminateFormat => 'The metadata format identified by the value given for the metadataPrefix argument is not supported by the item or by the repository', idDoesNotExist => 'The value of the identifier argument is unknown or illegal in this repository.', noRecordsMatch => 'The combination of the values of the from, until, set, and metadataPrefix arguments results in an empty list.', noMetadataFormats => 'There are no metadata formats available for the specified item.', noSetHierarchy => 'The repository does not support sets.' ); sub new { my ($class,%args) = @_; my $self = $class->SUPER::new(%args); $self->code($args{code}); $self->message($args{message}); $self; } sub code { shift->_elem('code',@_) } sub message { shift->_elem('message',@_) } sub toString { my $self = shift; return $self->code . " (\"" . ($self->message || 'No further information available') . "\")"; } sub generate { my ($self) = @_; return unless defined(my $handler = $self->get_handler); Carp::croak ref($self)."::generate Error code undefined" unless defined($self->code); g_data_element($handler, 'http://www.openarchives.org/OAI/2.0/', 'error', { '{}code'=>{ 'LocalName' => 'code', 'Prefix' => '', 'Value' => $self->code, 'Name' => 'code', 'NamespaceURI' => '', }, }, ($self->message || $OAI_ERRORS{$self->code} || '') ); } 1; __END__ =head1 NAME HTTP::OAI::Error - Encapsulates OAI error codes =head1 METHODS =over 4 =item $err = new HTTP::OAI::Error(code=>'badArgument',[message=>'An incorrect argument was supplied']) This constructor method returns a new HTTP::OAI::Error object. If no message is specified, and the code is a valid OAI error code, the appropriate message from the OAI protocol document is the default message. =item $code = $err->code([$code]) Returns and optionally sets the error name. =item $msg = $err->message([$msg]) Returns and optionally sets the error message. =back =head1 NOTE - noRecordsMatch noRecordsMatch, without additional errors, is not treated as an error code. If noRecordsMatch was returned by a repository the HTTP::OAI::Response object will have a verb 'error' and will contain the noRecordsMatch error, however is_success will return true. e.g. my $r = $ha->ListIdentifiers(metadataPrefix='oai_dc',from=>'3000-02-02'); if( $r->is_success ) { print "Successful\n"; } else { print "Failed\n"; } print $r->verb, "\n"; Will print "Successful" followed by "error". HTTP-OAI-3.27/lib/HTTP/OAI/ResumptionToken.pm0000644000076400007640000000505610640714230016556 0ustar tdb2tdb2package HTTP::OAI::ResumptionToken; use strict; use warnings; use HTTP::OAI::SAXHandler qw/ :SAX /; use vars qw( @ISA ); @ISA = qw( HTTP::OAI::Encapsulation ); use overload "bool" => \¬_empty; sub new { my ($class,%args) = @_; my $self = $class->SUPER::new(%args); $self->resumptionToken($args{resumptionToken}) unless $self->resumptionToken; $self->expirationDate($args{expirationDate}) unless $self->expirationDate; $self->completeListSize($args{completeListSize}) unless $self->completeListSize; $self->cursor($args{cursor}) unless $self->cursor; $self; } sub resumptionToken { shift->_elem('resumptionToken',@_) } sub expirationDate { shift->_attr('expirationDate',@_) } sub completeListSize { shift->_attr('completeListSize',@_) } sub cursor { shift->_attr('cursor',@_) } sub not_empty { defined($_[0]->resumptionToken) and length($_[0]->resumptionToken) > 0 } sub is_empty { !not_empty(@_) } sub generate { my ($self) = @_; return unless (my $handler = $self->get_handler); my $attr; while(my ($key,$value) = each %{$self->_attr}) { $attr->{"{}$key"} = {'Name'=>$key,'LocalName'=>$key,'Value'=>$value,'Prefix'=>'','NamespaceURI'=>'http://www.openarchives.org/OAI/2.0/'}; } g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','resumptionToken',$attr,$self->resumptionToken); } sub end_element { my ($self,$hash) = @_; $self->SUPER::end_element($hash); if( lc($hash->{Name}) eq 'resumptiontoken' ) { my $attr = $hash->{Attributes}; $self->resumptionToken($hash->{Text}); $self->expirationDate($attr->{'{}expirationDate'}->{'Value'}); $self->completeListSize($attr->{'{}completeListSize'}->{'Value'}); $self->cursor($attr->{'{}cursor'}->{'Value'}); } #warn "Got RT: $hash->{Text}"; } 1; __END__ =head1 NAME HTTP::OAI::ResumptionToken - Encapsulates an OAI resumption token =head1 METHODS =over 4 =item $rt = new HTTP::OAI::ResumptionToken This constructor method returns a new HTTP::OAI::ResumptionToken object. =item $token = $rt->resumptionToken([$token]) Returns and optionally sets the resumption token string. =item $ed = $rt->expirationDate([$rt]) Returns and optionally sets the expiration date of the resumption token. =item $cls = $rt->completeListSize([$cls]) Returns and optionally sets the cardinality of the result set. =item $cur = $rt->cursor([$cur]) Returns and optionally sets the index of the first record (of the current page) in the result set. =back =head1 NOTE - Completing incomplete list The final page of a record list which has been split using resumption tokens must contain an empty resumption token. HTTP-OAI-3.27/lib/HTTP/OAI/PartialList.pm0000644000076400007640000000162211616525006015640 0ustar tdb2tdb2package HTTP::OAI::PartialList; use strict; use warnings; use vars qw( @ISA ); @ISA = qw( HTTP::OAI::Response ); sub new { my( $class, %args ) = @_; my $self = $class->SUPER::new(%args); $self->{onRecord} = delete $args{onRecord}; $self->{item} ||= []; return $self; } sub resumptionToken { shift->headers->header('resumptionToken',@_) } sub item { my $self = shift; if( defined($self->{onRecord}) ) { $self->{onRecord}->($_, $self) for @_; } else { push(@{$self->{item}}, @_); } return wantarray ? @{$self->{item}} : $self->{item}->[0]; } sub next { my $self = shift; return shift @{$self->{item}} if @{$self->{item}}; return undef unless $self->{'resume'} and $self->resumptionToken; do { $self->resume(resumptionToken=>$self->resumptionToken); } while( $self->{onRecord} and $self->is_success and $self->resumptionToken ); return $self->is_success ? $self->next : undef; } 1; HTTP-OAI-3.27/lib/HTTP/OAI/MetadataFormat.pm0000644000076400007640000000463411600641404016301 0ustar tdb2tdb2package HTTP::OAI::MetadataFormat; use strict; use warnings; use HTTP::OAI::SAXHandler qw/ :SAX /; use vars qw( @ISA ); @ISA = qw( HTTP::OAI::Encapsulation ); sub new { my ($class,%args) = @_; my $self = $class->SUPER::new(%args); $self->metadataPrefix($args{metadataPrefix}) if $args{metadataPrefix}; $self->schema($args{schema}) if $args{schema}; $self->metadataNamespace($args{metadataNamespace}) if $args{metadataNamespace}; $self; } sub metadataPrefix { my $self = shift; return @_ ? $self->{metadataPrefix} = shift : $self->{metadataPrefix} } sub schema { my $self = shift; return @_ ? $self->{schema} = shift : $self->{schema} } sub metadataNamespace { my $self = shift; return @_ ? $self->{metadataNamespace} = shift : $self->{metadataNamespace} } sub generate { my ($self) = @_; return unless defined(my $handler = $self->get_handler); g_start_element($handler,'http://www.openarchives.org/OAI/2.0/','metadataFormat',{}); g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','metadataPrefix',{},$self->metadataPrefix); g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','schema',{},$self->schema); if( defined($self->metadataNamespace) ) { g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','metadataNamespace',{},$self->metadataNamespace); } g_end_element($handler,'http://www.openarchives.org/OAI/2.0/','metadataFormat'); } sub end_element { my ($self,$hash) = @_; $self->SUPER::end_element($hash); my $elem = lc($hash->{LocalName}); if( defined $hash->{Text} ) { $hash->{Text} =~ s/^\s+//; $hash->{Text} =~ s/\s+$//; } if( $elem eq 'metadataprefix' ) { $self->metadataPrefix($hash->{Text}); } elsif( $elem eq 'schema' ) { $self->schema($hash->{Text}); } elsif( $elem eq 'metadatanamespace' ) { $self->metadataNamespace($hash->{Text}); } } 1; __END__ =head1 NAME HTTP::OAI::MetadataFormat - Encapsulates OAI metadataFormat XML data =head1 METHODS =over 4 =item $mdf = new HTTP::OAI::MetadataFormat This constructor method returns a new HTTP::OAI::MetadataFormat object. =item $mdp = $mdf->metadataPrefix([$mdp]) =item $schema = $mdf->schema([$schema]) =item $ns = $mdf->metadataNamespace([$ns]) These methods respectively return and optionally set the metadataPrefix, schema and, metadataNamespace, for the metadataFormat record. metadataNamespace is optional in OAI 1.x and therefore may be undef when harvesting pre OAI 2 repositories. =back HTTP-OAI-3.27/lib/HTTP/OAI/Harvester.pm0000644000076400007640000003462211600640517015357 0ustar tdb2tdb2package HTTP::OAI::Harvester; use strict; use warnings; use vars qw( @ISA ); @ISA = qw( HTTP::OAI::UserAgent ); sub new { my ($class,%args) = @_; my %ARGS = %args; delete @ARGS{qw(baseURL resume repository handlers onRecord)}; my $self = $class->SUPER::new(%ARGS); $self->{'resume'} = exists($args{resume}) ? $args{resume} : 1; $self->{'handlers'} = $args{'handlers'}; $self->{'onRecord'} = $args{'onRecord'}; $self->agent('OAI-PERL/'.$HTTP::OAI::VERSION); # Record the base URL this harvester instance is associated with $self->{repository} = $args{repository} || HTTP::OAI::Identify->new(baseURL=>$args{baseURL}); Carp::croak "Requires repository or baseURL" unless $self->repository and $self->repository->baseURL; # Canonicalise $self->baseURL($self->baseURL); return $self; } sub resume { my $self = shift; return @_ ? $self->{resume} = shift : $self->{resume}; } sub repository { my $self = shift; return $self->{repository} unless @_; my $id = shift; # Don't clobber a good existing base URL with a bad one if( $self->{repository} && $self->{repository}->baseURL ) { if( !$id->baseURL ) { Carp::carp "Attempt to set a non-existant baseURL"; $id->baseURL($self->baseURL); } else { my $uri = URI->new($id->baseURL); if( $uri && $uri->scheme ) { $id->baseURL($uri->canonical); } else { Carp::carp "Ignoring attempt to use an invalid base URL: " . $id->baseURL; $id->baseURL($self->baseURL); } } } return $self->{repository} = $id; } sub baseURL { my $self = shift; return @_ ? $self->repository->baseURL(URI->new(shift)->canonical) : $self->repository->baseURL(); } sub version { shift->repository->version(@_); } # build the methods for each OAI verb foreach my $verb (qw( GetRecord Identify ListIdentifiers ListMetadataFormats ListRecords ListSets )) { no strict "refs"; *$verb = sub { shift->_oai( verb => $verb, @_ )}; } sub _oai { my( $self, %args ) = @_; my $verb = $args{verb} or Carp::croak "Requires verb argument"; my $handlers = delete($args{handlers}) || $self->{'handlers'}; my $onRecord = delete($args{onRecord}) || $self->{'onRecord'}; if( !$args{force} && defined($self->repository->version) && '2.0' eq $self->repository->version && (my @errors = HTTP::OAI::Repository::validate_request(%args)) ) { return new HTTP::OAI::Response( code=>503, message=>'Invalid Request (use \'force\' to force a non-conformant request): ' . $errors[0]->toString, errors=>\@errors ); } delete $args{force}; # Get rid of any empty arguments for( keys %args ) { delete $args{$_} if !defined($args{$_}) || !length($args{$_}); } # Check for a static repository (sets _static) if( !$self->{_interogated} ) { $self->interogate(); $self->{_interogated} = 1; } if( 'ListIdentifiers' eq $verb && defined($self->repository->version) && '1.1' eq $self->repository->version ) { delete $args{metadataPrefix}; } my $r = "HTTP::OAI::$verb"->new( harvestAgent => $self, resume => $self->resume, handlers => $handlers, onRecord => $onRecord, ); $r->headers->{_args} = \%args; # Parse all the records if _static set if( defined($self->{_static}) && !defined($self->{_records}) ) { my $lmdf = HTTP::OAI::ListMetadataFormats->new( handlers => $handlers, ); $lmdf->headers->{_args} = { %args, verb=>'ListMetadataFormats', }; # Find the metadata formats $lmdf = $lmdf->parse_string($self->{_static}); return $lmdf unless $lmdf->is_success; @{$self->{_formats}} = $lmdf->metadataFormat; # Extract all records $self->{_records} = {}; for($lmdf->metadataFormat) { my $lr = HTTP::OAI::ListRecords->new( handlers => $handlers, ); $lr->headers->{_args} = { %args, verb=>'ListRecords', metadataPrefix=>$_->metadataPrefix, }; $lr->parse_string($self->{_static}); return $lr if !$lr->is_success; @{$self->{_records}->{$_->metadataPrefix}} = $lr->record; } undef($self->{_static}); } # Make the remote request and return the result if( !defined($self->{_records}) ) { $r = $self->request({baseURL=>$self->baseURL,%args},undef,undef,undef,$r); # Lets call next() for the user if she's using the callback interface if( $onRecord and $r->is_success and $r->isa("HTTP::OAI::PartialList") ) { $r->next; } return $r; # Parse our memory copy of the static repository } else { $r->code(200); # Format doesn't exist if( $verb =~ /^GetRecord|ListIdentifiers|ListRecords$/ && !exists($self->{_records}->{$args{metadataPrefix}}) ) { $r->code(600); $r->errors(HTTP::OAI::Error->new( code=>'cannotDisseminateFormat', )); # GetRecord } elsif( $verb eq 'GetRecord' ) { for(@{$self->{_records}->{$args{metadataPrefix}}}) { if( $_->identifier eq $args{identifier} ) { $r->record($_); return $r; } } $r->code(600); $r->errors(HTTP::OAI::Error->new( code=>'idDoesNotExist' )); # Identify } elsif( $verb eq 'Identify' ) { $r = $self->repository(); # ListIdentifiers } elsif( $verb eq 'ListIdentifiers' ) { $r->identifier(map { $_->header } @{$self->{_records}->{$args{metadataPrefix}}}) # ListMetadataFormats } elsif( $verb eq 'ListMetadataFormats' ) { $r->metadataFormat(@{$self->{_formats}}); # ListRecords } elsif( $verb eq 'ListRecords' ) { $r->record(@{$self->{_records}->{$args{metadataPrefix}}}); # ListSets } elsif( $verb eq 'ListSets' ) { $r->errors(HTTP::OAI::Error->new( code=>'noSetHierarchy', message=>'Static Repositories do not support sets', )); } return $r; } } sub interogate { my $self = shift; Carp::croak "Requires baseURL" unless $self->baseURL; HTTP::OAI::Debug::trace($self->baseURL); my $r = $self->request(HTTP::Request->new(GET => $self->baseURL)); return unless length($r->content); my $id = HTTP::OAI::Identify->new( handlers=>$self->{handlers}, ); $id->headers->{_args} = {verb=>'Identify'}; $id->parse_string($r->content); if( $id->is_success && $id->version eq '2.0s' ) { $self->{_static} = $r->content; $self->repository($id); } HTTP::OAI::Debug::trace("version = ".$id->version) if $id->is_success; } 1; __END__ =head1 NAME HTTP::OAI::Harvester - Agent for harvesting from Open Archives version 1.0, 1.1, 2.0 and static ('2.0s') compatible repositories =head1 DESCRIPTION C is the harvesting front-end in the OAI-PERL library. To harvest from an OAI-PMH compliant repository create an C object using the baseURL option and then call OAI-PMH methods to request data from the repository. To handle version 1.0/1.1 repositories automatically you B request C first. It is recommended that you request an Identify from the Repository and use the C method to update the Identify object used by the harvester. When making OAI requests the underlying L module will take care of automatic redirection (http code 302) and retry-after (http code 503). OAI-PMH flow control (i.e. resumption tokens) is handled transparently by C. =head2 Static Repository Support Static repositories are automatically and transparently supported within the existing API. To harvest a static repository specify the repository XML file using the baseURL argument to HTTP::OAI::Harvester. An initial request is made that determines whether the base URL specifies a static repository or a normal OAI 1.x/2.0 CGI repository. To prevent this initial request state the OAI version using an HTTP::OAI::Identify object e.g. $h = HTTP::OAI::Harvester->new( repository=>HTTP::OAI::Identify->new( baseURL => 'http://arXiv.org/oai2', version => '2.0', )); If a static repository is found the response is cached, and further requests are served by that cache. Static repositories do not support sets, and will result in a noSetHierarchy error if you try to use sets. You can determine whether the repository is static by checking the version ($ha->repository->version), which will be "2.0s" for static repositories. =head1 FURTHER READING You should refer to the Open Archives Protocol version 2.0 and other OAI documentation, available from http://www.openarchives.org/. Note OAI-PMH 1.0 and 1.1 are deprecated. =head1 BEFORE USING EXAMPLES In the examples I use arXiv.org's and cogprints OAI interfaces. To avoid causing annoyance to their server administrators please contact them before performing testing or large downloads (or use other, less loaded, servers for testing). =head1 SYNOPSIS use HTTP::OAI; my $h = new HTTP::OAI::Harvester(baseURL=>'http://arXiv.org/oai2'); my $response = $h->repository($h->Identify) if( $response->is_error ) { print "Error requesting Identify:\n", $response->code . " " . $response->message, "\n"; exit; } # Note: repositoryVersion will always be 2.0, $r->version returns # the actual version the repository is running print "Repository supports protocol version ", $response->version, "\n"; # Version 1.x repositories don't support metadataPrefix, # but OAI-PERL will drop the prefix automatically # if an Identify was requested first (as above) $response = $h->ListIdentifiers( metadataPrefix=>'oai_dc', from=>'2001-02-03', until=>'2001-04-10' ); if( $response->is_error ) { die("Error harvesting: " . $response->message . "\n"); } print "responseDate => ", $response->responseDate, "\n", "requestURL => ", $response->requestURL, "\n"; while( my $id = $response->next ) { print "identifier => ", $id->identifier; # Only available from OAI 2.0 repositories print " (", $id->datestamp, ")" if $id->datestamp; print " (", $id->status, ")" if $id->status; print "\n"; # Only available from OAI 2.0 repositories for( $id->setSpec ) { print "\t", $_, "\n"; } } # Using a handler $response = $h->ListRecords( metadataPrefix=>'oai_dc', handlers=>{metadata=>'HTTP::OAI::Metadata::OAI_DC'}, ); while( my $rec = $response->next ) { print $rec->identifier, "\t", $rec->datestamp, "\n", $rec->metadata, "\n"; print join(',', @{$rec->metadata->dc->{'title'}}), "\n"; } if( $rec->is_error ) { die $response->message; } # Offline parsing $I = HTTP::OAI::Identify->new(); $I->parse_string($content); $I->parse_file($fh); =head1 METHODS =over 4 =item HTTP::OAI::Harvester->new( %params ) This constructor method returns a new instance of C. Requires either an L object, which in turn must contain a baseURL, or a baseURL from which to construct an Identify object. Any other parameters are passed to the L module, and from there to the L module. $h = HTTP::OAI::Harvester->new( baseURL => 'http://arXiv.org/oai2', resume=>0, # Suppress automatic resumption ) $id = $h->repository(); $h->repository($h->Identify); $h = HTTP::OAI::Harvester->new( HTTP::OAI::Identify->new( baseURL => 'http://arXiv.org/oai2', )); =item $h->repository() Returns and optionally sets the L object used by the Harvester agent. =item $h->resume( [1] ) If set to true (default) resumption tokens will automatically be handled by requesting the next partial list during C calls. =back =head1 OAI-PMH Verbs The 6 OAI-PMH Verbs are the requests supported by an OAI-PMH interface. =head2 Error Messages Use C or C on the returned object to determine whether an error occurred (see L). C and C return the error code (200 is success) and a human-readable message respectively. L returned by the repository can be retrieved using the C method: foreach my $error ($r->errors) { print $error->code, "\t", $error->message, "\n"; } Note: C is true for the OAI Error Code C (i.e. empty set), although C will still contain the OAI error. =head2 Flow Control If the response contained a L this can be retrieved using the $r->resumptionToken method. =head2 Methods These methods return an object subclassed from L (where the class corresponds to the verb requested, e.g. C requests return an C object). =over 4 =item $r = $h->GetRecord( %params ) Get a single record from the repository identified by identifier, in format metadataPrefix. $gr = $h->GetRecord( identifier => 'oai:arXiv:hep-th/0001001', # Required metadataPrefix => 'oai_dc' # Required ); $rec = $gr->next; die $rec->message if $rec->is_error; printf("%s (%s)\n", $rec->identifier, $rec->datestamp); $dom = $rec->metadata->dom; =item $r = $h->Identify() Get information about the repository. $id = $h->Identify(); print join ',', $id->adminEmail; =item $r = $h->ListIdentifiers( %params ) Retrieve the identifiers, datestamps, sets and deleted status for all records within the specified date range (from/until) and set spec (set). 1.x repositories will only return the identifier. Or, resume an existing harvest by specifying resumptionToken. $lr = $h->ListIdentifiers( metadataPrefix => 'oai_dc', # Required from => '2001-10-01', until => '2001-10-31', set=>'physics:hep-th', ); while($rec = $lr->next) { { ... do something with $rec ... } } die $lr->message if $lr->is_error; =item $r = $h->ListMetadataFormats( %params ) List available metadata formats. Given an identifier the repository should only return those metadata formats for which that item can be disseminated. $lmdf = $h->ListMetadataFormats( identifier => 'oai:arXiv.org:hep-th/0001001' ); for($lmdf->metadataFormat) { print $_->metadataPrefix, "\n"; } die $lmdf->message if $lmdf->is_error; =item $r = $h->ListRecords( %params ) Return full records within the specified date range (from/until), set and metadata format. Or, specify a resumption token to resume a previous partial harvest. $lr = $h->ListRecords( metadataPrefix=>'oai_dc', # Required from => '2001-10-01', until => '2001-10-01', set => 'physics:hep-th', ); while($rec = $lr->next) { { ... do something with $rec ... } } die $lr->message if $lr->is_error; =item $r = $h->ListSets( %params ) Return a list of sets provided by the repository. The scope of sets is undefined by OAI-PMH, so therefore may represent any subset of a collection. Optionally provide a resumption token to resume a previous partial request. $ls = $h->ListSets(); while($set = $ls->next) { print $set->setSpec, "\n"; } die $ls->message if $ls->is_error; =back =head1 AUTHOR These modules have been written by Tim Brody Etdb01r@ecs.soton.ac.ukE. HTTP-OAI-3.27/lib/HTTP/OAI/Set.pm0000644000076400007640000000454311616521277014156 0ustar tdb2tdb2package HTTP::OAI::Set; use strict; use warnings; use HTTP::OAI::SAXHandler qw/ :SAX /; use vars qw( @ISA ); @ISA = qw( HTTP::OAI::Encapsulation ); sub new { my ($class,%args) = @_; my $self = $class->SUPER::new(%args); $self->{handlers} = $args{handlers}; $self->setSpec($args{setSpec}); $self->setName($args{setName}); $self->{setDescription} = $args{setDescription} || []; $self; } sub setSpec { shift->_elem('setSpec',@_) } sub setName { shift->_elem('setName',@_) } sub setDescription { my $self = shift; push(@{$self->{setDescription}}, @_); return @{$self->{setDescription}}; } sub next { shift @{shift->{setDescription}} } sub generate { my ($self) = @_; return unless defined(my $handler = $self->get_handler); g_start_element($handler,'http://www.openarchives.org/OAI/2.0/','set',{}); g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','setSpec',{},$self->setSpec); g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','setName',{},$self->setName); for( $self->setDescription ) { $_->set_handler($handler); $_->generate; } g_end_element($handler,'http://www.openarchives.org/OAI/2.0/','set'); } sub start_element { my ($self,$hash) = @_; my $elem = lc($hash->{Name}); if( $elem eq 'setdescription' ) { $self->setDescription(my $d = $self->{handlers}->{description}->new(version=>$self->version)); $self->set_handler($d); g_start_document($d); } $self->SUPER::start_element($hash); } sub end_element { my ($self,$hash) = @_; $self->SUPER::end_element($hash); my $elem = lc($hash->{Name}); if( $elem eq 'setspec' ) { die ref($self)." Parse error: Empty setSpec\n" unless $hash->{Text}; $self->setSpec($hash->{Text}); } elsif( $elem eq 'setname' ) { warn ref($self)." Parse error: Empty setName\n", return unless $hash->{Text}; $self->setName($hash->{Text}); } elsif( $elem eq 'setdescription' ) { $self->SUPER::end_document(); $self->set_handler(undef); } } 1; __END__ =head1 NAME HTTP::OAI::Set - Encapsulates OAI set XML data =head1 METHODS =over 4 =item $spec = $s->setSpec([$spec]) =item $name = $s->setName([$name]) These methods return respectively, the setSpec and setName of the OAI Set. =item @descs = $s->setDescription([$desc]) Returns and optionally adds the list of set descriptions. Returns a reference to an array of L objects. =back HTTP-OAI-3.27/lib/HTTP/OAI/ListRecords.pm0000644000076400007640000000611111600640342015635 0ustar tdb2tdb2package HTTP::OAI::ListRecords; use strict; use warnings; use vars qw( @ISA ); @ISA = qw( HTTP::OAI::PartialList ); sub new { my ($class,%args) = @_; $args{handlers} ||= {}; $args{handlers}->{header} ||= "HTTP::OAI::Header"; $args{handlers}->{metadata} ||= "HTTP::OAI::Metadata"; $args{handlers}->{about} ||= "HTTP::OAI::Metadata"; my $self = $class->SUPER::new(%args); $self->{in_record} = 0; $self; } sub record { shift->item(@_) } sub generate_body { my ($self) = @_; return unless defined(my $handler = $self->get_handler); for( $self->record ) { $_->set_handler($self->get_handler); $_->generate; } if( defined($self->resumptionToken) ) { $self->resumptionToken->set_handler($handler); $self->resumptionToken->generate; } } sub start_element { my ($self,$hash) = @_; if( !$self->{'in_record'} ) { my $elem = lc($hash->{LocalName}); if( $elem eq 'record' ) { $self->set_handler(new HTTP::OAI::Record( version=>$self->version, handlers=>$self->{handlers}, )); $self->{'in_record'} = $hash->{Depth}; } elsif( $elem eq 'resumptiontoken' ) { $self->set_handler(new HTTP::OAI::ResumptionToken( version=>$self->version )); $self->{'in_record'} = $hash->{Depth}; } } $self->SUPER::start_element($hash); } sub end_element { my ($self,$hash) = @_; $self->SUPER::end_element($hash); if( $self->{'in_record'} == $hash->{Depth} ) { my $elem = lc($hash->{LocalName}); if( $elem eq 'record' ) { HTTP::OAI::Debug::trace( "record: " . $self->get_handler->identifier ); $self->record( $self->get_handler ); $self->set_handler( undef ); $self->{'in_record'} = 0; } elsif( $elem eq 'resumptiontoken' ) { $self->resumptionToken( $self->get_handler ); $self->set_handler( undef ); $self->{'in_record'} = 0; } } } 1; __END__ =head1 NAME HTTP::OAI::ListRecords - Provide access to an OAI ListRecords response =head1 SYNOPSIS my $r = $h->ListRecords( metadataPrefix=>'oai_dc', ); while( my $rec = $r->next ) { print "Identifier => ", $rec->identifier, "\n"; } die $r->message if $r->is_error; # Using callback method sub callback { my $rec = shift; print "Identifier => ", $rec->identifier, "\n"; }; my $r = $h->ListRecords( metadataPrefix=>'oai_dc', onRecord=>\&callback ); die $r->message if $r->is_error; =head1 METHODS =over 4 =item $lr = new HTTP::OAI::ListRecords This constructor method returns a new HTTP::OAI::ListRecords object. =item $rec = $lr->next Returns either an L object, or undef, if no more record are available. Use $rec->is_error to test whether there was an error getting the next record. =item @recl = $lr->record([$rec]) Returns the record list and optionally adds a new record or resumptionToken, $rec. Returns an array ref of Ls, including an optional resumptionToken string. =item $token = $lr->resumptionToken([$token]) Returns and optionally sets the L. =item $dom = $lr->toDOM Returns a XML::DOM object representing the ListRecords response. =back HTTP-OAI-3.27/lib/HTTP/OAI/Header.pm0000644000076400007640000001006311601066343014575 0ustar tdb2tdb2package HTTP::OAI::Header; use strict; use warnings; use POSIX qw/strftime/; use vars qw(@ISA); use HTTP::OAI::SAXHandler qw( :SAX ); @ISA = qw(HTTP::OAI::Encapsulation); sub new { my ($class,%args) = @_; my $self = $class->SUPER::new(%args); $self->identifier($args{identifier}) unless $self->identifier; $self->datestamp($args{datestamp}) unless $self->datestamp; $self->status($args{status}) unless $self->status; $self->{setSpec} ||= $args{setSpec} || []; $self; } sub identifier { shift->_elem('identifier',@_) } sub now { return strftime("%Y-%m-%dT%H:%M:%SZ",gmtime()) } sub datestamp { my $self = shift; return $self->_elem('datestamp') unless @_; my $ds = shift or return $self->_elem('datestamp',undef); if( $ds =~ /^(\d{4})(\d{2})(\d{2})$/ ) { $ds = "$1-$2-$3"; } elsif( $ds =~ /^(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})$/ ) { $ds = "$1-$2-$3T$4:$5:$6Z"; } return $self->_elem('datestamp',$ds); } sub status { shift->_attr('status',@_) } sub is_deleted { my $s = shift->status(); return defined($s) && $s eq 'deleted'; } sub setSpec { my $self = shift; push(@{$self->{setSpec}},@_); @{$self->{setSpec}}; } sub dom { my $self = shift; if( my $dom = shift ) { my $driver = XML::LibXML::SAX::Parser->new( Handler=>HTTP::OAI::SAXHandler->new( Handler=>$self )); $driver->generate($dom->ownerDocument); } else { $self->set_handler(my $builder = XML::LibXML::SAX::Builder->new()); g_start_document($self); $self->xml_decl({'Version'=>'1.0','Encoding'=>'UTF-8'}); $self->characters({'Data'=>"\n"}); $self->generate(); $self->end_document(); return $builder->result; } } sub generate { my ($self) = @_; return unless defined(my $handler = $self->get_handler); if( defined($self->status) ) { g_start_element($handler,'http://www.openarchives.org/OAI/2.0/','header', { "{}status"=>{ 'Name'=>'status', 'LocalName'=>'status', 'Value'=>$self->status, 'Prefix'=>'', 'NamespaceURI'=>'' } }); } else { g_start_element($handler,'http://www.openarchives.org/OAI/2.0/','header',{}); } g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','identifier',{},$self->identifier); g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','datestamp',{},($self->datestamp || $self->now)); for($self->setSpec) { g_data_element($handler,'http://www.openarchives.org/OAI/2.0/','setSpec',{},$_); } g_end_element($handler,'http://www.openarchives.org/OAI/2.0/','header'); } sub end_element { my ($self,$hash) = @_; my $elem = lc($hash->{LocalName}); my $text = $hash->{Text}; if( defined $text ) { $text =~ s/^\s+//; $text =~ s/\s+$//; } if( $elem eq 'identifier' ) { die "HTTP::OAI::Header parse error: Empty identifier\n" unless $text; $self->identifier($text); } elsif( $elem eq 'datestamp' ) { warn "HTTP::OAI::Header parse warning: Empty datestamp for ".$self->identifier."\n" unless $text; $self->datestamp($text); } elsif( $elem eq 'setspec' ) { $self->setSpec($text); } elsif( $elem eq 'header' ) { $self->status($hash->{Attributes}->{'{}status'}->{Value}); } } 1; __END__ =head1 NAME HTTP::OAI::Header - Encapsulates an OAI header structure =head1 SYNOPSIS use HTTP::OAI::Header; my $h = new HTTP::OAI::Header( identifier=>'oai:myarchive.org:2233-add', datestamp=>'2002-04-12T20:31:00Z', ); $h->setSpec('all:novels'); =head1 METHODS =over 4 =item $h = new HTTP::OAI::Header This constructor method returns a new C. =item $h->identifier([$identifier]) Get and optionally set the record OAI identifier. =item $h->datestamp([$datestamp]) Get and optionally set the record datestamp (OAI 2.0+). =item $h->status([$status]) Get and optionally set the record status (valid values are 'deleted' or undef). =item $h->is_deleted() Returns whether this record's status is deleted. =item @sets = $h->setSpec([$setSpec]) Returns the list of setSpecs and optionally appends a new setSpec C<$setSpec> (OAI 2.0+). =item $dom_fragment = $id->generate() Act as a SAX driver (use C<< $h->set_handler() >> to specify the filter to pass events to). =back HTTP-OAI-3.27/lib/HTTP/OAI/SAXHandler.pm0000644000076400007640000001307511600621020015330 0ustar tdb2tdb2package HTTP::OAI::SAXHandler; use strict; use warnings; use vars qw(@ISA @EXPORT @EXPORT_OK %EXPORT_TAGS); use Data::Dumper; # debugging for here @ISA = qw( Exporter XML::SAX::Base ); @EXPORT_OK = qw( g_start_document g_start_element g_end_element g_data_element ); %EXPORT_TAGS = (SAX=>[qw( g_start_document g_start_element g_end_element g_data_element )]); =pod =head1 NAME HTTP::OAI::SAXHandler - SAX2 utility filter =head1 DESCRIPTION This module provides utility methods for SAX2, including collapsing multiple "characters" events into a single event. This module exports methods for generating SAX2 events with Namespace support. This *isn't* a fully-fledged SAX2 generator! =over 4 =item $h = HTTP::OAI::SAXHandler->new() Class constructor. =cut sub new { my ($class,%args) = @_; $class = ref($class) || $class; my $self = $class->SUPER::new(%args); $self->{Depth} = 0; $self; } sub g_start_document { my ($handler) = @_; $handler->start_document(); $handler->start_prefix_mapping({ 'Prefix'=>'xsi', 'NamespaceURI'=>'http://www.w3.org/2001/XMLSchema-instance' }); $handler->start_prefix_mapping({ 'Prefix'=>'', 'NamespaceURI'=>'http://www.openarchives.org/OAI/2.0/' }); } sub g_data_element { my ($handler,$uri,$qName,$attr,$value) = @_; g_start_element($handler,$uri,$qName,$attr); if( ref($value) ) { $value->set_handler($handler); $value->generate; } else { $handler->characters({'Data'=>$value}); } g_end_element($handler,$uri,$qName); } sub g_start_element { my ($handler,$uri,$qName,$attr) = @_; $attr ||= {}; my ($prefix,$localName) = split /:/, $qName; unless(defined($localName)) { $localName = $prefix; $prefix = ''; } $handler->start_element({ 'NamespaceURI'=>$uri, 'Name'=>$qName, 'Prefix'=>$prefix, 'LocalName'=>$localName, 'Attributes'=>$attr }); } sub g_end_element { my ($handler,$uri,$qName) = @_; my ($prefix,$localName) = split /:/, $qName; unless(defined($localName)) { $localName = $prefix; $prefix = ''; } $handler->end_element({ 'NamespaceURI'=>$uri, 'Name'=>$qName, 'Prefix'=>$prefix, 'LocalName'=>$localName, }); } sub current_state { my $self = shift; return $self->{State}->[$#{$self->{State}}]; } sub current_element { my $self = shift; return $self->{Elem}->[$#{$self->{Elem}}]; } sub start_document { HTTP::OAI::Debug::sax( Dumper($_[1]) ); $_[0]->SUPER::start_document(); } sub end_document { $_[0]->SUPER::end_document(); HTTP::OAI::Debug::sax( Dumper($_[1]) ); } # Char data is rolled together by this module sub characters { my ($self,$hash) = @_; $self->{Text} .= $hash->{Data}; # characters are traced in {start,end}_element #HTTP::OAI::Debug::sax( "'" . substr($hash->{Data},0,40) . "'" ); } sub start_element { my ($self,$hash) = @_; push @{$self->{Attributes}}, $hash->{Attributes}; # Call characters with the joined character data if( defined($self->{Text}) ) { HTTP::OAI::Debug::sax( "'".substr($self->{Text},0,40) . "'" ); $self->SUPER::characters({Data=>$self->{Text}}); $self->{Text} = undef; } $hash->{State} = $self; $hash->{Depth} = ++$self->{Depth}; HTTP::OAI::Debug::sax( (" " x $hash->{Depth}) . '<'.$hash->{Name}.'>' ); $self->SUPER::start_element($hash); } sub end_element { my ($self,$hash) = @_; # Call characters with the joined character data $hash->{Text} = $self->{Text}; if( defined($self->{Text}) ) { # Trailing whitespace causes problems if( $self->{Text} =~ /\S/ ) { HTTP::OAI::Debug::sax( "'".substr($self->{Text},0,40) . "'" ); $self->SUPER::characters({Data=>$self->{Text}}); } $self->{Text} = undef; } $hash->{Attributes} = pop @{$self->{Attributes}} || {}; $hash->{State} = $self; $hash->{Depth} = $self->{Depth}--; HTTP::OAI::Debug::sax( (" " x $hash->{Depth}) . ' <'.$hash->{Name}.'>' ); $self->SUPER::end_element($hash); } sub entity_reference { my ($self,$hash) = @_; HTTP::OAI::Debug::sax( $hash->{Name} ); } sub start_cdata { HTTP::OAI::Debug::sax(); } sub end_cdata { HTTP::OAI::Debug::sax(); } sub comment { HTTP::OAI::Debug::sax( $_[1]->{Data} ); } sub doctype_decl { # {SystemId,PublicId,Internal} HTTP::OAI::Debug::sax( $_[1]->{Name} ); } sub attlist_decl { # {ElementName,AttributeName,Type,Default,Fixed} HTTP::OAI::Debug::sax( $_[1]->{ElementName} ); } sub xml_decl { # {Version,Encoding,Standalone} HTTP::OAI::Debug::sax( join ", ", map { defined($_) ? $_ : "null" } @{$_[1]}{qw( Version Encoding Standalone )} ); } sub entity_decl { # {Value,SystemId,PublicId,Notation} HTTP::OAI::Debug::sax( $_[1]->{Name} ); } sub unparsed_decl { HTTP::OAI::Debug::sax(); } sub element_decl { # {Model} HTTP::OAI::Debug::sax( $_[1]->{Name} ); } sub notation_decl { # {Name,Base,SystemId,PublicId} HTTP::OAI::Debug::sax( $_[1]->{Name} ); } sub processing_instruction { # {Target,Data} HTTP::OAI::Debug::sax( $_[1]->{Target} . " => " . $_[1]->{Data} ); } package HTTP::OAI::FilterDOMFragment; use vars qw( @ISA ); @ISA = qw( XML::SAX::Base ); # Trap things that don't apply to a balanced fragment sub start_document {} sub end_document {} sub xml_decl {} package XML::SAX::Debug; use Data::Dumper; use vars qw( @ISA $AUTOLOAD ); @ISA = qw( XML::SAX::Base ); sub DEBUG { my ($event,$self,$hash) = @_; warn "$event(".Dumper($hash).")\n"; my $superior = "SUPER::$event"; $self->$superior($hash); } sub start_document { DEBUG('start_document',@_) } sub end_document { DEBUG('end_document',@_) } sub start_element { DEBUG('start_element',@_) } sub end_element { DEBUG('end_element',@_) } sub characters { DEBUG('characters',@_) } sub xml_decl { DEBUG('xml_decl',@_) } 1; __END__ =back =head1 AUTHOR Tim Brody HTTP-OAI-3.27/lib/HTTP/OAI/UserAgent.pm0000644000076400007640000002037311600640717015311 0ustar tdb2tdb2package HTTP::OAI::UserAgent; use strict; use warnings; use vars qw(@ISA $ACCEPT); # Do not use eval() our $USE_EVAL = 1; # Ignore bad utf8 characters our $IGNORE_BAD_CHARS = 1; # Silence bad utf8 warnings our $SILENT_BAD_CHARS = 0; use constant MAX_UTF8_BYTES => 4; require LWP::UserAgent; @ISA = qw(LWP::UserAgent); unless( $@ ) { $ACCEPT = "gzip"; } sub delay { shift->_elem( "delay", @_ ) } sub last_request_completed { shift->_elem( "last_request_completed", @_ ) } sub redirect_ok { 1 } sub request { my $self = shift; my ($request, $arg, $size, $previous, $response) = @_; if( ref($request) eq 'HASH' ) { $request = HTTP::Request->new(GET => _buildurl(%$request)); } my $delay = $self->delay; if( defined $delay ) { if( ref($delay) eq "CODE" ) { $delay = &$delay( $self->last_request_completed ); } select(undef,undef,undef,$delay) if $delay > 0; } if( !defined $response ) { $response = $self->SUPER::request(@_); $self->last_request_completed( time ); return $response; } my $parser = XML::LibXML->new( Handler => HTTP::OAI::SAXHandler->new( Handler => $response->headers )); $parser->{request} = $request; $parser->{content_length} = 0; $parser->{content_buffer} = Encode::encode('UTF-8',''); $response->code(200); $response->message('lwp_callback'); $response->headers->set_handler($response); HTTP::OAI::Debug::trace( $response->verb . " " . ref($parser) . "->parse_chunk()" ); my $r; { local $SIG{__DIE__}; $r = $self->SUPER::request($request,sub { $self->lwp_callback( $parser, @_ ) }); $self->lwp_endparse( $parser ) if $r->is_success; } if( defined($r) && defined($r->headers->header( 'Client-Aborted' )) && $r->headers->header( 'Client-Aborted' ) eq 'die' ) { my $err = $r->headers->header( 'X-Died' ); if( $err !~ /^done\n/ ) { $r->code(500); $r->message( 'An error occurred while parsing: ' . $err ); } } $response->headers->set_handler(undef); # Allow access to the original headers through 'previous' $response->previous($r); my $cnt_len = $parser->{content_length}; undef $parser; # OAI retry-after if( defined($r) && $r->code == 503 && defined(my $timeout = $r->headers->header('Retry-After')) ) { $self->last_request_completed( time ); if( $self->{recursion}++ > 10 ) { $self->{recursion} = 0; warn ref($self)."::request (retry-after) Given up requesting after 10 retries\n"; return $response->copy_from( $r ); } if( !$timeout or $timeout =~ /\D/ or $timeout < 0 or $timeout > 86400 ) { warn ref($self)." Archive specified an odd duration to wait (\"".($timeout||'null')."\")\n"; return $response->copy_from( $r ); } HTTP::OAI::Debug::trace( "Waiting $timeout seconds" ); sleep($timeout+10); # We wait an extra 10 secs for safety return $self->request($request,undef,undef,undef,$response); # Got an empty response } elsif( defined($r) && $r->is_success && $cnt_len == 0 ) { $self->last_request_completed( time ); if( $self->{recursion}++ > 10 ) { $self->{recursion} = 0; warn ref($self)."::request (empty response) Given up requesting after 10 retries\n"; return $response->copy_from( $r ); } HTTP::OAI::Debug::trace( "Retrying on empty response" ); sleep(5); return $self->request($request,undef,undef,undef,$response); # An HTTP error occurred } elsif( $r->is_error ) { $response->copy_from( $r ); $response->errors(HTTP::OAI::Error->new( code=>$r->code, message=>$r->message, )); # An error occurred during parsing } elsif( $@ ) { $response->code(my $code = $@ =~ /read timeout/ ? 504 : 600); $response->message($@); $response->errors(HTTP::OAI::Error->new( code=>$code, message=>$@, )); } # Reset the recursion timer $self->{recursion} = 0; # Copy original $request => OAI $response to allow easy # access to the requested URL $response->request($request); $self->last_request_completed( time ); $response; } sub lwp_badchar { my $codepoint = sprintf('U+%04x', ord($_[2])); unless( $SILENT_BAD_CHARS ) { warn "Bad Unicode character $codepoint at byte offset ".$_[1]->{content_length}." from ".$_[1]->{request}->uri."\n"; } return $codepoint; } sub lwp_endparse { my( $self, $parser ) = @_; my $utf8 = $parser->{content_buffer}; # Replace bad chars with '?' if( $IGNORE_BAD_CHARS and length($utf8) ) { $utf8 = Encode::decode('UTF-8', $utf8, sub { $self->lwp_badchar($parser, @_) }); } if( length($utf8) > 0 ) { _ccchars($utf8); # Fix control chars $parser->{content_length} += length($utf8); $parser->parse_chunk($utf8); } delete($parser->{content_buffer}); $parser->parse_chunk('', 1); } sub lwp_callback { my( $self, $parser ) = @_; use bytes; # fixing utf-8 will need byte semantics $parser->{content_buffer} .= $_[2]; do { # FB_QUIET won't split multi-byte chars on input my $utf8 = Encode::decode('UTF-8', $parser->{content_buffer}, Encode::FB_QUIET); if( length($utf8) > 0 ) { use utf8; _ccchars($utf8); # Fix control chars $parser->{content_length} += length($utf8); $parser->parse_chunk($utf8); } if( length($parser->{content_buffer}) > MAX_UTF8_BYTES ) { $parser->{content_buffer} =~ s/^([\x80-\xff]{1,4})//s; my $badbytes = $1; if( length($badbytes) == 0 ) { Carp::confess "Internal error - bad bytes but not in 0x80-0xff range???"; } if( $IGNORE_BAD_CHARS ) { $badbytes = join('', map { $self->lwp_badchar($parser, $_) } split //, $badbytes); } $parser->parse_chunk( $badbytes ); } } while( length($parser->{content_buffer}) > MAX_UTF8_BYTES ); } sub _ccchars { $_[0] =~ s/([\x00-\x08\x0b-\x0c\x0e-\x1f])/sprintf("\\%04d",ord($1))/seg; } sub _buildurl { my %attr = @_; Carp::confess "_buildurl requires baseURL" unless $attr{'baseURL'}; Carp::confess "_buildurl requires verb" unless $attr{'verb'}; my $uri = new URI(delete($attr{'baseURL'})); if( defined($attr{resumptionToken}) && !$attr{force} ) { $uri->query_form(verb=>$attr{'verb'},resumptionToken=>$attr{'resumptionToken'}); } else { delete $attr{force}; # http://www.cshc.ubc.ca/oai/ breaks if verb isn't first, doh $uri->query_form(verb=>delete($attr{'verb'}),%attr); } return $uri->as_string; } sub url { my $self = shift; return _buildurl(@_); } sub decompress { my ($response) = @_; my $type = $response->headers->header("Content-Encoding"); return $response->{_content_filename} unless defined($type); if( $type eq 'gzip' ) { my $filename = File::Temp->new( UNLINK => 1 ); my $gz = Compress::Zlib::gzopen($response->{_content_filename}, "r") or die $!; my ($buffer,$c); my $fh = IO::File->new($filename,"w"); binmode($fh,":utf8"); while( ($c = $gz->gzread($buffer)) > 0 ) { print $fh $buffer; } $fh->close(); $gz->gzclose(); die "Error decompressing gziped response: " . $gz->gzerror() if -1 == $c; return $response->{_content_filename} = $filename; } else { die "Unsupported compression returned: $type\n"; } } 1; __END__ =head1 NAME HTTP::OAI::UserAgent - Extension of the LWP::UserAgent for OAI HTTP requests =head1 DESCRIPTION This module provides a simplified mechanism for making requests to an OAI repository, using the existing LWP::UserAgent module. =head1 SYNOPSIS require HTTP::OAI::UserAgent; my $ua = new HTTP::OAI::UserAgent; my $response = $ua->request( baseURL=>'http://arXiv.org/oai1', verb=>'ListRecords', from=>'2001-08-01', until=>'2001-08-31' ); print $response->content; =head1 METHODS =over 4 =item $ua = new HTTP::OAI::UserAgent(proxy=>'www-cache',...) This constructor method returns a new instance of a HTTP::OAI::UserAgent module. All arguments are passed to the L constructor. =item $r = $ua->request($req) Requests the HTTP response defined by $req, which is a L object. =item $r = $ua->request(baseURL=>$baseref, verb=>$verb, %opts) Makes an HTTP request to the given OAI server (baseURL) with OAI arguments. Returns an L object. OAI-PMH related options: from => $from until => $until resumptionToken => $token metadataPrefix => $mdp set => $set =item $str = $ua->url(baseURL=>$baseref, verb=>$verb, ...) Takes the same arguments as request, but returns the URL that would be requested. =item $time_d = $ua->delay( $time_d ) Return and optionally set a time (in seconds) to wait between requests. $time_d may be a CODEREF. =back HTTP-OAI-3.27/lib/HTTP/OAI/GetRecord.pm0000644000076400007640000000505310640714230015263 0ustar tdb2tdb2package HTTP::OAI::GetRecord; use strict; use warnings; use HTTP::OAI::SAXHandler qw/ :SAX /; use vars qw(@ISA); @ISA = qw( HTTP::OAI::Response ); sub new { my ($class,%args) = @_; $args{handlers} ||= {}; $args{handlers}->{header} ||= "HTTP::OAI::Header"; $args{handlers}->{metadata} ||= "HTTP::OAI::Metadata"; $args{handlers}->{about} ||= "HTTP::OAI::Metadata"; my $self = $class->SUPER::new(%args); $self->verb('GetRecord') unless $self->verb; $self->{record} ||= []; $self->record($args{record}) if defined($args{record}); return $self; } sub record { my $self = shift; $self->{record} = [shift] if @_; return wantarray ? @{$self->{record}} : $self->{record}->[0]; } sub next { shift @{shift->{record}} } sub generate_body { my ($self) = @_; for( $self->record ) { $_->set_handler($self->get_handler); $_->generate; } } sub start_element { my ($self,$hash) = @_; my $elem = $hash->{LocalName}; if( $elem eq 'record' && !exists($self->{"in_record"}) ) { $self->{OLDHandler} = $self->get_handler; my $rec = HTTP::OAI::Record->new( version=>$self->version, handlers=>$self->{handlers}, ); $self->record($rec); $self->set_handler($rec); $self->{"in_record"} = $hash->{Depth}; } $self->SUPER::start_element($hash); } sub end_element { my ($self,$hash) = @_; $self->SUPER::end_element($hash); my $elem = lc($hash->{LocalName}); if( $elem eq 'record' && exists($self->{"in_record"}) && $self->{"in_record"} == $hash->{Depth} ) { $self->set_handler($self->{OLDHandler}); } } 1; __END__ =head1 NAME HTTP::OAI::GetRecord - An OAI GetRecord response =head1 DESCRIPTION HTTP::OAI::GetRecord is derived from L and provides access to the data contained in an OAI GetRecord response in addition to the header information provided by OAI::Response. =head1 SYNOPSIS use HTTP::OAI::GetRecord(); $res = new HTTP::OAI::GetRecord(); $res->record($rec); =head1 METHODS =over 4 =item $gr = new HTTP::OAI::GetRecord This constructor method returns a new HTTP::OAI::GetRecord object. =item $rec = $gr->next Returns the next record stored in the response, or undef if no more record are available. The record is returned as an L. =item @recs = $gr->record([$rec]) Returns the record list, and optionally adds a record to the end of the queue. GetRecord will only store one record at a time, so this method will replace any existing record if called with argument(s). =item $dom = $gr->toDOM() Returns an XML::DOM object representing the GetRecord response. =back HTTP-OAI-3.27/lib/HTTP/OAI/ListMetadataFormats.pm0000644000076400007640000000436111600640342017315 0ustar tdb2tdb2package HTTP::OAI::ListMetadataFormats; use strict; use warnings; use vars qw( @ISA ); @ISA = qw( HTTP::OAI::Response ); sub new { my $class = shift; my $self = $class->SUPER::new(@_); $self->{'metadataFormat'} ||= []; $self->{in_mdf} = 0; $self->verb('ListMetadataFormats') unless $self->verb; $self; } sub metadataFormat { my $self = shift; push(@{$self->{metadataformat}}, @_); return wantarray ? @{$self->{metadataformat}} : $self->{metadataformat}->[0]; } sub next { shift @{shift->{metadataformat}} } sub generate_body { my ($self) = @_; return unless defined(my $handler = $self->get_handler); for( $self->metadataFormat ) { $_->set_handler($handler); $_->generate; } } sub start_element { my ($self,$hash) = @_; if( !$self->{'in_mdf'} ) { if( lc($hash->{LocalName}) eq 'metadataformat' ) { $self->set_handler(new HTTP::OAI::MetadataFormat()); $self->{'in_mdf'} = $hash->{Depth}; } } $self->SUPER::start_element($hash); } sub end_element { my ($self,$hash) = @_; $self->SUPER::end_element($hash); if( $self->{'in_mdf'} == $hash->{Depth} ) { if( lc($hash->{LocalName}) eq 'metadataformat' ) { HTTP::OAI::Debug::trace( "metadataFormat: " . $self->get_handler->metadataPrefix ); $self->metadataFormat( $self->get_handler ); $self->set_handler( undef ); $self->{'in_mdf'} = 0; } } } 1; __END__ =head1 NAME HTTP::OAI::ListMetadataFormats - Provide access to an OAI ListMetadataFormats response =head1 SYNOPSIS my $r = $h->ListMetadataFormats; # ListMetadataFormats doesn't use flow control while( my $rec = $r->next ) { print $rec->metadataPrefix, "\n"; } die $r->message if $r->is_error; =head1 METHODS =over 4 =item $lmdf = new HTTP::OAI::ListMetadataFormats This constructor method returns a new HTTP::OAI::ListMetadataFormats object. =item $mdf = $lmdf->next Returns either an L object, or undef, if no more records are available. =item @mdfl = $lmdf->metadataFormat([$mdf]) Returns the metadataFormat list and optionally adds a new metadataFormat, $mdf. Returns an array ref of Ls. =item $dom = $lmdf->toDOM Returns a XML::DOM object representing the ListMetadataFormats response. =back HTTP-OAI-3.27/lib/HTTP/OAI.pm0000644000076400007640000000347611616526042013422 0ustar tdb2tdb2package HTTP::OAI; use strict; our $VERSION = '3.27'; # perlcore use Carp; use Encode; # http related stuff use URI; use HTTP::Headers; use HTTP::Request; use HTTP::Response; # xml related stuff use XML::SAX; use XML::SAX::ParserFactory; use XML::LibXML; use XML::LibXML::SAX; use XML::LibXML::SAX::Parser; use XML::LibXML::SAX::Builder; # debug use HTTP::OAI::Debug; # oai data objects use HTTP::OAI::Encapsulation; # Basic XML handling stuff use HTTP::OAI::Metadata; # Super class of all data objects use HTTP::OAI::Error; use HTTP::OAI::Header; use HTTP::OAI::MetadataFormat; use HTTP::OAI::Record; use HTTP::OAI::ResumptionToken; use HTTP::OAI::Set; # parses OAI headers and other utility bits use HTTP::OAI::Headers; # generic superclasses use HTTP::OAI::Response; use HTTP::OAI::PartialList; # oai verbs use HTTP::OAI::GetRecord; use HTTP::OAI::Identify; use HTTP::OAI::ListIdentifiers; use HTTP::OAI::ListMetadataFormats; use HTTP::OAI::ListRecords; use HTTP::OAI::ListSets; # oai agents use HTTP::OAI::UserAgent; use HTTP::OAI::Harvester; use HTTP::OAI::Repository; $HTTP::OAI::Harvester::VERSION = $VERSION; if( $ENV{HTTP_OAI_TRACE} ) { HTTP::OAI::Debug::level( '+trace' ); } if( $ENV{HTTP_OAI_SAX_TRACE} ) { HTTP::OAI::Debug::level( '+sax' ); } 1; __END__ =head1 NAME HTTP::OAI - API for the OAI-PMH =head1 DESCRIPTION This is a stub module, you probably want to look at L or L. =head1 SEE ALSO You can find links to this and other OAI tools (perl, C++, java) at: http://www.openarchives.org/tools/tools.html. Ed Summers L module. =head1 AUTHOR Copyright 2004-2010 Tim Brody , University of Southampton. This module is free software and is released under the BSD License (see LICENSE). HTTP-OAI-3.27/bin/0000755000076400007640000000000011616526067011674 5ustar tdb2tdb2HTTP-OAI-3.27/bin/oai_static_gateway.pl0000755000076400007640000000327310640714230016064 0ustar tdb2tdb2#!/usr/bin/perl -w # Change this to the location of your static repository # XML file my $STATIC_REPO = 'file:../examples/repository.xml'; use strict; use HTTP::OAI; use HTTP::OAI::Repository qw/:validate/; use XML::SAX::Writer; use CGI qw/:standard -oldstyle_urls/; use vars qw( $GZIP ); BEGIN { eval { require PerlIO::gzip }; $GZIP = $@ ? 0 : 1; } # Create a new harvester object to read the xml file my $h = HTTP::OAI::Harvester->new(baseURL=>$STATIC_REPO); binmode(STDOUT,':utf8'); my @encodings = http('HTTP_ACCEPT_ENCODING'); if( $GZIP && grep { defined($_) && $_ eq 'gzip' } @encodings ) { print header( -type=>'text/xml; charset=utf-8', -charset=>'utf-8', '-Content-Encoding'=>'gzip', ); binmode(STDOUT, ":gzip"); } else { print header( -type=>'text/xml; charset=utf-8', -charset=>'utf-8', ); } # Check for grammatical errors in the request my @errs = validate_request(CGI::Vars()); my $mdp = param('metadataPrefix') || ''; my @mdfs = $h->ListMetadataFormats()->metadataFormat; if( $mdp && !grep { $_->metadataPrefix } @mdfs ) { push @errs, new HTTP::OAI::Error(code=>'cannotDisseminateFormat',message=>"Dissemination as '$mdp' is not supported"); } if( param('resumptionToken') ) { push @errs, new HTTP::OAI::Error(code=>'badArgument',message=>'This repository does not support flow-control'); } my $r; if( @errs ) { $r = HTTP::OAI::Response->new( requestURL=>self_url() ); $r->errors(@errs); } else { my %attr = CGI::Vars(); my $verb = delete($attr{'verb'}); $r = $h->$verb(%attr); $r->requestURL(self_url()); if( 'Identify' eq $verb && ref($r) eq 'HTTP::OAI::Identify' ) { $r->baseURL(url()); } } $r->set_handler(XML::SAX::Writer->new(Output=>\*STDOUT)); $r->generate; HTTP-OAI-3.27/bin/oai_browser.pl0000755000076400007640000002157011616520461014544 0ustar tdb2tdb2#!/usr/bin/perl -w =head1 NAME oai_browser - Command line OAI repository browser =head1 DESCRIPTION The oai_browser utility provides a command-line tool to browse an OAI-compliant repository. =head1 SYNOPSIS oai_browser.pl B<[options]> I =head1 ARGUMENTS =over 4 =item I Specify baseURL to connect to. =back =head1 OPTIONS =over 8 =item B<--help> Show this page. =item B<--silent> Don't display data harvested from the repository - only shows a record count. =item B<--trace> Turn on trace debugging. =item B<--tracesax> Turn on trace debugging of SAX calls. =item B<--skip-identify> Don't perform an initial Identify to check the repository's baseURL. =cut BEGIN { unshift @INC, "."; } use vars qw($VERSION $PROTOCOL_VERSION $h); use lib "../lib"; use lib "lib"; use HTTP::OAI; use Pod::Usage; $VERSION = $HTTP::OAI::VERSION; use vars qw( @ARCHIVES ); @ARCHIVES = qw( http://cogprints.soton.ac.uk/perl/oai2 http://citebase.eprints.org/cgi-bin/oai2 http://arXiv.org/oai2 http://www.biomedcentral.com/oai/2.0/ ); use strict; use warnings; #use sigtrap qw( die INT ); # This is just confusing ... #binmode(STDOUT,":encoding(iso-8859-1)"); # Causes Out of memory! errors :-( binmode(STDOUT,":utf8"); use Getopt::Long; eval "use Term::ReadLine"; if( $@ ) { die "Requires Term::ReadLine perl module\n"; } eval "use Term::ReadKey"; if( $@ ) { die "Requires Term::ReadKey perl module\n"; } use HTTP::OAI::Harvester; use HTTP::OAI::Metadata::OAI_DC; my ($opt_silent, $opt_help, $opt_trace, $opt_tracesax, $opt_skip_identify); $opt_silent = 0; GetOptions ( 'silent' => \$opt_silent, 'help' => \$opt_help, 'trace' => \$opt_trace, 'tracesax' => \$opt_tracesax, 'skip-identify' => \$opt_skip_identify, ); pod2usage(1) if $opt_help; if( $opt_trace ) { HTTP::OAI::Debug::level( '+trace' ); } if( $opt_tracesax ) { HTTP::OAI::Debug::level( '+sax' ); } print < Use CTRL+C to quit at any time --- EOF my $DEFAULTID = ''; use vars qw($TERM @SETS @PREFIXES); $TERM = Term::ReadLine->new($0); $TERM->addhistory(@ARCHIVES); while(1) { # my $burl = input('Enter the base URL to use [http://cogprints.soton.ac.uk/perl/oai2]: ') || 'http://cogprints.soton.ac.uk/perl/oai2'; my $burl = shift || $TERM->readline('OAI Base URL to query>','http://cogprints.soton.ac.uk/perl/oai2') || next; $h = new HTTP::OAI::Harvester(baseURL=>$burl); last if $opt_skip_identify; if( my $id = Identify() ) { $h->repository($id); $PROTOCOL_VERSION = $id->version; last; } } my $archive = $h->repository; &mainloop(); sub mainloop { while(1) { print "\nMenu\n----\n\n", "1. GetRecord\n2. Identify\n3. ListIdentifiers\n4. ListMetadataFormats\n5. ListRecords\n6. ListSets\nq. Quit\n\n>"; my $cmd; ReadMode(4); $cmd = ReadKey(); ReadMode(0); last unless defined($cmd); print $cmd . "\n"; if( $cmd eq 'q' ) { last; } elsif($cmd eq '1') { eval { GetRecord() }; } elsif($cmd eq '2') { eval { Identify() }; } elsif($cmd eq '3') { eval { ListIdentifiers() }; } elsif($cmd eq '4') { eval { ListMetadataFormats() }; } elsif($cmd eq '5') { eval { ListRecords() }; } elsif($cmd eq '6') { eval { ListSets() }; } if( $@ ) { warn "Internal error occurred: $@\n"; } } } sub GetRecord { printtitle("GetRecord"); my $id = $TERM->readline("Enter the identifier to request>",$DEFAULTID) || $DEFAULTID; $TERM->addhistory(@PREFIXES); my $mdp = $TERM->readline("Enter the metadataPrefix to use>",'oai_dc') || 'oai_dc'; my $r = $h->GetRecord( identifier=>$id, metadataPrefix=>$mdp, handlers=>{ metadata=>($mdp eq 'oai_dc' ? 'HTTP::OAI::Metadata::OAI_DC' : undef), }, ); if( defined(my $rec = $r->next) ) { printheader($r); print "identifier => ", $rec->identifier, ($rec->status ? " (".$rec->status.") " : ''), "\n", "datestamp => ", $rec->datestamp, "\n"; foreach($rec->header->setSpec) { print "setSpec => ", $_, "\n"; } print "\nHeader:\n", $rec->header->dom->toString; print "\nMetadata:\n", $rec->metadata->toString if defined($rec->metadata); print "\nAbout data:\n", join("\n",map { $_->toString } $rec->about) if $rec->about; } iserror($r); } sub Identify { printtitle("Identify"); my $r = $h->Identify; return if iserror($r); print map({ "adminEmail => " . $_ . "\n" } $r->adminEmail), "baseURL => ", $r->baseURL, "\n", "protocolVersion => ", $r->protocolVersion, "\n", "repositoryName => ", $r->repositoryName, "\n"; foreach my $dom (grep { defined } map { $_->dom } $r->description) { foreach my $md ($dom->getElementsByTagNameNS('http://www.openarchives.org/OAI/2.0/oai-identifier','oai-identifier')) { foreach my $elem ($md->getElementsByTagNameNS('http://www.openarchives.org/OAI/2.0/oai-identifier','sampleIdentifier')) { $DEFAULTID = $elem->getFirstChild->toString; print "sampleIdentifier => ", $DEFAULTID, "\n"; } } } $r; } sub ListIdentifiers { printtitle("ListIdentifiers"); my $resumptionToken = $TERM->readline("Enter an optional resumptionToken>"); my ($from, $until, $set, $mdp); if( !$resumptionToken ) { $from = $TERM->readline("Enter an optional from period (yyyy-mm-dd)>"); $until = $TERM->readline("Enter an optional until period (yyyy-mm-dd)>"); $TERM->addhistory(@SETS); $set = $TERM->readline("Enter an optional set ([A-Z0-9_]+)>"); $TERM->addhistory(@PREFIXES); $mdp = $TERM->readline("Enter the metadataPrefix to use>",'oai_dc') || 'oai_dc'; } my $c = 0; my $cb = $opt_silent ? sub { print STDERR $c++, "\r"; } : sub { my $rec = shift; $c++; print "identifier => ", $rec->identifier, (defined($rec->datestamp) ? " / " . $rec->datestamp : ''), ($rec->status ? " (".$rec->status.") " : ''), "\n"; }; #printheader($r); my $r = $h->ListIdentifiers( checkargs(resumptionToken=>$resumptionToken,from=>$from,until=>$until,set=>$set,metadataPrefix=>$mdp), onRecord => $cb, ); print "\nRead a total of $c records\n"; return if iserror($r); } sub ListMetadataFormats { printtitle("ListMetadataFormats"); my $id = $TERM->readline("Enter an optional identifier>"); my $r = $h->ListMetadataFormats(checkargs(identifier=>$id)); return if iserror($r); @PREFIXES = (); printheader($r); while( my $mdf = $r->next ) { push @PREFIXES, $mdf->metadataPrefix; print "metadataPrefix => ", $mdf->metadataPrefix, "\n", "schema => ", $mdf->schema, "\n", "metadataNamespace => ", ($mdf->metadataNamespace || ''), "\n"; } } sub ListRecords { printtitle("ListRecords"); my $resumptionToken = $TERM->readline("Enter an optional resumptionToken>"); my ($from, $until, $set, $mdp); if( !$resumptionToken ) { $from = $TERM->readline("Enter an optional from period (yyyy-mm-dd)>"); $until = $TERM->readline("Enter an optional until period (yyyy-mm-dd)>"); $TERM->addhistory(@SETS); $set = $TERM->readline("Enter an optional set ([A-Z0-9_]+)>"); $TERM->addhistory(@PREFIXES); $mdp = $TERM->readline("Enter the metadataPrefix to use>",'oai_dc') || 'oai_dc'; } my $c = 0; my $cb = $opt_silent ? sub { print STDERR $c++, "\r"; } : sub { my $rec = shift; $c++; print "\nidentifier => ", $rec->identifier, ($rec->status ? " (".$rec->status.") " : ''), "\n", "datestamp => ", $rec->datestamp, "\n"; foreach($rec->header->setSpec) { print "setSpec => ", $_, "\n"; } print "\nMetadata:\n", ($rec->metadata->toString||'(null)') if $rec->metadata; print "\nAbout data:\n", join("\n",map { ($_->toString||'(null)') } $rec->about) if $rec->about; }; #printheader($r); my $r = $h->ListRecords( checkargs(resumptionToken=>$resumptionToken,from=>$from,until=>$until,set=>$set,metadataPrefix=>$mdp), handlers=>{ metadata=>(($mdp and $mdp eq 'oai_dc') ? 'HTTP::OAI::Metadata::OAI_DC' : undef), }, onRecord => $cb, ); print "\nRead a total of $c records\n"; return if iserror($r); } sub ListSets { printtitle("ListSets"); sub cb { my $rec = shift; push @SETS, $rec->setSpec; print "setSpec => ", $rec->setSpec, "\n", "setName => ", ($rec->setName||'(null)'), "\n"; }; my $r = $h->ListSets(onRecord=>\&cb); return if iserror($r); } sub input { my $q = shift; print $q; my $r = <>; return unless defined($r); chomp($r); return $r; } sub printtitle { my $t = shift; print "\n$t\n"; for( my $i = 0; $i < length($t); $i++ ) { print "-"; } print "\n"; } sub printheader { my $r = shift; print "verb => ", $r->verb, "\n", "responseDate => ", $r->responseDate, "\n", "requestURL => ", $r->requestURL, "\n"; } sub checkargs { my %args = @_; foreach my $key (keys %args) { delete $args{$key} if( !defined($args{$key}) || $args{$key} eq '' ); } %args; } sub iserror { my $r = shift; if( $r->is_success ) { return undef; } else { print "An error ", $r->code, " occurred while making the request", ($r->request ? " (" . $r->request->uri . ") " : ''), ":\n", $r->message, "\n"; return 1; } }