libbiblio-citation-parser-perl-1.10+dfsg.orig/ 0000755 0001750 0001750 00000000000 11300037153 020565 5 ustar gregoa gregoa libbiblio-citation-parser-perl-1.10+dfsg.orig/INSTALL 0000644 0001750 0001750 00000001303 10115645511 021621 0 ustar gregoa gregoa Installing Biblio::Citation::Parser
===================================
The easiest way to do this is like so:
perl Build.PL
./Build
./Build install
You can also optionally run './Build test', which will run a few checks on
the modules.
Requirements
============
The following modules are required for Biblio::Citation::Parser to function correctly:
- URI
- Text::Unidecode
The most recent versions of these are located at http://www.cpan.org.
To use the web service support, you will need the SOAP::Lite module
(and any dependencies that it requires). Known working versions
are available at http://paracite.eprints.org/files/perlmods/soap/, while the
latest versions are at http://www.cpan.org.
libbiblio-citation-parser-perl-1.10+dfsg.orig/docs/ 0000755 0001750 0001750 00000000000 11300037153 021515 5 ustar gregoa gregoa libbiblio-citation-parser-perl-1.10+dfsg.orig/docs/html/ 0000755 0001750 0001750 00000000000 10115646330 022467 5 ustar gregoa gregoa libbiblio-citation-parser-perl-1.10+dfsg.orig/docs/html/contact.html 0000644 0001750 0001750 00000004517 10115645542 025023 0 ustar gregoa gregoa
Biblio::Citation::Parser 1.10 Documentation - Problems, Questions and Feedback
Biblio::Citation::Parser 1.10 Documentation - Problems, Questions and Feedback
There is currently no online bug tracking system. Known bugs are listed in the BUGLIST file in the distribution and
a list will be kept on the http://paracite.eprints.org/developers/ site.
If you identify a bug or ``issue'' (issues are not bugs, but are things which could be clearer or better), and it's not already listed on the site, please let us know at paracite@ecs.soton.ac.uk - include all the information you can: what version of Biblio::Citation::Parser (see VERSION if you're not sure), what operating system etc.
There is a mailing list for ParaTools (encompassing Biblio::Citation::Parser) which may be the right place to ask general questions and start discussions on broad design issues.
The examples directory contains two categories of examples - parsing examples and web service examples. Note that the web service examples require the SOAP::Lite module (see Required Software for more information). To try out these samples after installation, simply cd into the directory and execute the example. More information about the examples is in the README file inside the examples directory.
Biblio::Citation::Parser 1.10 Documentation - How to Install Biblio::Citation::Parser
If you cannot find a solution to your problem here, make sure you are using the latest version of the toolkit and ask on the ParaTools mailing list (see http://paracite.eprints.org/developers/).
If you are using the Standard parsing module, make sure that a template for the reference exists in the package. See the HOWTO for more information on how to do this. If you are using a contributed module, please email your query to the author of the module.
ParaTools, short for ParaCite Toolkit, is a collection of Perl modules for reference parsing that is designed to be easily expanded and yet simple to use. The parsing modules make up the core of the package, but there are also useful modules to assist with OpenURL creation and the extraction of references from documents. The toolkit is released under the GNU Public License, so can be used freely as long as the source code is provided (see the COPYING file in the root directory of the distribution for more information).
The toolkit came about as a result of the ParaCite resource, a reference search engine located at http://paracite.eprints.org, which uses a template-based reference parser to extract metadata from provided references and then provides search results based on this metadata. The ParaCite parser is provided directly as the Biblio::Citation::Parser::Standard module, with a separate Templates module that can be replaced as new reference templates are located.
As well as providing examples for the provided parsing modules, ParaTools also includes examples for using the ParaCite web service. This is an alternate interface which provides access to ParaCite's search and parsing functionality for any language that supports the Web Services Description Language (WSDL).
The ParaTools package has many applications, including:
Converting reference lists into valid OpenURLs
Converting existing metadata into valid OpenURLs
Collecting metadata from references to carry out internal searches
Extracting reference lists from documents
Carrying out searches using ParaCite
The modularity of ParaTools means that it is very easy to add new techniques (and we would be very pleased to hear of new ones!).
ParaTools should work on any platform that supports Perl 5.6.0 or higher, although testing was primarily carried out using Red Hat Linux 7.3 with Perl 5.6. Where possible platform-agnostic modules have been used for file functionality, so temporary files should be placed in the correct place for the operating system. Memory requirements for ParaTools are minimal, although the template parser and document parser will require more memory as the number of templates and sizes of documents increase.
The Biblio::Citation::Parser package includes several examples that demonstrate the ParaCite web service, as well as the WSDL definition file. This section explains the web service, and gives an introduction to using it.
As ParaCite is written entirely in Perl, there are obvious issues if you wish to use Java, PHP, or another language. The ParaCite web services provides an interface into the reference parsing features of ParaCite, while remaining language agnostic.
To access the web service from Perl requires the SOAP::Lite module (see Required Software). Once this is present, this is all that is required to connect to the web service:
use SOAP::Lite;
my $service = SOAP::Lite
-> service("http://paracite.eprints.org/paracite.wsdl");
my $base_url = "http://paracite.eprints.org/cgi-bin/openurl.cgi?";
my $query = "Harnad, Stevan (1995) The PostGutenberg Galaxy.";
my $result = $service
-> doParaciteSearch($query, $base_url);
my $first_result = $result->{resultElements}->[0];
print "First result is: ".$first_result->{URL}."\n";
The web service automatically adds Google, Scirus, and Vivissimo as resources to the search request, so if no resources match the publication or subject these will be used as fall-backs.
Most of the Paracite structures have been modelled very closely on the Google web service structures to allow some degree of standardisation. Some additions have been made, and some fields are not yet used, but these may change in future versions.
Adding new templates to the Standard parser is relatively easy:
Locate where your Templates.pm file has been installed.
On Linux systems this should just involve doing 'locate Templates.pm', otherwise 'find / -name Templates.pm' should work.
Alternatively, you can edit the Templates.pm in the Biblio/Citation/Parser/ directory of an unpacked distribution, and install it once you have finished.
Add the template to the list.
If you are editing an already installed Templates.pm file you will probably have to be root to do this. If you are editing the Templates.pm inside an unpacked distribution, you will have to reinstall the modules once you are finished (see the Installation section).
The Templates.pm file should contain a structure similar to this:
Each template is a string containing a set of placeholders. For example, '_AUTHORS_ (_YEAR_) _TITLE_' can match 'Jewell, M (2002) Title'. The following are valid field names:
EPrints already contains ParaCite support, but using a specially built version of the module before it was part of ParaTools. To alter your cgi/paracite script to use ParaTools, you need to do the following:
First replace
use Citation::Parser::Simple;
with
use Biblio::Citation::Parser::Standard;
Next, replace this line:
my $parser = new Citation::Parser::Simple();
with this line:
my $parser = new Biblio::Citation::Parser::Standard();
This should work fine, although you can obviously integrate ParaCite more if you wish.
All new citation parsers should be named Biblio::Citation::Parser::SomeName, where SomeName is replaced with a unique name (ideally the author's surname). The parser should extend the Biblio::Citation::Parser module like so:
If you wish to create valid OpenURLs, URI::OpenURL provides a set of functions for this purpose. The metadata produced by Biblio::Citation::Parser can be used with this module.
This module is required if you wish to use the ParaCite web services, but optional otherwise. This requires several other modules, which are available in the soap subdirectory
of http://paracite.eprints.org/files/perlmods/.
There are also some dependencies for the above modules, including MIME::Base64, HTML::TagSet, and Digest::MD5. The latest versions of these can be obtained from http://www.cpan.org/
Biblio::Citation::Parser is designed for parsing citations, and this can be done very simply:
use Biblio::Citation::Parser::Standard;
my $parser = new Biblio::Citation::Parser::Standard();
my $metadata = $parser->parse("Jewell, M (2002) Parsing Examples");
The $metadata variable is a hash containing the information extracted from the reference.
If you'd prefer to use another parser, simply substitute the 'Standard' for the appropriate module. Biblio::Citation::Parser is distributed with the Jiao module, which is a slightly modified version of a module created by Zhuoan Jiao. To use this instead of the Standard module, you would do the following:
use Biblio::Citation::Parser::Jiao;
my $parser = new Biblio::Citation::Parser::Jiao();
my $metadata = $parser->parse("Jewell, M (2002) Parsing Examples");
The Standard module provides slightly richer metadata than the Jiao module, but it does rely on templates (see Biblio::Citation::Parser::Templates) so requires updating as new citation formats are found.
Once you have the metadata from the reference, it is easy to create an OpenURL from it:
use Biblio::Citation::Parser::Standard;
use Biblio::Citation::Parser::Utils;
my $parser = new Biblio::Citation::Parser::Standard();
my $metadata = $parser->parse("Jewell, M (2002) Parsing Examples");
my $openurl = create_openurl($metadata);
The OpenURLs created by Biblio::Citation::Parser do not have a Base URL prefixed, so this should be carried out before they are used (the ParaCite base URL is http://paracite.eprints.org/cgi-bin/openurl.cgi).
If you would like to try to extract more information from the metadata, you can use the decompose_openurl function:
my ($enriched_metadata, @errors) = decompose_openurl($metadata);
This tries to extract information from SICIs, page ranges, etc, and also checks the fields for validity (the C<@errors> array contains any mistakes).
Note that the create_openurl has been superceded by URI::OpenURL, but the metadata returned by trim_openurl is in the correct format to be passed to this module.
Biblio::Citation::Parser supports all of the fields specified in Table 1 of the OpenURL specification (http://www.sfxit.com/openurl/openurl.html). Specific parsers can add their own fields, but these are not exported when OpenURLs are created. Biblio::Citation::Parser::Standard provides the following extra fields: