Bio-DB-SeqFeature-1.7.4000755000766000024 013605523026 14633 5ustar00cjfieldsstaff000000000000Changes100644000766000024 101313605523026 16202 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4Summary of important user-visible changes for Bio-DB-SeqFeature --------------------------------------------------------------- 1.7.4 2020-01-08 22:03:25-06:00 America/Chicago * The prior release required both DBD::Pg and DBD::SQLite; we now skip these. * We now explicitly require Bio::DB::GFF::Typename; tests depend on it and the dynamic loading used (which caused issue #1) now doesn't make sense. 1.7.3 2019-02-19 14:26:06-05:00 America/Detroit * First release after split from bioperl-live. LICENSE100644000766000024 4417713605523026 15756 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4This software is copyright (c) 2020 by Cold Spring Harbor Laboratory, Nathan Weeks, Ontario Institute for Cancer Research. This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself. Terms of the Perl programming language system itself a) the GNU General Public License as published by the Free Software Foundation; either version 1, or (at your option) any later version, or b) the "Artistic License" --- The GNU General Public License, Version 1, February 1989 --- This software is Copyright (c) 2020 by Cold Spring Harbor Laboratory, Nathan Weeks, Ontario Institute for Cancer Research. This is free software, licensed under: The GNU General Public License, Version 1, February 1989 GNU GENERAL PUBLIC LICENSE Version 1, February 1989 Copyright (C) 1989 Free Software Foundation, Inc. 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The license agreements of most software companies try to keep users at the mercy of those companies. By contrast, our General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. The General Public License applies to the Free Software Foundation's software and to any other program whose authors commit to using it. You can use it for your programs, too. When we speak of free software, we are referring to freedom, not price. Specifically, the General Public License is designed to make sure that you have the freedom to give away or sell copies of free software, that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of a such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must tell them their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License Agreement applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any work containing the Program or a portion of it, either verbatim or with modifications. Each licensee is addressed as "you". 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this General Public License and to the absence of any warranty; and give any other recipients of the Program a copy of this General Public License along with the Program. You may charge a fee for the physical act of transferring a copy. 2. You may modify your copy or copies of the Program or any portion of it, and copy and distribute such modifications under the terms of Paragraph 1 above, provided that you also do the following: a) cause the modified files to carry prominent notices stating that you changed the files and the date of any change; and b) cause the whole of any work that you distribute or publish, that in whole or in part contains the Program or any part thereof, either with or without modifications, to be licensed at no charge to all third parties under the terms of this General Public License (except that you may choose to grant warranty protection to some or all third parties, at your option). c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the simplest and most usual way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this General Public License. d) You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. Mere aggregation of another independent work with the Program (or its derivative) on a volume of a storage or distribution medium does not bring the other work under the scope of these terms. 3. You may copy and distribute the Program (or a portion or derivative of it, under Paragraph 2) in object code or executable form under the terms of Paragraphs 1 and 2 above provided that you also do one of the following: a) accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Paragraphs 1 and 2 above; or, b) accompany it with a written offer, valid for at least three years, to give any third party free (except for a nominal charge for the cost of distribution) a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Paragraphs 1 and 2 above; or, c) accompany it with the information you received as to where the corresponding source code may be obtained. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form alone.) Source code for a work means the preferred form of the work for making modifications to it. For an executable file, complete source code means all the source code for all modules it contains; but, as a special exception, it need not include source code for modules which are standard libraries that accompany the operating system on which the executable file runs, or for standard header files or definitions files that accompany that operating system. 4. You may not copy, modify, sublicense, distribute or transfer the Program except as expressly provided under this General Public License. Any attempt otherwise to copy, modify, sublicense, distribute or transfer the Program is void, and will automatically terminate your rights to use the Program under this License. However, parties who have received copies, or rights to use copies, from you under this General Public License will not have their licenses terminated so long as such parties remain in full compliance. 5. By copying, distributing or modifying the Program (or any work based on the Program) you indicate your acceptance of this license to do so, and all its terms and conditions. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. 7. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of the license which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the license, you may choose any version ever published by the Free Software Foundation. 8. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 9. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 10. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS Appendix: How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to humanity, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) 19yy This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 1, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston MA 02110-1301 USA Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) 19xx name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (a program to direct compilers to make passes at assemblers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice That's all there is to it! --- The Artistic License 1.0 --- This software is Copyright (c) 2020 by Cold Spring Harbor Laboratory, Nathan Weeks, Ontario Institute for Cancer Research. This is free software, licensed under: The Artistic License 1.0 The Artistic License Preamble The intent of this document is to state the conditions under which a Package may be copied, such that the Copyright Holder maintains some semblance of artistic control over the development of the package, while giving the users of the package the right to use and distribute the Package in a more-or-less customary fashion, plus the right to make reasonable modifications. Definitions: - "Package" refers to the collection of files distributed by the Copyright Holder, and derivatives of that collection of files created through textual modification. - "Standard Version" refers to such a Package if it has not been modified, or has been modified in accordance with the wishes of the Copyright Holder. - "Copyright Holder" is whoever is named in the copyright or copyrights for the package. - "You" is you, if you're thinking about copying or distributing this Package. - "Reasonable copying fee" is whatever you can justify on the basis of media cost, duplication charges, time of people involved, and so on. (You will not be required to justify it to the Copyright Holder, but only to the computing community at large as a market that must bear the fee.) - "Freely Available" means that no fee is charged for the item itself, though there may be fees involved in handling the item. It also means that recipients of the item may redistribute it under the same conditions they received it. 1. You may make and give away verbatim copies of the source form of the Standard Version of this Package without restriction, provided that you duplicate all of the original copyright notices and associated disclaimers. 2. You may apply bug fixes, portability fixes and other modifications derived from the Public Domain or from the Copyright Holder. A Package modified in such a way shall still be considered the Standard Version. 3. You may otherwise modify your copy of this Package in any way, provided that you insert a prominent notice in each changed file stating how and when you changed that file, and provided that you do at least ONE of the following: a) place your modifications in the Public Domain or otherwise make them Freely Available, such as by posting said modifications to Usenet or an equivalent medium, or placing the modifications on a major archive site such as ftp.uu.net, or by allowing the Copyright Holder to include your modifications in the Standard Version of the Package. b) use the modified Package only within your corporation or organization. c) rename any non-standard executables so the names do not conflict with standard executables, which must also be provided, and provide a separate manual page for each non-standard executable that clearly documents how it differs from the Standard Version. d) make other distribution arrangements with the Copyright Holder. 4. You may distribute the programs of this Package in object code or executable form, provided that you do at least ONE of the following: a) distribute a Standard Version of the executables and library files, together with instructions (in the manual page or equivalent) on where to get the Standard Version. b) accompany the distribution with the machine-readable source of the Package with your modifications. c) accompany any non-standard executables with their corresponding Standard Version executables, giving the non-standard executables non-standard names, and clearly documenting the differences in manual pages (or equivalent), together with instructions on where to get the Standard Version. d) make other distribution arrangements with the Copyright Holder. 5. You may charge a reasonable copying fee for any distribution of this Package. You may charge any fee you choose for support of this Package. You may not charge a fee for this Package itself. However, you may distribute this Package in aggregate with other (possibly commercial) programs as part of a larger (possibly commercial) software distribution provided that you do not advertise this Package as a product of your own. 6. The scripts and library files supplied as input to or produced as output from the programs of this Package do not automatically fall under the copyright of this Package, but belong to whomever generated them, and may be sold commercially, and may be aggregated with this Package. 7. C or perl subroutines supplied by you and linked into this Package shall not be considered part of this Package. 8. The name of the Copyright Holder may not be used to endorse or promote products derived from this software without specific prior written permission. 9. THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE. The End dist.ini100644000766000024 125013605523026 16356 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4name = Bio-DB-SeqFeature version = 1.7.4 author = Lincoln Stein author = Nathan Weeks copyright_holder = Cold Spring Harbor Laboratory, Nathan Weeks, Ontario Institute for Cancer Research license = Perl_5 ;; Modules should be fixed so that these don't have to be removed. [@BioPerl] -remove = PodCoverageTests -remove = PodWeaver -remove = Test::EOL -remove = Test::NoTabs ;; Skip requiring DBD::Pg AutoPrereqs.skips[0] = ^DBD::\S+$ ;; skip compilation tests, which fail if DBD::* modules aren't installed Test::Compile.skip[0] = Bio::DB::SeqFeature::Store::DBI::Pg Test::Compile.skip[1] = Bio::DB::SeqFeature::Store::DBI::SQLite META.yml100644000766000024 1710213605523026 16206 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4--- abstract: 'Normalized feature for use with Bio::DB::SeqFeature::Store' author: - 'Lincoln Stein ' - 'Nathan Weeks ' build_requires: Bio::Root::Test: '0' File::Spec: '0' IO::Handle: '0' IPC::Open3: '0' Test::More: '0' lib: '0' perl: '5.006' configure_requires: ExtUtils::MakeMaker: '0' dynamic_config: 0 generated_by: 'Dist::Zilla version 6.012, CPAN::Meta::Converter version 2.150010' license: perl meta-spec: url: http://module-build.sourceforge.net/META-spec-v1.4.html version: '1.4' name: Bio-DB-SeqFeature requires: Bio::DB::Fasta: '0' Bio::DB::GFF::Typename: '0' Bio::DB::GFF::Util::Rearrange: '0' Bio::Location::Simple: '0' Bio::PrimarySeq: '0' Bio::RangeI: '0' Bio::Root::Root: '0' Bio::Seq: '0' Bio::SeqFeature::CollectionI: '0' Bio::SeqFeature::Lite: '0' Carp: '0' Cwd: '0' DBI: '0' DB_File: '0' Fcntl: '0' File::Basename: '0' File::Copy: '0' File::Glob: '0' File::Path: '0' File::Spec: '0' File::Temp: '0' Getopt::Long: '0' IO::File: '0' MIME::Base64: '0' Memoize: '0' Pod::Usage: '0' Scalar::Util: '0' Text::ParseWords: '0' base: '0' constant: '0' overload: '0' strict: '0' vars: '0' warnings: '0' resources: bugtracker: https://github.com/bioperl/bio-db-seqfeature/issues homepage: https://metacpan.org/release/Bio-DB-SeqFeature repository: git://github.com/bioperl/bio-db-seqfeature.git version: 1.7.4 x_Dist_Zilla: perl: version: '5.028001' plugins: - class: Dist::Zilla::Plugin::GatherDir config: Dist::Zilla::Plugin::GatherDir: exclude_filename: [] exclude_match: [] follow_symlinks: 0 include_dotfiles: 0 prefix: '' prune_directory: [] root: . name: '@BioPerl/@Filter/GatherDir' version: '6.012' - class: Dist::Zilla::Plugin::PruneCruft name: '@BioPerl/@Filter/PruneCruft' version: '6.012' - class: Dist::Zilla::Plugin::ManifestSkip name: '@BioPerl/@Filter/ManifestSkip' version: '6.012' - class: Dist::Zilla::Plugin::MetaYAML name: '@BioPerl/@Filter/MetaYAML' version: '6.012' - class: Dist::Zilla::Plugin::License name: '@BioPerl/@Filter/License' version: '6.012' - class: Dist::Zilla::Plugin::ExtraTests name: '@BioPerl/@Filter/ExtraTests' version: '6.012' - class: Dist::Zilla::Plugin::ExecDir name: '@BioPerl/@Filter/ExecDir' version: '6.012' - class: Dist::Zilla::Plugin::ShareDir name: '@BioPerl/@Filter/ShareDir' version: '6.012' - class: Dist::Zilla::Plugin::MakeMaker config: Dist::Zilla::Role::TestRunner: default_jobs: 1 name: '@BioPerl/@Filter/MakeMaker' version: '6.012' - class: Dist::Zilla::Plugin::Manifest name: '@BioPerl/@Filter/Manifest' version: '6.012' - class: Dist::Zilla::Plugin::TestRelease name: '@BioPerl/@Filter/TestRelease' version: '6.012' - class: Dist::Zilla::Plugin::ConfirmRelease name: '@BioPerl/@Filter/ConfirmRelease' version: '6.012' - class: Dist::Zilla::Plugin::UploadToCPAN name: '@BioPerl/@Filter/UploadToCPAN' version: '6.012' - class: Dist::Zilla::Plugin::MetaConfig name: '@BioPerl/MetaConfig' version: '6.012' - class: Dist::Zilla::Plugin::MetaJSON name: '@BioPerl/MetaJSON' version: '6.012' - class: Dist::Zilla::Plugin::PkgVersion name: '@BioPerl/PkgVersion' version: '6.012' - class: Dist::Zilla::Plugin::PodSyntaxTests name: '@BioPerl/PodSyntaxTests' version: '6.012' - class: Dist::Zilla::Plugin::Test::Compile config: Dist::Zilla::Plugin::Test::Compile: bail_out_on_fail: '0' fail_on_warning: author fake_home: 0 filename: t/00-compile.t module_finder: - ':InstallModules' needs_display: 0 phase: test script_finder: - ':PerlExecFiles' skips: - Bio::DB::SeqFeature::Store::DBI::Pg - Bio::DB::SeqFeature::Store::DBI::SQLite switch: [] name: '@BioPerl/Test::Compile' version: '2.058' - class: Dist::Zilla::Plugin::MojibakeTests name: '@BioPerl/MojibakeTests' version: '0.8' - class: Dist::Zilla::Plugin::AutoPrereqs name: '@BioPerl/AutoPrereqs' version: '6.012' - class: Dist::Zilla::Plugin::AutoMetaResources name: '@BioPerl/AutoMetaResources' version: '1.21' - class: Dist::Zilla::Plugin::MetaResources name: '@BioPerl/MetaResources' version: '6.012' - class: Dist::Zilla::Plugin::Encoding name: '@BioPerl/Encoding' version: '6.012' - class: Dist::Zilla::Plugin::NextRelease name: '@BioPerl/NextRelease' version: '6.012' - class: Dist::Zilla::Plugin::Git::Check config: Dist::Zilla::Plugin::Git::Check: untracked_files: die Dist::Zilla::Role::Git::DirtyFiles: allow_dirty: - Changes - dist.ini allow_dirty_match: [] changelog: Changes Dist::Zilla::Role::Git::Repo: git_version: 2.20.1 repo_root: . name: '@BioPerl/Git::Check' version: '2.045' - class: Dist::Zilla::Plugin::Git::Commit config: Dist::Zilla::Plugin::Git::Commit: add_files_in: [] commit_msg: v%v%n%n%c Dist::Zilla::Role::Git::DirtyFiles: allow_dirty: - Changes - dist.ini allow_dirty_match: [] changelog: Changes Dist::Zilla::Role::Git::Repo: git_version: 2.20.1 repo_root: . Dist::Zilla::Role::Git::StringFormatter: time_zone: local name: '@BioPerl/Git::Commit' version: '2.045' - class: Dist::Zilla::Plugin::Git::Tag config: Dist::Zilla::Plugin::Git::Tag: branch: ~ changelog: Changes signed: 0 tag: Bio-DB-SeqFeature-v1.7.4 tag_format: '%N-v%v' tag_message: '%N-v%v' Dist::Zilla::Role::Git::Repo: git_version: 2.20.1 repo_root: . Dist::Zilla::Role::Git::StringFormatter: time_zone: local name: '@BioPerl/Git::Tag' version: '2.045' - class: Dist::Zilla::Plugin::FinderCode name: ':InstallModules' version: '6.012' - class: Dist::Zilla::Plugin::FinderCode name: ':IncModules' version: '6.012' - class: Dist::Zilla::Plugin::FinderCode name: ':TestFiles' version: '6.012' - class: Dist::Zilla::Plugin::FinderCode name: ':ExtraTestFiles' version: '6.012' - class: Dist::Zilla::Plugin::FinderCode name: ':ExecFiles' version: '6.012' - class: Dist::Zilla::Plugin::FinderCode name: ':PerlExecFiles' version: '6.012' - class: Dist::Zilla::Plugin::FinderCode name: ':ShareFiles' version: '6.012' - class: Dist::Zilla::Plugin::FinderCode name: ':MainModule' version: '6.012' - class: Dist::Zilla::Plugin::FinderCode name: ':AllFiles' version: '6.012' - class: Dist::Zilla::Plugin::FinderCode name: ':NoFiles' version: '6.012' zilla: class: Dist::Zilla::Dist::Builder config: is_trial: '0' version: '6.012' x_generated_by_perl: v5.28.1 x_serialization_backend: 'YAML::Tiny version 1.73' MANIFEST100644000766000024 230313605523026 16043 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4# This file was automatically generated by Dist::Zilla::Plugin::Manifest v6.012. Changes LICENSE MANIFEST META.json META.yml Makefile.PL bin/bp_seqfeature_delete bin/bp_seqfeature_gff3 bin/bp_seqfeature_load dist.ini lib/Bio/DB/SeqFeature.pm lib/Bio/DB/SeqFeature/NormalizedFeature.pm lib/Bio/DB/SeqFeature/NormalizedFeatureI.pm lib/Bio/DB/SeqFeature/NormalizedTableFeatureI.pm lib/Bio/DB/SeqFeature/Segment.pm lib/Bio/DB/SeqFeature/Store.pm lib/Bio/DB/SeqFeature/Store/DBI/Iterator.pm lib/Bio/DB/SeqFeature/Store/DBI/Pg.pm lib/Bio/DB/SeqFeature/Store/DBI/SQLite.pm lib/Bio/DB/SeqFeature/Store/DBI/mysql.pm lib/Bio/DB/SeqFeature/Store/FeatureFileLoader.pm lib/Bio/DB/SeqFeature/Store/GFF2Loader.pm lib/Bio/DB/SeqFeature/Store/GFF3Loader.pm lib/Bio/DB/SeqFeature/Store/LoadHelper.pm lib/Bio/DB/SeqFeature/Store/Loader.pm lib/Bio/DB/SeqFeature/Store/bdb.pm lib/Bio/DB/SeqFeature/Store/berkeleydb.pm lib/Bio/DB/SeqFeature/Store/berkeleydb3.pm lib/Bio/DB/SeqFeature/Store/memory.pm t/00-compile.t t/SeqFeature.t t/author-mojibake.t t/author-pod-syntax.t t/data/dbfa/1.fa t/data/dbfa/2.fa t/data/dbfa/3.fa t/data/dbfa/4.fa t/data/dbfa/5.fa t/data/dbfa/6.fa t/data/dbfa/7.fa t/data/dbfa/mixed_alphabet.fasta t/data/test.gff3 META.json100644000766000024 2757113605523026 16371 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4{ "abstract" : "Normalized feature for use with Bio::DB::SeqFeature::Store", "author" : [ "Lincoln Stein ", "Nathan Weeks " ], "dynamic_config" : 0, "generated_by" : "Dist::Zilla version 6.012, CPAN::Meta::Converter version 2.150010", "license" : [ "perl_5" ], "meta-spec" : { "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec", "version" : 2 }, "name" : "Bio-DB-SeqFeature", "prereqs" : { "configure" : { "requires" : { "ExtUtils::MakeMaker" : "0" } }, "develop" : { "requires" : { "Test::Mojibake" : "0", "Test::Pod" : "1.41" } }, "runtime" : { "requires" : { "Bio::DB::Fasta" : "0", "Bio::DB::GFF::Typename" : "0", "Bio::DB::GFF::Util::Rearrange" : "0", "Bio::Location::Simple" : "0", "Bio::PrimarySeq" : "0", "Bio::RangeI" : "0", "Bio::Root::Root" : "0", "Bio::Seq" : "0", "Bio::SeqFeature::CollectionI" : "0", "Bio::SeqFeature::Lite" : "0", "Carp" : "0", "Cwd" : "0", "DBI" : "0", "DB_File" : "0", "Fcntl" : "0", "File::Basename" : "0", "File::Copy" : "0", "File::Glob" : "0", "File::Path" : "0", "File::Spec" : "0", "File::Temp" : "0", "Getopt::Long" : "0", "IO::File" : "0", "MIME::Base64" : "0", "Memoize" : "0", "Pod::Usage" : "0", "Scalar::Util" : "0", "Text::ParseWords" : "0", "base" : "0", "constant" : "0", "overload" : "0", "strict" : "0", "vars" : "0", "warnings" : "0" } }, "test" : { "requires" : { "Bio::Root::Test" : "0", "File::Spec" : "0", "IO::Handle" : "0", "IPC::Open3" : "0", "Test::More" : "0", "lib" : "0", "perl" : "5.006" } } }, "release_status" : "stable", "resources" : { "bugtracker" : { "mailto" : "bioperl-l@bioperl.org", "web" : "https://github.com/bioperl/bio-db-seqfeature/issues" }, "homepage" : "https://metacpan.org/release/Bio-DB-SeqFeature", "repository" : { "type" : "git", "url" : "git://github.com/bioperl/bio-db-seqfeature.git", "web" : "https://github.com/bioperl/bio-db-seqfeature" } }, "version" : "1.7.4", "x_Dist_Zilla" : { "perl" : { "version" : "5.028001" }, "plugins" : [ { "class" : "Dist::Zilla::Plugin::GatherDir", "config" : { "Dist::Zilla::Plugin::GatherDir" : { "exclude_filename" : [], "exclude_match" : [], "follow_symlinks" : 0, "include_dotfiles" : 0, "prefix" : "", "prune_directory" : [], "root" : "." } }, "name" : "@BioPerl/@Filter/GatherDir", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::PruneCruft", "name" : "@BioPerl/@Filter/PruneCruft", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::ManifestSkip", "name" : "@BioPerl/@Filter/ManifestSkip", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::MetaYAML", "name" : "@BioPerl/@Filter/MetaYAML", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::License", "name" : "@BioPerl/@Filter/License", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::ExtraTests", "name" : "@BioPerl/@Filter/ExtraTests", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::ExecDir", "name" : "@BioPerl/@Filter/ExecDir", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::ShareDir", "name" : "@BioPerl/@Filter/ShareDir", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::MakeMaker", "config" : { "Dist::Zilla::Role::TestRunner" : { "default_jobs" : 1 } }, "name" : "@BioPerl/@Filter/MakeMaker", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::Manifest", "name" : "@BioPerl/@Filter/Manifest", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::TestRelease", "name" : "@BioPerl/@Filter/TestRelease", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::ConfirmRelease", "name" : "@BioPerl/@Filter/ConfirmRelease", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::UploadToCPAN", "name" : "@BioPerl/@Filter/UploadToCPAN", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::MetaConfig", "name" : "@BioPerl/MetaConfig", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::MetaJSON", "name" : "@BioPerl/MetaJSON", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::PkgVersion", "name" : "@BioPerl/PkgVersion", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::PodSyntaxTests", "name" : "@BioPerl/PodSyntaxTests", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::Test::Compile", "config" : { "Dist::Zilla::Plugin::Test::Compile" : { "bail_out_on_fail" : 0, "fail_on_warning" : "author", "fake_home" : 0, "filename" : "t/00-compile.t", "module_finder" : [ ":InstallModules" ], "needs_display" : 0, "phase" : "test", "script_finder" : [ ":PerlExecFiles" ], "skips" : [ "Bio::DB::SeqFeature::Store::DBI::Pg", "Bio::DB::SeqFeature::Store::DBI::SQLite" ], "switch" : [] } }, "name" : "@BioPerl/Test::Compile", "version" : "2.058" }, { "class" : "Dist::Zilla::Plugin::MojibakeTests", "name" : "@BioPerl/MojibakeTests", "version" : "0.8" }, { "class" : "Dist::Zilla::Plugin::AutoPrereqs", "name" : "@BioPerl/AutoPrereqs", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::AutoMetaResources", "name" : "@BioPerl/AutoMetaResources", "version" : "1.21" }, { "class" : "Dist::Zilla::Plugin::MetaResources", "name" : "@BioPerl/MetaResources", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::Encoding", "name" : "@BioPerl/Encoding", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::NextRelease", "name" : "@BioPerl/NextRelease", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::Git::Check", "config" : { "Dist::Zilla::Plugin::Git::Check" : { "untracked_files" : "die" }, "Dist::Zilla::Role::Git::DirtyFiles" : { "allow_dirty" : [ "Changes", "dist.ini" ], "allow_dirty_match" : [], "changelog" : "Changes" }, "Dist::Zilla::Role::Git::Repo" : { "git_version" : "2.20.1", "repo_root" : "." } }, "name" : "@BioPerl/Git::Check", "version" : "2.045" }, { "class" : "Dist::Zilla::Plugin::Git::Commit", "config" : { "Dist::Zilla::Plugin::Git::Commit" : { "add_files_in" : [], "commit_msg" : "v%v%n%n%c" }, "Dist::Zilla::Role::Git::DirtyFiles" : { "allow_dirty" : [ "Changes", "dist.ini" ], "allow_dirty_match" : [], "changelog" : "Changes" }, "Dist::Zilla::Role::Git::Repo" : { "git_version" : "2.20.1", "repo_root" : "." }, "Dist::Zilla::Role::Git::StringFormatter" : { "time_zone" : "local" } }, "name" : "@BioPerl/Git::Commit", "version" : "2.045" }, { "class" : "Dist::Zilla::Plugin::Git::Tag", "config" : { "Dist::Zilla::Plugin::Git::Tag" : { "branch" : null, "changelog" : "Changes", "signed" : 0, "tag" : "Bio-DB-SeqFeature-v1.7.4", "tag_format" : "%N-v%v", "tag_message" : "%N-v%v" }, "Dist::Zilla::Role::Git::Repo" : { "git_version" : "2.20.1", "repo_root" : "." }, "Dist::Zilla::Role::Git::StringFormatter" : { "time_zone" : "local" } }, "name" : "@BioPerl/Git::Tag", "version" : "2.045" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":InstallModules", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":IncModules", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":TestFiles", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":ExtraTestFiles", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":ExecFiles", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":PerlExecFiles", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":ShareFiles", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":MainModule", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":AllFiles", "version" : "6.012" }, { "class" : "Dist::Zilla::Plugin::FinderCode", "name" : ":NoFiles", "version" : "6.012" } ], "zilla" : { "class" : "Dist::Zilla::Dist::Builder", "config" : { "is_trial" : 0 }, "version" : "6.012" } }, "x_generated_by_perl" : "v5.28.1", "x_serialization_backend" : "Cpanel::JSON::XS version 4.09" } Makefile.PL100644000766000024 562513605523026 16676 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4# This file was automatically generated by Dist::Zilla::Plugin::MakeMaker v6.012. use strict; use warnings; use 5.006; use ExtUtils::MakeMaker; my %WriteMakefileArgs = ( "ABSTRACT" => "Normalized feature for use with Bio::DB::SeqFeature::Store", "AUTHOR" => "Lincoln Stein , Nathan Weeks ", "CONFIGURE_REQUIRES" => { "ExtUtils::MakeMaker" => 0 }, "DISTNAME" => "Bio-DB-SeqFeature", "EXE_FILES" => [ "bin/bp_seqfeature_delete", "bin/bp_seqfeature_gff3", "bin/bp_seqfeature_load" ], "LICENSE" => "perl", "MIN_PERL_VERSION" => "5.006", "NAME" => "Bio::DB::SeqFeature", "PREREQ_PM" => { "Bio::DB::Fasta" => 0, "Bio::DB::GFF::Typename" => 0, "Bio::DB::GFF::Util::Rearrange" => 0, "Bio::Location::Simple" => 0, "Bio::PrimarySeq" => 0, "Bio::RangeI" => 0, "Bio::Root::Root" => 0, "Bio::Seq" => 0, "Bio::SeqFeature::CollectionI" => 0, "Bio::SeqFeature::Lite" => 0, "Carp" => 0, "Cwd" => 0, "DBI" => 0, "DB_File" => 0, "Fcntl" => 0, "File::Basename" => 0, "File::Copy" => 0, "File::Glob" => 0, "File::Path" => 0, "File::Spec" => 0, "File::Temp" => 0, "Getopt::Long" => 0, "IO::File" => 0, "MIME::Base64" => 0, "Memoize" => 0, "Pod::Usage" => 0, "Scalar::Util" => 0, "Text::ParseWords" => 0, "base" => 0, "constant" => 0, "overload" => 0, "strict" => 0, "vars" => 0, "warnings" => 0 }, "TEST_REQUIRES" => { "Bio::Root::Test" => 0, "File::Spec" => 0, "IO::Handle" => 0, "IPC::Open3" => 0, "Test::More" => 0, "lib" => 0 }, "VERSION" => "1.7.4", "test" => { "TESTS" => "t/*.t" } ); my %FallbackPrereqs = ( "Bio::DB::Fasta" => 0, "Bio::DB::GFF::Typename" => 0, "Bio::DB::GFF::Util::Rearrange" => 0, "Bio::Location::Simple" => 0, "Bio::PrimarySeq" => 0, "Bio::RangeI" => 0, "Bio::Root::Root" => 0, "Bio::Root::Test" => 0, "Bio::Seq" => 0, "Bio::SeqFeature::CollectionI" => 0, "Bio::SeqFeature::Lite" => 0, "Carp" => 0, "Cwd" => 0, "DBI" => 0, "DB_File" => 0, "Fcntl" => 0, "File::Basename" => 0, "File::Copy" => 0, "File::Glob" => 0, "File::Path" => 0, "File::Spec" => 0, "File::Temp" => 0, "Getopt::Long" => 0, "IO::File" => 0, "IO::Handle" => 0, "IPC::Open3" => 0, "MIME::Base64" => 0, "Memoize" => 0, "Pod::Usage" => 0, "Scalar::Util" => 0, "Test::More" => 0, "Text::ParseWords" => 0, "base" => 0, "constant" => 0, "lib" => 0, "overload" => 0, "strict" => 0, "vars" => 0, "warnings" => 0 ); unless ( eval { ExtUtils::MakeMaker->VERSION(6.63_03) } ) { delete $WriteMakefileArgs{TEST_REQUIRES}; delete $WriteMakefileArgs{BUILD_REQUIRES}; $WriteMakefileArgs{PREREQ_PM} = \%FallbackPrereqs; } delete $WriteMakefileArgs{CONFIGURE_REQUIRES} unless eval { ExtUtils::MakeMaker->VERSION(6.52) }; WriteMakefile(%WriteMakefileArgs); t000755000766000024 013605523026 15017 5ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4SeqFeature.t100644000766000024 3046413605523026 17437 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/t# -*-Perl-*- Test Harness script for Bioperl # $Id$ use strict; use constant TEST_COUNT => 116; BEGIN { use lib '.','..','./t/lib'; use Bio::Root::Test; test_begin(-tests => TEST_COUNT, -requires_module => 'DB_File'); $ENV{ORACLE_HOME} ||= '/home/oracle/Home'; use_ok('Bio::SeqFeature::Generic'); use_ok('Bio::DB::SeqFeature::Store'); use_ok('Bio::DB::SeqFeature::Store::GFF3Loader'); use_ok('Bio::Root::IO'); use_ok('Bio::DB::Fasta'); use_ok('File::Copy'); } my $DEBUG = test_debug(); my $gff_file = test_input_file('test.gff3'); my (@f,$f,$f2,$sf1,$sf2,$sf3,@s,$s,$seq1,$seq2,$count,$new_features); my @args = @ARGV; @args = (-adaptor => 'memory') unless @args; SKIP: { my $db = eval { Bio::DB::SeqFeature::Store->new(@args) }; skip "DB load failed? Skipping all! $@", (TEST_COUNT - 6) if $@; ok($db); is( $db->isa('Bio::SeqFeature::CollectionI'), 1 ); my $loader = eval { Bio::DB::SeqFeature::Store::GFF3Loader->new(-store=>$db) }; skip "GFF3 loader failed? Skipping all! $@", (TEST_COUNT - 6) if $@; ok($loader); $new_features = 0; SKIP: { # skip("skipping memory adaptor-specific tests",27) # unless $db->isa('Bio::DB::SeqFeature::Store::memory'); # test adding my $n = Bio::SeqFeature::Generic->new( # -primary_id => '_some_id', # you're not allowed to do this!! -primary => 'repeat_123', -start => 23, -end => 512, -strand => '+', -display_name => 'My favorite feature' ); ok( my $id = $db->add_features([$n]), 'adding a feature' ); ok( @f = $db->fetch($n->primary_id)); is( scalar @f, 1 ); $f = $f[0]; is( $f->primary_id, $n->primary_id); $f2 = Bio::SeqFeature::Generic->new( -start => 10, -end => 100, -strand => -1, -primary => 'repeat_123', # -primary_tag is a synonym -source_tag => 'repeatmasker', -display_name => 'alu family', -score => 1000, -tag => { new => 1, author => 'someone', sillytag => 'this is silly!' } ); ok( $db->store($f2) , 'adding a feature with no primary_id' ); ok( $f2->primary_id ); # test fetching features is( $db->fetch('-1'), undef, 'searching for a feature that shouldnt exist'); is( $db->get_features_by_type('repeat_123:repeatmasker'), 1, 'simple type' ); is( $db->get_features_by_type('repeat_123:'), 2, 'base type with colon' ); is( $db->get_features_by_type('repeat_123'), 2, 'base type alone' ); is( $db->get_features_by_type('rep.*'), 0, 'queried types are not interpolated' ); ok( @f = $db->types ); is( @f, 2 ); isa_ok($f[0], 'Bio::DB::GFF::Typename'); # test removing features ok( $db->delete( $f ), 'feature deletion' ); is( $db->fetch( $f->primary_id ), undef ); $db->delete( $f2 ); ok( $db->store($f, $f2) ); # test adding seqfeatures $sf1 = Bio::SeqFeature::Generic->new( -primary=>'seqfeat1', -start=>23, -end=>512 ); $sf2 = Bio::SeqFeature::Generic->new( -primary=>'seqfeat2', -start=>23, -end=>512 ); $sf3 = Bio::SeqFeature::Generic->new( -primary=>'seqfeat1', -start=>23, -end=>512, source_tag => 'dna' ); ok $db->add_features([$sf1, $sf2, $sf3]), 'adding subfeatures'; is $db->add_SeqFeature($f, $sf1), 1; is $db->add_SeqFeature($f, $sf2, $sf3), 2; is $db->add_SeqFeature($f, $sf1, $sf2, $sf3), 3; # test fetching seqfeatures is $db->fetch_SeqFeatures($f), 3; is $db->fetch_SeqFeatures($f, 'seqfeat2'), 1; is $db->fetch_SeqFeatures($f, 'seqfeat1:dna'), 1; is $db->fetch_SeqFeatures($f, 'seqfeat1'), 2; is $db->fetch_SeqFeatures($f, 'seqfeat1', 'seqfeat2'), 3; is $db->fetch_SeqFeatures($f, 'seqfeat4'), 0; $new_features = scalar $db->features; } # exercise the loader ok($loader->load($gff_file)); # there should be one gene named 'abc-1' @f = $db->get_features_by_name('abc-1'); is(@f,1); $f = $f[0]; # there should be three subfeatures of type "exon" and three of type "CDS" is($f->get_SeqFeatures('exon'),3); is($f->get_SeqFeatures('CDS'),3); # the sequence of feature abc-1 should match the sequence of the first exon at the beginning $seq1 = $f->seq->seq; $seq2 = (sort {$a->start<=>$b->start} $f->get_SeqFeatures('exon'))[0]->seq->seq; is(substr($seq1,0,length $seq2),$seq2); # sequence lengths should match is(length $seq1, $f->length); # if we pull out abc-1 again we should get the same object ($s) = $db->get_features_by_name('abc-1'); is($s, $f); # test case-sensitivity ($s) = $db->get_features_by_name('Abc-1'); is($s, $f, 'feature names should be case insensitive'); # we should get two objects when we ask for abc-1 using get_features_by_alias # this also depends on selective subfeature indexing @f = $db->get_features_by_alias('abc-1'); is(@f,2); # the two features should be different isnt($f[0], $f[1]); # test that targets are working ($f) = $db->get_features_by_name('match1'); ok(defined $f); $s = $f->target; ok(defined $s); ok($s->seq_id eq 'CEESC13F'); $seq1 = $s->seq->seq; is(substr($seq1,0,10), 'ttgcgttcgg'); # can we fetch subfeatures? # gene3.a has the Index=1 attribute, so we should fetch it ($f) = $db->get_features_by_name('gene3.a'); ok($f); # gene 3.b doesn't have an index, so we shouldn't get it ($f) = $db->get_features_by_name('gene3.b'); ok(!$f); # test three-tiered genes ($f) = $db->get_features_by_name('gene3'); ok($f); my @transcripts = $f->get_SeqFeatures; is(@transcripts, 2); is($transcripts[0]->method,'mRNA'); is($transcripts[0]->source,'confirmed'); # test that exon #2 is shared between the two transcripts my @exons1 = $transcripts[0]->get_SeqFeatures('CDS'); is(@exons1, 3); my @exons2 = $transcripts[1]->get_SeqFeatures('CDS'); my ($shared1) = grep {$_->display_name||'' eq 'shared_exon'} @exons1; my ($shared2) = grep {$_->display_name||'' eq 'shared_exon'} @exons2; ok($shared1 && $shared2); is($shared1, $shared2); is($shared1->primary_id, $shared2->primary_id); # test attributes is($shared1->phase, 0); is($shared1->strand, +1); is(($f->attributes('expressed'))[0], 'yes'); # test type getting is (scalar $db->get_features_by_type('transcript'), 4, 'base type'); is (scalar $db->get_features_by_type('transcript:confirmed'), 2, 'base:source type'); # test autoloading my ($gene3a) = grep { $_->display_name eq 'gene3.a'} @transcripts; my ($gene3b) = grep { $_->display_name eq 'gene3.b'} @transcripts; ok($gene3a); ok($gene3b); ok($gene3a->Is_expressed); ok(!$gene3b->Is_expressed); # the representation of the 3'-UTR in the two transcripts a and b is # different (not recommended but supported by the GFF3 spec). In the # first case, there are two 3'UTRs existing as independent # features. In the second, there is one UTR with a split location. is($gene3a->Three_prime_UTR, 2); is($gene3b->Three_prime_UTR, 1); my ($utr) = $gene3b->Three_prime_UTR; is($utr->segments, 2); my $location = $utr->location; isa_ok($location, 'Bio::Location::Split'); is($location->sub_Location,2); # ok, test that queries are working properly. # find all features with the attribute "expressed" @f = $db->get_features_by_attribute({expressed=>'yes'}); is(@f, 2); # find all top-level features on Contig3 -- there should be two @f = $db->get_features_by_location(-seq_id=>'Contig3'); is(@f, 2); # find all top-level features on Contig3 that overlap a range -- only one @f = $db->get_features_by_location(-seq_id=>'Contig3',-start=>40000,-end=>50000); is(@f,1); # find all top-level features on Contig3 of type 'assembly_component' @f = $db->features(-seq_id=>'Contig3',-type=>'assembly_component'); is(@f, 1); # test iteration @f = $db->features; is(scalar @f, 27+$new_features); my $i = $db->get_seq_stream; ok($i); my $feature_count = @f; while ($i->next_seq) { $count++ } is($feature_count,$count); # regression test on bug in which get_SeqFeatures('type') did not filter inline segments @f = $db->get_features_by_name('agt830.3'); ok(@f && !$f[0]->get_SeqFeatures('exon')); ok(@f && $f[0]->get_SeqFeatures('EST_match')); # regression test on bug in which the load_id disappeared is(@f && $f[0]->load_id, 'Match2'); # regress on proper handling of multiple ID features my ($alignment) = $db->get_features_by_name('agt830.5'); ok($alignment); is($alignment->target->start,1); is($alignment->target->end, 654); is($alignment->get_SeqFeatures, 2); my $gff3 = $alignment->gff3_string(1); my @lines = split "\n",$gff3; is (@lines, 2); ok("@lines" !~ /Parent=/s); ok("@lines" =~ /ID=/s); # regress on multiple parentage my ($gp) = $db->get_features_by_name('gparent1'); my ($p1,$p2) = $gp->get_SeqFeatures; my @c = sort {$a->start<=>$b->start} $p1->get_SeqFeatures; is(scalar @c,2); is($c[0]->phase,0); is($c[1]->phase,1); @c = sort {$a->start<=>$b->start} $p2->get_SeqFeatures; is(scalar @c,2); is($c[0]->phase,0); is($c[1]->phase,1); SKIP: { test_skip(-tests => 2, -excludes_os => 'mswin'); if (my $child = open(F,"-|")) { # parent reads from child cmp_ok(scalar ,'>',0); close F; # The challenge is to make sure that the handle # still works in the parent! my @f = $db->features(); cmp_ok(scalar @f,'>',0); } else { # in child $db->clone; my @f = $db->features(); my $feature_count = @f; print $feature_count; exit 0; } } # test the -ignore_seqregion flag # the original should have a single feature named 'Contig1' my @f = $db->get_features_by_name('Contig1'); is(scalar @f,1); $db = eval { Bio::DB::SeqFeature::Store->new(@args) }; $loader = eval { Bio::DB::SeqFeature::Store::GFF3Loader->new(-store=>$db, -ignore_seqregion=>1) }; $loader->load($gff_file); @f = $db->get_features_by_name('Contig1'); is(scalar @f,0); # test keyword search my @results = $db->search_notes('interesting'); is(scalar @results,2,'keyword search; 1 term'); @results = $db->search_notes('terribly interesting'); is(scalar @results,2,'keyword search; 2 terms'); # test our ability to substitute a FASTA file for the database my $fasta_dir = make_fasta_testdir(); my $dbfa = Bio::DB::Fasta->new($fasta_dir, -reindex => 1); ok($dbfa); ok(my $contig1=$dbfa->seq('Contig1')); $db = Bio::DB::SeqFeature::Store->new(@args,-fasta=>$dbfa); $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store=>$db); ok($loader->load($gff_file)); ok($db->dna_accessor); my $f = $db->segment('Contig1'); ok($f->dna eq $contig1); ok(my $contig2 = $dbfa->seq('Contig2')); ($f) = $db->get_feature_by_name('match4'); my $length = $f->length; ok(substr($contig2,0,$length) eq $f->dna); # DESTROY for $dbfa sometimes is not being called at script end, # so call it explicitly to close temporal filehandles # and allow their deletion $dbfa->DESTROY; # Remove temporal database file used for SQLite tests if ($db->isa('Bio::DB::SeqFeature::Store::DBI::SQLite')) { $db->DESTROY; unlink $db->{dbh_file}; } # testing namespaces for mysql and Pg adaptor SKIP: { my $adaptor; for (my $i=0; $i < @args; $i++) { if ($args[$i] eq '-adaptor') { $adaptor = $args[$i+1]; last; } } skip "Namespaces only supported for DBI::mysql and DBI::Pg adaptors", 6, if ($adaptor ne 'DBI::mysql' && $adaptor ne 'DBI::Pg'); push(@args, ('-namespace', 'bioperl_seqfeature_t_test_schema')); $db = eval { Bio::DB::SeqFeature::Store->new(@args) }; ok($db); $loader = eval { Bio::DB::SeqFeature::Store::GFF3Loader->new(-store=>$db) }; ok($loader); $loader->load($gff_file); # there should be one gene named 'abc-1' ok( @f = $db->get_features_by_name('abc-1') ); is(@f,1); $f = $f[0]; # there should be three subfeatures of type "exon" and three of type "CDS" is($f->get_SeqFeatures('exon'),3); is($f->get_SeqFeatures('CDS'),3); $db->remove_namespace(); } sub make_fasta_testdir { # this obfuscation is to deal with lockfiles by GDBM_File which can # only be created on local filesystems apparently so will cause test # to block and then fail when the testdir is on an NFS mounted system my $io = Bio::Root::IO->new(-verbose => $DEBUG); my $tempdir = test_output_dir(); my $test_dbdir = $io->catfile($tempdir, 'dbfa'); mkdir($test_dbdir); # make the directory my $indir = test_input_file('dbfa'); opendir(INDIR,$indir) || die("cannot open dir $indir"); # effectively do a cp -r but only copy the files that are in there, no subdirs for my $file ( map { $io->catfile($indir,$_) } readdir(INDIR) ) { next unless (-f $file ); copy($file, $test_dbdir); } closedir(INDIR); return $test_dbdir; } } # SKIP 00-compile.t100644000766000024 672313605523026 17221 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/tuse 5.006; use strict; use warnings; # this test was generated with Dist::Zilla::Plugin::Test::Compile 2.058 use Test::More; plan tests => 20 + ($ENV{AUTHOR_TESTING} ? 1 : 0); my @module_files = ( 'Bio/DB/SeqFeature.pm', 'Bio/DB/SeqFeature/NormalizedFeature.pm', 'Bio/DB/SeqFeature/NormalizedFeatureI.pm', 'Bio/DB/SeqFeature/NormalizedTableFeatureI.pm', 'Bio/DB/SeqFeature/Segment.pm', 'Bio/DB/SeqFeature/Store.pm', 'Bio/DB/SeqFeature/Store/DBI/Iterator.pm', 'Bio/DB/SeqFeature/Store/DBI/mysql.pm', 'Bio/DB/SeqFeature/Store/FeatureFileLoader.pm', 'Bio/DB/SeqFeature/Store/GFF2Loader.pm', 'Bio/DB/SeqFeature/Store/GFF3Loader.pm', 'Bio/DB/SeqFeature/Store/LoadHelper.pm', 'Bio/DB/SeqFeature/Store/Loader.pm', 'Bio/DB/SeqFeature/Store/bdb.pm', 'Bio/DB/SeqFeature/Store/berkeleydb.pm', 'Bio/DB/SeqFeature/Store/berkeleydb3.pm', 'Bio/DB/SeqFeature/Store/memory.pm' ); my @scripts = ( 'bin/bp_seqfeature_delete', 'bin/bp_seqfeature_gff3', 'bin/bp_seqfeature_load' ); # no fake home requested my @switches = ( -d 'blib' ? '-Mblib' : '-Ilib', ); use File::Spec; use IPC::Open3; use IO::Handle; open my $stdin, '<', File::Spec->devnull or die "can't open devnull: $!"; my @warnings; for my $lib (@module_files) { # see L my $stderr = IO::Handle->new; diag('Running: ', join(', ', map { my $str = $_; $str =~ s/'/\\'/g; q{'} . $str . q{'} } $^X, @switches, '-e', "require q[$lib]")) if $ENV{PERL_COMPILE_TEST_DEBUG}; my $pid = open3($stdin, '>&STDERR', $stderr, $^X, @switches, '-e', "require q[$lib]"); binmode $stderr, ':crlf' if $^O eq 'MSWin32'; my @_warnings = <$stderr>; waitpid($pid, 0); is($?, 0, "$lib loaded ok"); shift @_warnings if @_warnings and $_warnings[0] =~ /^Using .*\bblib/ and not eval { +require blib; blib->VERSION('1.01') }; if (@_warnings) { warn @_warnings; push @warnings, @_warnings; } } foreach my $file (@scripts) { SKIP: { open my $fh, '<', $file or warn("Unable to open $file: $!"), next; my $line = <$fh>; close $fh and skip("$file isn't perl", 1) unless $line =~ /^#!\s*(?:\S*perl\S*)((?:\s+-\w*)*)(?:\s*#.*)?$/; @switches = (@switches, split(' ', $1)) if $1; close $fh and skip("$file uses -T; not testable with PERL5LIB", 1) if grep { $_ eq '-T' } @switches and $ENV{PERL5LIB}; my $stderr = IO::Handle->new; diag('Running: ', join(', ', map { my $str = $_; $str =~ s/'/\\'/g; q{'} . $str . q{'} } $^X, @switches, '-c', $file)) if $ENV{PERL_COMPILE_TEST_DEBUG}; my $pid = open3($stdin, '>&STDERR', $stderr, $^X, @switches, '-c', $file); binmode $stderr, ':crlf' if $^O eq 'MSWin32'; my @_warnings = <$stderr>; waitpid($pid, 0); is($?, 0, "$file compiled ok"); shift @_warnings if @_warnings and $_warnings[0] =~ /^Using .*\bblib/ and not eval { +require blib; blib->VERSION('1.01') }; # in older perls, -c output is simply the file portion of the path being tested if (@_warnings = grep { !/\bsyntax OK$/ } grep { chomp; $_ ne (File::Spec->splitpath($file))[2] } @_warnings) { warn @_warnings; push @warnings, @_warnings; } } } is(scalar(@warnings), 0, 'no warnings found') or diag 'got warnings: ', ( Test::More->can('explain') ? Test::More::explain(\@warnings) : join("\n", '', @warnings) ) if $ENV{AUTHOR_TESTING}; data000755000766000024 013605523026 15730 5ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/ttest.gff3100644000766000024 23660013605523026 17665 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/t/data##gff-version 3 ##sequence-region Contig1 1 37450 # #index-subfeatures 0 Contig1 confirmed transcript 1001 2000 42 + . ID=trans-1;Name=abc-1;Alias=xyz-2;Note=function+unknown Contig1 confirmed exon 1001 1100 . + . Parent=trans-1 Contig1 confirmed exon 1201 1300 . + . Parent=trans-1 Contig1 confirmed exon 1401 1450 . + . Parent=trans-1 Contig1 confirmed CDS 1051 1100 . + 0 Parent=trans-1 Contig1 confirmed CDS 1201 1300 . + 2 Parent=trans-1 Contig1 confirmed CDS 1401 1440 . + 0 Parent=trans-1 Contig1 est match 1001 1100 96 . . Target=CEESC13F 1 100 +;Name=match1 Contig1 est match 1201 1300 99 . . Target=CEESC13F 101 200 +;Name=match2 Contig1 est match 1401 1450 99 . . Target=CEESC13F 201 250 +;Name=match3 Contig1 tc1 transposable_element 5001 6000 . + . ID=c128.1;Name=c128.1 Contig1 tc1 transposable_element 8001 9000 . - . ID=c128.2;Name=c128.2 Contig1 confirmed transcript 30001 31000 . - . ID=trans-2;Name=trans-2;Alias=xyz-2;Note=Terribly+interesting Contig1 confirmed exon 30001 30100 . - . Parent=trans-2;Alias=abc-1;Note=function+unknown;index=1 Contig1 confirmed exon 30701 30800 . - . Parent=trans-2 Contig1 confirmed exon 30801 31000 . - . Parent=trans-2 ##sequence-region Contig2 1 37450 Contig2 clone assembly_component 1 2000 . . . Target=AL12345.1 1 2000 +;Name=match4;Note=Terribly+interesting Contig2 clone assembly_component 2001 5000 . . . Target=AL11111.1 6000 3001 +;Name=match5 Contig2 clone assembly_component 5001 20000 . . . Target=AC13221.2 1 15000 +;Name=match6 Contig2 clone assembly_component 2001 37450 . . . Target=M7.3 1001 36450 +;Name=match7 Contig2 predicted transcript 2501 4500 . + . ID=trans-3;Name=trans-3;Alias=trans-18 Contig2 predicted transcript 5001 8001 . - . ID=trans-4;Name=trans-4 # processed_transcript Contig3 clone assembly_component 1 50000 . . . ID=AL12345.2 Contig3 confirmed mRNA 32000 35000 . + . ID=trans-8 Contig3 confirmed UTR 32000 32100 . + . Parent=trans-8 Contig3 confirmed CDS 32101 33000 . + . Parent=trans-8 Contig3 confirmed CDS 34000 34500 . + . Parent=trans-8 Contig3 confirmed CDS 34600 34900 . + . Parent=trans-8 Contig3 confirmed UTR 34901 35000 . + . Parent=trans-8 # associative attributes # these are not intended to have any implied parent/child relationship, but their attributes can be # used to group them arbitrarily. Contig4 clone assembly_component 1 50000 . . . ID=ABC123 Contig4 confirmed gene 32000 35000 . + . ID=thing1;gene=gene-9 Contig4 confirmed mRNA 32000 35000 . + . ID=thing2;mRNA=trans-9;gene=gene-9 Contig4 confirmed CDS 32000 35000 . + . ID=thing3;mRNA=trans-9 # three-tiered gene Contig1 confirmed gene 2000 3000 . . . ID=tier0;Name=gene3;expressed=yes;in_process=1 Contig1 confirmed mRNA 2000 3000 . + . Parent=tier0;ID=tier0.1;expressed=yes;Name=gene3.a;index=1 Contig1 confirmed mRNA 2500 3000 . + . Parent=tier0;ID=tier0.2;Name=gene3.b Contig1 confirmed five_prime_UTR 2000 2100 . + . Parent=tier0.1 Contig1 confirmed CDS 2101 2200 . + 0 Parent=tier0.1 Contig1 confirmed CDS 2500 2800 . + 0 Parent=tier0.1 Contig1 confirmed three_prime_UTR 2801 2900 . + . Parent=tier0.1 Contig1 confirmed three_prime_UTR 2910 3000 . + . Parent=tier0.1 Contig1 confirmed five_prime_UTR 2500 2510 . + . Parent=tier0.2 Contig1 confirmed CDS 2511 2520 . + 0 Parent=tier0.2 Contig1 confirmed CDS 2300 2400 . + 0 Parent=tier0.2,tier0.1;Name=shared_exon Contig1 confirmed CDS 2500 2800 . + 0 Parent=tier0.2 Contig1 confirmed three_prime_UTR 2801 2900 . + . Parent=tier0.2;ID=utr1 Contig1 confirmed three_prime_UTR 2910 3000 . + . Parent=tier0.2;ID=utr1 ctgA est EST_match 5410 5500 . - . ID=Match2;Name=agt830.3;Target=agt830.3 505 595 ctgA est EST_match 7000 7503 . - . ID=Match2;Name=agt830.3;Target=agt830.3 1 504 ctgA est EST_match 1050 1500 . + . ID=Match1;Name=agt830.5;Target=agt830.5 1 451 ctgA est EST_match 3000 3202 . + . ID=Match1;Name=agt830.5;Target=agt830.5 452 654 # test multiple parentage handling Contig1 test gparent 2500 2550 . + . ID=gparent1;Name=gparent1 Contig1 test parent 2500 2550 . + . ID=parent1;Name=parent1;Parent=gparent1 Contig1 test parent 2501 2551 . + . ID=parent2;Name=parent2;Parent=gparent1 Contig1 test child 2500 2550 . + 0 ID=child1;Parent=parent1,parent2;Name=child1 Contig1 test child 2501 2551 . + 1 ID=child2;Parent=parent1,parent2;Name=child2 # test out DNA loading >Contig1 ttcttcgacactaagaaccatgccccatgtcacgacttcaccctcattgacgagcgttgt tcactcatctcgctgggccactagctacccatggggttaattggccagggatcggataca cgttaatcgatcggaaaactgcgcgtgtcatcaactcttcttacaatgctataagtaacg gaaactgaatagctaaacataagtgagcttcagagcagcgagagcacttaacaaaccttg ggttgggctaagtgctttagttccatcatctcaagagtgcatagaagtctgcacgcggcc gccggctggaagcatggtgaatactgactattagctgttattcctctggaacacaccggt acagaaggtacatcgacacttctttggtgctataagaccagttgcgtggccgataggggt gatgacgagtaaccctttcgcagctcaagtccagaacgagacctgtcaattcccgctctc ttgcattgattttcacgatcgttacaagctaaaaatagttgaaggagaatgagcaggatg tgctccccacggtctcttacccctgacactcttcgcgttctatatctccggaaagcccgg aaccttttgtgcttcgtgtgagcataccgaacttgtttatgccctgccgccggataagct tttcacgttagtaagaatgctcactgagacaggttgtcaatagccggcgttaactgccat tataggaatttttaggacgaccgaagattaaatggcggagatttgttggcggattaagtt ttccgcgcaccggggtaggagatcactcggctatcgtgctttaatgatgcggctatcgga gatgtttgcgcagccgaactccaacgggtattaaatctcggcagcgtcttaccgctttta tacagtacatgatttcgccgaaaggaaagtatatggtgttctcggttggttgcattatcg acttattcaactaacacaacgttcggggttattattcctcccgaagatgatgccaaatat acatttggctcataaccgaagtgtaatggatcttaggttggctagcggctgcgtacagac aatagatacagtattcccgactttggacgaccgaggccggtcctgagagttgaaagtcac gggacgtttggccctgtgtctttgttccaccgttaaatattgggggcgacgtatcggcct ggacgtatacttcatgacctttcgtccatggccatacggcggcggggattttaatctaga tccagtcgtaggcttggagctatcctctctttatcgcgcttgacaatagcatccccaaga tacgatctacccaagtgacgcgagctcaaaccctattttgtggtatacggctgatgtaga tggcaacaacggtcgagtatagagttcctacctacgcctacggagtctgtacgtctgtgt atatcgagcgcattcgaggatattgatcgttggcgagccggtttgcggaaaacagacagc aaatttgatgcagatggtattgccaactgtcccagcgcggcttaagggccgtggtatgac gaagggatctatgtgaattctcccgggccctaatagatgtggcataggaaatgcccgaag cccggaagccttcgtactgcacgtcgttgccatacggatcatgtagagagaggtccctat accctctaatgtggcctgaccttttaagcatttacgtacttgaatatttaggcgattcct ccttgataacttcgttgcaccggttcctctcccggatctgagagattaaagcgtgactca catttctaggtaaaccgtagttctggacgaacccgtcgtcggatgatggacgagctggaa aagacccatcatttccacgcagacaggaatacaagagtgtctggcaacaaggcctgccag attactttatctcatacatatgggatgagaacacgacctcatcgaaaaaagttagtgtat ggccgccgacagtcgtacaatatcaataatcctgctgtttgcaacttcgctggacatgta ctgaataatgcaagcttgaaaacgtataccctgcaagcccctgctatggtcgacccccgg tggagcatatcgcgatagctgcttttgaccatccacgggccaccgacactctgaaaatgg gtatcctctcggactgcatacggaccaacctctctagtagaattaacccagagtatttag ccgggatactctcgcgagttccaatctatatgccactcccgtaggcatgatttgatcttg cgctttgcaagcaactggcgttcagtgattacggaatcctcgagatcgccagggagatga cagggggcggttggcaccccggcgtatgagttactactggggccaggtcggcatccgttt attacgtggcgcctcgaatgggttcgtttacagacgccgcgagtatcaccacacgattag ttggctcggcacacgacgggccgcaactgaatattggtatgccctgacccacagcagatg gatcccctacgtgaaattactggaaacccatctctccagttaatgtggagcttactgtat tacgcgggcttaacctggtcgtggtacgggaggtccgcctaatacatcgatggtcctcac cacgaacgttatatgaagctctgtcagcacgacgttgcaaagacacataaacatttgctc ctcttcgtcatacatatccgtggtgccgcgaaagaatgggacgcatttgtgaatggacgc ttgggggacgcccgggttcacgagcctagccctccagtttattgctctataaccccggcg ttcgatcgatagacgtgctgggaagcgagcagtacctagctcggtcatttaacctccgag actctcgtccaccgcaagcgacctttccagacagtcgaggccagccgcaccaatgacatc caggcctcaggtacttgaaacgctcttgagagcctgaattcatgaaaagcccaagacggt tcactcacaccacgcaattattattgctgcctgtgtggtcccaatcacgtggggtcttac acggagagagggatatacagagccttctgcttttgtcggcggtgaccttcaggccggatc agcgttgaactctggtgtaaggatatatgaagtttactaccacaggagatttgcatttat cccatggcaggtcccggaaattggaggaaaagtggtagccttcatctactcaaaagatgc acctaaccggacactcggaccgatgtttgacgaagaaatttatattgcaacagggccatc ccgatgaaagagggcagtgacagagtagggaaaaaacgcaacaaaatttgattagtggga cactgaattagatgtgatgtacaggtccagtgaggctccgcatggggcttaggtgtcaac tagggtaagaccccccagctcaccgtgtcgaacaaacacgagtacccccgcctctcggca agaataacaccttattcgttctcatctatcagaccctccactgtaggtagctcggtactc ccgttaggtctactgcaatccggctgcccggcgaagcgggcttagcgaggggccgaaagt ggaatgacgaaatggtccggagctgcgcgggcagacgacggtcgtcgactcacctcgaat tcaagccgcctcgcccggattacaaccctcatacatgggccaattcagctcggtccgata cgacagcttagtatagcagcctggccggttttgtagccttctttctacctctaaaagaga caaagtccctgtgttattgtaaccgtgcatcgatctcgtcaccttcaatatgcaacgtga agactgaggccctagcctctttgagggtcacccgaatgagatatgggatgtcgaatacat gggccaagtttcaggacacgaggatacagggttaaccgtcgaatttcttcttgcaaggta gagatcccgaagtctctagaacgggttacgtacaattgggattagcgattaggaggagag gcacgttagacaaccgacaaagccgaacaccgccctgtcgtatccagagcgggtaagttt taattgcgcgtccctcttctcaaagtcccgatatgcccggctaaatatagttgcggatag gtttgccgtccggcggtaggagagggaggctccctgtgatgtcgacgcttatcgtcacca ctgacagcctcgacctcgatattgagtgccaaatcgtctggtcacccatcacgatccaga ccgtcgattaaaccccgtgtaccccgtgttatacctgacgtccgtacctccaataacctt aggtcaaataaatcttcttgagtctatctcattcggatacgctgcaaaggagaaggtccc atagttcggttcggtcattggatacaatgctgtacgctaacaggccttcactttagattc gcgtcgtgttgaagttctattatatagttccccagcaactttttggagatacgatggtct acatcttaaaataagcaaatttggcatgcgatggacattgtagaagcccgataccaggct aactacgacgccctgtgcgggattattcagtaacagactatatcgggggccgactctgtg tcgtacccccaccggagactctaggacgttgggcaccccaacccagtacgatacagactt ccttgatctaaacgtattgtcatttttcgttaacacatatcaagctgtttttcgataacc tacgggtactcgtttgacaggcaccaggtatgtctccccccggtcagactgaaggctaaa tagccgtacaatttcagctcgcgtaggtagttcgattcgcaggtgcatgatagcacagaa gaggtcaccgaactgcacgactaaaatgaccgctggttgccgtagtagcatgcgaagcag cttgccaacatcatcactgtcaccagccccctgtattcacacatagggctaagattaaac cctcatagcgattgttggatttgatcggacactttgccttgctagggggctatttactgt agcaaggcccgacatcgtggtccttgtcgagtctctgttccactaattcttcattgagat ctcattctcgaaatccaacgaatggggcggatatcgacaaaccattggcaagttgctata agccagcaacaatatccccgtgacatcatcgtgccgggttaaatacaagagggtgtttgg gtccgcccccggaatcgcttctatggcattatatggtcccccctcctttactgtgacatt gtgcagacccttcgttacgactatcaattccaccccggaaagttagttctccacatattt tacctgggtcggcgttctgccggctacgcaggtcttctgtacaagcgcgggaaggtacac tagtctatagacgccgagttccgctacggcccctaatccatgagatctttggtgttattt cgatcatctcctgggagggtatacttcaatgacccattctgggaaagataaccaaggcga acgctcgcaattaacgctactgtggaattgaacatttcctcagtgtacgagtgtctctac gcagcgtacgcctatggagtctttgcatgtgactgcgacccatatctacaaaccggcatt cccgggttctaccaactggttgtgccgcgcgaagggagatttccccgtacgccagtgtcc accggaagcgctccataacagctgtggaagcgaatagggtgccggggtctacacaattct cacgagccaagatccccaacaacgattacggccctccgaccaaaatacacatttgcactg attcctttgcggccctgaatcatctcagagccgatactcgctaaggcctggtcggggaat cccctgtacagtcataatccatccattggcagagtaacttttgcatctttgctacttgag atatcgtgtagctctgaaagacggccatgaatcttagctctgggaccgtatctaggggca ggctccgtctgtcatggttgggccgtaagcctgccttttacgttaatgccccggtaccac gatcagtacagcgtcacccagtaccccttgccattgtcacccctgcgttagacaaataac actagaacagcctgatcgctagctcatatctggtccctcgaaaaactgcgactcggtttc atcctagattgtgtgtattccctctatggacatgcttggacagtaccatggagcgccgca ccaagatcagaagtgccgtcagcagatacctatagtacgacatctagagtacattagacc atggatagacatccttgtacttgtgtgatctcgagtaatcgcagatgctctgggcggggt tttttcccgagcccttttagtgttaagtacactcaatctcaccggaatggatcaccccgc acttaaaggtccctacccaaatcccacgaggccaccgagggtcgaaagtcggttgacgtc cctgtagtccgcattgccaacattaccactactttacccgtctatccaggcttgtagtag tgcgcatacgaaaagacttataaaaatcatctaggaatttcatttccagggagctgtcgc caaaactgatgatccacagaagaccgatcttcaaggtcctcccagctatctgccaaaccc tgctaccatcacccccacgaaggtctggtctctctcaatgcagtatttcaattcactccg ctccaagtgtagcgggaagttaacttttcataaaatataagaaggtatcacagctcccgg ccagtgagtatccgctccgaggctgtgtcttactgaagatgtacgcaaaattaatggaga tcccggttagtctcacggaaaggtgtcggggagggtctcacgttcaccatgtgcatttgt aaaataatagctgctgtaacttattcacattcctaacaatagagcctagccgataggtat atcccgccgacggtgtcgcccgtccttgaaaaataaagcgagccattgagcgcgacgctg tgtagaactcgttcgcatacacgcatacgcccggcgtcgagctgcggtttcctctgcaca agaagttctgttatcaaaataaagctaaggattccaaacatgtgacaccagctaccccac caccccatataagtatagcatgctcccattggcacagtgtaccgggtgttgacggtcgcg ggctcggtcttcatggaagctcctgttggaagcgtgtactattaatgtccccctcattta agtactgcgaacccaaaattttgggttaggaaaaagacggcatcctcttatatgacgagt cacctttttttttacactcgagccttacggaagagaagtcgaacaaaagcacgtggtgac ggtccctcaaatactatcgacatggccaccttcctattattaatagccgcttcacgattg ctattagagtgcctatctcatacttcctatgatcgtacgtggatggagacgaagcgggtg tggtgagctttgtattccggacgagtgtgcggcttggtaggaaccggggcgtatgaaatc tctcatgggaggaggtagtagagttagacaccgtctaagttagttctcggactgtatcgc agtcagaaccggccgttcagagcgggaaccacagaaatgggcccccccgcagcttcacac gcatactcggacatgctggcttgcacgtcgtaagatgaggtgaatagggctatgcgaata tcctgaaatcgaggagctagaacggcaggaacctaatggatagtcagttatgttcagtcg gctcccttcaaaatgggtgctgcagatacttaggtcgaccgtcacggagaccccgtagac gcgaacgggtagtacaaacatagcactagccaccacggagaagagttggctcgttaggtc tcgtcactctgcattggttgacaagcttgggctaccaatgccctgtcctaacactcaggg cgcggaatatagtactctccgactggcattggattagcattactcgttctttttgatttc gtgagcgagctaggctaaactttaacgccatgattgacgtcgttaggggatcggctccca atagctaatgccgtgatgtctttcactaacgtgggacaagtccattcgctaaccgcgcaa cggttatctcaaaactaaaggacgtttagaagaatacgttgagctggcgatcggcgccta tccgttggtcacctaaggtcctagtgttaaatatcctcggacgggccactaccatagctg agagcccaagtgttacaggcttcgacacattactcatgccgaccccccggtaaccagggc ctggccgtgcccggcgaatccttatgagcagtttacggaaaagctatgtgaccttattcg cctcaccctatgacgatgcatgaactagacgacagtaacgtaaagtgtaaaaaaaaactg gcgctatacctttatcccataaacaccctttttcgccggctgcgcgcggcaagcgctgat tgttatactccgaatgtccgggctagacactgatgcacactccctaggaacagtcaggtc actcgctcctcagtcgtgggcaatcgatgtcacctccagcgagaatcgcccatatcggtc tggaattaattccgttataacgtgacatcgacctctccgctgggcgcataggcaactttg cctgctaatttttccccttcaacaacaggcccgacagagttacacctaccgtaacagtga gctgacttaaacgcttcaactgttaaacactgactagtgacggggctcggtacgcccagg tgatcatcgaaggatagcgtagatttcgagcataaatgacgtcgggccgatcaggcgttc taaaatggcggaattataagccagttctgccagcgccgcgtaaatggtcttcgtggactc actgcatatatcgaatttttagactcaggcgcatcactgttggcggacgattactgacag ggttttcggaatggtcccttctcattcccagcggtacttgagctgtgctgtgcgttttgg cccctcggacagctgcgtttacatcttcaagcctcatacccgcgagtacttgcgacgcct tcagctagttgtcgcaacattgggcagtaacccaagaggggtcgctaacataggtagaga tccgccctcgtccttaaatgcaagggcgttatcttggggcgctaatcaattggacgaagc tcccttaggttatcgaagcccgaactaggctgggttcgtgtgagtatgaaaactggttaa gctatagaaagcttcatttgcccaagaggacttcctttgagtagcaacatgaggtaaccc atagggtgctcccaatatttgtaatgactctggctgttcggttgtgccgtgtcctaccat atgcgcacctgtacctgcgactaagatacgcgtgcaatagaagggaccgtgggttccacg caatctcaacacgatagggttgttttgattttattgccgccccgtgagatctgtgtatcg ttaggttaggtagcctgaactgggtagcgattatctgcatgacggatcatacccgttggc gtcccccaattctctaaaaaccctgggtccacgagtagaccgggacccagggcttagcag cgactacagaagcctgcattatcctgttacaaccccgaagtgaagacctcggacgcagtc atccctagcagttcaagcggtccggaggtccggtaaggattgaatcttgtgcttcgtgaa tagactgtagttggttcaaacacaatctaggcatgggcccggaacccacagtgaacagga gatgtgtctccaaagaatccgttatctcgcacttgacggacctgatactcaataatttta gagaggacgcacttactgctatgcgtcatttgggcggccagcgaacccggattggatctt ctagccacaatgaggcggaaatcttcgcacgttctagtggccctctaaaaggaacgatgg ttaattcgacggcctgggctcggcgttggtctgattatagactattttccggatgtagaa ggaaccgaacttcctcgctggtcttaaaggctagcaacaaagagacaaaggttaagcgtg aacatgagctccacagcttcagttgagccaggacttttaccttccatggcacgcaggatc gatccagatatccgacgcactaagaaatcgtcccagggctgtacgtctgggtgctatgcc agcatcatccgtcgtctattgacgagaaactactcgtacacttcagccatggataaccaa cttcctaagggtcgtgcgtcatgcatacctatacagggcagcttaatggtgatgagtctt gtatataagggcgacttttttacagcgtgtttacctctggccgacaaatagcatccatta tgcttagaaagctctccctaattgaggcgttcctcgcttgcccgtcctagctatgacgag gcccctttacgtcccagaataagctccgcatctaaccggaatagcgctctcaggattatg actcctctcatctatgccggttcaagctaacatagtctctgtgcggcgaggggttgtcat gtgatttaaagagatagaagcctgcttgtatctcgtagctgaccggtcacgtacgcggtt gttgaggctcttgaccatctataaacgctgtttcctcttttatttggcccgcaagctcta ctcactggccgttgacggctacattcgttgaagatatacgacagtggtgataattgactt ctgacgaattcggtacccgcgatatcgcgcctcccctcttttgcatagctctcatatagt tactccgaaggggcttggccacatataaatagtaaccgaatctgtgatgtgatagctgcc gtagggctcacgagttgcctaacagtaatgcagggcgaaccgcggctaacgtcacagctt accggtgggcggctcccgccatgttgttaccgttaacgtagtgcactcacacggccgtta accgtcctaccaccccgaggcagagctcgcaatccctaccataaaccgatcttctcaacc tttctagaggacaaaacgcaccgtaactactagccgccgcttgttgatagatatccccga gtgccactgcaggcggttatgtatcgaattaattacaagtttgggacgagagcgacccga gaagatttgataatacttaaaggctcacctcgctcgcagtatttcgggtgaaaatccgct cactaacatcgaccgaggtttttggaggtctacaaacaagcggctagccccgtgaagaac ggggtcttaactattttagaaggtaatagatactctgcagcttacttccataagtgggta ctgagcaaggatttaaacgcaataaatgtgcttgccggtatggatgctatggaagctccg cttctccagagaacttggcgagattatctcaataacgacatagcttcggagtgagagtga tacccgaacatgctcagagtctggggatacccggacgaagacgtaacgccaatgtcgtcc ccccgaccggcgagtcaagacccccacagagggtcgccagtcagccggtgcgtctcgagt cagtaggagatattccagagcctcatctgaactcggcatgcaccctcatccggttgagtt ggtcatcaatatcccctgcttaactaagattaagatattaccgctgataatagcacgtga gccttaatgcggtaagagcagcacactcacgagagatcagttacgtgtctcatcagtaag aactatccgctgccttttctcctcagcggcatcaagcctacaccctctcgtagcgaaggc aaggtgaaggttatggatcacctatcagaatcgagaacgtccccctgaccgtgtagactt cgacagcctgggcgcgcctcccgttatcctgcccaaaggctatgtccaagggtcccgaca agcatatggacaaaaagaacagcagattgatacagattgcgtccaaattgtccttgcgtt ataagtaagtaatggcgccgtcatggaaaaaggggtcatacttgggtggtgaaaccccac agtctgagcggcgactgtggccccgtccgtagccctatcccaggattaggtaaccggaat aggctaatacacgggtgatgcactcttcggatcacgaaagacggagcttcagccagtagg aaacggaatagtgatgctggtggggccgacataattaggggttccgcagtatgggtcgac ttagaaaaccgagaaaatttttttcacgtcgtagggtttacatgcgcagagcggcaagaa gatgaggggaaaccttgcttataccggggtgccctttgccgtaagcaatgaacagacaac tacccgcaacacttagcatgtcaggcggagttcaacgtcattcgtaggttctagtatcag tgtgggacgttattgccccggcacgattctaagttaatacatgaactctggtgccatgga gggctacgctgaggtagctctacttgcaaccccgtggtgtgttccctactcgcgctgagg gcaatattagagattttgaccctgaaacagcgtaaaccccgggataccaaggaccgactg cgtcgggctttagatacttgagggcccataactccgggggtcggcttgtacgttatttcc agtaggccggcacgctgggtatcattagatctacaggccccaacaccactctgtcattag attagggtcggttggtcgtcacggcaattattcgatgacaattgttgataccagctgcgg aagaagccggtccgggaactcctccaaatgaaaggacagtcttttcatatcgcagctatg ttcagtggatccaatggatgcagatacgtttaggatcgtgtaaacatttgaattatgtga acagcgtaaatggggcggatatgccgggtatcgaatacactcgtggttctaacttggaag tccacccagaaaagcctccggcacgcgtcctgtaaccgactggatcgttaagacaggctg gagacaatatcaatgcgtgttctaaagacagcaggtcttacacatggcgattgccgtagg caagcttgtagctaagagtacgatgcttcgtagataatcacgctgaactgcagggtctct tggcatatgaaccctttcgcctgcaagctaatcgtcctctctccataatattgtaacaca gcggtgtgcgccacgtttacagatgtccgtgacgtacttagtagtttgacgtgctgaatc tagatatcattcagggattagaggaaccgagtttgggcgccatcacgctatatccattaa ggctgcgacattaaggtgccaaatctctaagccctaattggggcgtatggctagcttcta atgtaacaatagaaaaccaacatcgtaacggttgatacttataattcccctttgggttgt gaagagcgggaagggaaggagaattcagtactcaaacagcagtgtatggtagttgagcca cataaatgtagagcggatgcatgaccatggctcggtaatgaggcttgcctagtctagttg gtgaagtggtgaacactgggagcgttagaatataagtccgtttacccaaacgttctgatt ctcatcgcccccttttgccggcgactgcactgcacgtttagcaatgtgttatcgccatcc atatccccaattgtagtttaggaaccacggggatttcctgcgcccaaggcgttatgtggg atacgaaaaggtcccgccgtatagaaaacgtttcatctaggggacaacaactataacttt atcaacgcttttgactgacgcggggaaccaactacacagagcataaggaatgggaccagc tactagagacaggggctgaactgagggtgagtggttagctccggaaaacatgcttcggtc ccgagaactggaaccagcggcgaacgggcgctccttctaattaggatcagtggagaacgg cccgttgccgctttacactgtatgcaccgagctcacacgctgagccgttatttacaatta tgctacactagcgcgtggccccgagagcacggagataaggatacaaatcggtgatataat ccattaatgtgtgaacttgtcccctccgcccgtgcatacttaatcgatcgagctacaaat cacccgcagctggatacttaattcagccgagtctcgtagcagtacgtatgatggatactg acggtcccaggatttgcagataccagtgtatcaaagtatacattgatgtttgttcataat attgcgcctcaggctgagtcttatttaaacagcgagagtaccacacgcccctcctatggc tacatacgatgaacagtttgcctagggtcattccggctgaggtccgaatagttgtacttt aggatcaggaacgaatggacctatcattaggaagcgcactccctgcttgccctggtgccg ccctaagtgctgaatgtgcctcaaacacctccgtcgaatgcgacggcatctcgggtcggg cgacttcccgattaattattagcactaaaaaccatcgtggatgagttggggctctggaca gaagatcttattagacgtgctttcggtgcgctccgttcagtgtttaaggtacgtcaatac cggaatccctggtagtaacgcgtgactaagaaaatactggagctgccaatgatactatcg gggtattccgcgagagattcaggcctgccgatgggcgcgatggatgcactctactgagtt acttgtcgctgtgtgatggataatgttcataacagccacgtatatacaatgggggggcgt tccacataggcatgtcccaagatctacccgttgcgcgtcatcctgccggttcacgattaa aaagtctcgtttagctcatggctgctacaggttaaggccacgacaccgaggaggtccaag agtgcttcccactctatattgatctcttcctaacgccaaagctatgtcctacatgatatg agttatactgagacaacagaaaatcccatcgctaagaaagacgatgcgcctacgcttagc ttcgtccgtaagaggcagtccgatgtttgaggtggccgctgggccgtatgccgaccaaga tacgaccccgggtattatgctccattgaacttagctaagaagatccaacgggatgtgttt agagcgggattggagctcccccatgcagttggatccggagtaggtccctcactcgagcct gactgtatgccggcacagccgtggcactcatttcgcccgtggagggtgttcgtgaacttt aataccaatactctaagctgtccacgcacgggacaggtatgagtttggaggacaccaatt taacatgctcagagtcttgtaatgctgccagcggtctctaaggtgctaccaaacaagtaa gcggaattgagatgacgtggtttgccgaggctggaataatgaggtggttccctagccttc gattctacgtccattagggtaacagcacgattagaactgtggatcacgggccaggatctc ccattcaagtatccagactcatctccgctgacgctagcatgcgttggcggtaaggcagtg tgtggaaacaacggttccacccgtggggatcgcgaggctgtcatattccattgaagtgtc ggtatctaccaaacaagcgcaggtcggggaatcctgaaagcctacgtcagacttagcgac tttcatgggctgggtgtcgagaggtcacttcagtacgctatgtagtgatagcgacccgcg cgtagttcacgaactctagcagaccagtgcgcgtcatccgtttttattgcctcaacttag cgtcataaccctagttatcgcgaggacacccaatacgaaggactgcccctaagaggcgat aattttacgcaggtagacacgtcggcacacgctggaattccatactagataactcccaca cttttggcccgacagggagaccctgctgacggcgtgtcacggagctgctcatcgaagcga acgctatctgaatagtggaagtggcgttgtgaaaaatccatggtgagtgcggaggggaac tcgggtagcgcataaatcactgtttccctaccgtcacccgattcctccactcgtcggcgt gctaagccacgtgtgagcacccgctaccgttgtagggtcaacgggacttcttatgggcct cggtcggccaatttcatcccttacagagattagtgtgtttgcgtggagtccctcattgta gcgcactcttcggggcagggattatcgtggattatcctcctcaagagcgaggcgccgatc gattgccagggacctacagcagcaccttatcattcgaaatcggcacgaccctcttttcta cgagtgcctgggtgttaaagagattgataatcaccggatgcgctcagggagtcattactc gtggtgttggccccgtttcacgggcttccaaaagacaatttcatcggcatttagtatcta ggcccagttgtcagaacgggtccctgtgatctgtaccccggctccgagcctgaccacttt acacacgcgtaactgatgtacgtatccgcgtcgaccgttcgattctcactagtcagagcc cgtgctcaaggagagaggtcttctcgaacacttgagacctaggcttaagttccgagtact gacccgaatcaataagtttacgtgtctagcgcaatagcgtcatgggggcgtgcgtcatac tggtccggcgtaatctcgactagataaatccagcgcctgatcaggttacagtaaagcata gattcattaccatggcctagggtctcaaggccgatgattgaccgcgcactagtacttagg ttttgagtcttcgagtagatccatgacccgtggggcgtctatagggctgcgacttctcta gtatcgggatttaaactagccgtcttccaggtagcgagcgattggcattcgtaacagctg taaccgtttattccttgctaccataatgtgcccaaaaattcgccactagaaggttgatag aatcttaactgtggtatagtttcgggccgtgctggaacggtgattgtactttgcgtcgac aagaaaggtgttgcggtgaagagcaccccaccatttgacttgccttggcgacttttcctc tttgcagctgatatactctgcgctatgttattcgggcccgtacaaccgtgcacctctcat cacctgactattactacctccccactccctttatgtcagctttctctaaattttcacatg caattcacgccctccacgcgtaacaagaaagcgggactgtcagataatcctaccgactaa gaccacaagcgtgaagatggataaccctcgggtattcgtagaggcgcgatcgactttaaa tttgcgtacggccaagattttatcacataaacgggcaatgagttgccgcggcgtgaagac cccctatacgagaggatatcgccatctactatatcatgtatcagtgtgtaggaggccttt ctaacgtatacagttgttcgtccccaattgtccccatcgggtgatagaaatcatttcacc cttaatggattgatatgaccaattactatactgttagatacagttaactcgtccaagaaa agatgctggactggtcttgtaagcagcgcgccgctaaagatgcgaatcttactagtcagg agtctggacccttttttcttatgactagacagatgttgcgccgtttccgggctaacttat aagagaactaagctacagttacccctgcctgcctgtcgcgtccatccgcgtcgtaccaac ttcacatgttatagtagtagcactgttcggattttttgttctatctagattcgaattgtg tgagcttcaagaaatggtgctccaattgttctgcaatgagtcttctctgctgacaacccg acaggttatcagaccgaaggatgggctccgtgcacgttgcacttcatagccgcatatccg cagattatgaagaatgcgtattatggttactgttcctagccgaattataagcgctatcaa ttcccgtctaaagtacaacgtccgaacgacaacttacacaagtccgcctcgcattactgc cctcctggtcacactccatgaagatcaggcagttctaaacgacgaggttgttgctatcta ctcaacagccagaagtatacagacatcatatatactcagctacaacgctggatgataagc tatcagcctcagccgcacatgtgggatacgcgcctctctgaacctgagaaaggcttcccc catcaaggcgtgtatctgcccctgcctgacgcgataattaacgtgtgatcaattcgcccc gtgtctgcgctagtactcaaactggttgcccttgttctcatgttggcggcgtgtgcgaca ggtactcagcgggagtatgaatgtagttgacgcacggtatgtgcagaagtaggaatgcgg atgacccgttcgttcgatctatatttcggaccgcgtcacctgagcggtctttcagccgtt taagtggagagtaaaagatcaccacaaacaacgacgcacactctatgtgggcttttattg gtctgagctcgtctagccccctaacactttcgaaccttactcttgttgcatttgtaacaa gttatgccggtgaccagaacggttaggttagttgcgtcacagatatggactcgctgctgc aaagtaggggtaatacgaacacggtagggtcaataagcgcactaattgcggctgtatagg gagcgggcgcgagaaggacgactactggtcattcccaattgttctctccagcctaatggc ttcggctctacacgtacgctactattagcccaatagttaaccaacgacttaaccggcgat ggcagcgcgcgtgagctaggaacagactttcacaacgcacgatataggggcttaaacggc agtcctatggcattcgacggtagtgttccaaagctacgtgataccggttgcatcaataat ccccggttacgttcgctacttggtagctgcgcatgtctgccctgtgagctcctgcaattg actttttcagagtgcagagttataaacagcgggagagcaacatggcctcaatctcgtttc tcaaaccttggctgacttgtgttatgccggagctcccactggtatgtggaacacgagacg caaggataagctaacgtctcaatggtacggcgcaacaggtaggtatcgaaagcataagca tgttcaggcctcttgcggtgcgtacaaaaatctctagacgtacaacttagaaatgtcttc cggtatgacactattttgtgcttggtcccatgtaaatctgattttcggataagcccctat aatcaacctactgccggacgtctcgggttccgagacggtcccactatgattggtacaatt ggactgagagaacacaaaatacgtcgcaagaaccctgaagtggaagacactgattaacgc aacgaatataaataacgggcctgcattgggtccccttgtccatgggaaatgttcacgcac atgtgcgggcccggggcgcaattactggcatttagatgaggtgcacctaacctagacacg gtcctaggccactgcaccggtatcactcacttagcacttatcgtcatcatagtgcgagct tgttttagcagccttggcatcagacgggatcgtccgaagtcacgtgacggagatcccatt gcgcgattccgactgcaaggaaacaaagaaacgcgggacgattctctccaatgcatcctt tccctatcaaaagcagtttactacaagttgcggcaattttttcggtaggaggggtctatg cagatcttggcgccgactgtccttgaggggccgctggtagtctgggttgtgcttacgttc attgatgtacgctcacatcgtaaagtgaaatggatgaaatatttaagtcctgcgttggaa cattttcgaagtctttacaagacacaacgggaaaccctcgcaaagtttatgatagaataa ccttatcacatgcacctaatgagggcccaagctagttgtccaattctacaaaaccaggta cctcacctgttcaaggatttcacccagtagaacaaggcgcccggaacacaggggataata cgattcgtgactcgggatgttcgagcccattgcgagcgcagttcaatcctacccggacct tggtggtccggtcccaaatcccagtggaatagctcttgtcggaactgcgcaggactttct ttgccctggcacgctgttaatcccctcgtaacctgtggtatagcctgtctgaaggcctac acaacgaacgtagaacagaaggcgtggctacctccaggtctctctcggaccttcatgaaa taccgagtttccacaggggctatggtgcggacaagctccgctcggcctgattttgattcg cagctgacgtcaaacgatgcgaaccctcttcttgttatcggtagggcggttgggctttat agaccgtttagctatgcctttcgagccgcacatccctggacggcaacctcggttcgcacc atattccgttgatctagccagccatccatcatcggaatgttcatataacacgtacgaaaa aaatagttaggggcgactattaccggctggggatccttcagggcacgacctgaaccagaa cggaggttcgcgcttctgttacagcctcaacttgcaaggacttccgcctatgcacgcgag agatctgcgctatgcagacgaacttccacagaggtttaatactagatagtgtcgtattga ggcttcaaggcggaacctgagagtggttcccctgcatctatctcaccgactgggctcctg tcactcgactctcactgatcgtacactagagggtgggcctacctaaccttactggagctt ctactcttctgacacagtagctgtcaggatcaataactacgtgtagctccaagtcgctac gcttgcaacgagccattcaacaattcgtgcccacagattttagttaagtaacgttagtga atgctgtaagactccaattctagtacttgaccgacaatacaggggtcacgacaaatacat gtgcgtaagttcccatgccaggagtgtttgacctcccctcaaaaaggacgggcggtttac taatatggttctgggcgggtgctcattaccatcgttccgatgtttcgatcatgatgatcc atcctctctgatcacttaaaatcttagtgcaaaagacttgattactggtcgacgggaaac cgaatagcaaccgggggtcgcgcttccatggaagcgggttgacgtcttgctcgggatgct tcccgttatctacgctatacggagatccgctaggtgcactggcctattggagcactatac cgaggaccaacgcgtatgaagaagtgtagtcgtatctgaagtttcgacctggaaagcagg ggctcttacaggggtgccaacgacaaccgatgatttaaacaagttctccaatctcacgag caggcgaggccacctagaccttattacagggatcctgccgagccagtacggtgcctagtg cgaccactcattcttttatgcgcgcgactccctggggcagcgaagaactacttaagagca acgttgcaggcgagcttagatgatatccttatctccctcacttgcagactcactgactgg tagacctgtcgtgtttaatttgcttttgcggctcttccaatgcgcacagtcgtaggatga taatgatgccaacgacatgtgaccgttatcgggatgacttaactgtagaccaacgattta agtttgtaatgatgccgattagaccagcttcggatttgaggtctccgtgcgtcgttggaa ttacaggttagaatgttgagatccaaaatggattgtacgcagttcacgaggcgaccgttg taccgacaaggtcctgtcatgagtctggttgccgtcccttatgaaagggctgacaaaccg ggacacccagtcgaactgaaagcagcgccaaagaggcgagcttaaatttcgggatagctc gcggctccggaacaacgaggcgagcaacatgcaagaaatgcagacagtactcgggtccgg tgggaggcgaacacttctagcctatccgaacatatggggccaagaaaactagtgctttct tcgggcgaaaagtaggacataagttcgcctggttggatgtagtgtaggaatgatgcaagc attcccgtacgataactgaatcagtcacctggggatgaagaccggtgagccctgatccct tcataagatcggtcacatgcacttacacttggaacgtattgagaggcaatatgaggagcc gatcgccgtgattccgattaaggcttcagttaccctgtcagaggccccagctagtcttaa ctaacataaacgatggtctaccggtcgcaacacaagtctcctaacctggcatttacggtc tctcacatacccgacatagctcatcgtttgggatcgaattgcgatgcagacgttgtaggt tgcgcacacaggatgcagcgtgtccccggcgataggacaccgtttgagttggcccatgaa cccttctaatttgtgacttttttaatgtaatcttcgtttgtgtagttcattcatcagtct atatccgtccggaccccatcgcaaatataatagcgcccagatttatcttacaccgctgat ggcaataccaccaggctatgacgcagtctactgttagcttctcactctgacgtctaaatc attttagtattggtgacccgggtcagacttcgcggatgaaatcttaccggacaccaccaa ctatacaatcggcctttagataggagtaagagccagtcaccgcgtcagcttgccagatgg tgatgactgaggcctggtgcgttgtcgttcaccaaaggttattcctcaactgacggcgca acttccagcacaggcccgagttgctagcctcggccgatccctgaatgggcattcatcagt attcaagcgtgacacatgtgacgcagttttcagcgccatgccttttataaactaaaaaat gtcatgaaaaaacaagacacgctcgacaacgacaattagggcgcgattgtattagaagca cattgaaagctactccccgacgtccggcttgcaaggctcaatcgggttgtggtcgtctgc acatgcctaatgaaatccaggtcgtaatcaagtcgggcagatcggatacgcattgtactg gctgattaagcccatcatccgtttcgggcacacgtaagataagaccctggtggcgtaata acaggtatcactcgctggttacgtgtgcacgacatcgtaaactacgctgcctgcgatatc tagctaattgcaccgcataaaataatagcgaccgaaatgatgcggcccggaaccgatggc tctgataacggagcacggggtccaagaggtagaacctgcgaacagtcgccgttcaatggc ggcctagggcacatctggtgacaaattgcagcagcaaaggactgatcggcactctctaag ttggtatcgtgcatataagagcttcagccgatgtccgcattgcgtgttattcgagtcagc tgaccttcggtgccctccaaccagatacatgagggatttgaaccgttgtgcctgttagtg atggattttactccattttccagaacggtggacagttttccaggtactgcgtaacactgg acgaacatggaccaaacagcagctttcgaagtacggtcgcttggatgttaagagtctaac gatcacaagtagctaccatattcacaattttgtgtttcttaagccattcgtaaataaagg aatacgaagcgtccgaccagaggtctgatgtgtctcgtgtcattggtagagatgtttacc ttagaaccggtcaccaggatttctgacgttttcggttagcggctctgccccgtagggatg cttacgtgtcgaagttaggacttcttattatcacgtctactagttcatggacgatctgta atgttattccccgaggccgatgacgtgaattaacgaaggcgaatgctagccgtcaaccca gaacagcaggggcggtggcctactgtctgagtcgaatagtcacgtcctaggtacccagta caagacctacaccaggatatgttgggatgtattaccggcaaccctttaagtagggcaaca gggccacactgagtgcagaactatacgagtcccaacaaagaggtggttcagccaccagcc agtaagtttgcagttcaggcgggttaggccacacaagccgtagcccgatgcaactatggc cttgggtaaacccccgtccaaaattattgactacttgtactgacaggttgccgtgtgatg cttacggtacctcaggtctggtatgacctcattgagtccccagtaacgtagacattgttt tcttagataatccgctagagcggcggtcggcgaagaagtccacgtcactgataagtcaag gcgactctgacaagctctccgtgtcatgcacttaagcctagcaaatttaggatgaggaga aatattgcatcagggacggggaatccgaggataaagcactcataagcctgttgacacccg catgctgaatgctaagctagaggcgcaggctccaagccgtcctcgcaagtagatcttcgt gaggtagcgtatgtcagtagagtaccatcaggcaccctggcgctcatagcccagcgcctc tccgatgttggtctcccacgagagacccggtgtagcccctgtggaagagttaataagcat catcgacggatttggtgaatataattcctttagggaacatatctataatgtgaacaagcg ataacagccatgatattaattgaacaaattcgatgacttatgtcctcgtccaatgttttg gtcatggacagtacgccatatactcaactcatgggatgttgtctcccccctagcgggccc accgtatttaaagctaaccgttatttaaacctggcctgcatgtggtgtacgggagcagtc agtactatctcctagcgtaccacacaccacagaatgtttcgtttgaataccagttccagg gagtgggaatgttggtgaccagaggatacatcgaagttcaggcgcgttgagccagtggtt ggtgggccggtcgcgtaacgaggaagggccaaacggcccagcatctccgcagatataacg gtgcacgaagaagcgatctccatacaggtaggtcgcgtcgctaacgcaatcctcatagcg gtgccgatcaatgtgttcagttgatctggccactgcggtgcgcagtctaacctaacatga aaacccatgatccgaccagatgttatcggcaatgacggagcaaaatattgtggtgtctgc atgctattccgcatcggctttcctatccgcttaggaggtggaggacacgcgtaattcgac ttctcgacactacaaacgttcttatcagtggttgatccaatgcctcctgagtttccaaac caatcgatacctaccacggctagaacccagcttaagtcccggaccgcgcgctggagatgg cagggatgcttgcttcttcagtctcagggtagtcacgctcgttagagttacagtcaaact acaagtgccgaatcgcaagacatggccgtacatgttccaaatgtgcgcgaccgacgaaaa cgatgcatctggaacatccttcactttcggggattgttccgtgtgtggggacgaccctct ctgatagtagggaagcttacaccggatgaccagcggtacgggttttataaaattgaatgc cggaacacctggtgcatctgtgtctgtttacaaagtcaactgctaaagtccagtgcacct aagtgctagagccatctcagccaggtggagagataggaatggaactaatgagtgtccgac atataccgaaatagtgaatagcattatcggggtcacctacctcaccataatgttaaccag tacgtggaggtgagtagcattgatgttggttccacgactctagttaaaagtagggatgtt gcgtggtcaggaaactccagcacgcagcaattattcgatgataatggcgcggtcttgtgg aaccgcatgattcattaatcacgacctcaacacattcggttgaaagtaacgaaagtacca ggacggcaaattgtggatcgatgtcggcttgaaacagtctcttgtctgtcaatgattgca gctattgggctcctttttgagattactcatctcatacttgaatgtacggtcaattcccgc tcaggtataagggctaggaccaactacgggcttagagctaagtcaccggtgcagagcaag gacgtctctcccagatatataagggctttaccggtatcgaattaggcctttatccagtgg cctagttacggtcgatcgtttctcgtgaggttcctatacacgacgtggagggtcgcaatt gcgaggcacttctaggtctttccggacagaccatacggccccgccgcacgatgggatgaa ggggatagaggtcgtgacgctaagtatgacattaacggggtctatctgacgccagcatta acgcgttgtgaccggaggaagtcgaaaccggtgggagggcgttcttctagcggtccagag cccatattaaccgcaagcgtgatcggagtcgaccttacctctcagctgagacgaagtgta gtggcttgtctgagctccggttggtccctcgactatgctacacaggactacagtgtgctc ccgcattgacaaatgactcttggggcatggactaacgagtgatcaagtttcaccatttat tccctagcgtaagtcgcgtaaggatatcaggtcagcctaatttagaggatttcgtgacgc ctactggacgaagaggtgttactgcggaaggttccagaaaaggggcaatgatccaaaaag caagatagggacggacttatgatgcaactgttgagcggccggcagaccaaagcgcttatt gctgcgagaggagatgggcagactgtgtgcgaaataaaagtgtcctgtcgcggatgagtt ctagattgtcggacacctggtagaggcgcactagcaacaagaattcttggaatcggtagt tgcctacagtcctcgttgtaccacaaggccctcagaaagccagggtaagttccattacta tcaccttcgtttccttcctataaattttgcgtacgctcagtgacgtaatttcgtcgccgt atgtgtgttccaagaccttacgggttataggtgtcgcttccctagtcggaacttcgattg gagttcacgcccaattcaacagagggaagatgcgacccacaagctgcaaccagtcgaaat aagaaggagcattgtagcggacccttcacagtggggctcttagcgcactcgcgccaggga tatcgtgccccccgcactgtacccaaaagggtcagatcatttcaatgacagagcaatcga tctacgcaaagctcctcggtgtaatagacaggttagaggcaatttctggttagcgcggtg cgtagccctagcagataagacactagaactccgaggctatgggtaagacatcgccgccgt tggagttagtgagcagcgaaatcccccgactggtgctagggtaagaagacccgtttcgtg cacccgggagaagactgtagccgacattcggttgagcagatccatcatctaagtgttgaa taaacaagctttggtccggcagtcttcgcgcattccctaccttcaattcgcttccctcat atactaaatcaagagatcgataataaactgtattgccacctctgttttgctggtcaaagt cttgcgactaccacggcggaatctcgtctttggcatagaatgcccacttggccccgagta tgcaacgacttaagcagcgaaaaatacggataagcaccatgaggcgaacacgctcgcagg ggactcccagtgctcggaccgcgattgcgatccatgtacgaatgagtaggatctccaccg gtggatcgccgtcataccctcagggaggttccccccaagctctacgtccaacggaaaaat caggcgtgctcatcttcattcgtacagtgcccaagaccgctcgcacttgcgagtgctggg accatgacaggtcgcggcatgaatagtacgaagcgggaaccacggacgattcgtcacaac aggtcccgattcgtcttgaatactactgcaaagccagcgaatgacaccgactgctaacca cggaggaaataccatgcgaactgttaacatgcaatacattggtgctgggctcatccctgg cgcaaggccacatctggactgaccgtccagattaaaagtatgccgccggacgcgttcgaa ctggtcaaaaacctttcgataaggtgttcacgttactcttatacgaacaaatctaagcct agaggaactagacatagcagacctggttgaacttgcgcttaagcgtcgtcaaaaagcgct actagtttataacctgcaaccttctgccggctcgcatagcgaaacgcgagaacgcttggt tttagtcgacgcgctcaaatctatgcttttgaacttcgtggctttcgtgtgaaacatcgt atcgtagcatcatcacagatcacaattcataacttcatgccgcatcgcgatagccccccc tctttctagaccagacagatgtagacgatcaactgaatcggccgtacgccgtactggcat ggttatgctgcaattattttcttagggcagatatcgatctgacaggtaagactaagacca tctcggcatttccgagagcttataaagctccgtaatatgcgtgctacacctgcgatgaca agtgactcccgagaaaaaacaaagatcttgcacactggaagaggtgttttcactttcaat tgaggatatcactttgcgctcgctacgggacattagccacataacacacgtgaagcccaa tgtgctcaataagcggtggtttggacaatagggtccaaaattcctatcgctactcaaatt tttgccggtaaatggctctgcgtgcctagcagaatctctttttgcagacaagcggcgacg gcccgagaccggctggtcagtcctggtttgcactgatactctccataggacccttgcagg tatgggcgagaaatcctcgggatgttatccagcaacacgtgcgttcgcaaattctgtaga cttttggactaaaataagtgcactggctgttcacgttatcgagcgacgttcccgattcct tattgctctgcgcgaaccacggtccgattgtagaaagacggagtacggtaaaaacgccgt caagtataatgtcagtgactttctataaaaggttgggaagtacgttatgtaagttgcact cttagtccgcatcggttccatgtgccccggtactacagcgaaggtcgtccacactcagca aggagaaggcgagacgtacgtttagctcttaacgtaactggtccaacagcctccttcggt gaggcgttgagcgtagcagggtcactaatcatgtagggagagacagcctctggcagacaa tcgttatcgaaacacaccaatacaggcgacagccggcccaattacaaggatacagctgct ctgggacagcatcgtttcctgtgaaagctcgtcacgattttacactcatccttggccgtt tacaccacgatgcggcttggataagagaattaagaccaagtgatgccgacatcattatcc gctctatctaccacctattcgtctttcgcctacggcctagttctactagggtctttgtta ggtaatgtaatccgtcccgagtggcctctgcacgtcgcgctttgcaaaaaatagcgccca tgatcgaggaattctgtattatacgagatacctggcgtcaaaatacagggaatggggtgg cgcgcaacttgggactactcctcgcccaccagtgacgtctgaggatgatcagccgcgcag gtgcaattaccccgccgtctaagctaagtctaaaatcccgagacgtttcgctttgattag gggattgctgaagccaggcacccggggtctcagctgcacgcctgacactggacttgccct cggcgcagcagccttcctcctctgtgaggtcaaaaagtccttattaccatagtcttttcc gtgtgtcacacttctagatacgcgcagtgaacctagcggtgtctgagatagttcatactg gtatatcccgtttatttgtcaactgttacccctgagccgaatggacacgtgtgttacggc agtgtgacagacctccgcccattttggatgatggtatagcgctacatgcacggcgaaggc ctgccaaatacgctgtagcggaattaccattgatggcattcgatggactaaggcacctac cgccacgaacaaggtgtcagccttccattgaggcattgtgaatcaaagttctgcctaacg ctgtcctctatcagctggcgcagtgtttgtaaaccatcgtatagtccgtcataaccttct tccttatggtttcgcaatctcgcgccaactacatgggtctgatcataggcgctccaatgt acaacttagccggccaggtgaagattgaaatcaacactacactttccagggtcgaaggag tgaacacccaacgggcgtttccagagtgcgacgactgcacagttgcccaccgctgaggct ctgagattaatgcgctacatgtattgtatgcagcctttccttatggaaacgagtttcacg gcataacggtttcggttgtgcagggacctctcgggtttacagagcctaattggattcatg tgtgggatgcgtcaacgggcaacttccaatgtcgtcctaggcgccgatgaccaccattct agctcatagtacgtaggaaataggttgtgctgattgtacgctactagtataactcccgta tcctcgctgtgaggagatcggtgagattttcacttgagagagagaattacctcacgagca aggttaaaattactcaaaagcgattttcaggttaataggatgcttgaggctgcctacagt tcagatgaacgggcattgcgtcgcagaggttcggataacagtgaaatattcggtctaatg atattcgggtaaggagactataattacccggtgcagcataatattgtaaccgcggtcgtg cgttaggctgatctacgacggtaaagaaagccggtatgctcgaagactgctggtcccagg acctgtgaatcaaaactgaagccctgtgctcccacgggtattagagcaactgactagttc cggaccagtaatccgggggccctggagtgggcggaaacacgtcagattaacaccttcgag tgctattggctgattggccttgctacgaacctcccccagccagcagaaacacatctggcg acggtgagttcgcctcgcaagaagaaaccgttaatctaacgtgagtcacggccccagacg gtcataggcaggggtaggcgtagaggtactcgtcatgtacaaaccgtcccgtaaaattaa agattaccatgcaaggctctgaaggtttcgacgccgctttcaaaagcgagacatgaagac actcccttccgcacaaagaaagggaagtcttaggaagtcttgtaagtgccacttcacctc tgacatgagcaatttgatccttggactttcttaattcaagccgcataccggtcaaacaca tttactatttgctttcacgttcccgatagagtacctatggtggctctataaaatgacatt ttagggagataggatcgcttctagtgaggcgcggagcatggggtcatcgctttcgctgcg agatcaatcgttgggactcggctcctaattcatacctatgaataggtttgtggcatgctg tatttccagacaacaaattcgttgagatgtccgtagccgccttactttacccagatgggt tgttatttcagtcggcaagttctacctcgtgggctgtacctcagattgcaaatactccga agtggatcggattccgccgtgcgttacaggaacaatgggggttgctgccttgggcgttca aagttaccaacggtcaatgtccgggagcagttacgagcgggcgtccgtgtcaaaggtttc aataaggccatcacggctagttcattccgtctcgtcaactgggtgggttatgactgtgtg attacacaaaaagtcatttttcttttgatcctgatggccatgtagttctcctagaagagc acagcggatagtgatcgagccgcgatgtgcgagcaacccagcccgttgttttcaagttcg gctttgcctagatagaatcaggggctgtatcattgagttcgattctcccggtcagccagc ctgtcgccaaagaaagcttcaggccgcgttggtcagcgcggcgagtagcgaggagttcgg ctcgagtctgatctacttccctgtatttccacggtgtccacccccgtgtaccgctgggtt aagtaagccagctcgaaggttaaccagtttattagcgcgtagtcaacatgggtatctacg acggcccaccctgaatacacgcatcaaacacttggttcaggatgactctaccctgatacg tcagagggacttatttactattgtctcgcagggacttaaatcatcaggcggaactgtgta cctgtgatcggatagtgagagttgtgatgacggatacaagctagctcgctgcctactgga acatgtagtgtgtaagtgtcaggctactctgtactatagagtaatgcgggtcctaagagt tcccgctctaggttcaccaatggtcatataaactgccggtgtagggatccttggcctact atcgacgggcctcctgcgggtgcggtccaacgagtatccggttcgtcaaggagttgagac tgttgatctctcttgctcatatggagaagtgatagattatgctatcttcgttttctgcaa gaacagtagaacgaacgtgtacaataacctgggccactcgtgtcgtccaaagcctcaaga aatactcctgcacgagatgagagcatttctatatcgtcgaagttcatgccgaaaacgatt atacgttgcacgttacttagaggatcgcagcaagagtgtattagaacatgaaggagaata gaaaggaggtgtccgatggatacaggtgcgactcttggcgattaagataaactagggagg gcgaccttctcgtgtgttccggtaacgagcactcgtcctttgacagagtctgcattacat aaggccggtaccaggagaataactcaggcattacgaacaaatctttcgaacggagtcgat aggcatgccgctgatacccaggcggtggggtatactagtcaaccgccgtgaagccaagcg ggcagtctactagccaaacgtcgagcttctcttgcacttactaggactaaaccctcgggg ttttagatggtttcttccgccgggcgccatttgactccgacctcaccgcgtgtatggtcc ggtggaatccaacacggagaagcaaacgggactctgcacagtttgagcgctgtccgagaa gcgtgctgccgacttccaggattttgtagattcaccctatgtgggtgctaacggcgtacc ccgcgcctaccgcagacgaccgggttggaggaggaattgcgtgcactaatctcctgccag ctgcaccgggtcggtgccacgttctccttggacggaggacacttccactagccggtcctg tgagtgatctgctaccacgcaacagcaagtagtgaactcggcgattgcagctcgcgccac ctaactcgttaaaggcgccgtaggcgcatgccaggtcgcaaaaccggtcatattccccaa gatgacgcaagttatctttgctcgcagtcgatcctacgacgtataccaaagaaggtaccc aatcattagtctcaacactaaagatagtccctcaagtagtggagcaaggttcgcacttgc acatacaatcgtatgcagagtttgagatgctgttcgtactcaaggctattaacgctatat gttagaagagctgtaatccgacatttatctgacgctgcttctaccccgtgcaatcgttgg tgaagggtattcgtgctgcctcttttccttcgagagtactgcgcgctccagttatgaact acccctgttaggaggccaatttaggggcatactgcaacgtttgcgactcatttttcgcgc gtaactccgtggagaatatacaagaattgcccacttcagtttatacgccgatatggtgga aagccggagagttgatgtagtaacgaggctccagcgaaaatgagtgcgactgcgatagag tggagatctatccaacggcatggcgactcagctaggggtggggtaggatcctgttggtaa acctagaccggaggctgcctccggggataatcatctggcagtgaatccggagaattaatg acctgaccgatatttcaagaaagtgcaggggctgagatgcatcactgattccattgtggt ctcgatgttaagattagaataatggattagaggcacttgttattcgtaagtgttaattac ataacctctcttggatagtcctttgtcatctcgggttgtgcaatagcggggcttcagggg tcacaaatagacttacccacccaacaatctacagtcttgaatgacggggagcacacaacg accacatctgcaccgttcattattggagcatatggccataaaagccgtacctacctctct ccggagcaatggcgaaagcggtaatgttacgtaactacaaaccgctaaagacgaagaaat ggcaaccactctgtcccggaccgggcagctaacggctgaaatcttgtgaactatattaac tgctgtacagcgctgtagtacggtttctggtttcggctagtagtaccttgcagaagcacc gattaccaaccaccgtacgattcgccaacgagttagcctcatgcgttcaaggcgttcccc gcaccgtctctcgctcgattggacaccatgctattgacggcatgattacgagggcagcta gtaatgacaactactggcccgcgttgccaattcttttgcatcacgtggctagtacgtcag cagcgtccctttctgcagaacgcagggtctctgtaaccgtcgctgcgcacaacgtttagc atcaggaattctcgtttctactctgttaaagggtaacgtggcaaacatagccatcgcttt tggcggactcgctgtggtccaggtcgacgaattgggggtccagctcttcgtgagctgcag ctggcgtgaagcaagaactgagtggcctgaatgtgagcagagtattgggactggtgatcc acgcaacggtttgagagcgtatgggctgcaccggctagtttatgggagtaaacttgaaca gtagtgaacttgcaggggccgattttatccgggaggactgaaagcgagaatagcacgtca cgcacgcggtgctactgttgctgatcggtatgagtcctacgatattcgccgtctatggcc accacacaaaaaaacggggagggcagctgattatcagtcggtgtaatgtattcggcatct gcccgccctcatccgcctctgtcccgtgtcaactgtactccacgtcagtgacattttcgc atcacgtcggtgtacacaaatctgaataccgcaatcacgggccaagcagttatcctgcac tgatcactccaactggaggcatcgggttacacgtctccagacctggtctcggagccgatc ggcgattgggtaaaaaaactgatttttgtgcgcggtggcaggtggggcacagaccatgtc agtccgagtgtatctttggtgaattccggactggagtacattcgcctaattccgtctggt tccgtatcgctggggactccatgacatgtctgttggctccgagggtcgtagtaagaggga ctcatgctatactcgtgttgcacgattcctttagtatatagtaccatcggaggtctatca tggcacacacaacttgagtcgcaccggtaacgcatacattttaaattcggggaggaagtt tcgagcgtacatgatcgcgatagagggaccaagaatagggggtggagtggacttgacgag ctcgacctgccctgtcgctgccagaatgccatccatctagacacaaatcttacggacccc ccctggctcgtagcaaccgacggcgatgactcgacggcgcacgctacgcatgggggccgg gccacggaacatatagttaatgctccctaaacggcctcgagcaggaccaaaaggccgccg ttaatccgtagcgactgctagacgcactagactctgcctgaatttatagcggggtgtgtc ttattacatcaaggtcctgaggccgtaaacggatgctcggtaaggcagagatttgacttt gacaagctgaatccattgcacccagatttaaggggatccctgtatcccgaaacttccctg actactcaagcagtaaccaagcatgcgatacaaccgatctgtatgctggtttcggggcgt agtcgaggagcctagcgacttggagctatcgaaccaggcccgaaaccgtgttaggactcg gatatagattcggaggaacttggcatagatcgagtaaattgggtaacactttactggacc ttgatttgcgttgcctatcgcaaccggactactagccgacagcgtaagacccagcaggca ttctaaggcaaccgacctgatagatctaaagttcttctcgaaaaaaccgcgtcgacaggt ttttaacggaccgcccctcagatcaaacagtctgagatagcagcgcttagaaggcagagg tggtggtccaccaaccgattcagggccaccagcattgctggagtaaaggccaaataagca aaatgaggaacgatttccggaatactcgatacgtcccagctctaagggtgctcgcggcgg agtcgcgaagtgaagtccgtgcgatgcataggctataacgggacgatctgctgacgtatt gttgttcgccctgtattcacacattttatgttggccgtagggcggcggcctggtgcaccc aaccgtgcaacacggcacgcgttctggggcaggcggataccctcattggtgtggagggtt agctgcgtgtaggtttgctctatccataaaaaggaaaactacagttccagagcttgccgg tgctttcacactactatagcggcccttcaacggcttagccgcatgaatattcttgagcct cgtattgtgc >Contig2 gattaatgtctggacccacggtccgtgctatcacagtgtccaccgtaggccacgacaatg gcacagaagatacccaattccaaaacgctgtcgggtggtctgattagggcgtccgatctt gtaaataggaccaagcttaacctggaccactaattcctggaagctctgccagagtatgtt tttccgggcgttcgatcgttgtctgttccgccatcaaaagcacagttttgagacgccata ctctccttctccgataacataagaggagctacgtgaactgttctcggacccgcgagtgga gtacaactatattctcgtcccataaatgaagtggcttatggggcacgaacatcgtgtcat actctagagtccgattccatttcgacctttccctccaggcgccctgggttatccgacaca aatacgggtaccagacgacgcattaccaagaatgatgtaccacactacgattacattgcc cgatattccgtgagacaaagccacattaaattttacaagggtggtatttcgcttctgatc ctaactaggaagatgcaagacatcgactcgagttgcacgggagagtgggagcattgttca gccgaaagccaatgagcgacaacggcatgggaagacagcggtaaaagcacaggctctcgc gttaccttcagtcttatatgcgttcatacttttggaaatcggctgcccgtacattaaagc ggcgccgcgggccgaagctcagcgcagcatggtatctcaaggcatatccgctgaaatgaa attgataacgcattctgatatccaacgcaggaaagatgaacgaacttgtaagttgaactg gagtttttagacagccgccatgtcgataccagaaagacctcacctaggtagtaagcgact tacatacgaccgaatatctatggtgaacgtaaggagctatttatcagaccctaaaaggct tttattgtgctcaatatcgcatatgggtgcataggcttagctggagttggtacctgcggg atggttcctggtccacgttatttgagtaactggtgtgccatctccgactcaattgaaaaa cttacctctacgcattccaacgcgtcatggatgaggtatcacctcacaatacggctttat ctggatctgaacgggtgaggaacttcgaaacactaatattaggcatttggggagtaggcc tatccacgacgactggagatggtccaaacgtcttacgtagacgggttctatcagtgttgc atggctattactattagtcttaacttggtacctgtaaaactcatttgccccagaaaaatt cttgccctgaaaactgccctaaaccatcgacctcgactactagagacgcacgcgcggata ttagtgccggactaagcgccataatatgtttagcatgtcaacaggtgcgtgctgacgcct gtgcctcattactgaggtaagcgggtgcttgatgtactactaattggtgtacaattgtgt tttgtgacatgcttgtctcccgctccgatagttggaatgtggatcatgtggagctgtact gacacggtacgatggctgtgtggtagtcccgggtcctgacacctaataactccagaaaca gcgacagggggggtatgcaattacggaatcagcgagtagcaacataagcggaggacgtac accgttagttgtcagttgtattggattgcagatgcaggaatcggtgcctagaaaagttat ggatggcaagggttccgatccggtgcctacaaaattacacgtatcaatctcgtcctagtt ggtaacatacacgacgcctagatcctgatacatgcggcaccatgtggcaccaggtggcaa cataaggttctaccactagacacttaagcaagggaattaagccggtgttagcaactgcct acccgcgaccttgaacggaccttcgaagaccttagaaaagtagctgtaggtgcaggccta ccgtccagtacgaaaagtggtccatgcgctcgggctagttattgaaagttgactggtaca cgctgcaagctacattatatgttgcgccgaagtacttggtacgggtgtgtctccttccgg gtctatggtcacatcttcaccttactgcccctttacgcgagaatcatgtgtgacattttg gacgccgaggactcgccagaaccttgcggagactggtcaacggcccgttatgtaggcagc cgagttccgtttgatgatcgaacattgcatatattagtcctagtaatgcaatgatagcag gccccgctcaagtccactagaaacgagcatcaccggctcagtcaatctttcagctaagga gtccatgaatgtgggactccataaacttttgcgggccccaagttggcggctggctgagga ttctgtcatagactgctccagcagcccgcacggtcgcgaagtgcgtggggtcccgtcggc gccctgtacgggcagtgcaccatccgatgccccttaccgacgtgatagtccgtgatgctt cacggcacatgtgagctaatgcgtgatagctttctggggctatgtttcagtggtccaatg acatagcacgccactatccgagtgaatgggaaggctcgatatcgaagattcaaaatgcgg gagtcttgggttggattctgcctgggggtcttaaagatcaagaacggcccaaagcacact cgggcgggcccctagcagacgcgccttgtcctcagccgtttagtttaaaatgttttaagc tgtcgggccagctatgaaagaactccggagttgtgatggacgctcatgcgtccggacggg atcgtactgattggggtaggcccaaccgacactcttgcagacgccgcgtccacctgcagg aacgcccccttttcgaatagtatgcctacccgtacatagggaacttgtccacaacgcgga ggttgatctcgatgaagatagaggtttggcctgaccttaaagctggaggctctgaacagc tgggtcatacagcaggttcacggcgccgggagctagctgaagagtaaggacgacgagaga cataagcttcgccctttattaaaacacaatactcttttattggcacctaccgcaattacg cttccttgttttcaccccgggcagtgtctttgatgggaccattttgtaaggggactgtca ataaaccaacgcgtagcttccgtatcatctggtcgctatgcttgctacggctcgtacgct tcgcaggtagagctcccggggggtccagaacgcgctagcaaattcaggactaatctgaac tctgttttttgaggtacgcttgctccgatgcactctagcatcgtcactagattctcggca gtgcgtggtgccagtgattggatctaggtgccggtcgagatcgcgctcaccgaaatgggt ggcgacacccgatggcgggcagttgcgtctcattctagtacagtttcgaactgtgtcctt ccggtcacaacgaatgattgtctgcgcggcctgggaaaccttaccgagttccccagatct ctaatggttgagcttatcactgtcatagacgagcggggggccacaacctaggattattgc ggaagcacatttgccggctaagggtcagaaataaaagagggcgtcccttagcgcctacgt taccaaaccctgggctgttgatcaacgcttgggataacttaagacgaattcagcgtctgc acgatacggacgcaactactcaacggtatagtaaaattagaagaccggtagcgttcagtg gctggcaagataatccaccctcgagccgctaacttaacctagcgtcctcatcttctcctg ctggtcacggagtgggttgttactacggacgctatctctagaacactattttaactcatt cctggcgttcagcccttcgctaaagtaccagataagaggggccgcatatacatgaaagat cgttgtcagccatgggtggactgtatattccatagatacactcgcaccgagatgaggcat ttttcttttcaagatcgcacacctattgcggtgaaacggcgcatctgcgaacaatacgtc tggggagtcagaaacgtcctgtggtgtgcatctcaagctggtggtgttcggtaggcggtc ccgttctcacactaaacttcggtccgtcactataagcaccacataacctatcgttggcat acgggacgacgctttcgggctggttttctccgttgagagcatgattgaagcttctgggca tctgggcctctgttgcctgactctctaccatccgaaccttagggctcggtcgcgggtgcg attctggactagtcagatgaaagacgcctagcgattggcctttgacccaaaagctgcatc acctaggcgattggacttgacgattttgagtaccgactgctccctaatacggcgcaactc tatgtgcgccccgcatgacacgagatcgtctaggaaaatagaagcagccgacagattgac cgccttactccaattgataacagggaagcggcccagttcgtggttagagccttgtttaga attacggcaccgccaagactgtggtggtgcgcgcgtgcttacaattgtccctatggaata tgccgaagatttgcacctgccaggttaggactctcaaaaatgatagcgtaaacgtaggtg aaccgcattctcgttaccatgaccaaacgtgcatacgccatgagaaattacttctatctc aacaacaacgtacaatccggcacatacgttaggatggtaagctattctcgtctagtagca gattacggcgagcctgggctacttcatctgtctagtagtcagaaggtattcctagattgt gcggtagtacaccgtatcaagcgaaagtgatacttcgagtgagattagaaatgaagcgga agcatgggataatctacccccccggtaggtctcggcctcctaccttaagacttgggccgt aattgggaggagggtgtcgaaaaacggtaccgtatagaaataaacctaacccaaagattc aattttctgatgaagcacaccaccgggagggcacttgataaaattttgtcatttccatcg cctttaatgaagtcctgcataacctaatcactaggaacacacgaaaaacccgttgtatca cggaggtagcgcgatccctctttcactctcttccatatgcgccccttggagggtagggga tatgcttcaatcagactctgatggacatatgaacgatgtgggttagaagtgtggggcaca ggcacgcatccgatttcagtgctcgaatccgaaactaatatttacccttccccgcacttc aatgtagagacaggagaatgttaatctgactgagcagataaacaactcgtccatgccgat agattacgcaatatcacgagcgttaccatttagtttggcgattaagatggcgcaccagat ttatgcgcttcctcaagtctagatcggcctttgctttaaaacttaggaattgcgccgtga gccagacagcggatttatgatcgcacgtcttttgggaggcgtgcgataaaattatagcat gtgtgcccgaaatggtcgctgcacgtcaaatacctcgcgtacgaacgtgaacctgagttc cggagagacccgcctactccactatggggaaagagatccgtataaagcgggaagcccttg ggtattgagcggatttagacgatccgacgagtcgcccttacatcgtgctgacactgaagt cttaccgtatttacgcgagctggctcggactttgccgcacagaggttgccatattttacg aggatataacttttgatcagccgtcgattacatcgtcgagatcacgacactctatccgct cactgcgtcagtctaactacgaaacacgaccccccaccaacgcgcggggtgtcgttccat aataaacactcgcattttacaatgacgatatgatcatgtacaagaaagttgacgtgaaaa atctagtgctccgtgtttcccagggcttatatcggctgacagtcacaactacgcgaattt ttatactctcagtgttctcagcaaaatacgtctccgcaccgaaccccttcccattgaacc aactggtgttagttagagttttataacccgtcgcgactgtgagctaatccgacgcaactt ggaaagatcccctcgttggcagagcgccgacctacacacctgctagatcgccttgagcac ctttttaacacctcagcgctgcgcgctgtataattcgatgagacctaaaatggaccctaa cctcaggacttagcttccctaaagtaagcctatgcatatggaaaccgtcgttgaggctcg cgactggtgtatccccggctggccataacgaaatcatggtgtcgggtcgcgaatagccgc catcattgtcacaacctgcttttattattggcatagtgcaaccgactcttgagcaagctg caatgaggcagtgcggaccggaggccaagagacaacaatcatgtcagaagatcaggcccg tcaggcttgtgttctagggtatcagaccgggtggaatcttcccttgtgctgcgagggaac aggtcgagaagaacccgtatcgacacgccgtgctcacgatctgttgccactggaccaaac gcactagtttggtttcgcaggttgtaccggcccggggacttattaacgaactggcctcgt ggatgccgtcgggtaagtagcaatatacgggacagctttatcgagcatcaaactagtcga tccacggaacacactgccctccttaattttacgcaggagaagacttgcaaatctggcccg ggtacttttgatgacgcgtcagctagtagcacgtgctagtagtattgctcgagagaactg tgtcatactaaatagtgaaactttcgcgggaacgagacgtcttccgccgggctttgcact tcgcgacgtagaacaggtaagttaaatgacaggtcctaccagtgtgttgacggaccccac caggccacacggcccggagtgagtaacacctcaagctggaaccaagctcctggcctgacg gccaccgatgggcaggccgagccaagaatactacatcctctacgagggtagtgcggcacc agccaggccgccacttgagtcaggtttgatgaatggaaaatcaagtactcccctgcttgg tagccatgccagcatcatgaaggcatccttaatagcacgagtatggaggtatccggtatg acaaaaggcttctacttcagcttaccagcggacttcccgaagaccgagcacgttcgcgta attagttcaaagcccctttcgacttttcatggtacactgactttcacccacaagtggctg cctctgtttcatacccagggcagcaatctaagtggatactattgcaagccggtatatcat ttactcgtaacggcgaggcggtgaaccagcacgatcttaccctcaaattgatagaaatgt cggcaattgaatgaagtgaaagaggtttaaacgccctttcttattatgaaggaaatgttt cacgagtggctccattcatcgcggctttcctgcgtcaagagtgtcccgtgcggagcgtag tagaccccaagaccccaaagtcacccgttacacggctcggatgccctcgaaacagcgggt tatacgtcaaactgtacaaggcttatgattagtaatttgcttcgagatatacagagttcg gactcgccctcgcccctgcctaggcttagctcgcatatcgcggtgatgtaattatggcct attggggggaacacaaaccacgccagaattgatggcccgacgtggggccctgacgtatgc tactgtagcgatgttgagtgcacgggatcacgcttcacctgcgaacgtgcgaataagggt ggccatactagccatgttgtctggcagtacccagatgatctccgttctaggcaaaccaca aggggcaagcacctcgacacgaaggccaaagtcaggacctgaacgctgcgaggcacgaat gagacaggaggcctgcgactgcaggtgcttactgtctaataaactgttgttactcacccc cccaaaacttttcccattcatatgaattcgtaggttgaatatacctcagacatcccaaat cccgaaccgcgccctgaccttgtgtggaatgactatctagaaacgaatcagtagaaactc ttatgacggtatctaatcggtttgcacatgagctgaaggcttttaaggagattaaggccc tagtttgaaggccgaccctgacgtagcgaggtactctgcaccaggccctaagcgaattat tgaataactaggccgaatgaaatacttgtctagacactccgggcattggaagcttagggg gtgttacctcgctctttgcggcttcggtctaaaggcttggacctcgtcatcctctattct acgcgcccgacatcgctcgcatacgatcgtacatcttcttcggtagatcgcacagagaac aggtggatttttagcagacctggctgaaggtactatcatgtcacatcgaccgcataatgg agcccctggccgaaacccggagactaattgcgacaactagccgagcacatatatttctat catggtactcaaatgccataagggattggagcgcgcgcaagcagtccaagcgcaaggctg aggttctcttgacacattcggtatcgagcgagagctagcacagaacccacgcatggctta aaaacaaggcggccgtagggcataaactaaccgacgcgaatagtacccgctttttgctac cgccaatgggacccaccctgcgccgcaatacgtctgttgcatgcctgccccgctgtcacg ccttcaacgatctgttccaccgcatccatggtctattactctttggccttactcgatcgg gtgtgcataaaggaaacctcttctgtggtgaaaagggcgagacgcgcttttcgtagtgaa gacttactccttatcaccaacgcccctggaggcatggtgattactgcgcgcttatccgta tcacgccctgccaatcgtgcgttattcagcacgtccgttgactgtcatctgtgtgtggga ggcccgaggacgaaaatgggaacaatcacgaggcatcgatctctgagctcacagcggctg gtcccgtgcctataatttaaatcctgactcagagccgtgtggctcgcgacggttacataa agagccaattctttgcctttcggttatcgaaatatctctggtgggaagctcatctcgtgg aatctccgcgtaagtgcagtatcggtcgtatctcatattgtctagttgcctaggtggtcg ttgactccttggacaagcactggtcctacaggtggtacgttggagcgaacggatgtgctg aaatatagccaggtacgattgccctgccaggtgaacatgacgagttgaaatggattaggg cgcgcatgagtggagatccgctccacagtagcgccacttatactttgctttataagggtc agggtcctccacggctaatgtacactaacgaagccaaactaagggtctactcgacgaaat agtgcatgatcgacgagttcaggtaagaaaggagtttttactggtaaagatgaatatatt tcctactgaaagggataggcaggtggcaatttaatattcccttagacgactcttacgcgg aggccgacgcttgagcagggcatcgcggtccttagtcgagtatgtcttcggaagaatcct tcgaaagcaaaagccgcttaatttgtcgacccgtctaaattaatcatgtgcaagaaattc ttccggttcaatccaagatattattcatcgtcgagaatcgacgaggctaccgaaagaggc cctgttcatttatccgccaacacaccacgtaaattcgtatgcttagaaaggcgccgtgta gcccctaccataagtgtcgctctgggcttactctaagtaggcttccctgttacgccgact agcacccatctggatacgagtttcgtcgaacctttgattgtgaccgatccccccattaat atcgttcgttgagacctgatattggtaatcaaggtagatgattaaccgcttttgagggaa aataccgatagcaaggccaaggtctcacgtctccgcgaatcatacgaaattgcccatagt aaacagcatcccagtccatcatgaatgtgtccgtcgtagcctcgggaaaaaatccaacga ttacagtggaccgtacctgaagactgtcactgttcgttcaaagagcgcgcgtagttttac cctggtcgaaaagtcaagtgtgagatcggtcatagtattcgatgccggcatagcgcgagt acactcccaccttcattcgatctatcatgggcaagaccgtgttgccaagtgcccgttcag gttgcgccgaaggtaaagtcgcgggtgagggtagaccagtatttagatctcaatggagtc tgccagcatacggcaggccacgtaggtccttcgtgtaaacacagataccactaagcattc ccagtgcacagggggcgaaaaggtcacgatatctgcggagttcatccgagtcaaggaggc ggaaatgacccgcgcgagcaaggggccaggttaactaaacagcaaattctgtcgactgac tcaaaggcagcgtacatgataagcacgttggtcctctgggctccctccccgttgtaggct aacttgtgaacgcttacggacacttttctgctgggaacagcaagctacagcattacaaaa actagactcatgccgggtaggcaacgcgtagcgacctgttataatgcggcgcaatctcga tccacagtgtctagcgaacactacaaaataggcaaggctgagacatccctcatctggggc tcgaagagaacataatgctttaattcgtgcacattgtcaaaccccagacgaacattaagc ataatttccgcttggggtgatgtctacacctcgccaatccatcccgcagggtatctattt gaagggaccctcgaacacctgctccctgtttcacacgcctagcatgatggcaagacgaac atttcaatggccggcatgagaaggcaaaggattcgtacttttccgaggggggaaagggat gagtctgagcgtcgcatggggttgccaatttattgggcgcccatggtatccccgtatcta gggattgagttgatgccagcagtaaattactactaatttgatcagaacgtaataccggta gccgactagcccgtcacgacgtcgtttatcaactatagacagctcacttgaggttaaaac gaaggacaaaaggcctggcgttgtccgtgttttaagtctgagccgggttgtgggtgtgag tcggcaaatatcttttaatagggtaaacaggccagatgccatgcagctattcggaatctg aggacagggctgccgttgttgcgctggttttaagttcagtcattactgacgcgccaaaac aagagtaacctaatggatctcgtaatctccgagacgatcagcgtgacagaaatgtctgct gcggcacgtcttggcggagctaagcctaccatattttctatatccaatcgtattcatccc aggcgaccccaacagaatcaactcggccttggaaaggaagtgacgaactcgatgggtcca ctgtcacacggagtcaccagtcgtcctctgtgatttagagttctaatggagccggtcagc cacgcagagccagaccaggactcccgacgcttggcaacgtagtccttcaggtacgggcga catcggtttggtcgtatgcatgcagcacatacaagcgatcctcgtggaatcacaccggtc tattcgctatctgtcttaacgcgcacggagcacgtcattccgaagataaagtggattcga ctattgaagcatacaggatccttatgactggctgcggatggtctgggagttgatcgattt tacaggaggacattacgggtggagggtccccataccgcttgactcaaaactgggtagggc gcacctatatacgagccatacggttttcaatcaacatagcgttcaaccactccagcgcat accacttcaagtactctgtaccaaagatcgcagtggacatccctcatactgtttatcacc cttcgagcagagtcttatagttccttgggatttcgatttgcgaagttaatcgagcatgct ttccttggatcccgtcgggagggcgtcgcgtgtataggcctcacaattttcccgtcgcca tgtgacttggaacttatcagagaatctcgtttccctaggtgaacgtagctaggcaatcgt cttgggtaagcctcgttggtaggtgacttttcaattgaatcgagctgctttgatagggct tgcgccctcaaataacagacagcttctattgccgccaccggccctatgccagtttcaaag cgaacggcatccagccagctgccccgtgatgtactaccctgcgtatcaagcgtatgcgtc gcggccacttggaacacgtcgtcatctcaccgtatgtggcatatgccgccgaacgcgagt cagacgcagcgtactacatcataaaggccagcggtgagcatgactttgaggcgtatcggc actccgtcttaacttggtcgtggtaaagctcgccggtccaaggcactatacgtagcaatg gaataatacggggagtttacgatggagcgccaaaattggtgttcgcccaccttcgtagag gcagtgatatctctccctgtcctaacaggtaaaaacctaggtttgacgtgttcgcgatgt gagtgcccgtgatttaagcaactcgagacactgcaaaccagcgacggtctcactattgac tgtcgggtgctgtattagttatacataggtgagtcccaaaacctaaaagttaagaggtta caagatttagggaggagatattccgctttccacctggtcgcgttggcgatccagtcttca aggccggagaactcgacagtcagaaccccaagtggaccaaatatgacacgtagtggacag tggggctcttcacagattactctctcagcaatccgcgcttgccagggagcagcgtcaatg cataaaccggcgactaggcattgcgagattaggaggggaattaaggtactgcaagtgaag actcgatgctttacgtgggccatccaagatgactcaacggtccccttcgtataaactcgc gaattgcaaccaagacagtgtattctacgccgttgtgaatcccgcatcggttcgttgaga ctgcttaattttccggaggagccaataccctcgccgttcagtacaagtcgtagatccgcg tgtcgtgtcttgggcaccggaccggtattaacggaccgttgcagcgcaggtgaaacaccc tttgtaggtgctcggggtgggggacgtacctatgatgggactactatccagggcagagac gtttactacttaaggtcaagggtgttagggtgcgtatcgggtcacggttaaggcgacgta ccaaccgagtccaacagtagtaaatgctcactgggagtagccatattcgaggctcgggtc cggctggttattaccagctggtaaggccttttaatctatcacgcgcctaatagcttggag cgatttgattccgtcacttatgccatcactagcaccgggtgcggaaccctcttcatggca accgcaggtctcatttatggatgagtatttagttaactattaacatacttaacggggcct accaaagtcggtgactaagggccccgggcacgtctcaccctgctgtacatactgccttac tcatggtagttcgtcggactccactcgtcggggctgaacttctagaccgcgtgggagggc gatatattaccgttcgcttcattcgatatcgcctaactaagggggggggatcccatggcc ccgcaacacgcaaaaacttgaacagtgatcgggatttctaaccacatatccaacaagctg gtactttccgaaataggatactgctcggttttcatcgggacataggatagagacggatac agggactcaatatagtgaaaactccgccacgcctgcttacacgtcccaacgcgttacacc ggaggcacgtggtctgtcttatacgactggggtaaccatggcaacaaaatagttcctctg gctgggtcggactctggtgtttagggatgcgatataagcttttggaagccggacgcctaa agcttgggtgagaacatgaggttacataccagggagaaatctgttcgtgcattggtttct gcttcgtacagattcgcgtaatgggggctgattagcttctggcaagtaaggtatagaatc aaccaccaaatgaatgtcttacgaaatggtgtgacggtcacccagaggaccgcgagcatt tcaatcagaccgtgaatctagacattcttgaataacgagcactcaatgtagtttaggagc ggatcttctccgcaaactgtgtaccaggacgacttcctgtggtggaaatcgctgtactag gggtagaactctgggctggaattccagcgcggcgtgatagcactgtctacccttcacgtc actggcagtcgttcgtgtcagttagcgctagaccttcgagctctagatatcataggccgc tagttactgggatttatatgaccataagcatccattcgtgttacgacagattgcctctgg caccctggcccaccgagacatgacagtcacaagcttgtatcccccatggtgtccgcagag gtttggtatgttgtaatttactgagtttaagagatgcaatatatagattttagccgaaat ctgtgaagatcactagtcaaggcgcgcccaattctataatctcacccaagtaaccccttc agttcgccgactccgcccaacccctcttgccctgttctgtctgccttcgagagagaccca gtttcaattcgagctgctacggataaaggattcgaggctccgccccgtgcatggcgtgaa ctgtcagcgaaaaacgtcctgtggtaaccctgtcaagacaggcaagggtgttctttctat gcatggtcctctacgtactttactcactaatgaactctgagctgctcagaaccatcacca agaattacgcggtctagaccccgggcagggaacacgcatacatcatatacgctagggaac tgacgaacagattacatgcctcctatgataccggaaggcgtgcttctacttttcccaaca tgagatggatattgtcccagtatcccctcaaacgatcgggagagtcgagcaggtgtctca agctaatttaacgtaatcggggctgctgcggtggtagcggtcgtgaaaacccggcggtct cgaagtcggtagtaatagtagtgtcccgaagatggagatggacgttggcatgtgcgtttg atactgctgttgcgcgcggcggaatgatcttttcgcgaccgccgagcacggtcgaaaagg ccaggaggtggtcattgtagatgatatttcgatttacaaaatggtgttctggaggagctg atttgtttgggtttatggacagatggggaggccgtacccccgaacgtgatagtatgatag tggtggcgttcagttgactcagagatatcacctggccgacgagttgagtaaactaacctg gcaacgtcgtaccgtatttcgttacattcgattggagaagagcaatttaaatattaaaaa accacatgcggagtctaatcctatgaccccactatataggtaccgacagttacgtccaga ttacagtttatccctcgggcagttcgcctgctgattctcaggcacaccagctccgctatt actgggcatggctggacgagatctagcatcgactgagaacatggccgagcagagtccacc accttcttctggaagcgctgaatgtgccactggtgttcctgctgggagtaaacaccgttg ttgtcaagagtccgccgcttatcacctcaggggcgtgtgaactgaattagtcactattcg ttacagcacccggtttacttaattgaatctatccaccggacggcaggtagtcaaccactg ccatgctagctccgatcacccaaagacggtccaggatgcgtgcacgttcctatgattatg cgcacacgatctcaatcctatcttaccagactgattatactatgcggtaggacgccatgt ttcctgtcagttccttgactcagcatctcgactagtctgaaaattctaactctccacagc ggctttcacgtgggattgtgagtatttcctgtcatcagatgtattcgagagaaatcgtgc agtgacttcctatcaattttgttgcgatcagccgcttacgtggcaccaagtagcggtgac acgaccgcgccccacattccgccacaacatggaccttcaatttaagctccacaccagcgg ccgtatcatcgtcatgagcttccccctaccccacatccgaacacgttgctctacactggc atggcgagcgtggcctgaggccaattcggatagtattccgtagagtcgtaagggaatcgt ggtaaatagcgtgaggagcttgtcaccggctgcacgacgagtttgagactctagttccat aaagcaggatcgctacatagtgagtaattatcagtcaaccacagcaagcaccccttactc ataatgacggtacacaaggtgtaacattgtagaactcaaacaccaggtgtggaaccgacc gacaaaaagcgagccctatttttcatacaaatcaccgaatcttagggccagcagtacatt taaataaaatgacttccccacaggaatcgcgatcgcgtcatagcaccgaaaatgtgtggt gtcgattagaccttactgcagtgagaacaatgtaggcagtgcgccccaacgcccgtgacg gccgaaagaagccttggtccgccaacaagtactgtcgtccctgcatacgtatctctaccg tattgctggttgcggactagaagtccgatcgcctagttttacaaacccgatgcattcacg tctcaaacaacacaaagaatgttacactgactgggtattcaactcgcccatctagctact catcttcggtcgagactaaaaggtgccccgcagtatgctgtgttatgggagttcatagtt tagcaaatccgggattaagagggtcaagctgtaaccgtgcggtgtctgctgacagaataa atcagaggactagcgaaaccgtagctataaaccgccccagaaattaatctgcgttggtac tcagtgatgtttcgccatgccactaaccagcaagtgtcaaccgtgcggagttccaagaca cggcaacaaaagcagctctacgcgggtttcaagaaataatctaaagctacaattcatgga agtgagataccatggtgacccacaatcagacttaaccagtaccgcaaaagcttgctcgcg ctgtcaaaagcctgttgtacacccaataatcttggtatatttaaaagttcactaaagggg cccaagggagtcattaattgcggtgcgcgaatggtcacttcatccccttgcccctccgcc cgtcgataacgcctgcttgacgtccggatccctgggcagtcggagttcgtctgagctggg ggttacatttacttatatttcaacagaggggcaataaaacggagggaggcactaaaacac actcatagcgactctattagttgtgcgatgtccgggcaggtactgtacaatcaacatcgt cttacccgcccctcctcagaccaagagtgtctttgtttacgcacgaatggggaaaagacc ttgtccgaagtgtatcagttggtacagtcctagtctttgatcggtcctatcaacaatcgc acgttcatgctgcgtttgcacgatggtcccctctttgagtctcctcaaaggcgccaaggt tcctaaacacatgactcgagtcatgggggaacgttagcagaacctaatgcacgggaggca cggcagggcgttcctcaggcgattagctccctttcaagtaggggtgaagcacgcccagag tctcgtgtgctccgtttgagcacgatgtggagagcgttacaacggtcccaataaacgcct agccactcgttagccatctcctatcaggtatggcgttgaaaatatcccgcgacaacccat cataagtggtattagatggcatttctaacaccattaatggttgaggactgttgtcccgtg ggataatctgaaaatacattattgagtctaccaattattaagggaaaccgtccccgggga agtcgacctgcctggtccgtataacgggtccactgtcccgcgtcttgattagagatcttg agatatgcaggccatataatactcggaccacttagccgcctgcaatctttagtgcttcag ccgcaggcaaataaaagataacttaggtacctaagctacccgtagcgcaaagcaggtata tgtaccgtcattcgattgcccgtccgcaccgagagattacgacaggcaactgtcctgggt gatcttaccgactcagcctagtcccgatcttcacttctttacgccctaattcctcgtggg tgggcggcgcatgatttctgctcagtaaacacaaaggtgcttcaagcccctatcacgtta ggttcgagtcaggttagttcggacagcacgccaatcctcttaatacaggcctgtacgaca tgtttttggaagacccttctggtcgggatgtcgtaattgcaaggggcacctcaggggcag cggggtaagagaaatacgccgtattgtagggcgtataggtttgcacgccctaatcgatac ataatcctccaataacacgaagagtgtttgtcgagcaggcactgtcgatcggatagtggt caagcaggttctttcgcacagcctactcaagaattggtggtacgctgttgtgtgacgtat ccggggaatacaatcctgtatcatgatgacaaatcagatggggcgaataggtgattagca aagctgtatgattcaacagtagtatgtaagtagaattagggcaccgaagtattgcgtcgt tcattcagcccgtagcttaccgccatttatattgataggttaaagcacaagtgaagattt aggctacagcgaagcggttccgctgaccgagtccgggtccatcccttcgcccatggtgag aggcagtcaggcagtgcagcctcagaggatgtcttctagtgttgaaatttctgatcggtc gagatatctccactactgcagcctgggggccattatcagtcgagtgaacttggctcacct cagatcttaacatgaacaatgatcgtaaaaagttgattacgccgtaacctttcgcactta accgttgggaattgcgcagagacatgagaattagtagtctgcgtaaaagagagtaccgca agggcgagctatataattattcactcaaaaccagttcacacaaatagatgtcctcttgca tttgcacatatcaaagatatttttacttaacctctaagcaatggacagtgccccgtctag cgagttgaaaggcgccacgactacgcacgtatgatctatatgtgaacgcaagcacttact acgccacaacagaactcaatgacggtgccgggtctacgcaaaccgcagggatagatccgg acgaacgggaacagcataatgcaccgtcgcgcttctcatggctaggtacttgggcggacg gttccccaccgccgtcatatcgtcaccagctagttgtggttagagcgattgtctctagcg aatcgtcgcttctataatgcccctccgctctacccgctctgggatcgtcttttgtatttg tatctcccggagggttgtgctgctatcgcccacgatcctggtcgtcaagtggttacagat ggagtaatccttcaataacatcaggtacagcaacctctgctagggtgtaacacggggtgt agctgactgccaggatcgctgcctccaaggtcagtgcgatcgcgagaaaggtaatctaac tcggccgggaccactcagaatggaggaaaagcagggcgcttgcatacggcttagtaggaa aattgtataactggcatgtttctgccccccggtttcacggacagttcgagagggcggagt ttgcagatactgactgtcgtggagggcatgtatagctgcaaagtgtatcaacatcgcctt cttttctctaagtgctctagattcaccgcttcttgccggcgagccgctttacttgcaatc tgtcgtccgacatgactttcgcaactacaaagtagcgttcgtctattttgcccaccagca attttgggtaccacattacggaaatctatattgccgtggagataatgaattagacagcac tatacagctcaggtaaattgctgagtttgctggtctgctgagtgacaacggaacctgagt tgcctcgccgcgtgaaacaggggagggagctgtaatattaaatctagtctcgggtatgtc ggccgatagtgcttgtcggcgcagttatcgcgacaaagaaatccatattcggtgacctgc gttctaagttatgtatcgtcagatcctacgaccgggaagaaaggtgcctctataccggta tttattgctggagctgtcgaactacttaggactccgatgctagtgcaccggataaaaagc ttaaaatcaaacatccggagacggacacgcgcatactctatgccccgagtttagttatct aatggtcttggcgtgtgttagcccctacggaagcgcgccggtcttctcggtcttgggcag ttgtgatcgggcgaactataaggaccttcccgtcctcccctcagggagctcgttagtggt aggcaaccagcttcgaacggatggctgccatccggtaggggcgccgattggtaccagctc tcttatatgactacaccggggtaatgacataggagctctgataagaacggccacctataa ctcagacatagcgttagatgtccgcactggaactcaccgacatcgcgagggccgaaacag cgaatgaattagcctccagtttttcagtagcagcccctatcctcgcctctcgctgcagac gtttaaggggccgaccatataggaatgcgcctcctcgttgggcgttgattagccccgtga ggtgacaagatctctccaagtgattagggcggtggtggctaaatatacatttacaatatg tgtgcccctcagcctggagaactctgtctcctcattcgaggcgccctcacgccgtgggaa ctcagctactctagtcgcagtggtcaggagtcctattaaactatgccggaagatactcag agatcagtgctatagacgatccgtagccgtattgctgcgcattgccttcctgatgtagga ccacatcgtcgtccagcacctgacgtcaagtcggtcaaaaatttaattttgacctaaccc tgtatacccttcgagccggtaactcctccgtgatacggttggtctttcctgggtaaccga agatttggccgaggatttgtgtcctccaccacatataaagctgtccaaaccgcctactcg aatgatccagctcgggcttcgacctgttctaggaagtacacataataacctgagggggca agtcttggacaacttatattcttcttctccgcgcgtggaaaatctacctattaattataa atgaaccacgttattcacaactacttgcgggtttctccgcagctccacaggcgacttacg ggacaacgagcaaagtgctgacagcgctcccatagctgtattgaatactacgattcgcac cccttgtatcagtaataacctatgccgttccgagtatccctgacctaggatctattgtcc tacctggagggcacatggccagaagaggatagataatacgcgcaatgcccgagcaatcgg ggagaccgagtgtgtggaaccccaaatcagatctaaacgcgagcgcgagcgcacaatcgc gaacgaaattaccaaacaatcattggcgaacaaacgtcaagggctcgcctgtcacagggt atcaaatccggcgttaaccaataatgcatgagtttaagacagtcgaccccctatatgcca ccatcgagaaacggcagttgatgaccaagtacgtaatggttcttggctatatgcgcgctg agaaagaagactagcattagctgacggctattacggaatggtaagagctcattcgttctc ctcttccgcggagagcaagtaagcatagtagtcgtgggtaattaccccctcgagtgaaga ccgagaagaacgacgacacattgtttcatcctaaaagcagttagatagtgaataatcccc gaggccaaacttgtgaaacaagaattgagggcacaacactctcactctgggagtgcgacg tcaagtccttgacagctagttgtctgtgtagggggcagcgctatgagcctttccgattct cgatctacgccaggcatgtcagacggcctatatttgtcgctgggggagagactactaagg gaagctcactaggacgcttgactatcgtttttggtgagagaaccctaaatcggctgcctt gatcttctcaatcattcctaacgtgtgtctcgccgtcgcagacgctcagcgatgagccat cttacacccctattaactgtacttctggtgtacgcatcatcacaatctcgagatttcgga cgttatgagggcagtactcccggacgccaaccgtgagtacgtccagtatacgggatctca gattgccaaatcaacatcctagcgggaaggatgtgtcgtgcctccctgaaaccgggtgcg gcgagcctgcgtcaaaacacggtcgcagtagctaagttcgtcaaataagcagtttggcat cccctcccaatagggctgacctattgggtcctcgttttttgccctagagaaacgaggttg ccaattcgtcacaggttggcgatgcaacccgacccttgttaatatattagttgcaccgta gaggatttacagaaagctggcgcagcaggtaggataataccgagcatgggcaggtatggt gggcgtataccaacccattatgcaagttgccgtatttagtaaactagcggtaatctaaag aaacatcttacccaatttgaggtcgccgacttgattgtgggaggttaaggtatgatctgt cattactattttggaaagaccctggcaagtgcaccgcgtggactggcaaaagtatcattg ccccgacttagccagccgcgggtctaaacggaacagaaactttatcctggcccgacggga cgtttcccttctggggttggatcacagcgctgcctgataggacggaggatcactcgctat ggtccggacacacgcctcggtatgatacacgtagactagactgccgccccgggggcgagt gagcggtccttcgatttgtgcgaggagggtgcgcaattgtttcctacctgctactaccac gtaacgaagtggctcgttgagcttcggtatatcaccggcatggatgaatacggagcgtat gagcgcgagttccatgactaaacagtgcgtcatcgatccaatgaccggatcccaaaatgt ggcggaggagctaaaccgcatgctcactaatcgatttgggtatcgtcagctgcctccggg ccagcccaggtaaagcgccgtaagacctcagcacagtgagggctgatggtatatcagcga agttgagggttgggttccagtagcgaacgcatatcagtccttaaatggagcacatggggc tatcgacagtccgttaaccgtgagagggaggagtttttacggtgcattgtggggacggcg cgaacctaaggtaagagttatctgatcggtgcggtggtagacccatgaggctaatgaaag tacacaactagattgctggaacccggcgagcgagctaggcatcattcttagcaaaggcgg caacgccaactagcagcacataatccactcccaacctacaaaagtctgcctgttccgtgt gatcggcgacgaggcgtcttaccgacttgcaaagtcgtttccgtgattgccgtcaattca tcgagccttgtccactccttgtggcacggatcactgtcccccagctttatcactttcgat tgacgtcaggtatgaacctaaagcaatggcgggggcaccaacacttactataacttaaaa accaagaccttctctgcctaagcaagtaacctcgcacgaataaatcttagtcacacattg tattatatgcttggtagcgcggcaccaaatatctacgacaaagtaactcatttgaaatta gaagttcgcaaccagagtcgtaagtgcagttcgtgatggcgcgccgctccctacccataa ttctcgatattctatgtcgtggcccatcctcaacaacgatcagtgccccgtcggcggtca catacgccaacactgatgttactcgcaatccacgatggcgaggagccatgctcctcgtgc gagtaaaaactgcccgtagcatgtgtcgagctgtgatgttgcacaagaaacgacttacat aggcactaccactcccgcccgtcctataactacgaaaatacatctaggtcggcaatgggg cgcttgcaggccgattctgccatcgtccaaacgcccacctagtacgacatgtggccgcga gggggcatgacagctccgtcgcattctgaacatcctgtgatagcgacgctagcagtagag cagtaatcaatggagtgatcgctctagtgcgattatgtcgggtttcaacgagtgcctcta cctctattgtgaattggtccggactacacagacactgagtacaaccccgctggtccttgg tacgctccaacgcaagtaaccgccacatgcatttttctgtgagcactagcagctcccttg aagattacgctagctagtgctggggacatgtgatcctttatccctcctaattcacctcct ggactttggaaactagtgtccccatcggcttggataaaagaaaatcgcgttgtcacgtgg acgtgtaattgcgaactgtccctggacatctttccgtgtcctgcgctacgtcgcctcgtc tgtagttataaccgcactccgggcacgagatgaacctgatcaaaagcctcggaacgtaat ggaattacctgctctctctttaaccaagcctaatcaaacactcccactgtttaggcagat agggacgcccactcctttctgattatactcaattccagtagataagttgcccggaaccgg tatgcgtataacacccgggagttcattgtcgagcccgcctcatgcttagccgtgacactt gatacgttatggcagcagcgacggatacttcgttccacccgggtaggtagcggttatcag gaccgccgccttcccgtgtcaggttatcctggaacgcccgccgggaacaccggcaggtat gatgtaggctataacatctggtaacaaccaagagtggtcgccgtaagtcggggggacaac gtatataaattcgacaagtcgatgtcgcgacgccattctggtcacgatatacttcgataa acgatctgtgttagatggtcacctggttgttatcctgtatctctccgccgcatgactcgc gcctgatccttctgtggccaggggacgaatgtacgggttctaatactgtatccaacataa ctgttggtttagtgctcttcgggtcctcgactattgtgcagaacatggacacacgctgcg gaattaaatggaacagcatcaattgcctctctaggagctccgcttaaatcacgggattac aatagcttgggtagtatgcacacgacgcactggattcgctatatgactatcagctgttac ctgcacaggcttgtctcccgtgagcatgaaacttatgcccggccaaaacctcaactcaga tgtgcaaagctggcgagtttcctcatgccccgctactccttgtatcgacggggttcactt gtccggcctccctttcatatgttcgccacttagagcagggagaattgagcggacgattgt tgtaacgggtgcttgcatctcatgcttttaggctctgtgggtaagaggaccgtacatggt ttgtagttttcgctgtgaacaccgaggtaatctcgacggtacgctatcatgagtgtccac gcgagaaggcccccaccggctcgttgtgccgtatttgcctctttctagcgaataacttaa cccggacctagccatgtggtctagcggaaaaataattgtcctcttattacctaaattgct tccaaggtttccgacggttgctgcaattgcgcatagaggttggtattagctatagcccca cggttaaatgttttataaaggcgtacaacatgaacgcgcggagcgttaatgctgccacga tgtgctaagaaccatagcacttcaagacgattactacaggcaaaaaacttatcagggcgc ccgaggtccaggggacccgaatcagtatgtccatgatagcgctgacgttgccagctcgct cgctcaaggtatcttcgggtaaggggtcccttctaaataggtcgtcaccatgcttaacca tctgcctcgtagttatatactctcgtcatttccggacgggaaaaccaagatgcatcaaca cgaacgctaacttaatcatatgtcatctgcttcagacagtctgccccgtaatcgtttagg ttatgctattttcggtttgatacagtacgcattggattcttgtctgtaatcttgttaaat ggaatatggggccacgtagggtccagaccgttgttggtggtactaaagcctgagtagagc agaaatagcaccggctatgtcgaccaagcaagaacagtgcggacactccctccgtgtcaa cacttgcgctacgttttatcgccgttgtgagcaaacctcagtttacaaaggcaattcccg gttgtttacacggtcattcgctcggtattgcctttgttggatcctatatcggatccgacc gttctctccatgatattcggtacgctccctcggttgtagactgcgaatgattggcaacgc gtgatctgactactaggattaagaaccgcgtttagaagtctagggtggacaatagtttgt gttcggaagtagatcccctatgtgatctgatcatagcatgaggcgcgttcatacactcat actgttcgcctgctaatttcggacataagcataccgacgtccgacgtatggacagccatg aaccgtttaggcctggtgagatttgcacaccagacgacccattacagtacaccagccgag taagttaggtacgggtgctacgctcatacatcaccagccacgggggagcggtctgatatt tgacgggaggtccgcgccttgtcttctttcgttaaatatgcaatatcgcttatagaccta ttaaccgcgaccgactggtattattctccccgtataagcaattgaccgttgccgctctcc cttgtgtctttgctgacttgatggcagcctgtggcatgttagtgacttacactactaggc gatgtgggaacgcgcttttccttttttctaagtgttgtaagccacatcgccgtatattta gatacttgcccgacagccaggcgtaggcttgcgcggccccacacctgaactgtctacaac cagcgcccttagtcagtcagaagaacagtaacatatggcatttcacaaacactgagccag ttgagacaaatccactgagtatgggagggtgatacatttatagtatggcatactcgacgg ggtgatatgacatggtttcttgggttcgggtgttctcgttttgaacttggccccgcggcg gcgttacttggggatctaccttataacgggccctcctgggcataaaggtacgacgtaacg aagcaagcccactattcttttgaggggggtccatacttcttcaatttttagatcagagtg ttcaggcacgccaccgttcagataagagccccataaaagactactaagattatgaccctt aaaataaaaggattggagcgaagtcagcacactaaatcatcttctagctctgtacttcgg ccgcgtgcttcgtgcgaattcaaaaaccgagtggggcgcgtcggcgcacttggctcggat tagaatcttgtgagacatcttgtcacttagatcttggggcagagctctagcatgttcgat caaaactcacgtagactcactaatataaaagtctcctcttctgtatcagtgatgtgagcc aattggaagttcacgcgcgtctgctctacttcttcaatgagcacggtgacgcaaaacacc ccggtaatatgtcgatcagagacactctccgtccttggtagtgctttacacttcaatccc tgacacaataactcgctattatagcaaacgcaccggtacatccacatatatatgatacac gctcgacggatggactacccgtagagccgatgtacgtgctaagacttcacatccaagaga cgaccaaaattgtttcacatgagcactaaggtcactgataacggtatggcgaatattatt ggaaatctgtgaagtgttgggtagtgggacttgagacctaatcttatatagccttccgac cggataaattggaagtctggctcgtgatcggatatcgataggagacccagcacgccaccg gacggcgcgtagaaccttgtggaagccacgttaccccagctcgacagcccacaggtcctc tgagcaagagaccataggtaaacaccgacgtcaggaggagctgggtccatacctgaggtc gcctactaacgccaacacgagggttgggaaaactaataggtgtacggtatagacaatccg ctggcatttagagggcctggtatgagtgagggggcacctctgttctcaataactcgctca agccgggatgtacgacgaccgtaaatggtgataccattgcaccacagcttgtcacaggca cgtacgtagggtgtacgatcttgaattgatgcatcctccccttttctttcacgttagaga aggtatcagctgaatgtccgggatggagccacggctacgctcgaactcagccagatccgt aagtccgcatatccctgcaggtcttgggttcgtcctaatcatggataacgtgtagaaggc actcctctgtggccggtagtcacgacccacgtacgcggtgttcaccgttcagccgtcatc cctttgtgccgagttataattggcagtttcctgctgatcccggcgcggtcgagtcgcaac ttcgaccatagggagtctcgttagctgaaccgccagcaaacctggcttaggaatgtccgc cttctgcaacgctggttgacggcacagaggactgtaggcctattcctggccagacggatg gtctgggctagtttacactgggacaagtaaggacaactacacagcgagcacctccactgg taatgggcctagccactgtgccactcattcccgctcaagttggggggtgcggtgagggta tggctatgggatatctgctttacctttgtgactcgctgtggatgtacgaggcgcgcagct ctttctagtttccctgcgctgtacagcagtgactcagaaaccaggaacctccacaagatt agagacgtacacgcgatgccacccgaactatctgttacttttgtcctcatcagccttacg atgatcaggatggccacgaacttaagtagatcaagttcgtcagtgctgtccaggtggtcg ttgcaaacgacgattagtcacggacaagtgccactcaataggcatcgttagattcaatcg tacgttaccatcgcactgtttccggaggtgtgagctctcagcggttcagcatgacgtttt cagacaccaggcccctacggatgagcgccgactgaccacccacgaggcgggctaggcatg acttgatgggcttcgcattggcccacatgccaatctcttaactagtagcatctacgacca ggcccatgggtcagtccctgactctgtgggcacctggcctcaatcaatgcgggccattga ggggctcccctatcttctactttggtagagcatgtgcgaggatggggtccatgaagtcta gcagtagtttataatagattatttaggccctgaccagtgtattgagacgcataatactgt cccttcagggggcacgctcttactcgtctacaattctttatgatcatggacgtcagatgg ttcttagataacctaattatatcttcccacggtcgtgaatgccagtgtcatccggtcatc atgtatgcggcggagaaagcctttgctctcagaagcccgttcctaaaattcgaaccgtat agtatactttactcgcttcgagggcggtttctcagttgagtcctctttagcgtgatcagg tactgtaatagtggtctttaacgtagcacctcgaatctcgaagggtctgcgcttcagtcc tttcggtcgcacagagcatctcacacttcgctcatggcctacgggaatgcgcccaagtga ggtctgttgggtggccggataaggatatgcctatccttgatacccttgactatgcccagt gtgcacggaaagcgcccggggaacgccgataatgcacaataggcgcaacaactatgggca tactgccaatagctgaagcgtcatcatagatgatcagtgattgtttgctctgctcaggtt ccatatgagctcacgcgggatctctcagacctattggcttggaataatggcgctacataa aggatagctgacgtgcactccgagcgtgtcgaaggctgcaacttcaccgtggtaacgact ccctcaacagtgcccaataagtgatccatttacgttctagtgctaagcaactcatggccc tagtaagcagctagacacacagatcaggcaacttaaatggaacctacttcgtcactttca tgggacacgaagctcaaggagcggtgaccagcaaacagccgcagcttcacaattgaattt cgaccgaccggatagaggacaagcatactttgtcaacaactaatttgctcgcgaggaaca ttgttggttacctaggcttaagtcgggatagacgttatcggctctaatcatctttccgac aaatacttgacgtgtacacggattctgcatttagggatcactgtggtgtgctacaagtac accgtctacgatggacccatagatgaccctcgaggcttctcactaagtacagctctcggg ccggttgttttgccgggcggttacccttctctacgagctcgtccatgagtcttaggtcgt gacaaattcggtctcattctagcagtacgaatcgacggtccgcgctaggcgaatctttta gctatattcaaatactgggaaaagacaaacagcaagataagagcgttcaacggagccgaa aaatgtacagagtgctacgaaccgaaagccacgagcgacttgactgtactatgggcgcaa accgttgctctatacacggagtcatcttagtctagttaaatcttttcagggttagaattc gacatggatcgactcattagggaaaactactctacacctctattttcttgtcccatccag gtaaatgggcctcgcgtggtgtaggcggcgcattctcctagaggcgactggacggaaagt agggggtctcgagtactggatttcgatgcctgttgaacgaggccatcaaagcgaagttgg gggttgatgagatgcgcagccaatccggcgacaaccgtggtctcccatatcgtctacgac gatggcattttccatcatgtagactgaaaataatgaagaaatccgtgacacagtggactc agcgttaaagtcttttctcgcgtgctcggctatcccctttaccgtcccgggggccacccc gggtacagtgctttacccggctcgcctcaacaaaacacacctccccataaatggtgagcg gggacttagaagggcacgttagctacagaatgggtcaatacaatcgcctgcctatctcaa tggtgctccccgagccgcatgctagagtgcgaagtcgactactagtagtacacacgtctc cggttgaaagagctgggaggttgagtgcggtgtcgtagaagaggtagggccagcatgtgc tactagtcaccctacaaagcaattcaaacaaacaactccgccacacccggtcacagagca ttgttgtataccatgggtcatatcgctagcaacgatttaatcatccctaaccctacacta ccatgacctagcgatgggttcccggacgaaacagattgacgaaacgtgtagcggactcgc agtactctttctctggtctagaagcaggcgcggctgtagaattaatctgtcgttgattca gctgataatggacctgttggcttcattacttcacgagccagtgacgctccagaaggcagc atgttcgagaacctgccaccgtcatctaacccaaaggctggctgttaaatacatctttgc ggcatcccaggacaaggtaatgaatgctcagcagtctctcacaacctccgcaccgacgcg tttctatgcaagtactagtacctgaacgtatatgttcctaaaagcctcatcccgttggag tctgtcagacgtgaactagttaagcgcaaatgtggacaccgtgagttgcctgactggtaa agtagcagttaagaaaaacttaatgtatatcgtcccagactgaatgggtccacacttagg tggcgtaccgtcttcgtgtttccacacgggacgctcgacactcaccaggaggagcggccg gtagacagaggacggggtattgaacctgtacttgcattatagaaattagaatccggaata ggcgcatgaagtctgccctctgtccgctcacgaagccccacgtcttacacgcatgtcgaa tatgagtcagatgtaccagaggcctgttcccaccacagtaactggccatcagtctccagt aggaggagcgagtgacagacgacgacaagaccggtttcctcccccttcctgtcccagccc atagacctgaagggacacaagtcaccgttggcacaactagatatcactacatactatcgc tccccacccccttcggaccacaacgagaaatggtcccccttctaagctactcctcgacgg ccccacgggtcaccggccttttgaccagcttccacaggagcacgtagcagtatattctcg tgtttaaaggtcgcggctgtggtggggctggttagtgcgtattgtcgatctacatcggga cttagaataactctacctcaatcgatccggatgtcgcaagatatctcaccgtctcgtctt cctttcgagggtgcatggcattggtaaccgctggtcttcagcacgtttgaaatttgtcga gtgcctaagagatagtgctactgaccggcgtaatcacggtctacaaacgtagatattgac catttcgtaggcgagtttgtgggctaactcccactcgcgttggtaatttggaggtcgccc gcaagaacccgaaccagcgctggagcgtggctaacactcgtcgaggttttaggcattgct cggtgtgtcccaaatttctattaccttatggcgcttctagggaggggaagtgctgcgagt aactgctaacatgtcggcctgaatagccctgcaaccggtccctgtgcgttcgggtgtcac atcgcccctaaatgaacttggaccgcctagtacaacacccggcgacaatacacgcatcat cggaaacaactgtatagcttcatacagttcagacttagaacattaattcttcctcgagag ggacccgtacgccctacctccgtcaagccccaagctgaaaggccacaataccgcgttcac agtctcagggagggaaaatcacatcaagtatcgcgagtggtgttaagtttaccgtcaggg tatactcctaactagacgtagctactcgactattgcacgcaccatgagacatgttagccg cgtgtttgcaatagattccccattcagagtatgatgaggatagttcgtgcactccatttg acctctacgcattgcaaccggtccacgtcattcgacggatgtccagataaggggtccaga tgcgtacacgactgtgttctctctacgcagtgtccccgctgaagttagtagggcctccgg acccacggatctactaccccgagtcgggctatcaagggaatgggacctagccggggcatg tacggtaccgatagatctgtgtatgttcggcttttgaaagtaggaccttgcggggttaga aggtagcgttatgggtccatgccgcgattcgctcgaaccagtaccagtgcttaattgtcc gcgtcgcaggcgcacgtgcaggattaaacgtctcttacaactataactgacgcctacgga tttgaagaccacaaccaaccgcgagaatgcctagtgttggcgaagacctatctccttgat gcgcgcagtgcatcttctaaccgtggcaagatcatgaacctgtgagaactttcttttaga tatcagacccccgtggcacccagaaactgccgctgttttgggcaagcgccgttgatacaa tacaggtgcatggacattcatccggacgagtagttctttaggcgggcccaccctaccacc gggcgcttacatccatccttcgctctcaatatcaattttatgtccagaggctccaaaatt tgcaaggtgactaagactggtagatctattacttcaccgttggggggacgtcctccacac caactaagtcattttgatatgaacattgaatcgatgctataaacggacatattagaaatt ttcagaggttttacgtgagtactccaaaggctccacactgaaagttattgctcagcgaca atgtctacgtcgacggactaaacgcagccgccttccataggacgattaccgaccgaacgt agcgagtacgggtctaaccttcggtcttgcaatcaagtagacgtcgtttctaagagtttg tagcagagaaacgactgagagtgtatacatattggcacatcctggagtaccaattcatgg aagcggaacgtccgaccgtaccgtccacccggatgaagttcaccatactatcggcgggct gtcccaccaaccctacgccttacgagaacgcacggcggtgtggacaccctctgattgtcc tgtagccgagcctaggaccgtctcgtgttggacagatcgcattgtatgcaaactcgtgat ccaatgaggtcccttggctatacagacaccgcgaggactctagggtcctggggttaaatg attcgtcttttcacttagatcgtgggccctccgtgtttatgttgggccaggctactagta tattcgatacatccgctggcctttgataacgtttgttaaatgttatctcggaggcttttt ttgaatcatacagctatgtcagacgagagttcctcagtgtcacccttcgttatcaagaaa tggtgaatttaatgttcgtagttttcagttattcagcagtctgtacccaaaacgtcaagt tcgttatcttcagcacaaacgggtcgggccctgaaaaacaaatcacgaaactgctacact ctttattttggcctagattattagctattgtttggcttttacttacggactcgtaggacg atgtacgtgcacttggtgcacatcatccttaattgggccggacggttaatcttatcaact ttgcgttcggctaaagattgcgacccttgtatctgttttgtgtcatctcgcttggtgcta gggtgatgtctctactctggagtgtatacgacgcaagcccttgcaggctaactacggact gctgcacactctaattcgaccgaactcgagtacgactgcgattataaatgcaacaagtcc ggggaagcctccgacaccaagacgcgacgctcgcaccatctttcccgtcgctatcccagc tcggtaagtctcgctcccgcccggcttaccaaaacaacacactcctcggcaatcgaatga gcgtcttgtgccgataatactttccaaacccgagttttttcctaattgtttggggaacgc cagtgaggtcagaacgcctacatcagaaggcggacgagagcctcacgattactcgacctg cttttgtggcacattgagtgccccctggagtcagaacactccgtggctggtacaaaaatg gtgttagttcttcgacaattagatcatatcaggcggttttagccactattaccgaccatc cacgcacacctagcaactcaactttttatcacgctggcaagtaagtggaaggccctgctg ccgttaaacggcaggctcatcgcggtatccacatcaattgcggtgaccggggtctcgctt tagaaaagccgatatggtccaggacgtcgtggtgcgcaggaacccttagaatttagtatg cggcgcgcgacgcgctgagagtcatcggcgccttagacggagttactccgcgaagcgctc aaagcctaccaagttctatactggtaagggaggataaggccacgtttcaaatatatacgt attgcattcgcgtatccatcccgagaaagaagtcgcggccgcaccgatggctgggcagct atagggattcaggccaaacgattgcgggacgatcctaaattgttgcccctgttaagagtt tcataaccattagattccagtctactacttctgaggacttctctgatcatattaatgttc atgggacactgctttcaaccgtatcaatgtgcgaaacgtaccttatgatccctttcgctg acacacacgatcctgtggttcagggtgaaaacactcaaatccaataacgcggggtaacag gccggttctcaagttggagaagcatacagcgagtcccgtatagtccgagcagaggcttca tcacgagtacaagcttttcttcagtgccgaaccataggaaggtctacataaaccggccgc tctaatcttggcgccgcgagagcaaagagagggctcacccacatcaaattcgactgcggg tcggcctccggtccccgtagtagagtacacgcgatacgtctttactctacctagaagact tgtcttcaagcacaccagcttactccggatcagcgttacataaagcccataaattagcat tgggcgagccgaccaggggtgaatattacacgaccggcgtcggtgcacgacccaggcatc agtcatccttggaggtttcagccacgcgggtgtgcatgaggcacccgctgcagtcacctc tgacggtctg >CEESC13F ttgcgttcggctaaagattgcgacccttgtatctgttttgtgtcatctcgcttggtgcta gggtgatgtctctactctggagtgtatacgacgcaagcccttgcaggctaactacggact gctgcacactctaattcgaccgaactcgagtacgactgcgattataaatgcaacaagtcc ggggaagcctccgacaccaagacgcgacgctcgcaccatctttcccgtcgctatcccagc tcggtaagtctcgctcccgcccggcttaccaaaacaacacactcctcggcaatcgaatga gcgtcttgtgccgataatactttccaaacccgagttttttcctaattgtttggggaacgc cagtgaggtcagaacgcctacatcagaaggcggacgagagcctcacgattactcgacctg cttttgtggcacattgagtgccccctggagtcagaacactccgtggctggtacaaaaatg gtgttagttcttcgacaattagatcatatcaggcggttttagccactattaccgaccatc cacgcacacctagcaactcaactttttatcacgctggcaagtaagtggaaggccctgctg ccgttaaacggcaggctcatcgcggtatccacatcaattgcggtgaccggggtctcgctt tagaaaagccgatatggtccaggacgtcgtggtgcgcaggaacccttagaatttagtatg cggcgcgcgacgcgctgagagtcatcggcgccttagacggagttactccgcgaagcgctc dbfa000755000766000024 013605523026 16624 5ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/t/data4.fa100644000766000024 14022513605523026 17503 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/t/data/dbfa>AW057442 tgctgcatctaatggcaccactgatatggagcaggagcattatctgcagt tggagttgaaccagtgtcttcagttgatccttggaccacaaagacggcga gaa >AW057443 tgagaaggatgaaccgtcagacaactgtcttttctcctcatcgtctcgat ccttgccgtttttgtccaccatggatttgctgctgctgaagaagagaaga atacagcttcagtcgtcagccctgctccggactctgaagcagcccaacct gctggaaacggaaccgaaacaccaaaagatgaggtgaaggatgaggcacc aaaagaaggtagtgaaactgaagcttcaccagaagccaagacaaaaggat ctatggtattccatgctcttggagccatttccacaagtggttctggccgg cattatgtgaagaagtctgccgaa >AW057444 aggcgatgtctacaagaagccagtgcagttttacacgaatgtcacagtgc cagtgttcgctccagtggtatcacctctcgaggtctacacaaacaccacc aaggcaactgcttttgctccagctcagaacatcaaagtggctgctattct tgaagaagatgctgatgcaattcatgttaagtcaatgagaatcgctggat tcattgcacaatccatcctatttttgtttgtctacacaattgtcactatg gatgttgagaattccgaagaatctgaagcagaggttcccgtcttcaagct ctacagagcccgtgagattcagacatgcccactgccagctcgacaaaatg t >AW057446 tacagtgcagcttctggtgctcttcttcaagttgccttcaagaacttgac tgctcagaatagaatccacatgtatcagattctcttggtctcttcgttcc tcttctcgaaggccctcgtatacggattgttcggatctgctgaacc >AW057447 atcaatcaactacggatgatatggtgtgtacttatccgtcgagcttgtat ccagccttggccttacggcagagaacaatggtgttatcagcgcagttctt ggcagttgcagaatgctcctttccttggtatccataaccatacttctcgt tacggacatccaccttggcaagaatctgctttcctccagcaacgttaaca actccaacggagatatctcccttgtggttgttgacatgaactggatgcca agctttgtctccaaactgagcagcctccgggaatgggatccatccatagt caaaaccacgaacactgtccggaagatagatgagaagctggatagatcca acattcttgcaatactcgttgttgaaccacgagaagttggcagccacctt tcctccttcattccagacacgacccataacaggttcaccttgcatgtacc agagagcaacatattggtctggattgactcctggaagggtgtccagagtc ttgtcaagagccttgacaagttggcgagtcggccatggatctccctcacg aatgtccatccattcgtctctctcgacacgtggcactggcttntggtgtg gtggagcagctccgcacacagcacatgggcaatgtggtgtcctccgatca gtttggcgtgatgatgtacatctcgttgagctgtgcccgagctgcttatc acctcccattgacgagaccatgcttctcagtctgtgt >AW057448 aattccgaattccttcgtctcaactgtccttctcctctctgttacaattg ctttggtgtctggatatccatcccaactccaaacaacttgtgtcacaaaa gctaaaagttgcaccatgttctttttaaatggagtatattgcaccgagtg cacgtattctggaactcttgaactgaaaattggctcaacatgtacttttt ccatttacgagaaaaaagtggcgagccagccaaatgaaaattcacaaaat gaagtagctcaatgcaaacagtcatcatgctactcaaatcaattttgtac cagttgactgtgcggctgcttttggaaatgaatatat >AW057449 ataacctctcccaacaacacctcaagatgaatgccatctacactgccgtc cttgttgcttcaactctcgcctacactgcaatggcttggattggactcag cattgaagccgccaacgaggatatgatctgaagtggcgccc >AW057450 tcgatcaaatcaattcgaaaaaatcatgccgtcggaaaaggaggatgatg taatgaaaaatgtaactttcgctgaaggcaaaaaatttggtgactggaaa atcggctaaacgatcgatgaaggaggatttgggaaggtttacattgcaac atcaatcagcgatccaaagaaagtggctgctttgaaagccgaatcaaatg aaatcgaaggaggatctgcaatcaaattggaggcaatgatcctaaacaaa ctgaatgccaatggacccgttccccacattccaatcgtccacttatgcgc aaaaacgaagctctactgctacatggtgatgacgttgttggggagaaatc tacgaaaactgaaatccacaaatctcgtagtcaacaatggattctcccgt ggaacgtggagccgaatcggaattcaatgcctgtatgcattgaaatatgt gcatgacaatggatttattcatcgagatgtgaagccacaaaacttcttgc taggaaatgagacggatagtgaaagagcaagaattgttcatatcttggac tttggtcttgcgagacctttcgctgttttcatgcccgagagaataagtgg gatcgacgtagagctcgtggaactgcagaagttcgtggaactctccgttc acgtctccgatgtcatctccgaaggacaaggacgggtgacatgatggtcc tgcttatgtcatcatgagctcaacgtggaaagctctccatggcaacccga c >AW057451 aaaaaattaagtagttcacggacaccatctccaacacaccactggccagt tctatcaatcagttttcggttcgtttaaactcgaagaacagtcttgacca acacgagatgtattcacttttgacgctcctcttcgtcctcttcttctctg gaagcactctgctcgttcaatgtggtggaaagaaaaagggagcaacttct gccgaaggaaaatcttcgacgatgggcccggctcctggaggagctcctgc tgctgcttccgctcaaggagaacctgaagagaaggagtaatg >AW057452 ttttctcgtggtgatcccaagcttctagtatatgacgttcaggctctggc tcctcggtcatttgctactgttccacgtgttattgataaaatccataagg cagtcatgaagcaagttcaggataaaccactcaagaaaatgattttgaat gcggcaatagcctataaactataccattataagatgacaggcaaagctac tcgtaaaacatgggtagataagtatgttttgcataaaatccagatgcttc tcggtcctaacatcagacaattgattcttggagctgcaaaatcagacgtt tccgcaatgaggtttgctcgtggggcttttggagttgaggttctggaagg atacggacaaactgaaacatctggaccaacaacacttcaattggttggag atacacgtattggatgtgttggaccaccgatggcttgtgcgatgataaaa ttaattgatgttcctgaacttggatattctgttgttaaaaatggtggaga agtacttgtgaagggtcataacgtcacttcaggatattacaagaatccag aagcaactgcatcatctttcactgaagatggatacatgacaactggagat attggaagattcacttgctgaggatctcttcacattattgatcgacgcan acacgttntcaacatgccacaaggacagttngtggctcagatctcacaga atccctctacactttctcgagtttgttcaacagattacgttcatggcgat actgacaaccgtggcttgtagcaatcgttggtccagatccagag >AW057454 cgtttatactgattgcctttgttcttcagtttggagaaggatcaattgca gttcaagaagttaaggatggcgaaaaagtgcaaattgaacttttcaaagg agccaaggcaatccagagatccgttgacgctggcgaacagattttccatt tcgaaggagaaaacaaaggagtgtttgtggatgctaatggaaaagctatt gactcgtcaaattatgaagagaataacggacatttggtcatcaaaaagct tacaaaggctgttgtttgatcatattctgaatattcaacgaaaattatca aaacgaaaacggatcatggattttctggagttgccgcgccagttctcaaa ctttctcttc >AW057455 agttactcaagatgctgaaagttcttttagcaatcggtctactgtgtttg atagatgtttcagctcagttccgtgctgaatgtgagcatccgcttcattt tggagttcagcaatgcaccaacacttcagttgtcagatatcacttcgaca tggaaagcaagaaatgccttgctttcaaatacactggatgtggaggaaac gagaacaatttcaaggattactcagcttgctcaaacttctgtattccaat ggactatttcacatgcccaggtggcagtgatagtgtcgctggaaaggaac gaaagagccactgtggaggaatggaacaattgaagtgcgatggcccgaat actttctgcttgaatggcccattcactggaatctgttgcgacacgagaat cagagacaaaatcgatgccgactacgccaaggagtgtggaccaggaaagc tgaagcatcaaattgatattggaggtgtcaagatcccaatgttcggaaaa acttgcgattctacattctgcccagcctatacaaagtgtcatcaaggaaa ttatcntgcttactgttgtgcttag >AW057456 tttgagatgtctaaatacgcaattctctgccttgttctggtcggcaccgt tgcctctttggactttatcggtcgtacacaatctgctgctataaagggaa gattagtatgcgagggaaagccagcttcaggagttaaagtcaagttgatg gagtccgataacagttttggacctggattccttgacagcgatgataagat ggcatcaggaaaggctgactcgcatggagaattcaatttgagtggatcta ccaaggaaatcaccggaatcgagccctatttggtagtggttcatgattgc aaggacggaattacaccatgccaacgcgtgttccgtgtcaacgttccaaa atcgtacaccaacagcggaagctctgccaagaaaacctatgatgctggag tcatcgagcttgccggaaagtatccaggagagaccagaagttgcctcaac >AW057457 tttctcggaacaactccaagcgaaaaaaattgttgacaagtcggcgtaca tgggtgctggtggctatggatccggatacatgggatccaacgcctcatcg tcgggatatgcccgcgaagattatgcacaaggaggaaatggaggcggaca acaacaaaaccagggaaacggaggaaacaccaacccaggaggacaggtct tcaaggcccgtaccgatcaatcgtgctaccttgggccataagtagctgct cgaataatgtgaagactcagccag >AW057459 aattccttggaaggttctagctatgaacgtcaccagtgtcacttcagagg atggtgttaaagaattcgaaaagattgttgtggaacctgaagatatcgaa tatgttgagattccggccgatgccaaaaacgttgacttgacgcgtcaccg tatcaaagaaatcggtgattattcgtggctcactcacgtcgaacacttct cgtttcgttggaatctgatcaaaaagattgaaaatctggattgtttgaca acgttgactcatctcgagttttacgataatcaaattacaaaagttgaaaa cttggatagcctcgtcaatttggagtcactcgacctgtcattcaatcgta tcaccaaaattgaaaatttggagaagttgacaaaactgaagactctcttt tttgttcataacaaaatcactaaaatcgagggtttggatacgttgactga gctggaatatctcgaattgggtgacaatagaattgcgaaaatcgagaatc tcgacaacaatctgaaactcgatagattgttccttggcgctaatcagatt cgtcttattgaaaatgttgatcatttgaagaagctcacagttctcagtct ttcagccaatgcgattactgtagttgataacatttcgggacttcacaact tgaaagagatttatctggctcanaatggcatcaagtacgtttgtggaatc gatgagcatcttnctcttgaaattctggatttcaatcagaatcgtcttga gaggtcgagatatccattcattgagacactacagacttttggcaagagga aatagnggattactgagcattatgg >AW057460 gcttcttgacatgataaccaagacggaagatcacttcgctcagagaattg tgtccagtgtcgcggtaaaccgaagcttcacgtgaaaacttcagaaattc attttcgtgaagtgggacaaccattgcaacttcgccaatcaatgcttgaa cttcatttgccgtttcgttgaaagcatagatggctttacgatacgtttcg ccttttgctttcacctctttcgtggtttttgaagacttcacatggtgacg agaatcatcgattgcatggaccaagttattcaaattttctacgttcgaat tcaatatctcactttcctctccagtgtacaaacgcatctttntaatcaga tgagaacgagttcttttctgcacctggatctccttttgagcaattcttcc acaggcgctactgtatttttccagcatcttagaatgttgcccatcaaact gatattttctcacataattcaaccatcctccaagaacttcccatggattc tcgccgggagcacacaaaactgctttcttgccattgccttggcatgacat gacttgaagtcgatacattcacattcgtgatta >AW057461 tcatacacatcctcctatggcttttctttaaattcaccggctactttctt tcctccgaagaaaatggcaatcgcgggcaaggcggtgatggcgagaacga aggaagcacttatcacatttg >AW057462 ttttttcactaacaaactttttattctttacatttctacaatacgagaac gcatttgtggatgctatttaaatgttctggctagatttctcgtgtggaaa ctgaataaacttaaagcttagacaacatctgcttcgcatcatcggcaaac tcttgttcagcgtcggaatacggttcgatgtcacacacttcggtaaggta cttgcgagcgttgttcttgtctttaattgctacgtagcatttggacacat aaaatgtgtnctccatccattttggattgaccttataagcagcaaggaaa tcctcgttggcctcttcataggaatgtgatggtggttgctgatagaacgt ggcagcaagcttcttctcaagccatgtcagagatgcaaccgagtacttgt aacggccacgcaaatggaggagagccgtatcctttggctccttagcaaga gctttgtcgagcaattccttgaacttcttactgcattccaacttttcttt cgttgccatatattcggttgcttgtccagtcaacacagcattccacttca gcgcctcgaaatggtttggatccttctggacagcttcttcagcatacttg agtccttcngtgacacttgccttacgttgagctnttggaacacatgcaga cttctcatgaattacttgagcaagtctncacaagacttccaccgaacgat cacatgtcgagacgcgctttgagcaaatcatatccttgatcgcgntcttg ggtccgaagtcttgtcattcatcaaactcat >AW057463 ttttactcaaaactatctatccaagttaatcagtagtgttagttctagtt aagttattaaggcgcacggtctgtctccttgcttcttctctttgtatccc ctttctcctttttcaaaacttcactttcatcaataattggttctttagaa tacagttttccaatttccacgtactctcttctcttccgatccttgtcaaa ctttttcttcgggagctcatcttctggaactactttcacatttttcgatg gaaccaaacgggaacgagttggcttttccaccaaaagattagcgtactcc gaactgtatttccccttctttttcttttcaagaggaacattttctcgttg agtatcatcgtcctccaaactttgttgagtagtcatggactgggtccgag agaattcaacggtaggcatggaacctttgctcttgtcgtcgtttgccttt ggtgcctttcccttttgaa >AW057464 tacaaaccctgcatgcctctgcaagattcttgctgatgaactggtactcc acacgatcctggtaagctgttccgtgtaatgtatccataactttactgag ataatacgtgtgtctgcgccccgcgaacatcgaggaccgctaacttctac ggcatttctctcttctccttgagcacctcatgcttgtcagagagctcacg ccaaggtataccgctagggactgattagcatctacacgcacttagtacca gctctcatagacatacttcggtcccatatcgctgttacgaccgtaggcga tcggatggaagcctaccgctctcttctatgcagctggttctgctagtgtc tccacctcattcttcgttttcacgcacttg >AW057465 tcatgtctcctgcagcaagatcctcgtccactgaatccttcttctctcct ttggttgcttctttctttcctccacagaagattgccacggccggtaagac ggtaacagcaagggca >AW057466 cgcggaccgaccgactatcagacgggtttgctaccactgtatgttcatga agattacgaatatgggattctgcttgaataagtgtttggtcggctttttg aacaattgcattgaatgcatgtgtggcgcctaagtattgtgccat >AW057467 gtggaaaaatccccttcatttctaaaaccgttacaatagtatctgtttaa cacatcaagtataggattcattgcccgacaagctccagtttctaataatg ttcgatgcacgaggccaagccattgagcaccggcagctgatactccgact ccattgacatctggttgaccggcatgaaggctccgtagctgttctgagca actggggcgatgcgctcgacaacaattggagcagaggcttgaactggagt ttgagcattcttaataagccctctgaatcttgaagtaacagctggcgagg ctggaacttgagtgatgtcctcac >AW057469 tttgtacgcaccggatgaagcacatagaatactttactgttcgccacatg cgtgatacagtatcacgcagtaccgacggatctgcgtactgttcgctatg cgccga >AW057470 ttttggtaaaaactaaaagatttattactgaaatccatgggggtacattt atatcaggccgcttgcacctctcattgcctcgaagtaggcgatccaccac atggagttgcttcaagacacacattcacctgactcttcttgttcttctta cgagactgtgaacggcaactcgacgatccagcagactttgcacttttctt cgaagaacgacttgaacgacgggatgctcctctaatagatatcttctctt tatgctcctcttcgccaccgcaaatgtagttggacaaccagcatccacct ttgttagtgttgcgacgtttgtattcctcgacaaatcccatcagatttcc aattggagcagcggtcagggcgtctttctcaatgttaactgcagtgatcg tgtgcagctcttcaattccaacatggtgatccgattgtttctccacacct ggtacttcctggcattgtggctacaacatgtgaacgctattactgacatg tggtcttgactggaagtgaaaatcgcttctgatgctgaaacctgctgaga ctttgatcgat >AW057471 taacctgctccttcttcttgaaaaccaacgtagcacgacgagtacggtcc ggagagccaaacagtgagcgcttcttcttcaaatggtccggaatcttttt gcccatcagctcaatcgtgtactttccacttggcggcttctttcctccaa attcttgaacctcataaaggtacccaatgacttgaccaatagtgtacgga tcgaaaatgttctcctcgctttttgtctgcaccggatgaagcacatagaa tcctttactgtttgccacaatcttgctccagtctcccgtaattccgtcgg atacggtgtactggtcgctttgagccgaaagtcgttcatttgcgttgggg aggatagggtccgcatgcatatattttccaccttttccaggcttgtatgg atagttcttgggctg >AW057472 ttcactccggaaatgatttattggataaagggtggctagtgttatttctt tgagcttttactcctcttcaaagtgcgatgatcactgattgtcgatccat tatccttttctgcagttgccttttccattattttagtatccagaaacatt ccctgacaatcatcctctctttttgtctcttcaaatgattttacatctcc gggtcccagaataagtcccatcagaataccacgatttctacggcggaaat ttccaaaagtgtcttctggtccatcttcaaatgatttgtgctttggatac gagaagaaaacacggaaccagttgattccattctttggcttttcggcttc agatttccattcattgaattttgcagtccatgtttttggaagatattggg gtgacaccaagaactctttcatcattgctgctttcagcttatccctcttc ttcataacagcctttgtctcctcatcaaccgcacaattattcttttgcaa ataatcaataatcttatccaatgcagaatccangacagccaatacatttg aatccagcttctgctgtcctgagaagaaacttgcangtacttcgttctcc tctggtgtcagacatatctcgattgacgcgtgctatttgctactataaga gacgtggcgcaanctcagatctcctatcgtcacatctgacgcatctcctc aactcatcatcatccagctatcaaatgctcagcttgatctcagccagtgc gtagtggaaagttccttgatacagaactgatggatcgagtagacacatgt catgggaccatctctcgtttattttatcgtctgtttcttggc >AW057473 aagtattgaggatcgcttgcttggcttgtgcaactgcgttcatgtcgagg atctcttaccatggaacacgtccaacagccagcttaacttgcatgtatag ccagacctccgcgtgattttttcttgcaagctgctgactcttatggcaag cgattggcgcatagcgaacaggtccacaggatccggcggcagctattggt ttgcggatcactccatagctgcgcgtgaacttgcgaccatggcgagagcg cgatataaacttacgaagtcattca >AW057474 gatttggtgttaaccattgccttttcaaggagctcgtagatggtcgtgta ctccggagcatcgaagaactttcctttgtcaagaatcgggaaaacttcag tgaattcacgcggacaacctccaaacagacacctaagacgagtcgtttta cattccttcttgaacactcccacatcatcactttcagtcagattacgcca gggaagacgtccacatgtcatttcaacgaccatgtagagccaactttcaa tgtcatccttgcgacactgctcacgttgaatatggcaagccaatggagca tacttcaccgttccacggaatccagcacgagcacgtggattgcgaagagt tccgtcttcacgggcaaacttgcgtgccattccgaaatc >AW057475 ttcataattatttattaaacatttaataagagctacaatgtttcccgtat ccaagaaacttctcgaaccagttgtagtcttcgcctttaacagtatccat ttcgaaaactccaatgtcaaaaatatgaacatccggtcgacatggattga cataccattccggaatgggaatctttacaagattcttggtacatctctcc ccattcatcacttctgttccgcaaaaattattgatccatagatacggttt gaatccaaaattcagcaatggaatgtcgactagagtctgaaaagaaatta tgtaatttctgtgttttcccacgctgtaactattcgaaattaatgggaat ttgtagaaataaaatttggtaccacatcattcccgttagtcgccatctcg aatctcccatcgactcctgttttttgagcggaccacatttggtcgagagg tttgggctcccaaatctgaaccatctccgcaacatatggtcttnctctgc acattagctggccttttacgtgaacagaagcacacgctgcataacagtag tttggagccagaagacacagaagaattagaaagagtgtcttcat >AW057476 ctcatccttctcccaatcgtactttgaactcggtgttatcttctcgttct ccatcacttgccagaaacatttgaagatcatgtgataatccggtctttga tagtaatcgagagaggccaaatgaggcatcactttatccatacaggctgg catatttgacatgacaaccttcgccggcaagttcagcttcatttgctcca cacgttcacgttgagaatcggtttgccatggaagagcttttccaccgttg agctcaatgatgacatatagcagggaccatacatcgtcaacccg >AW057478 aattttaaaattaaaaagtcgtttttttacttgaaaagcaacaattgaag aacaacaatatcaacaaacaaacaacttggagaataaattatcatgactt taagaaaagtcttttcggaggaaggtcatctcccattccttcagtttcac gtactccggctccgtaggtgataatcgcgataaatccctgattatcaggt gctgttctaataaaatctggtgtagttttcaggcgatcgttcatctcaac aagagttccacgaacacctgctctcacgacatgctctgaaccatctttgc atttgaatgtacacaattttgagtcgggttgtagaataagagctcctttt tttccttttccagaaacttgatttgttgaacgatctgcgccgtgcttctt cgtcgttccaaaatccactt >AW057479 tttctccttacagaaacacaccatactcttctttcaccttattcatataa tcgggctcagtttcaacagacaatcacttcttccatataccaactccatc cattccgctagtgacgacaatgagcaaggtcttgttgccacccaatgaag catcagaataatcatctctagaagacttgcattgtgtgatcgtgatgaaa ccagcggctgacgcaggtggtgcatgctcgaattgatatgacgatacatc ggtgaagtccttcagaccaatctcagaaggcattccttcgattttctcaa gaacatgaaactcttgtaaagtgactccatgctcgcgcgctgcacgaaca gtgtctaacaacaccgagtgagcttgaatgatagcagtctgcctctatca ccagtgcagttcatgcgacttcactgcttg >AW057480 tcttcagatctcgtgatggattgaactggcattggatgaacgcaagaggt agctttccacagatttggcagtttttccaaaatcaggattctgatcgaaa tccaccgggattggttggaatttgtccaactgatgaaggcattcaattga atcaatgatcttcgcattcgaaaaattgctgcactccaatccaatcactt tgctatccggaatcccgagctcggcttgaaggcgctctacgtaggcggtg gtgtcgaatgttggatgagctgattttgcagtgtgctctgacgccaaatc ctgagcaacgttcttgagaacatcagccattgtgacgttggtctccgagg caagggcagtgtactccgagaagcttgcaacattcattgtaacatttgct tctgatggtggcattgagtagacagagtagtcaagaacgttcatcaagtc tttgcaccgctcgaacttctcattcatggtcactggngtctcggattttg gagcagagtaaacggacaaattgaattgattgatatcttgacatggctgg aacccttcgctagctgacacatttgtttctgattttgggatggntagatc gaagaatcagcttgatcattctgcgtctcattgacagcgatatttctttt tgcagcacctcattgatggagcaattgtctccgatgctgagcgagtacac gagacattccatctgtccatt >AW057482 tctagtccaccactgattttctctgtccgcctgcgcctcgtattctctcc tctctctggcattcgttggctccagcagaactacagtttcatcgctgaaa actgctcttggggtgcagaagatttcagagacaaagcttggctcatcgca gttggctttgtgttgttgctcggcgagcacttgctccttaggcttcagga cgagctccttctcgagtttctggcgaagagaccggacggagacatgggga agtggtatttgctcgatggcagtgggcttcacctcgtcgacttcatcatc ctcatcatagtcataagatgctgacgatgtagacgacatctcttcaaaga ttccagaatcctccaccgccgtcatcatcgcgacattaaggcttccatca tcgcggngctctgtagacggcaagcactttccggcggctctcataatcgc gcacgtccacttgtgaacatcattatggccgacaaagaggtgaacattgc cacgttgcatgcggattttgatgtgacagcggaaatgacgctgctttttc gacgatcagagtgcacagacattcctggcgatcgcaatcatcaccaatcg tgtcgtcatctcgtaatgacatcanacgcgtgacgtacgtatcatcgctc cttggctcgagat >AW057483 tttacctccaaactttattaaataaaccgaatgaattacgaaataacaca ttcatatttctctttcaaatacttcttatgcatgtccttgattggtgcaa atccatcatcacgtgccgattgattcttcccaatttccttcctcattcta tccatttccattttatcaatctcttcatatttcttttctccttcaagttt cacaccaaactcttctttcaccttcttcatctcatcggtttcagtttcaa ccgacaatttcttcttccagaatccaacttcttccattccgtcagtgatg acgatgagcaagttcttgttgggaccccatgaagcatcggaatcatcatc tctagaagacttgtagtgtgtgaacgtgataaaaccagcggcagacgcaa gttgcgcatgctcgaaattatcgtacgaaacattggtcaaatccttcaga ccaatctcaaaagtcatttcttcgttntttccaagaacatgaaactcttg agaagtgactccatgcttacgtcttgcaccaacaatttcgaagaaaaccg aatgaactcgaatgatcacatcctgtttctcatcagcagtgcattttcaa gtgtactccaatggcttgtcgctgcgaatctgaatttcggtgaagagatt tcgaactcaaactgtttgcacgtgaaaccaggtttcttggcagtttgttg ctttccttctgtcaaacacat >AW057485 ccgtttcgacgattcatcactagaatttgtgcgggatcttgcaaaagact tcgaccgatccatcgtcgcggtgaggttcaccagtcactggatcatacaa gaatccgcccatctcaagtccttcaaatccattattatcagttggcatga cacatgtgaacaagtcatcatcaacatcatatgtcccaatgcattttctg tagttggatccagccttgtatgtcatgagcacacgttctccgtatcgaac cacgtaattgccggcgtcaggcgtcttgaatgtgttttccggtcccgtgc aaacattatcaagagtttgagatttcttgaatacggttttttccaagttg acaaagtcggcattcaacaggaaatactccatattgcacttcttcacgac attgattcctttgaacggcatacaaaagtttgacccgcttttttcccaag ttggaatgtacgtttcctcgattccggaaaatgtgaagacgccgtcgaga aggataacgccaatcattgcgtatntataactatccctggtgcctgtacg canattgtccaccanaatcatagtgttgtggaaatcaccttcggccgncg gaacaaccatcggttttttcaattcgaagcccactttgtacatnncggtg ctgtgctttggctcntggtncgcgagtctgctgcaacgagactggcacag ntggagcatcataagactgcaagaaatgtgacgtc >AW057486 actagtaaagttcataatcctcaatactcggcacatctgggctttctcgg tgcgtaccatgtcttgtttctttccatccaggcctcgtgcttagtattaa cggagtccgtgtcctgccacgaatagaactcctttggcggaatgttttct ttccagttctgaatgacgcgttcgagataaaacgcgactacattggcgtt cgtattcgtcgcagacagacagtactgtagaatctttgccatcctctcat tagtatctgggtccggaaacgtgcacgtggccagatagttgtacttcatt gttgccgccactatgtgatcctcctcatcgtgccattcgaaggtttccag cgtgataaaccgtgcacaaatgtaaaaccacgactcgtaatcgtcggcag gtcctcgtgccaatcggaaatgctgacgcctcgacaaatacgtgttgatg acgttcttcgacagtggctctcgctccgatttgttacgggcatgtcggga ctgttgaacgtaaagaggctgcagcccattgtgcttcgtgtacgaattta cgactgatgtttgaccgtttcgagttgagaagtattgacacacgactgct tgtgttctgagagagagacgagnaggacgatgcgagatgtgagc >AW057488 tttttaacaatagcatttcattccataaatatttaaggggtggttattat atcagttctcttaccagtcttctccatcttcgaatgcattatcacgatct gcagtggatgccgttccatttagcttggcagttatcggatccgggatatc agttgaattggcaacaatagcatcagtgacagcaacagtgacaacagatt gggatgaactttcagcgagcgtagaagtttgaagcg >AW057490 tatttcaatgagtacaatttttcgaaagaaaaacagattgaggataaaac ttgagtgatgagataaccgtaatatggagaattatatcagtgtcaagaag gcacattgttcagtttcatatttacagatgtttgggattaaatgaagatt cggtatgcatcgacgatcagaacaatgaacgagtgagttgaaagaccttt tgttgaatgatatcagttcttggagaagaccatctttcaattgcttcaaa ttcacgacgtcttttcaccagttgtaagaaggcggtcaacgacccgtcaa cgtttgcgtagtgtgtgcggcaggagaagagacaaattccatcatacgag agtccttggtgcagaatttcgtgaaccaaatcaaattctgcttcttgccg gtttagaagatctggtgccaaaatgatgtcgaatttttttccgccaagga acttcattgcctcttcaattgtaccacatgaaaccttggtcttgatcatt gnaatattatttctcttaagtgttggacgacagtaaagctccaaactagt cttatncattgtgtgcattgcatttcttctgcttcattttcgaagcatat actgaatgggagtcagttacgaatccgattncaaaacagatttncattcg anaagtcagttcatctcacattgacatatatt >AW057491 ttttttagaataactttttatttcgaatgtaatctcagagcaagctttta gaatctttttggcagtccgtacgagtcaacagtggattggtaggaagaac catcagaagacgacgtatccgattccgaggatttcgcagagttcatctca cggtccgataggtcagcaggagtcttgggaagattcgatttgctgcatgc ttgcaaatcttccccgtcttgtttgaactttctggcacggcagatctgct tgttgatgaccttattgtagttggccttttccttccggaagtttccacgt ggttccgaccagcaatagaagttcgaacatccataggatccatccggatt ctttacaatgccgaagcaggccttacacaacatgtgatcacactttccca atgcaatcaaatgattggttggcaacttgtttccacatggaccatcacac ttcacaaggacaacgcccgcttgctcggtgcactgaaatgctccgacagt ctgcacacacaccgtgtacgtgtattctttacgtcgagtcaccggattca ccaccgacgtcagattacaaccgagaacttcgcacgtggcanaaatgagg gtccagaagatttgcggatcat >AW057492 caccatacaaactttctgccattgcattcaaggccttcattatatcacag cgagcggccttatcactttgggatgcaaagattccatagcttgccatcct gaacatacgcttaatattcgattccatactaagtggaactggatattaat acacccaacgacaaacacggcacttggctttcaagtattgtgtcatgaag aggctgagagacggataggcaccacttttacgcacatccttctgatcatg atcgctgtgagcttattgtgactacttctgggcagtgtcttgactctcga cgcgaggagaagaaacatccactgtcggatcttgcgtaagatgtgtacaa tcaccagatggttgaggaagggtcgctggatcatcgaccatttggtcagc tgtaccagttgaggtataggttgttggcgctgtggaagacaccgatctag tgtgatacagcgcctcaaaccgaggagacgg >AW057493 gattaagttcagaatgatttggaaccgataattgaagaggaacatcggtc cagtagaatggcttcaccaagtaaagactcttagtgagtggcaatattgg ggaatccataatttgtgcctctcctttcttgagcgaaataaagttcattt ggcaggtgggcttcttgatgcattgcaaaatatcatcgacagtctccaca tgctcatttcgtgtcaagtcaaccatcacattatcagaaatcacgaccac ttgtttttgctcaatttttgcaatatttttcttgaaaaatgcttgaatgc gcaagttgtagtcttcgatgttctctgcacgaataaatccttgcgatgga agatattggatattgattggataacccatattgaaaaacgcttttggaga gagcaacatttcttgaaaattagtcggcaacatttgacgatagctgagaa gagctggatcaatgttgaaggtggtgtaggaaagaccaagtcctttaatc aagaaagctgagtttgaactgatntcatgtcangagagcaaaatacttca natggacgaattnncgcattcaacagaccacgcccataagttcagacaca atttgnccacgtcagtgagccggagatcgaattttgtagtgtaaagtgtc tanngaactatttnggaagcatatcatatcatatggatgatattcattgg accacgacgatacagacacgaacccattttggaagatcaa >AW057494 ctattagcaccgaagagatctagtccaccactgattttctctgtccgcct gcgcctcgtattctctcctctctctggcattcgttggctccagcagaact acagtttcatcgctgaaaactgctcttggggtgcagaagatttcagagac aaagcttggctcatcgcagttggctttgtgttgttgctcggcgagcactt gctccttaggcttcaggacgagctctttctcgagtttctggcgaagagac cggacggagacatggggaagtggtatttgctcgatggcagtgggcttcac ctcgtcgacttcatcatcctcatcatagtcataagatgctgacgatgtag acgacatctcttcaaagattccagaatcctccaccgccgtcatcatcgcg acattaaggctgtcatcatcgcggcgctctgtagacggcaagcactctcc ggcggctctcataatcgcgcacgtccacttgtgaacatcattatggccga caaagaggtgaacattgccacgttgcatgcggatcctgatgtgacagcng aaatgacgctgcttttcgaacgatcagcagtgcacagaacatttctggcc gatcgcaaatcaatcaccaatccgtggtcgtcatgctcgtaaatgatcat cataccgcgtgacgtcacggtatcaatcgctcttttgactcgcagatatg atcgagcaggaaggctgctctggaatcggattgagttgagcacctgacca cgtnggtacccatnccttggcattcactcgggacgctgagccgttatgca ggagtgatgtcactgatcactcgctcgcggatgccgagtcacggtgaacg tatatgttgtagacgcaa >AW057495 atcttgctgaattcttatcatcacttgaccttgcgaagagcttttccttc atgatcatgtccgatctcgtgagtttgagtttgctcgatgatagaaacac ggcgagcatttggaccggtagacattgcactcttcgctggaggcttcagt cctggaagagcttgacgatattgcgaagcaatgcctttcacatcttcgtt catttggacctgatggtggttgcatgagattggtagccaacgcctgtttc gcccattgcttggcatcgatcgaatcagggcgctcgacgacaaaagtgac ttttcctttctcttggatattcttgacgagaagatctctggcaacatcct tgtcactgactggaattccatcaacatcacacaaatgatctccaagaaca agacacttctcggctagtgatccaggatcaacacgtgaaacaagcacacg gttctggaaatgcttgattcccaaaccaagttttggtccattntgaaccc agancagggttgccaactcataaacgtatccctcacggcgttgaatgatc tttgcacgatcctcangaatatgaacacgagcttctaactcttcagcctt tttttcgtcacggttcaccgtgatctttgcacatggagcagcatagcgaa gcgcacggaaaaaagtcgtaacatcccttgcattttggtccgtcactttt tttacntggtcaccatatctcatttttccttacgaatggttccagcctga ttctanggataccgagctgtcattnngagttgcacaaagcgttcccctcc ctcatgcnataccaatctcgacatcttcttggacccat >AW057496 tttattcactcgatttggtttccccatgtgccaacgtctcgacctttgtc ttcggattgcacaatgccttgaacaatggaatcatgaagcagggatcttc attctttgcttcgattggagcttttttgctggatttcagagaatccggtt ttgcaactcttgggaggcatttcagcacggatttcgattcgaattgagca gcgcaggccgtgtcattttccagaaaagcacgaagaagtggtttactgtt gttcattttgattgcatatcccgtcgactcaatcattgtagatctcccgg gcaggtcgatgacaatcgttttcgccgaatactcggcatcgtcggcgaga aaattgagtgtctggacgagatgatgtggcttatacgacacgtgtccgtc aatccagtgacgaatgttttttcgaatgtacgggtaaatggccctagcgt agatatcgaccggcttgtttttcagcagagcagctttggatggtgtttca aggattgcgacttcagtatacttattggatagaaccaagcattcgtaacg gtcaagttttgggctttcgttgggatttacaatcttcactgctacgtcat anatcgaagtgtttcggagcacgattcgatattcagatttcacattttca cgaactaccacagaaagagtcat >AW057497 tatgtggacaacaagactgaggaagcatggttctcgttcaatgggaaggt gattaagcagctcgggccacagctcaacgagatgtacatcatcacgcgca actgcatcggaggaccaccacattgcccatgtgctgtgtgcggagctgct ccaccaccaccaaagccagtgccacgtgtcgagagagacgaatggatgga cattcgtgagggagatccatggccgactcgccaacttgtcaaggctcttg acaagactctggacacccttccaggagtcaatccagaccaatatgttgct ctctggtacatgcaaggtgaacctgttatgggtcgtgtctggaatgaagg aggaaaggtggctgccaacttctcgtggttcaacaacgagtattgcaaga atgttggatctatccagcttctcatctatcttccggacagtgttcgtggt tttgactatggatggatcccattcccggaggctgctcagtttggagacaa agcttggcatccagttcatgtcaacaaaccacagggagatatctncgttg gagttgntaacgttgctggaggaaagcagattcttgccaaggtggatgtn ncgtacgagaagtatgttatggataccanngaaaggagcattctgcnnac tgccagaactgcgctgantaacacatttgtctcttgcgtnagggcaaggc tggataccagctcgac >AW057498 acagtgataaacatgaaattcctaaattcctaaacagtaccccttcatat cttggccttttatcttcttctgtgagttctacaaattgtcgcagcacatc atctgcgagaatcggaatgttgcacaattgttcataggttttcccattga tgttgtgtttggttagcgcatatgaagcaagatactctggatctgggaca acgattgctacaagccacggtttttccatatcgccatgaacgtaaatctg ttggacaaaactcgaagaagtgtagagggattctgtgagatctggagcca caaactttccttgtggcatgttgaaaacgtttttgcgtcgatcaataatt tgaagagatccttcagcagtgaatcttccaatatctccagttttcatgta tccatcttcagtgaaagatgatgcagttgcttctggattcttgtaatatc ctgaagtgacgttatgacccttcacaagtacttctccaccatttttaaca acagaatatccaagttcaggaacatcaattaattttatcatcgcacaaag ccatcgtggtccaacacatnncatacgtgtatctccaaccaattgaagtg ttgtggtccagatgtttcagtttgtccgtattcttncagaaaacctcact ncaatagccccacgagcanacctcattgcngaaacgtctgattttgcagc ttccagaatcaattgtctgagggtaggaccgagaagcatctggatttatg caaacatacctatctaccatgtttacgatagcttgcctgcatcttata >AW057500 gttagatactttattgtttaaaaaatcgagtttttttaaaattcaaatga ccgtaatttcagaaggcactatgcaaaaaatatatcccatttaatttaaa aacactaagcacaacagtaagcaaaataatttccttgatgacactttgta taggctgggcagaatgtagaatcgcaagtttttccgaacattgggatctt gacacctccaatatcaatttgatgcttcagctttcctggtccacactcct tggcgtagtcggcatcgattttgtctctgattctcgtgtcgcaacagatt ccagtgaatgggccattcaagcagaaagtattcgggccatcgcacttcaa ttgttccattcctccacagtggctctttccgttctttccagcgacactat cactgccacctgggcatgtgaaatagtccattggaatacagaagtttgag caagctgagtaatccttgaaattgttctcgtttcctccacatccagtgta tttgaaagcaaggcatttcttgctttccatgtcgaagtgatatctgacaa ctgaagtgttggtgcattgctgaactccaaaatgaagcggatgctcacat tcagcacggaactgagctgaaacatctatcaaacacagtagaccgattgc taaaagaactttcagcat >AW057502 tcaagtattagacggactcagtgggatgatcatcagacggattatcaagt acaacacgcgaaagctctgtctcattcttctcttcttttggtggacggag catcgaatgctcacggcactgagttccatcaagacgagtaatttgttgca agagatgaataacctttccacgataagtatccactgatgcaattggattg aaatcgaggtaaacactattgagttgtggaagttcaaccaattcatccat aatgctccagttatccaacttatttcctcttgcccaaaagtctgtaagtg tcttcaattgatggatattctcgaccttctcaagacgattctgattgaaa tccagaatttcaagaggaagatgctcatcgattccacaaacgtacttgat gccattttgagccagataaatctctttcaagttgtgaagtcccgaaatgt tatcaactacagtaatcgcattggctggaagactgagaactgtgagcttc ttcanatgatcaacattntcaataagacgaatctgattagcgccaaggaa caatctatcgagtntcagaatggttgcgagaatctcgattttcgcaatnc tattgtcacccaaatcgagatatttcagctcagtcaacgatccaaaccct ncgatttagtgaatttgtatgaacaaaaaagaagtctcagtttgtcaact ctccaaattttcaatttgtgatacgat >AW057503 taactttatacttgtatttcacaacttttcccaatttgcaccaaatcgga agtgactgactgtcgaccatcggatgaaactctggagcctgtcctttgaa gtgagagatggatacgctcatcatattgtcacacgtaggtttgcccttga tcttattttcttcgaccacaaagtttgccgttttcttgccacctagcaca acctcatagtgaggattacttggggcttggcaataccggttggttctgta tctctcaacactgttcgagactagtggagaacgcataagtgaatacgcct ctggttcaagtttagcgagctgttccatgctatccgggtctgggaagaat tgcgactccttcgagtacatgtaaggcttcaacactttcttgcaaacatc cgattggttactacttctggtgtatgaggaagccgatctaacaataatcc catccttgaccggcactgttggatagtagatntccgaatatctacggccc atatagttaatctcggcgggggtgattgcaggcttgtattngcagagagc ttngcactncacttttncattaaatctgatcatcgcattttgtgggactg agaagtcaggtgtacatcgtagaaggacagagcangatatcaccatcnac tttgtcgtcttgatacatactcatcccacgttcngggctcgaatncagat ttgtggagtcttggagtagcttctccagcgtcagatgatctgtatgatgc cggcattctcctgaaactggcccac >AW057504 tttaacctccacgttttatttctgaatcagaattaaggcatgtatagttg agcgtgagttgggctgtagactttggaaaaatcgaaattttcaggcacac aatcagtgtagaccaggttacagacgtgcgaacgagttggttagtgtgtc aggaatggggtgagccagcaaaaactatccttattcagctgcccggatcg aacttcaaaactgacgaccgtcttctcggatgccttaatatcttttaaaa tctccgaaaacattgcctgagcgacacgtgtattttcagcataaataggc cctaaattacaatcttttctgttctcatagatcacagcttgcccgattcc aatcacatttccctcctcatcatacgccactttcccgaatccatcccgat cgtacatatgagagatgatgtacttttcgcggtgatacggatgaattgtc tgatcataatttatgatatctctcttcggaacttctcgagcatttttcac tgtaattgcagacaaatgcaacgtctcagggagcacgacgtcgtgtgcac tgtagaatgtcttgtagccaacgtcgccggctttgtgtccgagatccaag aattctttccgtttggacagctcgtgccagatatctgtgggccgttgggt tatctgtgagacaatattattatcttcgctgttcatttcactcgacacat atcattctgaaagtttgccgttccgggagatctgaattcaggatcaatcc atgcgcaaactcatgacattatctgatgattgattganggattcatgggg acgaatgtgcacaaatgtgctggtggatgactctcggcgtccttttaagc tatcactt >CEES071R ggcacgagtcaaccttcaccacaagcgtccgtgcaaccttctcatcattc cattttgatgaaaatcagcgcaaaaaggaatttggaaaagaagaagctgt gaagaagattcaaaagaaagcagcgaaggttgctcgtgntgattcaatgt tcagttctgaagaatttttccctgacattatcaagtgcatgtnncaccgt caaacgnagaatcgagcttcacacgaatgctttgacatacaatcacaagt taatgnagagatgcgaacaattctcatcgnttggttcagcgatgtcgtga aagagtacaattt >CEESA12F gaaagcaattagaagaagtactgcagagtgtgatcaagtcactttagaca atttaacgagtgtagaagaaaatcaaccaaactcaacantgtcgcctctt gcaaagataattgctaacattgaaaaaaggaatgaaaaagtgaaaatgac taagaaattcaaaaagtttggagttccgcttccaatgtnctcatcgaacc ttgaaagtggatataagcaatgtcgaatggacatcacatgtncttctggg tacagttgtgagaacantagtaaaacgagatgctgtatggaagcaaatca ttcaccggaaattgagcgaaaaactgaaganttcaagncatgcccgtttc aacttcaaatggcatacttctgccag >CEESA13F aaagtgatttatagatttacgaagcggactttgtttgggaattgtaaagt taaatacaaaggaaatagtgggaagaaattttnttttcgggacgaaaatt gaaatttaaaaaaagggttctcggggaatcacatgaggntagagactgga aaagagaaaatacaataaattaaaatcggttgaaaatgaacattggacag gagaaacaaacggaacggggaatcgttttttttatagaggcgacaaaaaa gagcaaaagcagcatcagaagcgtcaaatcagngtactcaaaaganggga atggatattccaattattcctgatcattcaacggcaagtgagt >CEESA14F acgtacgccagangagattcattcgaaaaaatgatgcaaacgtaaacaat aaagcaattttacacacaaaatagaaattattcccgagggttcagcgtct actttgaatcagctcgcgaagattcgtttcagcctccaccaatcctttct ccaaatagtctttctgcttctcgatagcttcaatttnctcttttgactgt ttagcttcagcttcatgtctggaaatttcagctggtttgtcggtcagcag gaaccatacggcccaacagatcgattagtacttgaggg >CEESA15F gaaaacgagtatttattgaggatttgtgagcaatggggatttgatgtgag gtaaaaaaaaataaaaacaaaaggtacaagantaaaatatacatatagga cccgcagaaattgagatttaaaaaaaaattcaaaaaaaagaaggaaattc aattaattgtgcatactattggtcatttctagcttaaaaggntcactgaa aagtgagggactttgtcggaaattataattatncgatgttgaagaagaag aagctccattaaccagcgtcggggaatcccaacttccaggttccattggg ttcttcacatccccttgatgattccgtattggntcctacatcatgatccg cttcaactgcggatagctctt >CEESA17F cgaagagtgaccatttatgcacattgatcggaaataaattaggaatccac gcacttcgaggatgaatcaataaaaacaggagaggggtgggaatagaata cgaaagatgaaagtaagaaaaaatcgcgtgagaaaattcgggagcgattc aaagggaacacagtgacaaccgggagtgaaggnttaancgtagtanttga gggacgcctttttctttgcctggncctccagantagcatccataaagtnt tcatgaagaatctcagtagcatctcgtcgaagagcaatcattccagcttc aacacaaaccgctttgcattggtgctccgttgaagtcatcggttacagcg gg >CEESA17R aggnaacaatattccgatatcggaggatgtgacaagcaaattcaggagct gattgaagctgttgtgcttccaatgactcacaaggntcgatttgttaatt tgggtattcancctccaaagggtgtgctcatgtatggaccaccaggaact ggtaaaacgatgatggcccgtgcggttgctgcccaaactaaatcaacatt cttgaagctcgcaggcccacaacttgttcagatgttcattggagatggag ccaagcttngttcgtgatgcttttgctcttgctaaggaaaagggttccag ctatttatttttcattgnatgagtttggatgcccattggtacgnagcgat ttcggattcagaggaaagcttggagg >CEESA18F ataacaaacagtttataaacaagaaatcacgcaacaatctcgaaacggag tgtggcgagaagttctggctcattgtnctgggtggtgacggtgtgagctc ctctaggggcgactggaagtctcttcaaagctgggactggttgtccagct gacttagcggccttcctactcgttacgacctggcgctagacaacgcccat gtcgcaacggcggagtataggtctctcgcttaagcgccatccatttncag ggctagttgattcggcaggtgagttgttacacactccttagcggataccg acttccatggccaccgtcctgctgtcaatatcaaccaaca >CEESA23F atagcacagaaaactatattnaatttaatattataatagcgattatnaag tcagctgctcactggaatganttccagagagggagagagaaatagacagt aaaacgagtctttgaagtaaagancaactcantacagaggcggggatgag tggtaaaaagattgcataatgtatttccaatattgaaagtagttatgtaa ttccgagacgacgggcttntccctcataanttaaaagtcgccacattt >CEESA24F aaaatttttttncattcttaaatattcgcagatgtcgtgggacagagttg caaaatgtcaagagtggtgtgtgtcggtatgggaaaataagaagcatgnc aatagtgtctgataagttaaaagaggggaagagggagggagagagagaga gtgtgcattttggnccagtagatgaagatgngtcgtaatnntgtaggaac acaattatatttatcagagaaaacgggataaaacaacaaactcgattcga gttgatcataaatctgtgttatcacaagaattcgacgnaacaggagttgg tggacgattgacgnggatattcgaccccgatggcaagggaaaaagtatt >CEESA49F ggaatgtgcatatatttatatatataantttaacaggaataacatggaaa acgtttcaaaaaactagcgagaaaacagagggttcgtattggaattatca caaaaggcacacaaggcgtagttggaaacatagtgatagattaggagtat agtgcaattaaaatacaatttnctttggtgaccataaaaaccctaattat gagaaaaggttagaaatttttaaaagcagattangagacggttacatagt taaaantgcatggcattataaagntcacaaattgggaaaggtntttttnc gagattcttcttctggtgggangtatagagcaagacggtcacgtcccata a >CEESA50F gataaatttcttatttagttgcacatgataaagtataaatgaaaataatt aaaattaaaaaaagagcaaaataatgtcacgtgaggtttaaaaaggagan taaaaagcccaaaaagtgaattgaacgnaggnaagatgtagangaganga gcattttgaaaaaataacgctaactatgctttaaaacagannganggtaa canaaaatgttgagaaccggtagag >CEESA51F gatcgaaatggtcagaccgttgtgaagcttgtggatcgtgccacattgct ccgagagcaagagcagaaggacaccgagaagaagcggaaggataaggaaa aggcggacaaggagcaaaaggctcgggagaaggctgataaggaggcggca gcgaagaagatcaagccggaagagctgttcaagcagggagagcacgtcgg gaaatactcgaagtttgatgaacgaggtgtaccgnctcatttggctgatg gaacggagatcacgaagagtcagatcaaaaagctggagaaggtgtacgga gctcaaaagnaaaagtatcagcaataaatattagtgcctaatataa >CEESA52F aaattttacaatgtttattgaagacgttgaacgtcaaattatcaaatttg atgaatgagataaataattataccagcatgaagattgtaagancacggag aacttacagggaagaaattggaaacaacataggacactagcgtagttcta tgtgtcattggggattgggaatgaggagatccgataagttagatgataga ngacagaggtaggancatattagaaggggaaaaggcagattatttaggcc ttggcggctggcttggcggccttcttggcagctggcttggcgatcttctt tggagcggccttcttggctggagactttngcgacccttctttggcaagct ggcttggcaaccttctttggggctcttgngccttccttgaccttcttaat ctccggtggccggccct >CEESA53F aacaataacaatttatttgaataaacaaatttaagccttagcttcggcct cggcaaggaaatctctcttgagttttcccatgaaagcacgcttctcggct gtggtttngaagcgaccgtgtccggtcttngaactggtgtcgatccactt gaggttgatcttctcgtgggcgactctcttggtttgggtgatgagcgact tncggagggtgataagacgcttctttggtccgagaacggntccacgaagc atgatgtagtcctcgttgacgataccgtatcttggggaatcctcccattt >CEESA54F attcttgaaaagttttttaaaagaaaagctgaaaattttacaataacgca gatgaaaatnccaaattttttggggatatttcgccaaaaaaatcattatg tgaatttntggtgatgaagatgatgacaatcgctttgaaaaaaaaaataa caaaaaatacaaaattcgggggaaaaaaatgaaaanttaacaataaaatt ggncattttagtttgagccaatttttggcagacagagagagngagagtaa gagtaaagagaggtagcagagannacagncaaaatttattcggggg >CEESA55R aaaagcttcgcaatgagttcaagttgtcagatttnnaacttctttacgat tacaagggaaataacttgagaagcgccatagttctnctnaaatatccaga tgcaatcaatctctgcaattctattcgttccaatccaactgtattcggaa aggaatggcacccgagagtcttcgaagtgctcgacgtagctgttcagccg ccaatcgataaataanctttttngtattttaatgcctgancctgtttttt ttgatgcttacatgaaaatttgt >CEESA56F actagttctctctnctttttttttttagaacaaacaatactttagtaaca actatgtgaaaaatgaaggcaaatgagagttaattcatcaattaatgtaa gagtgatatgacgattttaagcattgatttcagtttcccatttgtgcacc aatgttttgaacttccattcgtctgggtgtccgacgtggttgaccttgtc tttaaagtcaaactcagaccacttgaaagatggaaccttgccccatgttg gccctccttgtgcgacaaattgcaacttcttcatcaactcaacatttgna cccttgtaatcaagagctccgtggttaacgtgtccttgtcccggggaact cgtatgttccattagctggatt >CEESA56R gaggtttgctccacagcacattcgagatgaatgtntttattggcttactg ctagccactgttgtagcttctcaaagctcggaaggacgcgatgagagcta cacttacaagcaactttgcatagtggacgataagcctcaagttcttgatg gattcgactgccgcaaccaagttgctntcgccagatggcaaaacgctgtg aacacaactggctggactttcctggnagtcgnaaccaaggagaactactn nccacaaattcaaagcctactctgctgggatatctttaag >CEESA57F tccgatcaaatctacatggatatgcagaagttcggacgtgtccgtcgtca agccggaggatacggtggatatggtggatacggtagcggaccatctggac catccggaccatctggaccacacggtggattcccaggaggcccacaagga cacttcccaggaaatactggntcatcgaacaccccaactcttccaggagt tattggagttccaccatcagttactggacatccaggaggaagcccaatca acccagatggntccccatctgctggaccaggagacaagtgcaattgcaac accgaaaactcatgcccagctggnccagccggaccaaagggaactccagg acatgatggaccagatggaatttcaggagtt >CEESA58F atgagtgaggtgctttatttgaaaaatctttttggaattaaatttcaagt ttttttacagaaaaaaaaacaagttcagaaaggagcaaaaatacagaaac aaatttntggatgaaggggtacatgataattttnagggaggaaacatttt aagantagaattaagacaagatgcatcctggaaaattttgatcggcttcg gcgatttgcgacgcagtgaattgaagcatgaatagttggntccataaaat cctgataattcgattcagaaaccagagattctttatcagaagcaaaatcc atatctctatccaaataggaaatatcacttcccattgaacttcccgagtc gtcggcgatttgagaattcaatagcagt >CEESA59F gatcggcgccacagaattttttggagtaagactcgtcaatcgtgtatctg atatcttttataattggtctctcgatctcttgttcagctccactatggaa gacaggtagttgagattcttcaaaacctggagcataaccaagtcgtttga ttctagatagcatatccttgcaccgatttgcgacaaatttcgtttgaaaa tcatgaaaatcattttcagattgcttcttgatcagtttacaatgtttcgt gtaaatatcctcaaaggtcgattcgggattttcctcaatcattgtcttgg tgatctgttgatagtctgggttttccagaatatccttatgctgatcaatg ggntaaattggnttcatcacttcacaattgtttcgtcagctcattcccga ttttcc >CEESA60F ggtacagttcatttttncatttcagaataagcaacaaaaggtgttcatat gaatgtnctgtaggttagtaaaacactatgttaggttactctttgatgca aaaaggaatagttaaaaagttctcagaagctttcgaataaattataataa atacgttgcagaagtaactgggaaaggaatgatgatcgtnatttaaaaaa gatggaagagcttcaaaacaggggcttagaaaatncagttctataatant aaattgggaagagctcaaaanttcacaaaaaatggttaatgaatacgcat acaatgtcaccgcttgttgaccagcacactgagcgggtaaatttccgagt aagagatcgncgatatttcagcggatggagcacggtaggggaaagtagga ta >CEESA61F ataacaaacagtttataaacaagaaatcacgcaacaatctcgaaacggag tgtggcgagaagttctggctcattgtnctgggtggtgacggtgtgagctc ctctaggggcgactggaagtctctncaaagctgggactggttgtccagct gacttagcggccttnctctnctcgtcattggacttgacacggttcaagaa gtcggttctgcacttggatggcttgatgtgctcgatacggatgttgattc tctttggaagaatgtttccgcggactctcttgttgacgatgattccgacg gctcctctggtgacgttgaaagattctcccggttcttccgtggtaagcct tgaatggcata >CEESA64F aagagtttgaaacttttattagctgtttttttagttcaaagtgagaaaag atgaaagaaaaaacaataaacagtattatgttcagaagtgtattgaagag agatgggggcgaaactaatcctcaatgaatctaacttgaattatgttttt nctcatggaaaatcgcgataaaaggattactgtgtcttctacagtaaccc gaaacgtaagtttntgggtgttgggggtgggggagggttgattcgtgagc aggatttcggggcatttacacgaaacttttcctcatttttctcgtgccga attcctgcagcccgggggatccactagttctagagcggccgccaccgggt ggagctccagct >CEESA65F aataagtaacactttcatcacataaaacatcagtttagtgaaattgaccg gaaattgaagtaaaaataaacgcgggaaaggatggtgtgacttgactagg ttctaggcggcacgancagcaaatnttggttttnaagttattaaatgcaa aacgtttgatttttgantgttgggaaattgcacaatttagagggcattgc gagtntctgagaaaganatgaaatgttaattgttttgggcgctgaaatga aagatgaccagtggcaaagtacggatgagttaaggtgagtaaaaganata aatgcaaaggggtatgggtgggtaatgcgactagaaacactaagcnagta tatccgtaatggttggaaaattg >CEESA66F gaagcacacgaaactttatttttttttgttggagttcaacatgaaattca gcaattnacgaataaaataagacataaagaacggagagaaaagtggtgat gagatcggcggttcgntcgcaaaaatcaattttcgggatggaaaaatacg aggattatggtacaagttggnttaaatgaatattaaaagtgcttcgagaa ttggtgatggagcttaagcacgttctccgcggatgcgtctggcgagttgc atgtcctttggcatgatggtgacgcgcttggcgtggatggcgcacaggtt ggtgtcctcgaagagtccgacgaggtatgcttcaaatgcctcctggagag ctcccgattggcagccgnctggaaagccggnggtccagtcttt >CEESA67F acatagctaacatttatnagccatttgaggatcgggaataaatttgtata caaaacaagtataacaacgaacactaatgggggcggaaaaaaaggtgaca gagcaagtatttttttaaagagattcattgaaatcgatcaacagtaacaa gaaaaatgggatatactaatgcggatgctatccgtaccgttcatctcaca aaactcgcgaattggtcgatgaaagggtagtgatttattgctcatcggct ttcgtctcgttctttgaggcggcgtcgaagtctccaacaagttctggcac atcttcgtcctctcccttttccatctggtccgagctttggtaaccgttnt tggcaagttttcttgagggtggggtgaggggactctgggacc >CEESA68F aaaaaataaagaaattattcacgtaatcaaaaacagacagaaaaaaaaag taagctcgaataagcttatacatataccgagggtgaataagtgaattgta aatgtgagagttaggctttgaaacgttcgggagaagcgggaaagattgac aacttaaacgtgtaaaaccatganattccgtataatctagtttggggtgt gagcnttgaaatgtgcaggataacaacaacaaaanggtgggttgaaagan atctggngnaattaaacagttattagccgtgacgaacagaagcnccgggg ggctct >CEESA69F aaaaaaattcaattaaatttattatcaatgctccaaaactcatgccaaga agagatctgaaaacaggtgggtgtgtctgtgcaagtaaaaaattcaagaa aggacaagctggttggaaagaaaaatacaaaaaagtcgatggtctaacag aataaccagaacgagattgancgggaatncgnttgagangaaagcaagct tgtntgatggtagatgggatgnttggttgagatttcaatattaccaactg gctgagtattattnatgatttttnatcagcattgtcca >CEESA71F atcaattttttttattggaattcaacggtaaaacgagcgagggtggactg tattaattgaactacccaattgaggtctttncttgagaacacacacaaat taacaccaacgtatacaatattctncgatcggttttnttcggaggagatt tataaaaacactgccagagaactcatctttcaaaaaagaagacatcgggn ttgaaggacaacttgaaacaaatganggaaatgataatcacactaaaacc gagcatggtgcactaattanttataaaaaattaagagtgagagtaggacc gagagaaaagag >CEESA72R tcacaagtgattcaattgtttcgtaaaaatcaatagttttncttaattct gcttaaaaattggcctaaaatcttgaaaattaacaaagttatgaatttnc gaaaattttcaaaaaccaacaaaaaatttgattttttaaaatttaaaatc aataatctacaataaacttacaattaggcagatgaaaattccaattttng caaattttgaagctataacgctgaaaactcgtacagctaaaaactncgnc cattttggggtcccaccgcggncaacccaaaagtggggtgggaggcctag acgtnttagggggtcatttttcaaaaggtcttcggtg >CEESA73F agcagccaagtcctcacgagcctcggtgaactctccttcctccatacctt ctccgacgtaccantggacgaaggcacgcttggcgtacatcaagtcgaac ttgtagtcgagacgagaccaggcctcagcnatggcggtagtgttggagag catgcagacggcgcgtggcaccttggcaagatcacctcctggcacaacan ttggtggctggtagttgattccgaccttnaatccggttgggcaccaatcg acgaattggatggttctnttggtcttgattgcagcgatggcggtgttaac gtcctttngaacgacgtctactctgtacaagaggcacacagccatggtac tttccgtgacgtggatcaaaacttgaccatctggttaanccggcttcgaa agcangctattnggtgatg >CEESA74F gtgtgtgtgtgtgtgtcgaatcgttcgagaaaataggaaaatatgcgaaa aaaatgaaaaaaaatgaataagggagaaaaaagtacaagaaacagaaaaa ttagaagatatttttttattcaatcatcaccgggatgttcggggcagcaa ttcttccgtttgaaccaatcatcgatacattgtttatggtagatacatag acatggtaatcgtgctattttgtgtccagcttcgaggtcttccagacaaa tcgaacattctcctttgtcgtcttttagcacgncatcattataggnaatt ttgggtctcgtcaggcacatgnccagntgaatntccgcgtcgtccgatgg ttacaactttt >CEESA75R tatcaatgaatgtatttncccacctttcctatcaaattagcccttccagt caattcccccgccacctccctttccaatcatcagcacttgaccgatacag tcaacgcatctnagttgactccaatattttnccccgtctgatgttcttct tgtgttagtgaccttctcaatcatttctccccaaaattttttctctatca atgtgtactaacattgccaattctacggcggacttgtctccgttttagtg gtatganttatatacatatatataanntntnaatttaaaattgcatccta tttcgggtaatagg >CEESA76F accaaatgaggcatctcgattcattcgtagtattatggtacattcgacac aaaatgagcgacaaaaaaaaaagaaaaatgacaaaacaaaacaaagggaa aaaaaatggaatttgagttgggcataaattatatatatatntnnntatat aaancttgangaacttttttttgtgtttaanaagnggtgtggaacatttt tttaaggggaaaaggcattgaaacgtaagtagtcgngagggttttggctc gtgccgaattcctgcagcccgggggatccnctagttctagagcggccc >CEESA77F aagaatttctactttttattgatttnccgcataatgtaaagtaattttaa gaattacaagantaaaataattgaatgagaggncgtgggtgtgtttnctt aaaaacaacaacgagtgaagggggaattacagacaaaaagaaataaaana tgggatagagatgggggtgataggtggagatgaggggatatgaaaggtag aagancctggtaaaatggtctgncggggctcaangggaaatggggctcaa aaccaaaaacgaa >CEESA79FB gaaggggtgatttcattagatatttttaaaaattattccaaatttcacgc ataacagaaagaaaacaaaacaagaaggaatatcacaaaatgtttgatgg aaaccggtaagaagtgaggataataggcacgtnctgagtagctgatctat atagataaaatgtgaaaacaagttgaactaatctggcgtacgagaaaaga aaaggtaaatcgataaatatntatgtacaacgggtatagtggatcgtgag aaaagtgcatcgggacaagngacttagagntaaaaaacgtnaggcagagt tcactcaatanacancaaattttncgaaaaaaacatctatggattattca taaatgggncccttccgagtgt >CEESA79R attttagaaaagtatatcatcataatcaccacttcaaaaactttgacgtc ggcttcggaatttngacttctctttgggattatgttttccacacactcgg aatgggcccattataataatacatagatgtttttncggaaaatttgttgt ttattgagtgaactctgcctcacgtttttaatctctaagtctcttgtccc gattcacttttctcacgatccactatacccgtngtacatatatatttatc gatttaccttttcttttctcgtacgncagattagttcaacttggttttna acatttttnatctatataggtccagcttac >CEESA80F attgattcgaaataatttatcgtatacaacacaagcgatgagcatagaaa ttggaactcttttcattcaaaatttagaaaaaaataaaaagaagcgaatt aagcagaaattgatgcgagtncagtattatgcagattggagcaggcggca cgagtttaatactcttctccttcctcctcgtttcctccctcgttcgagtc agctccgacctcttcgtagtccttttcgagagcagccaagtcttcacgag cctcggtgaactctccctcctccattccttctccgacgtaccagtgaacg aaggcgcgcttggggtacataagatcgaacttgtagtcccaagcg >CEESA81F aaaatcgataaattcttcatcataattaatcaggaaatgtttgtnattga aaaaaaaacaagaaaaatggggcgtgtcgatgagaaattggggcaaaaaa aaaatcgataaatcgataaatcaagaggntctttgggcggaaaaatgaga ttttcagagagaaaaatggtgaaaaactaagaggtcagcgaaccgggaac acaagaaaaaantcaaaaaaaaantcgataaaatcgaatcatcgtccatt cggcatttncggcggcttttnctgggcctgggcctgagcctgagcctggg cctgtttgagcctgctggacttgggccaggnagcaaatttggctgtagac cgagcagtt >CEESA82F ggtgttgttaacagatttattgaaaacaaataacaagatctttagtcgaa gagaccgaagcccatgtcgtcatcggattcctcctttggctcctccttct tcttggtctcggcagctggggcggctcctccagcagctggagcagcagcg gcggcggctggagctggtccagatccggctccggaagagacagaagtgat gaggttcttcacatcaactccctcgagagccttggcgaagagtcctggcc agtatggctcgaactcgacgttggcggccttgagaagggtagcgatcttn tcgccggtgatggcgacctcgtcatcttgaaggatgagagcagcntagac gcaagccagtttcttggttcgaagccattnttcaatcgtaaagttncgcg gttttgcctta >CEESA83F gaatatttattttaaattgtgaaatgcaaattggtttcgttgaacttttc aagtgaaaatccatgcaataagagcgcaaaatcatacataatacagtgac gagaagcaatcgaaatatcacagaaaaagttaataagcgagatttttaga ttgggaatgagaaagtncttaatgggcttgcttcttggcgaggttctcag ccaatttntcgaggaattcaaaggtgttgaggtagtcggtacgagtgaca gctgaagcatttcctcccttcacgcaaatggcaagatccttggtgaggaa tccagcctccattgtctcaatgcacacagcttccaaattattggcgaatg tctcaagggcagagttcttgtcgagagtgggcacggtgagccaatccacg ggacccaggcgaagattggagggcaattgggattcgtgggaggttccttg tccccttttgggtgcattcctgtaaat >CEESA84F aaggaaaatagacagtttatattcggaatttataaaacaaatgtgataag aactgccggcggatacgnaaataccgaaaaaagtaatcaccgacgacacc gaaacggatggaaaatcgaaaaaaaataaataattgggggaaagaataca cacatcgacagangaccgcaatttagtgagtgatatccatggactcgacg acgtcgtacttgtaggtcttgtgtttcagcaccttggtggctccggcctg gaatttntagacgaggtcgttgggctctggtcgcttgcagaattctggca caattggccccttgaagacggtctcgcttggctcgatgatcacaagattc aaactgtttcctgaggcgttgtcaccgtgcataccggcttccagggcgcg ttggacaagcttctcggcttcaactttcgtcatatcaaccttaaaatcac gttccaaaatagtgatagcntgcata >CEESA85F gtattgtttgcaatttatttgaagacaaaaattggaaagaataattgggt agaattaaagggagaaaagagggaagaaaatagcttcacaagttttaaag tacaccagctaaccgaagaaagaaaanttgantgaaatatggttccttta tgaaaatctcttcgaaaaggaagatagttaacacaaagatggtccatcta gacaaaaaagcaggntcagcgactaanatgaaaancanctagaatattaa gaggttggtagagantgagaaagcnataaantaagggaataa >CEESA86R ggaccttnttgtnattaagccaaaggtgttcccagacgagcntggtttct tttccgagagttacaacaaaactgantgggcagagaaaatcggctacact gaggntcttcaacaggataaccactcgttctcccattatggcgttctccg tggtcttcatacccaaccacacatgggaaanttagttactgtggttagcg gcgagatcttcgatgtggctgttgacattcgcaaggncagtccaacgtac ggaaaatggcatggngtggtttctcaaccggggataataagcacgnnttc tgggatttccagcccggg >CEESA87F gcatggtaattgtcgaattttatttccaaacactcatgaaaaaagaaaaa taaataatacagtcaaatttttttncntgtagttggaaactnttaacgat aacgaaaattggtaaatgagaccacaagaaacagaggtcgataatttagg ggaaacaataggcattttactccatcagcacaatgccgggtaaaacgaag gtggaactcatatttagttgantgcataggagggtacatggaatcatttg gttcggccttcgatgtaggtagagacatcgatcttttctgggagctcagt gatgctgatatcgaagcggtcttgaacgctgttcaaggtcttggcgtcat tctcatcagaaacgaaagtgatggnccaaacccttggttcccaaaaccga ccagcacgtgcaactcttttgaaggntacgagtcgggaatcttcttggca tg >CEESA88F aacaataacaatttatttgaataaacaaatttaagccttagcttcggcct cggcaaggaaatctctcttgagttttcccatgaaagcacgcttctnggct gtggtttggaagcnaccgtntccggtcttggaactggtgtcgatccactt naggttgatcttctcgtgggcgactctnttggtttgggtgatgagcgact tgcggagggtgataagacgcttctttggtccgagaacggctccacgaagc atgatgtagtcctggttgacgataccgtatcttgggaatcctcccattgg ggtgatggtcttctgggtaagatcgaactcggtgg >CEESA89FB aatggtacattctatattgaaagaaatgctaaatagtttgtatgtacagg aaagtagccaaatacccataagcagagaaacagaagtggggaaaggaaaa cagacaagaaaaacagctagaaaggaaagtaagagatattaatcacaatg aaacgcggataacattgataagtgataatgttgataaactctgtgatgat gataaagcctacatacacaaacacacggatgaaaatactattcaaatgct caatgagagtgaccagaagctagaattgcggggacgacggctcctccaat ttattaacagaagctcttttgatcgtgtaacttngctcccagatgaatga ggaatttccctatttgaaggatggtgcagttgcatccacggcaagggga >CEESA89R tggaaatatccacgcacaaataggatattcactagttttggnatttgtac tcatgcttcttgttgatcaaattggaagtgtcacggtggcaagaaatgat agagcagggagaagccgaattggaatctctgccacaattgggctcgtggt acagctgcagcggacggtgtcgcattaggaagtgcttcagtcatcaacaa atccgatgttcaaataattgtttttggttgctataatgcttnacaaagca cccncttgcat >CEESA90F tgccgctcgtgccggcaaaaaaaaatcaatgggaaaatgtcatcataggt aatacaaaaaaataattttttgggagttttccagaaaaacgggcggcttt gaacaatgagaatttggagcaagaaattggtggaaaaaatcggcggtaaa aatcggaaaaatcaataatttatcgttaaaatcaataaaaatcaacgtcg actatgccgatccctgtccccccgatccctccgacgttccgaacttctcg accgttttcgacggcgatcccgttccgaagatcggcttctcgatcggtga ttatgtcgtgatgaagattgctctttttcacgacttttcgatcgtttttt gctctttttactcgattttntctccttctcatcgtctctcatctctngtt ctctcttnttttccgacgacgatttttccgattttttattaatcgccgnt ttttgaatctggaacggggtaaat >CEESA90R aatgaacaaagaggcccggaaaaaactgccggcctggntcctggaaggcc ttgaaaaggcganancgggagaagcaaaagcagtnggaaaaagaggaaaa gctgaagaaagcggaagaagaaaaagcccggcgaagagccgaagctggaa agagcaaatttgactcttcgtcggatgaagagagcccggaaaatgagaaa tttcctgttggtaatggaaaatctgaatatcaggaagatgataatgattc ggaagacgatttggaggagagacgagagcaatttatacgctgtgtgaaaa ctctaattgatgaacgtgctcctcgngtcttcgaatgacgtcatcatgcg tataatacaag >CEESA92F aaacacagacacgggatggaatattgaacttnttggtaaagtctcaatga gaggtaaaagtgcaagtttacatgtaagaaatcgttcatcattatccaga agcttctgggttagagtcgcagttnttggatatttntncttctgaggttg atagaagcntcgcacataaaatccgctcaagagaatgatttgggaaattt tgaagtgtatcactcgtggtggacggcccgacaaatgtttcaggttcttc aaatgaaaaaccagttggcattgatttacataatctnttgatggacgcga aaacttccgaagagttgtgcggtaaacgtccaaagttcgagtcttggtaa attaaaaaagcaatcctctcgcagagaccgagttttcacatcatatcgtc tttcgttttcgatgataatcaattatctctgcttgtgac >CEESA93R gaaagtcgaaaggtggtactcgtggtgaacaattcatttatgcggctgaa gcattcgattcgactaacaatgttccgataaaagtcggcgatctcacatc aaccaatactcatattattaaaaaaggaacagttgttgatgcgaaattcg cactggccgatcgttgtcatgtattcaaaaatgagattnatggaagtntc tntcaggcgacactntcgtttactgatcttacacagnataaggattcgta ttataagnttcaactgtt >CEESA94F gaaacagtaaatttatttcatagaaaantgaaaaaatgaataaaactata aacaantaaaancgacaaagtgggaaatatctatttagactttaccaatc gtgtacaattnctgagtgtagctggtccattccagtatcctttaatcaaa ccattccaatacagngctggcatgaaataccttttcatcaaatatgccca atatgtgggtttcgattgnncaantggagttgtctccatggctcctcgag gtccaaattctgcgaggttcacacgatttgtgctcacaaccagtgggcat gacgcgtancgatcggtactgcatacatggccgatttaccctgcatgact tgttgtcaaaatttctt >CEESA95F aagtggtcctaacaagtttattcatgttaagaaaccataaatataaagaa aatgaagtatctcatggaatttttntcaagtaaaatcgtgtttcgcctag aaatggaaggcacatnttgaaaaaaaaaatttgaaaaaaantcagcttag aagcactttcggtgaacaatcgatgggccggactcgtcgtactcctgctt ggagatccacatttnctggaaggttgagagggaagcaagantagntcctc cgnnccanacggagtacttgcgctctggnggagcaataatcttgatcttc attgtgcttggggccaaatnttgantctcctttt >CEESA96F atattggcattaaacatttatttttgacacactgaaatttaaacaaatgt atagtcaaaaaaaaacatgcaaataaattatgcttatacagatcagttgt gatttgaattcacaaccttcacagtggccaaccgatatatatatatatat ngnnccagatcagactagaatttggaatagaagcacctccaccttgtttt tatgtttgantttttctttttcatgatcatgttcatctgatcagaagttc atctttagccgctgttgagtagcaaagttcagcaatcgttctcaggtttg tagttgttgggccaggtagaatcggcttgatttgagtgatagttttaaca gcgtctaccgcccgtcccagtgtacaactngaatcaacgcttttgtgttg gangtattgcgga >CEESB01F aagtgcaatgtttgtagccattgaagctccttgtgtgagctcgttcactc ttgcaagaatgaatggggtgacactttgtgaagcgatgttttgttcaatg gcttcttgaacagcttggtcaattgctttctggatcgcatctccatcagc tgcatattttncaggaataggacaagccagaatagttccntaaggtaatc cgngagntttcnaggtttttaacaggtgaacaacttcttccaatgattca gtgcagaattgnagcttttctaactga 5.fa100644000766000024 10441613605523026 17506 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/t/data/dbfa>CEESB04FB tttttttgttagagtattttatatatttattattattacagcttacagaa ctttgatttgttttacagaaaaaaggtgcaaccgcttagacaaattcaat ggattatcattatttgaaactttttgcagttccttattttcaaaaaaatc ttggtttttggtttgatcagggtgagaaaggatttcgggggtcgaaagct agaaaattatcaattttttgtgattttcgattgttttnatgtttttcatt tatacgagatagcgaagaaagaaagtttggnggcaggatggtagcattga atattggtcgaatcgtttaggcggcttcttcatcgacagtttccttgggc ggtgagtgggcttgagctcccggnttccggtgagtaccgacgnacnttct cctcagcg >CEESB04R gccaagaccagaagtnaacgaattcattggagtcaacgtcggattcggaa tcgccatcgttttcggtgttgctgtctctgccaaaatntncggaggacat attaaccctgccgtctcctttnccttcctttccgtcggacaaancaccat cgtccaattnattg >CEESB05F atgaaacgactttggaaaaaggttgggtctggtgtgaagaatgcnaatta tctcaaaaagtctgtagaagaagtagcgaaaattgaaaaaggggaaacaa aagtcgtcccaccaaaatatccaactgaaaaatcacgagaagttagcgan gaaatcaaaaaagaattagcgntgaaaaatgaaagtttngtgggaaatat gactaa >CEESB06F ctagcattccatganctgtgccaagcattctatcaacaaagattgaaaac tattcgatctagaaatattccgaataactcgccggcgttagtcgtacgac aacaattcaaattggcgtttctgtcaganctcagacaggatactcataca gcattgagaaattatcgnttggcttatgatcaatgtcgggatacagttga ncaatgggatggagtcgatgtttttangtggagaagtgttgttggattgc ttanttataaggnacagcagggnattncttgaaaacccacgaattccgtt attagtactnaggggtgtcccnttgccgggaaatttttgaaaa >CEESB07F gaagaagcaacaaactttatagtaaaaaaatgacaaaagagtatcattgg agaaaagaaaacggataactgggagagcaagcanttaancaattttnaaa aattgaaaattaagaaaaggtaatagcaaatgggtggagaaacagtgggg aagagatttcnttagaaagaancaaaatttgtaaatcggaaggaaancan gntgaaaacggnggtcg >CEESB09F aaagtgtataaatttattacaaaagcttttncaaattacaacagaagaga gactggaatgcatgatgattggtggtaaaaggggaaaaaggntcttatga aaaattagaacaaatcacagaaaatttcaggttcggacagacagacaggt ggataagcnctcgtgatatatatttccagngngtaaatcagaaatagaca agatatgatgcaaaggagctctacggtcaagacaatagangtnccnntag ggtgcttacnaaacacttcgaattttgt >CEESB10R agaagccgcgtgttgttaaggaagaagttatcgagccaggttcacaatct gaaactcaaaaagaatctccggagaaagttcgagttgttgtaccgaaagt tgaagttgaaagatcaccgtcgccaaaatcttctcgtgatcgtaagaagg ntcganagaaatctcgtgagaaagatcgtgaaagagatcgtgacagaaga gaaggttcaaaacatcgtgatagttatcatggncatcggnaacggcagca gtttctttccagtgtacgacggtat >CEESB11F actttgtctatttattggtttatnctaataaaataaangcagtaacgntt caaaaacgctcaaaacagaaattaaacgaagcanatntttaagtgctgag attcataaagttgagatggaatgtnttgagtgtcgattgaggaattctta attaaccgacgtcgtatccnatcctatcgncatttncaggagcatctcat tgggtgtttggttgtccgagcgtgtggagcattgatttcncggcagagat ccacgagttgaattcattttttnagctttcgg >CEESB11FB actttgtctatttattggtttattctaataaaataaatgcagtaacgttt caaaaacgctcaaaacagaaattaaacgaagcagatttttgagtgctgag attcataaagttgagatggaatgtgttgagtgtcgattgaggaattctta attaaccgacgtcgtatccaatcctatcgccattttcaggagcatctcat tgggtgtttggttgttcgagcgtntggagcattgatttctcggcagagat ccacgagttgaattcattttttgagctttcggttcttcttccgtttttct acgagcacttctatccaatttccaaatcttctcgcagttcccacgngtna ccatttgaaatcattaggcacttctacggattctactatggtcggcccat catacttttcactcg >CEESB11R aaaaatgccactgtccaagagacgagcggcaggtagaatcaatggactga aaggcccacccgcaaaaaaaagaaaggaaattgaacgacatgaagctctt ctcatgcatccgttagcgtatactgcagaacagtacgaacaagttgcaga ggagtcgaagttctacaaaanttgcttcgaaactactgccgccgagaacg ttgaacttctaaaatccaacggtcaactcaaaaaagagcttggaattttt gaaaaaccaacanaggggnatcagttgtaaaggtccgattattggaagtt tcaaaattccccaaattcg >CEESB13F caggagtcaattgatttatttacagaaatcatttagcaaaaaaagtaaaa ttggaagaaagaaaaatgagagaataaatcatttagaagagtctaaactt gagattgtgaagaattcctgcagtgattgttttacaagtccttgattggg tcattctccttcttgatggagtaagcgaggtaagcgtagaaagcggtcca tacagcgaatggaacgacaaggaggccagccttcttgtcaatcttgtaga atgcaaaanatgcagctgcagcagtcaaactgacaacagtggtattcttc cacaaacatcccaagctccttcttcttgacgatcggaaattgtttgcaaa cagcgnagagttgacactngctncatacaaaccaaggagccaaacttttg tg >CEESB14F acctttagtttgactttattggtagaacctgagaacgagagaaaaaaaat aaaatatataagtnaagctcattggctagagatgaaatgggaaaacaaat aataatttatgcggttncttcagtgcttttcgaaagaggttgcatttcat gcgagctgcggtacgaatctccctcctcgatctcttcaatactctccggc agtggttttcccatagtttcaggcaaaaagagcaacgtcatcactgcggc gagcacagccatacacccgaccggaataatcatgaaaaccttcccaaact tgcttcaagcgatccacatggagatngttacgaagcgggcgatcg >CEESB15F gaaaaaagtncgttattcgactttatttncagaatttcagacaaatacaa aataaaacccgcaaggaaaaaaagataccagattagtcaaaaattgtaca aattgttgtgttagttgtncagaaagtnccggatatattgttgatgtgct cgacaatcacatcgtagaatgtgtagatacccngtgcaatcatcacaatg antattgtgattatcancagtgttattatcagcactctgtgtgactgtcc tcgttcgattctctgcatttcttttatggggaagtgacagaggacgcatg tcgaaacaacgagnaatcttgagganttttgaa >CEESB16F aatttaaaagtttatttatcggaaatgttgataaggaagcacgaattaaa attgaaagagggggcggttgaggggggatacaattacaccgngtatcttg tcaatgaaggttttcatgtcattagaatgagacttgataaacgataaaaa atgcatctgaataactatgggcaatatgtgtgagatgggtaattaacaat gaaacatgggngataaaacgancaacatcctaataaaananctcttaaat acccncttgaaaacatcgnncaaggcgactgantactngctaaatcgaat ccnatgggcaatcaagagtggatttgttttaccccgtcttggggtccgac >CEESB17F gaagaagcaacaaactttatagtaaaaaaatgacaaaagagtatcattgg agaaaagaaaacggataactgggagagcaagcaattaancaattttnaaa aattgaaaattaagaaaaggtaatagcaaatgggtggagaaacagtgggg aagagatttcgatagaaaganacaaaanttgtaaatcggaaggaaaacaa gatgaaaaaggaggtcgntagactttgngagatggttatctntcgtaacc tncgtgtctgatggtttncttaatatcccnttctttcaatcgtcagggaa cacgcacatcgtcg >CEESB18F gaacaggtcgtctatttatnctgtnaaaaaccgtgctgttgtaccgtttg anactgtaaaacagcataggtcagaagaaaacaacatcactatacaaaat aattttggaaaacgggttgagctaattnatttattggttggcctttctna gttgatactcgacgacgatgaatggngcccagaatccgnttg >CEESB19F ctnattaaaagctttattatgaatgtggctcaaataatgagcatgattca nagaaaaaatggtttaaaatgtcaattttgtaatgagaaaatgggggtca tcggcagtaatagggtacaacaacaaaagtgattgcnttaaacctcaact tcaaaccaaagatacacagagaacctagttatacatgcctagattactac cggantagtattgaccaaatacaagagangttaccantgaagatttgggt gagaatgggaagcataatgcagtcggctagagaagttngaactattacta gcaatgtacacggagagggtgaaggaaaaa >CEESB20R aatggtgaaggaaacaggntattatgacgttntcggagtaaagccggatg cttcagacaatgagttnaagaaggcctaccgtaaaatggcgctcaaattc catccagacaagaatccagacggagctnagcaattnaagcagatctncca gggatacgaagtattat >CEESB21FB cgggtaaaaatgctcctagataaaaaagaactgtcacacagttaggggat ttaaaacatgatcagacaagtaaaagcgtgagtaggcgtgataaatataa actttgaaatatgaaaaaggaaaatgcatttgtggatcattaaaattcta taaagttatgaaggaaaacgcgacaaaaatagtgataagtacggtattgg ttactcggaatgtacatcggcaaaaaatgcgacagtgacaagntccaatg ggaaaaaaggtaaaaaccaaaatganaccatacattggggcccaatattg gggaaaaattttgctaccaa >CEESB21R gttactttccaaaacggacacctcatcagcgaattctggaagctcaccga ttctctgcaaaaatgttccnggnttaacgcctcaacagaagcgaatgtnt catgagaatccgaatatcatcaagtatctgatttctggacttcgaagtgc tcttcacacttgcgagtacacatttnaacgagaagcatngaactntactt taacacttccttggggttggaacttcacctcttcaaattgcctctcgtga atcagcttatgtggtacgcgatttttggcggnccngtgntgttctcactc gttggctcgtg >CEESB22F aancttcaataaatttatcagaaaacaaaaataaaacaaccgcgtaaaaa aaagagaaaacacaaaaaggctccaattattcctaaaatnccaagattgt taatcancggtccattcttcgncttctccatcttctactacctcgttttg aactttgcctttngcaggtaccttctgatccttagcaacaatgcaattgg tgacgncagcattctctccgatttcnactccatcacaaatnaatagggtt gcttatgggggnaccatttccaatgacaacgcctttggcaatgatactct cttttagctttgttttttca >CEESB25F agaatgaaatctatattatttggaaaaaagtttttaaaaaatccagattg tgaccgaaaaaaatnattcgaaaagaaaaaaacacaaaanttgaggnaaa acatgangaaagantggcaaaaagttttttgactcgcaaagaatcaccta aacatttcaaatttcgtatgaaagtttgtncgctttaatgataactttta aaattcacattagggcgcactttctgggggaaaagtcgaaggaagaagaa aaacgtgattatcacagaaaaatcaatgaaatgaaatggnatttgaagag gattgttggaaggaaaccttgtggattttcaacaaaaaaa >CEESB26F gttgtcaagtatactttcagatattgttgagttaatggctcagtgggtgc tctcattgaaatcgggtgctctacgagtncggaaactgtctgtaagtatc tttccatgtcgttttccgcagggatttcgtcgtagttaaccttatccaac accaacatatcccaccactcaatatccggtaccccattctccattttcgc cgtgcccgttggtgtaaccatcgcaagtttcacagccgatgagattcccg tcgattgtgcagccgatgagacctcattctgaagcctttcgagctttgcc attgctctttgcttattcgcaagtttctcgaattctcccttctcatggaa attgaacccccggcgacgtcgatctngcagttctcggctgaatccgtggg atccagatactcggatgagtttctccggttccttttggattcttttg >CEESB27F aattttttaatctagtagttctaaaataaatctcagactgataactgtga ccaaatacataattactcatacaatactcaggtcttctattagaataata atctctaattactaaacgataactaaagaaagactctaataatacataaa taaataagnntagtcctgcagttctaataataganccataagnggcaata atatttcataccgnataaacatcagggtaatctaaatatttacgtgggaa cccgtgtagtcctgcaaaatgtagcgggaaaaatgttaaatttaccccaa taaataataaaataaatactgcagntatcataaggttatctaacacatac cctggtaataaatcttcatccatag >CEESB29F cgttcacatttnctttctctgtcggtcttaactcaaccgctcccccgtta caaaaatctcagtcaaaaaaaattaattttgacacgncatatttgtngtn cttgaggnccccctgtgactttncaaaacgntattttaattgtnctattt tgtgtcaactactgatgaaagtcattgaaaatgaactcgtaaatttg >CEESB32F atagaaaaattgtttattaaaaataaattncnctctatattgatactacc tttaaattttttacaaaaaaaacatgtctgtgtgtaggcggatacaggga gggggntaccatcagtaattggngcttaaaattccggaaaaaattgggaa agaaagaaaccngtaacatttcggatgtntgggcgggcgggtgattgtgg taagtgagaattggatattctgagaaattttgagccatgggaggnaaaaa agnncaaaaactgggtgaggntcgggggacattttttttgttgcaaaata gtccccctacacgtactcaaaaattnggaagtcaaaaaaaaaattg >CEESB33F ggacaaatgttcattatcatttacaacttcaatagggcaaatagaaattc aataataaataaataaaatcagaaaattgacagcttgcnctgaattagac acttctnctgactgagattctggtgataaaaagaggcgggatcctaaggt gctcatcgacccgcaggagagaaactaaggtcgattagtgaggaaataca ttttaaaagataaatcaactaataaagaggaaagatgaattgatgatgtt tggtgatttntttgaggattgagattggagatcgcaattattaatgncac gcatcggtttggaacgaatccgtcanttctccatcacaatttcgggagtt tggagaactgcaaaacagcccantntccaaatcctgctccctaaatacca ccgcagcagtttaagataagtgttggtaaggtcatccc >CEESB34F atntttatgattttattttaacgtgaataaacatcacaaaagtnagctta ctcaaggggtggggtgtggggcggctaaaccaaccactaacaagtaacaa aaagaagggtgacagtaagaaaaaaaaacaggagatgggtatgcttagca actgggggaacgtgctaagagcacttggcaatgaacttattgcttctnag cggaaacgagaaccgatgcagcttcgtcgaccttcgagcggaacaattca ctatcttgaagcatcatgatcaactcggaattgtcgatttcgagcatcat tcctgtgaattttccctgcggtccttgtgtcccgggtagagtttctcaat gagnagccgtagatacgntcaccaagaagttg >CEESB34R tctgagagacacgctgaattgaaaaagaagcacgagcaacacaaggctga gcgtatgcagaagtatcaaggagtcaatctctacgtcaaaaacctcgacg agactgtcgatgatgatggcttgaagaagcaattcgagtcttacggaaac atcacgagtgctaaggtcatgactgacgaaaatgggagatcaaagggatt cggttttgtctgcttcgaaaagccggaggaagctacaagtgcagtnactg agatgaactcaaaaatggtgtgctcaaagccattntatgtngctattgct caacgtaaagaagatcgtcgtgcacagcttgcttctcaattacatgcaac gtcttgcagcatgngaatgcacggcaaactttccagg >CEESB35F aatcaacaatttattaccactttcgttcaagaggttcagaggttgggatg ggggataatagctgcaaccagcattcacatataatatttnagatgcgggg aagaggattattggaaaaaggagtgaacgaaagtttcagttgaatacatt atttcgagttacagaacaatgaaaagacaacgaaaatggggggaatgatg attgatttagttggtctggcttgggcagtcgcgggagatgtgtccggtct cctggcactggtagcaacgcttctcctcggccgagccactntcggtgcac tcgngggaaatgtgcctcgtgccgaattcctgcagcccgggggatccact agttctagagcggccgccaccgnggttggangctcccagctttttgtncc cttt >CEESB36F cgtaaaaatttgtctttattttgagttcgtcctgatttgcagagctctta tactaaatgaattgaattgcaaacaattgcgaatgatgatatccacagaa aaaaggatagtgaaatggagattctttcaagtgggggtgggatgagaagg agatggtggtgggggaagggtaacaattttaatgataaactggaataaaa cttgactattggtgttggcatctnacgccttgtccgtnatgtcctccgtg ttgtncttcgtccgaatcttcattgtaagcgtctccgccacgtcctcggg ctgtacttcttctcgtcatattcccataaggggaaaccttctgcggcagc tggggggtgacaattancaacctttttgaggtaggggaaggcangctttt tcagaaacgggcataa >CEESB37F gcgaagaaaaatataaatttnattcaaaaatgattcagaatagaaaaatt tgaaaagtgtcaaaaataaatgtggattcgacaaaaaccccagaaatttc cagataaaaattaatttagaaacataatggtaattatagaaaattaacaa taattaaaagttattaggantaaaacaaattatgaaagantaaagttaag agaagtcagtgctagagctggatgcagatgctccaaaattgtcaagaaac tcccgagaaagacccgntgaaggagcaaaccatgaagaaaactggggcat cagtggattttgagctggctcgtgccgaattcctgcagcccgggggntcc nctaggttctagagcggccgccaccgcggtgggagctccagcttttgttc cctttagtgagggntaatttcgagctttggcgtaatcatgggtcatagct gttttcc >CEESB38R attcactcctttttgcttatacccccttcttgtgagtacatcccaccctg tagatgtgctccttgcttgataaaccaggtccgcagtccgatttaggggc tcgtgttctggaacagttaaggaaaccatgctcttgttgttgttgttagc tgttttcgctgctactagcactccttttttgtcctgagttgatcgtgttt gaagcggatttccgatctaaatttttataaattaaaaagaacctttttcc aacaaaaaaaatccaaaggaaaagagtttggaaattcttnggactctttc ttcggacttttaaactccaattttttcactcgactttcttaggaataatt tattctaggaaaaaagtacggtttttcccaacttttccc >CEESB39FB atctggacatcttttatttcttgttatatatacaataagacaagacaata caagactactgtgacagatcaatgggaatcgaggacaagcaagcacgacc aattcaaattatgtacaattcctttattattaaacaataaattattcgaa ggaagagaaaacattaaagtacttgtggtggctaacctctacggtatcct gggcatcagaagagtgagccttaaacggntcagctacatcagaagctttt tccttaacggagtccataggctccagaaattttatcaccgattggtctca gtgatagatttggacttgtccttcgccgagtnagcaagcgtcttccagca gaattnccttgggcatcggagggccttggtcccttgggtgggtattccca aggcattcagaangccttgtcccttgggttgag >CEESB39R caaccgggacagcaaccattgagatcgcttagcttgacgtctgcggccaa gtcacctgaggaggttgagccggagcaggattcgaagaaaggagagccac gtgccaaggctgcaggattcggaggaggtggaggtggtggaaagagactc ccagaggcaagatctncgccaatggtcattccgaaccaggtagcagcgat gccggttcagatgactggcttcgtncaacttgtngncaacancattctca gcgncattttcccataccaactctgttnatggggttgtccaatgnatcaa ancccttcgagatctttta >CEESB40F gcaatagaataaatttataagcaataagcagagcaagacatgaacagcaa atgacaaccggcaacattctttaaataatttttacagagagaaaatacaa tataancagacattttcctttagattttacgtttagtagcagtgaaacgt tcttgataagcatcatatcctggtaaatcttgaaaatctttcaattggaa tcgatttccgtcgtatgacatcacggaaattgttgaagattctngtaaaa tcgcgagtgaactccgacttctccaaacagttttcatcgagacgaagagt cctcagtcgtgtagttttggcaagatttgntggatttanggagctcagtc gattttaattcagacttnaactcgataactttcaaat >CEESB41F aaatatgatttgatttattatagttatatttgtgatgaaaaaagacatca tggtgagatgagattgataataaatatacgaaaaagttacaagcgaaaaa ncgaaatgtnctgtagaagttgattaattagatcatgaaagtnccaatga gagagtgttagggacatgaataacgggtaaaatgctgtttaanatcaatt atagtaagttttttgataaagagtagaaatatataatgtaattccncaaa atgaaaagaaaaganaaaccacaacanctcattaaatanttgcaaacgac gactcatctacattgtccacaattgcgggattttcgattggaatttggtt tttccgacg >CEESB42R cggcacgagatgatgggacctataaaagaacagttggttcattggactcg gctatcgcaggcagattagaagctgaaggatctttgaatctggaaactct tgttcttcctccaaccaagcctcccggtgatgatgttacaactgggtctc gagggtttntcactccaacacattnaacgnctgcactcgatatgtcattt ttcactccacctcgncaaatgcgaactttggctgatgctgttcgtgaagn tgcacctgttggaagtgatccccgaatttttttgaacctgat >CEESB43F aagatactcacattttatactcaaaaaaagggtggaaatgtgactataag anggtgataaaagaagggtgggtaacagggaaagaaagancacaacacga ggaaaaggagaacatggaatagatggntggngatgaataataaagggggg aaatcngtgtgtacactaantattttncaatattatttatcaacnctgat aagttacaa >CEESB44F gtaagtaattagatttattatttgaagattattagaatatttagaactat tattaaatctgaatgttgttagtaatagtaacgggctgatcgancagcgc gttgagtagctctgaatggaagacgagagacagttggtccctggtatact gtaggagaagatccataggntcccatggccagtggaatcaccattgggta tccagaagtttttgggtacattggatatccattgagtagtaatggttgac cacctcccatcatcatatcagatgcagcgaaacgaggcttatacatgtta gatccatacgtatccattcccatcatactgctaggaaagtcccatacggg catcttgttgcaattnccttngcatngttatgggatactctgaaatt >CEESB47F aacctaccacaacttcattagagcacgagaaaattacgagagacaagttg tgcggaatgggatggtggtggaacttgaagtttaaataaataaatgtttg gttggataacgggtagattaaaaatgagcagaacatttgaaacacaaata cgggggaaaacgggatgcgtatatatttaattagaccctggaagatgttg agctttgtggagtaccagatggagttctgcggcttgaagggccaacgaat cggtttgagctcactgaagtagctgcagatgctcttgnattganttgatg atctggatccaatggaacttccagctnggcgtagttctggccgatgttgc tctgctccgaattcgcgctcanttgcaatcaacggggggacagattggct ntcggagcataacttccctgg >CEESB49F gaaaattaaaaaattattatgcacaaagaatatacaaaatgcttaattgg aaaattagatcaattgaaattncagcaaaaaatacagaaaaaaaaatgca atggtttcagtaacaatatctacatatgcncacacggnttcantagaaat tttaaaaaaagatataaatctacaagccagtnctctccataatagtttgc aagtctctctgaagaattatatttttngaaagtgtctcttcaagcattct ttgcagctttttgttcatttcacggacatctgaatctcgggaatcttggc cangtgtattgagcagttttcggaatgacgcttcangtttcggttgacca gangaaggagaagaagnaccccaggattattnctgttggttgaaatccgg tgcccaggagggtttttcgcggg >CEESB50F atcaaagcgcgcttaaatgcgaactccaaattttattcgccaaaaatgct tgcctcagagcgctgtgtgattagtgaaataaaataactaaattatatga ttattataatgtgtataaaataccaacaagttcaacaaaaaagtgatcaa aaaatgagggcagatgagaaaaggaaaacaaagaaaatcaacaattggta aaaaaaaaggntgaaaacaattggaacatacagttttttagagaagaaac aattnttcgaatttngttctcttattatnctgtcctccgaaacttccacc atcgtatgancgtttgaggnctccacgtccgcgncctccgaagtctcctc cacggcca >CEESB51F aaaactctgaattgatttnttgaaagctgaaagactttggatttgtgtgc accgagagaaaagaaaactgaatacaaaaatatacacatagagatgaaaa gatagagaaaaatttnatgttttgattaactcctaaaantttnccaaaaa ancgggaaagagtgaattatggaaaggagaaaaaatgatagangataaca aagggacactgggagaaactttgttttcagaaagngaagaagacccgtag ttttancttgagtaataantancgttnaagcgtattancggcgccaccat tagaataagtcgcttcgatgctgaaacaggctgctctt >CEESB51R cgatttcgaatcatcttcgcaagacgctctggcgacaccaaccaaaaagt tctcttcccaatgggaaaaagacgtcgatgacgttgaaggaaccgccaat gagcttgttcgtattgacgaacgtatatcggatattacagcacaagccga tgttattcaagacaagatccgtgaaacagaagttggaagttcagaagaag aaatgttgactgcatcatatcttgagttgacaaatgaacggaacactctt gtacatcgacaggaatactataatatcattgagacaattcgtcaggttac ttcggaattgaccattggggaaacaatcatgaagt >CEESB52F gagcacagcactgacgagatgaagaaacttgttgaaagtttgagtgaggc gtgcaaaaaagcagccgatgagttcgacagtaacgagaaaaatggtgatg ccggtgcagcggaaagtgaaaagaaggacatcgaaagaaaattcaaattt catacatgtgacgttaatctgaagcaaatcgaacgaagtcatgctgagct gaaaccattacacgaaatactcaagtcagaagaaacgaaaacttcattca aaccaccagcaaatgctaaattacaaaanggttgggatgttgattggagt cgacctgatgactcggcattgctcctgggtgtctggaagtacggttacgg tagttgggaagcgataaaaaatggatcctactctttggattggcctcgtg ccgaattccctgcagcccggggggatcc >CEESB53R gctgtacagtctaaatccaacccgtggtgtgcgtttccagactaatggca agtttgtcatgccagccagagtaaagtcggtgacgattatcaactacgac aaggaatttaatagaaacgtcgntatgtttnccgaagggcttgccaagca ttgctccgaacaaggaatgaagtttgatagccgcccgaacagctggaaaa aagttaatctcggctcatcagaccgacgaggtacaaaagtggagattaga agaagccatttcgaaacggcgtttacggtttgttttttttggaatttatt t >CEESB54F acgaaggacttcatcgtgatttcgcctgtnttctctactcaaagcttcaa aagaaactcactcaacaacgcatctacgatatcatcaaggacgctgtggc catcgaacaagaattcctgactgaggcacttccagttgacatgattggca tganctgtcgtcttatgtcacagtacatcgagtttntcgccgatcatttg ctcgtcgagctcggttgtgacaagctttacaagtcgaagaatccattcga cttcatggagaatatctcgatcgacggaaagactaacttcttcgagaagc gggtttccgagtntcaacgtcctggagtaatggtgaatnnaagncgcgag acagtttcgatctttaaggctgancttctaaaggaaaatatttccaaaat ttctaatttctaaaccc >CEESB55F agaatttacccaaaatttattgatacaagtattattaaaatttggnggca aaatagaatcacgngaatgaaaaattgtgtcagagtacagtcaatgcaca gtcaattatacagaaaaggtaaaaatttgaggcgaccnattcagaaatct tcatcatcctcaaaatcgatatcaatagcattaacagagttctgaagctc gtcgagcccggtgacttcttcgagacgaccnggcgagttcataacgtcgn gaacaaacttccagccattttnctgaaacttgtgcagcgataggngctcc ncaagcaacacattgntccgantcgtttttacagaaaggg >CEESB56F gtttttataaaatattattcatcaacaccctcaaataaattaaactgttg cgatgaagtggaccagccatcgattgcnctccgactagttcacagtggtg gtttcggagagtttgaccaaaaaagacggccaaatatcacataaattagg acagggctcgggctagaacgagcagccaacgccgtcgggtagcataggaa acgacacccggcaacgntcacaactaagcgnccagtcaccaagcttgtnt ccaagcaacatcaactgtntccggttccactccagcgatttcacgttctc ttgaactctctcttcaaagtcctttgcaactttccctcacgggacttgtt cgctatcgcantcgagtcgattatttagcctnagatgaagtttatcaccc actttaggg >CEESB57F atatatatttcatttatttagatatattatggttatttacgggacgtcat ttaaagaacaacatttaaaagttaaagaaaaccaaaaaaaagaagaaaaa aatagatcattaattgtagagggagagatttgtttttcctattccttgga ttcttccacaacttctgcgtctttatcctcagactcatcaataacaattg gntctncttctgtagatggctcagcagcctgctcagcactcgctggaact tcttnagaaggttcttcagntgattttncagcggatttctcggatacttt gtcanccttctnatcggttttctccntttcg >CEESB58F aatgtgaacaattttttaaatgaaaaccaacaacaataacaatagagaaa tcataacaacgaaaaaacaaatcgngtaatttatagaatggtcttgttca aattgctgtatcctacaccaacagcggtaaccatagctttatcggcactt cctgttttgagatcagcttcctcttccttcaatgtttttttaaagtcttc cattttaaactcgatctcaaaaaactgtttgagatgcctgagtgcgtgta cactatacattggaagaggaccgtagaggaagtgtgaaacatctttagga cagagtgtcataaatgtgcttgcaagtatttgagcactactgtcaagggc acctcccatgtaaatttgttgaaggaggagcatgaccggcttcaattcca atgtcttctggga >CEESB60R gctcaagctcctcgacgagttcctcattgtcaaggctggagctgctgagt cgaaggtcttctatctcaagntgaaaggagattactacagntacctcgcc gaggtcgcttnagaggatcgcgctgccgttgttgagaagtcccagaaggc ttaccaggaggctcttnatatcgctaaggacaagatgcagccaacccacc caattcgccttggacttgccctcaanttctctgtcttctactacggggat ctttgaacacttcag >CEESB61F ttcatcttgaaaattttttttaaaaatgcacaaaaatttgantttttggc aaaatttgtgttttcacatataaataaaataattccgaaaatcgtaataa aatgcaacaaaagttattgaataagagattaaaagcaggaggcacaacat agtagtcatgaatcctctcaacacgtgcataatcagtagaaaaagaagaa gaagaagacgtgaaaagagtatatgtatgtnggagagacgcagagaagca cacaaaacgaattggaattatgatgatgatgaagaaggaagcgacgatga ttcatttgggaagagtgtagagcaatcttattgagcgcttggcggatgtt caacttggttcgattgcagcaaatcgttgatattcttctccaaatcatcg attctcgttgtcatatcatccgattcgccgaataatctgatnccgccatg tgcttggaatcgatcccgtgtctgt >CEESB62F tccgggtagagcttgtttattcacaggtgtacaataagaataacggctaa aataaatagcaaaaaaaatggttctgtgtgctttttgggaacaaattgag attatgagatgattttttgtagatttttgtgtgatttatgaacagaaaat gtaaatttgaaaatcgctattactggttacgggaaacaacgggaaaaatt ctaagaagaatgatggagtctcgggatatgaagagaaaatattacacaat aaattattaggaaacatgtttcataaacatcttgatctataagtggccnc nttggaaggacattttggttgggaggaaacgnatcggaattggtttgaca agaacccgtaaaagtgcaccancaatctgaggtg >CEESB64F ggggttttttgtaagattatttgaagaaagtacaataggaaatgggaagg taaaaaaattggatgagaaattatgaaatgcagaatgaatactgctttca gtaccaaaaagtatagccaacaatttttncnctgaatatcagagaatatt acgaccttggcatgcaagatgaagttagagagcttagtttttagcagttg gagcagaagtagaagcagcagcactctgagcattggtgagcactccaaag gcttccncgaccttncttctgaacaactcagcgtcttggagcatcatgat caactcggcgttgtcaatctccagcatcattccagtgattttcccggcct cccttgtaggttccggggaacatcttctcggtaangagcataggata >CEESB64R ctcgtncaatctcggcagntgtncatttaagancttggntcgtgtcattg acaacaagtccgtctacgacactttctcgctttttggaaacattttgtct tgcaaagttgtcaccgacgatgaaggaaacagcaaaggatacggatttnt tcacttcgagactnagcactctgcgcaaactgccattnagaaagtcaatg gaatgcttctttctgataagaaagtctatgtcggaaaattccaacctcgc g >CEESB65F aaaanttttggtaaaaatttattagatgacccagtatgttttgacacgaa tgcaatgagagaagngacttgcattgcttacacaacacaaggggaaatac aaaaaagcaagngatagcaaaataagtagcacagggcagantaccntctt aactgacaacatcgtaaacaaaaccgntaatcgccttattagtaaagatg agatgatatgctgccaagagccgttttgagaaagggggaaagagangtaa aaatgaataagttaacggtgtttgacatttaacaacaggtccntggaacc ccncntgcgttgaact >CEESB66F gaatgntttgtaaaaactttatcatgttttaaaaataagancatctgaaa ttggaattgaataaaaatacaactaacttataggaagactnctgattatg aaacgaaaaattctacacaagaaagatagcagagggagcagagcacaggn ncttgtncattttattaatgagcatttaaaagtgaaggaagtgggancat ggagcaaaggtaagaaacatttggcaggagtatttcnnttttaaaatgta agtaaacgtcngggnaaaatgagc >CEESB67R gttgattcggaaatcagtgaggaggaagaagaagaagaaatatataataa acaaacaaaaactaatgcctcactcgactccattcatagaaggaagtcta aaccccaaccaatgctggagttcgacgcggagacgcaaaaaatgtttgat gatgcgtttcaaagtgacaaaaaatctacaaaagaaaagtatccgttcta atttctgaaaaaacatcagcacactgttctattggttccactttcctttt tattccatctttattgtggttaattatcccatctacttctctatatttcc ccttgataattaaaattggttttaatggttt >CEESB68F aattactattttacatttttattcttaacacgcatgactgcctgaaaatc tcagtttcaacaatggaaaacatattttacggttacaaaaacaaataaat gttatagagcnctattctaatttnncatttttaaacattttaccngcaac acaattaaaaaagtgggtatcaacagttagttggctaccncagangtatc acacaaggntcgggacg >CEESB69F aaatatacaacctttattgagaagagaccatttatatacttgtaagcttc taggaaaatttnagatactaaagagaagcatagattttaagacaagcagt taactaggtgaaagtaggatgagacagcttaggccttggtggcgatgtac gagangagatcaacaactctnttggagtatccganctcgttatcgtacca cgagacgagcttgacgaagtgtgggttgagtgagatggatgctccggcat cgaagatggaagagttggtatcggtcgacaaagtcggtggagacaacttg atccncagtgtaagcnaggaattncctcccattcggtcccntcagngggc agnctttcattaactttctttg >CEESB70R aaacaactcgaaattnaggctagctctctccgccgtgtggctcttgttgg agttgccgtctccttcaccgccacattggtgtgcgtcattgccgccccaa tgctctacaactacatgcaacacatgcaatccgttatgcaatccgaagtt gatttctgccgttcccgatctggaaanatctggagagaggtttcccgcac tcaagttctcgccaaggtttctnggaggagcccttcgttcccgncgtcaa gctnggatacggagagncgcccgggagttttgaaaggntctttcaagttt gggacaaccaaggg >CEESB71R acgctttcaatcagaggtaacaacatcagatacatcattttnccggatcc actcgctcttgacacccttttaatcgacgatgaaccaagaaagaaggccc gtgccgctcgcgccggagcttcacgtggacgtggtcgtggtggaatgcgc ggaggaagaggaggtcgtggtcgcggacgcggaggaccacgcggaggcgg tccacgtcgttaagctatcatcgggtcatagcaaatttgagtatcgaacg tcctatacttttgtatcacgttcctcaagtttaattcacattttgttncc cgttaaaagtttgctgacttttgttttaaanctttttgaaattaatttan ttacggg >CEESB73F aaacttcaatttttatnctaattaatcgtcaatattcaaatncgacgaaa attttcagattaccaaatatttggaaatttggagaggattttntgtggga gagggatggggatagagtataaganttncgagacgcagggtgaatatttc ctttaaaacaagaaatgggggaaaaaaaggataatgtaaaantaacatgg atttncaggtagtnctagatgggggtgggtttaaggcagatttcacggta gcncagggtttgtccggntatttnagaggaggttcttgagaatgagaata tc >CEESB74R caaaatatagaagtcactgcatctnattctgcagcagaacatgaagtgtt tgaaggaatatcatcaaatattgctgggaaaggagaaaagttagaagaag aaatagataacattggtattgtaatgcaaccagagccacgagttgtccat gaagcttccgaagtatcagacaacattgaacttaatatcaaagacgacct aaacttgaaaagtagactggacaacttcacaagagctaaattcaggcaat caaccaccgtaacacctaatattgttgctgtggagccttctattganggt gttgaagacggttttagatcat >CEESB75F gtatcgtctcatcaactttatttttaggcatgtacaatcacgtaaaggac acactgatgtncattggaggaaatgtgtgagaatctcacactgcataatt tttnccggtcggtgatttataggtataaatatagantggcggttaaaatt caaaagatgcatttnaattaggtgaaaaaaggaagaagtttttgggctgc cagagatgatgtaaaaaatagacagagaccatatcaggacaatgtgtgaa gtgtgaaggggaaggaagcgacatgtcgtttagaaatgtaatagagaaat aggcaaactgaagggtaatttantcctcgcaccagggcattctctttgcg g >CEESB76F gacgctcttcagtaatttattcaggattctcagaccgccagatgaaataa acgaagaacgaatgtntggtggtttggaagaaatcctttttcaatttctc gagaaaaatcagaggaagggaggagggagatttggagacaaaatagttta aaggggatgagagagaattgaggggatcaatctacaccgttcgatttact tttgaattataactcgtgccgaattcctgcagcccgggggatccactagt tctagagcggccgccaccgcggtggagctccagcttttntnccctttagt gagggttaatttcgagcttggcgtaatcatggtcatagctgtttcctgtg tgaaattgttatccgctcacaattccacacaacatacgagccggaagcat aaagtgtaaaagcctgggg >CEESB77F caagaaaacgtttnttttatctatcaagtgtagcagaaggaaaaaagaaa cagagtatacgngtagtacaacaataaaacggaaaacagtaacaatagaa attgaaaacaaaaantttggcaatttagtcggtatggaagtnagtgctca tctncggtgtgcttctttcttttncttggaattnctncttcatctgcctc cccaccttcagctccttcatttnccggttgtngctgctcgcggagtcgct tcaggncgggagnanaaagttaagcccggggggcccaangggggggaacn nnaannannggg >CEESB78F attaaaattttacttccatttcagaaagaaagaaaccaagacgattaaaa agcactgtncaggaaagaagagccacatgagatagaagggacaaattttt naaaggttcgcaagtnttatggaaggatgtatggggacagaggtacagaa cagttccaccaatttgaactagccaaatttcaagtagggggcataacana tgtgattcgattgaggggcaagatacgnttgcaaaanacatcnagnaaga aagcaacntgggaagtgngaaag >CEESB79FB ggantaaaatactttcatcgaaaatataattgaaatgaaaagtaatctta aagaggntaatttacattgcaaaatacaaaacagtcgaacgagcacctag aacgaaaatggtccnggnaaagctcaagctcttctcggaaggancatcaa ggcattttggtaataaattgtggaaaagccatagcacatacactttggag cnaggagaaattgtgggaaaagagattttaataattctagaaaaaaagtt tacactcgagaaaaggcaatgaaaaaaggttttaaaagnagaaacaagct gggattatggtagggtaaatttacaaaactcgcaaataaagc >CEESB79R caaatnccgcttttaagtcataccgtnatttatacttcgagggaggcgtc tcatcagtttacttctgggacttggataatggaggtttcgccggaattnt nctcatcaagaaagagggagacggagccaaaaatataacaggatgttggg attctnttcacgttattgagatcacggaaagagcacgccaggctcactac aagctcacctccactatcatgctgtggctgcagacaaacaaaagctccag cggtgtgatgaacctcggtggctccctttacaagacagcatgagatgggt gcaccaatcaacggtcagaacac >CEESB80F agatcacaacgtatttattacttctcctncttttcaagaaaaatagtcca gacaagcctaatgagagcctgaaagcctggaaattttgtcttgatagtga tctattcgatttcagtgaaaaaaatcgcaatccgtacagatggtagaggt ggaaatncgcaaaaaacgccaaaatctcagtggaaagttggcaaatttnc gggaaatcgcaaatttcgcgagagaaatttgntactttnccccnaaaata gccnagnaacagggnaaaagatgg >CEESB80R ggaaaaacgggttttctttcggcgaaaattttgaattatttagtgaaaaa atagccaaaattctcaaattttgcacggttttttcaatttttttgcaaat tttggtgcattttccgtaaaaatcgacaaaaaattgcgaaacacattttt ccgagttttttttttgccgatttcgtgcaaaaaacgtgggaaaatctgca aaaatgcagaaaacaggnaaaaattgattttctgccactattttgcaatt ttcggtcgattttagtgcatttttagccgattttgactgaaaatttgcaa aanattgattgaaattaaattttatcggttttttcccgatattttgg >CEESB81F cttaaatcgtttattattaaaattaaaaattgttataataacgnaaagcc attaaatgaaataaatattaaataaagatagaccagaaccacaaatgatt gtaccaccgtatcaggaaaagagcactcaagaaaaaagtgaatttttggt tgaaatataattttaaacaacaaaaaaagaaacatttttgaaatgtaata ntaaattatttagaaactttccaaacangtttctggcatctgatgtattc tgcgagatgaagtgtcagttggtncganttcatacgttttgcattggaaa >CEESB82F aatgtgtaaatgtgttcatgtncttcatgaataaaaaatagagtgataaa cgattacatgagatgacagagtgntaacaggaaaaatgtgggtatttttn aaaaccgtaaaagtctaagggtcaagaaaatgaaanttgaaaatcctatc tgtncacgngtgcaaaaatatgtnccacg >CEESB84F aaaaatatgatttactaaatgattagaaagcttgcaataccgaaatacaa aatattggcgaaaatggaaatcccgagcgatcggtactttcaagaaaagg aataaattaagangaaatacataaagtcatcacaatagaagagaactaga ctgaaatatgaaaagaaatagagacaggaagcaagantttagaagaaaat aagatgaattttaaaaatgcgagangaagaaaattcagattctggtcacc gaaaggnaaatggaacanttttagagaaaggagatggctgggggatgagg ggaaactctatgcacaaacacaagaagaaaaaagcaccaacacgncacaa tattcaaatagaagtatatatctncttaggaaattaaat >CEESB84R ggcacgagctcattcgaaaaaatccttgctgaagagcgtgaagctgagga gaatctctaagatcacctcggccacttcaaacagtgtgacatcgacgttc gacaaatctttaattatttatttctagtagatatatacttctatttgaat attgtgtcgtgttgtgcttttttcttcttgggtttgtgcatagagtttcc cctcatcccccagccatctcctttctctaaaattgttccattttcctttc ggtgnccagaatctgaattttcttcttctcgcatttttaaaatttcatc >CEESB85F ttttttaaatatgtatattcattttcaataaagcccatttaatgagaacg caaaagtacaagaaatacagaagtcagtgcaatgagatcgcatccacgtt gagaacgtcacttagttggtcgaaatcgctgagatccttggcgcactcag cacatcctttagtccattgcatgcagtacacaggatgatagttcagatgc acaatctgaccatcggcttttcctacccaaactaaaactcgacagccttc cgtctcacaatatctcccgtatctcgcgnatttnttgaacgccttcgtgt cgaaacgncttttactgttcaaggggaacggttcggctttctgggaactc aaatcgcagtagagtntgctcggttttgaagttttggcgaccccactttg cc >CEESB86F catcacttcttcttgctgagccttaacggcatcttcaggacgagaacgga gtgcagaagtcaaaacgctgggagattcacactcgatttgcacaatagta gctgctgatttggcttgttgtcggattaaatcaacttctccttcagtatg agcaacttcgcacagagcttccaaatacgtcatcgcctgtncagtttggt tgcagaaacgcgtgagtttcgaaatttgagcgttgatttctcgaatcttc ttgtcaaattctacagaatttcggaggaactcatcaatttccataaccgc atcggccttttctttacacatttgctggcatcttcccattcgagtactac aaagattcgctaaaacgacccccttcggcttccatttttcccctcgtggc cggaa >CEESB87F aagacataatagtgctttgttataagcaattcatcgaaaatttagtgctc ggcaacagcttttcctttctcctcgtacaatttcgccaattccttcggac aagtatgcatgtatcccttgttagcagcgcaaatctccttcattctagtg tacggntcgtagtttgcgaaaaattcttcatatttgttgtgacgtggcca gacgtaaaatgcgttgaaggcggcagtcgagacgacagcgacagccagtg aaacgnnaactcgtgccgaattcctgcagcccgggggatccactagttct agagcggccgccaccgcggtggagctccagcttttgttccctttagtgag ggttaattttcgagctttggcgtaatcatg >CEESB88F atattagtaagatcatcaataaacaaacacaaataaaaataatcacacta catcaacaaaatgtcaatataaaatagcaaattccaaccccagatgatga ttataattaaaatgattttttagaagacgtaaaaaattaaatgctaagan caaaccaccacacaaggcatganttccgtgaaatcccgtagataaataaa aaatccttccaaatactccgtctgcaatagaaaatctagcttctatatac tctattaattganttcctgtaaaataagcngccaataaacatgttaaaat tntactattagtacatcttttatttctaggtnatctgtggatgtgctcaa gtttactgtaacaccactcctttaattaaa >CEESB89F actcaaaaattgttcattcgaatcctataaaacggcgggaaaaagtgatt ggagtgtgatgaaataacggaaaaaacagaaaaacaataaaaattactag ctattcaaaaaaaaactacaaaaaaccggaaaaacattaaaaaaccagag gaaaattaaaagaaaattatttagagacgactccaacttgagcaggccga attggacgttctttcaggctgtagccgatctttgtacacacttcgatatg tccgacaggttgtttggcattagcagatggantctggaaaacggcttcgt ggaggattngggtcgaacttttccattggttggatccacggtcacaaagg nccatgctttgcgaatggtcttttgcccatgaccagtacggggtcatttg aaaactccttcc >CEESB91F caaagaaataaaatttattttaggcactgttgagcaaccngagttgtgga ataaaaataaaaattggaaaattaaaattncaaaaaaaaaatcgaaattt ttttaatttgcaaaaaacccgaaagtggngaaaagaaatgngaaattnta actggaatggttttttncgttgaattgttgactaggatgacacgtggata cacatatcagangctgataaggttaacggancaggtgaagacttntggag accncggcggtgagccatttcgggt >CEESB92F cgaaaaatttcaaatttatatgatgaacttgtttgggtgtgaaaagaata ggaaggaaagcagggaaatgggatggagancaaacaaaaaagtagttttt tttgaagaataagaagnaacatttggagaagaagttgaagcaaattatgc acaggtatcatgtaatttncgnaagnaaaaaaaacacggngnaaaatgat ttagagacgntcccaagagatttcagcctct >CEESB94F gagagagaacgtcaagcttcaaagcttccgggcgtaaaatctccgataca aaggcatgagggtgacagtaccatagcttccattgacactccgaagttga tgaggttggcaaaggcatggggctttaatactaaatattccacatcacca atgtgttctccctctggataatttggaatgactggatgtgcatcaccatc accaccaccgnctgctcgtttgacacgtttctgaggtttgttccagaacg gccactgcggccgtccttccgttgccagttttcgaattgcatcttctatc ctagtatctctttcttctctggtcatattggcgatgggcaagtagcttgt tggggtatcagcatagacaagctttgactt >CEESC01F gattataatccttcactggaaataaattcttccaattgataatgactgag gaagtgagcccaatcgacgtcttttgctatttgcagtttggaaacattac tctgagtgcagaatgcatcggttttgtagtgacaagttttatggcatacc attttgcagtctctgcactgataagcttgcttggaaaaactgcttcgaat cctttgctggcacacattgcacgtggcaccgcctttnactttgacggcaa caaatgtatgancgttgtagatgtgaagtttctttcctttcctcattaat cgagatgcagctggctcattgagcattgctggagggatgaggaagtattt tt >CEESC02F agacgaaggacttcatcgtgatttcgcctgtnttctctactcaaagcttc aaaagaaactcactcaacaacgcatctacgatatcatcaaggacgctgtg gccatcgaacaagaattcctgactgaggcacttccagttgacatgattgg catgaactgtcgtcttatgtcacagtacatcgagtttgtcgccgatcatt tgctcgtcgagctcggttgtgacaagccttacaagtcgaagaatccattc gacttcatggagaatatctcgatcgacgggaaagactaacttcttcgaga agcgggtttccgagtatcaacgtcctgggagtgatggtgaatgaagccga gagacagttcgatcttgaggctgacttctaaag >CEESC02R gcacgagaaaatatttccaaatttctttagaagtcagcctcaagatcgaa ctgtctctcggcttcattcaccatcactccaggacgttgatactcggaaa cccgcttctcgaagaagttagtctttccgtcgatcgagatattctccatg aagtcgaatggattcttcgacttgtaaggcttgtcacaaccgagctcgac gagcaaatgatcggcgacaaactcgatgtactgtgacataagacgacagt tcatgccaatcatgtcaactggaagtgcctcagtcaggaattct >CEESC03F aattcaaaaatttatacagaaaacagaatgcaaagaaatctgtacgtgag cttttcataaaagcgcattcaacaacaataagttctacagatataaataa atatcgaaatctcttgaggggttggaaagggagaaaatgaaatgagggga tattgtaattacacgtcattgatttggcggaggggtttcatttgaaagga cattattaaagctctaattaaaagttttnctttaaaaaaaagtgatgatg agctgcagaaaaagggacttcccgtgagttttcagatgtcaaaaagttaa ggtcagaggagttcagaaaaatgcaattgggagggcccgaagtgagatgc atttttcactagggagtttcagggaaattacgg >CEESC04F gatcatcgcctggttgaagaagaagaccggaccagtcgccaagccactcg ctgacgccgatgccgttaaggagcttcaagagtctgccgatgttgttgtc attgaaggattcacaaagttcctcgagaccaacggaaaggagggagctgg agcttccgaggaggagaaggccgaggaggaggctgatgaggagggacaca ccgagctctaaatccacattccaatacagttcaacgcatcggggttccat ggacctgttgttaaatgtcaaacatcgttaacttattcatttttacttct ctttccccctttctcaaaacggctcttggcagcatatcatctcatcttta ctaataaggcgattatcggttttggtttacggatgttgtcagtttaagga tgggtattcttgccctgtggct >CEESC05F aacttgcttcttgtatattcagagtccgaagatgatccaggaactctgaa gatcacagatttcggattggcaactaaataccgaaaggatggagaggaga tcatgttgagcgaggattgcgggtctaagccctatgcagcgccagaagtc tgcacagggaacgattatcgggggccacccgtcgatatctggtcggctgg agtcgttttgatgacgatgcttgttggggagcactcttggaaagttgcaa ataaagaaaaagacgcggcctacagcaactggntcaatgcaaaagacgaa aaggcgaacctgtggaatgtgatctccggaccaacgacggcgcttcttcg caaacttctccatgcgaaccnccgaaaaaaggggcaacaatggcg >CEESC05R nngaaaaaagggcaacaatggcgaaaattgtaccggaaccatggntccgc ttcaattttttngccattgttgcccttttttcggggttcgcatggagaag tttgcgaagangcgccgtcgttggtccggagatcacattccacaggttcg ccttttcgtcttttgcattgatccagttgctgtagg >CEESC06F atcaatagctgtttattgataacatagtgaacagtctgaacagtttctgg ggagagatatttcacgaaaacataaatttttaagggaaaaatggggagaa aatgtgtagaaaaaataggaacgacacaatgcagagatcaacgntcatga gatgaaaacatacaaatagatggaccaaaatagctgaaaatttaaaaaaa agaggnaaaattacataattgcgcaactatttctgattgattcaattatt gaagacttttgatttttaatagccggtggtgctgattcggtctcccacca cggacgtggacgctctccttttgacttttnatatcaatgatgtcggtcct catcatcgnccctcangtggacccatgttggaatggctccagaggaacgc ttgttggg >CEESC08F aatatgaaacagattttattttagtttcaaatgcaatatattgcaattac aaccacaaaaggggaaaggaaccgtaaagtgttcgagaagtactgagact gagaagtggggggagaaacaacattaaataaatagaaaacaacacaagtt atcttatcttatcacaatatcatcgagtgcataaagctaatggaatgggg agatgttttacattgtttagagctcagagtgctcctcctcaatcgttgtt tctncagcggcttctggctctggttcagcctcttcaatatgctgttcggg ttcaacttngagcatcttggggnnacatcgagccgattgggcgaagg >CEESC09F agtcgctgttcttttttattcagaaaaaaaaagcttcgaacaaaaatggc aaacaaaagctggatgggatagataacaacaacaacaaaaagaaactaac aacaagagtaaatatgagaataataaaaaaatatggaaaaagagaactgg ttgataaaaaacaagatttgaaaaataaaatcaacaatttaaatctaaag ggttctagttttagaatgtattctggcgttttccaacgtttacgtttcca ggaagttttttcagtccttttgtgaccagagcattcgggcgtcggctcac ttctttncgttcagctgcatgagctgcttgtttcgtctttgccagcttct gcacgaaggatccttatcacggtttcgaaatcgtga >CEESC10F gaagttggtgttacaactttattatgtgtattcaaaagcttgggggtgtt tcgaatgtcgagatttgaaagggggaaaagccgtaacaaaacggaacatt gaattgtatgggtagagacgggaagttttgatgagtgggctcagcacaga aatttgaataactnctgcatttggttgatgggaaaaggggagtgattgat gaatttatagaaaaaaatggaaagaaaaacgatggtttaactagtcaatg gtaataaataaatcatatgacaaatagtattatttatcattttcaagatc naggtgaggggtgtgtggcggtaacgtgtcggtntgaaggagtttaggtt g >CEESC11F ccctgcgaagggactttgcaactnattctttaccagttgttcaaccatct attgagacgcaattcaatatgcgaaaactatatgggagcactaaagtgcc tttntagccttttgttgaccacaccatcaccgcgaacacgcaggagggac gtattcatacacggtggactctctttcgacattttccaataatcttttgc aagaaggttgtgatgttcgattgtcgaagttcatcactgctgtgtcagct tggcgaaatgtgcgatgagagaattgaaggccccaaatgaatgaccattc gttgttctacatttgagtttaccaccatttcaacaatggcggatgtgagg tccgttttcgctcgtctttgcggattcccggagcttttgaaatccgtcct ccgtaggcccatgcaactgccgtccctctttaaatgacagtaaacttag 1.fa100644000766000024 16645213605523026 17512 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/t/data/dbfa>AW057118 gcaatgtcacgttgtcaactcattgcaaacaaagactggaccaacattga acggagtgattggaagacaatccggacaagttgctggatttgactactcg gctgccaacaagaataaaggagttgtatgggacagacaaacacttttcga ctatttggctgatccgaaaaagtacatccccggaactaagatggtcttcg ctggtttgaagaaagctgaccaaacgagctgatctcatcaaatttattga agtggaagctgccaagaaaccatcggcataagcctctactaaataagaa >AW057119 test description tcatgttggcttctcggggtttttatggattaatacattttccaaacgat tctttgcgccttctgtggtgccgccttctccgaaggaactgacgaaaaat gacgtggatttgctgacaaatccaggcgaggaatatttggacggattgat gaaatggcacggcgacgagcgacccgtgttcaaaagagaggacatttatc gttggtcggatagttttccagaatatcggctaagaatgatttgtctgaaa gacacgacaagggtcattgcagtcggtcaatattgttactttgatgctct gaaagaaaggagagcagccattgttcttcttaggattgggatggacggat cctgaatatcgtaatcgggcagttatggagcttcaagcttcgatggcgct ggaggagagggatcggtatccgactgccaacgcggcatcgcatccaaata agttcatgaaacgattttggcacatattcaacggcctcaaagagcacgag gacaaaggtcacaaggctgccgctgtttcatacaagagcttctacgacct canagacatgatcattcctgaaaatctggatgtcagtggtattactgtaa atgatgcacgaaaggtgccacaaagagatataatcaactacgatcaaaca tttcatccatatcatcgagaaatggttataatttctcacatgtatgacaa tgatgggtttggaaaagtgcgtatgatgaggatggaaatgtacttggaat tgtctagcgatgtctttanaccaacaagactgcacattagtcaattatgc agatagcc >AW057120 aatctgtacatcttcaattgtggttcacttcttctatcgtcttgttcgag aaaaccacggagaaaaggagcaagaccgtggattgaaagacaccaaagaa accgccaaggatgtgctgggttttgtaaaaatgcttggaataatcctagc tatggttgtaggctttgccttgttggggtttgtcacgttttatctctatc agtatgcgag >AW057121 atgggcgctggtggctatggattcggatatatgggatccaacgcctcatc gtcgggatttgcccgcgaagattatgcccaaggaggaaatggtggaggac agcagcaaaaccagggatctggaggaaacaccaacccgggaggacaagtc ttcaaggcccgtacggatcaatcttgttacctcggaccataagaggcaag aactcagccaa >AW057122 gacaacttccatctctatcatagcattttgatgattagaacatgtcactc acaaaagatggttccgtagctgtgaccattctccaagcaatcatcttcat tcaattcggcttgtgcgttgcgatcacaattctgacaaccgtcggaatct catttgggtatccggtggcttctcattaccttatggctcttcttcaggca ttggttgcaattccgggaattgtgtacattgtgacacagacaaacatctg ggtggcagtctacatcagcttccaagtggtaaccgccgcgtgtgaagtct actggctcgtctacttgatcttcgacaatcaacccgctggatcttggatc gcacttgcaattatcacagccgtcaacattttggcagctgttgtcggtgt gtggttccgcaaaactgctttgaaccttccatgccttgataagaaaacga aaaaggctggtgatgcgaagaaggagaagcccanaatgaaggccccatca acttcaatgagtgatattgagaaaagtaaatccagagctg >AW057123 catcggatgaaacggatgatgtacttgcaaatgaagacgaaatgttcaaa acaaaaaaggacaaatacaaagtgatcgcattactcggtaaaggcggata tggagccgtatactctgtactccgcctcagcgatatggagaagtttgcga taaagtgtgagaaggcgactgctggaaagaaggttcttctgatggattgc aatgtgatgaaagttgcaactcagatcaaaagtagacacttttgtactgt actggatcgagctaacgtcaaggatcgtttaatttcattgtgatgaagct tatcgggaagaatctttgggacctgagacaggatcgtggcgatggaaaat tcacaatgggcacctcgttgaaagcagcgtcacagtgtcttgtatccatt gaacacttgcacagtttcggctaccttcaccgtgacatcaagccggtgaa ctttgccgacggacgaaaggaatccaacgagcatcacgtcatcttcatgc ttgactttggtctctgcagggagtacgtgaagcgagctgagggaaagatc ttcgagcagcccgtacaactgcaccattccgtggtactactcgatatgct ccattgacttcgatgctccagcaagatcaatcacgacaagatgacattga atcctggctctacatggntgtagaatggacttactgagattgacgcggcg catgtngaagctcacgatcgagagaggctctgcattacaacagtactacg tcaacaccg >AW057124 ttggatcaacagaaagtttaactcctcttctgacaactcaagagccataa agtccttatcttttccaacaaagaacaaaagtccatgaggcagctctgaa acgccacgtggctccgactgaaggattggctgttgctcagaatgacatac acaatttcctgttcgaatcacacggacaaatcttcgatatggcgatgatg tacgccgaatcggccggtttcaaacgagatcacctctcgtacgccgcgtt cggtttgatcgcattcttcttggttttcggatctgtggcacgtcttttgt gcaat >AW057125 aacaactcgactaaccgtctccactcttcacttgcacaaatcttcatgca accaatcaacgtcatgctcgctgttcttctcgccttggcttcatttgctc aaggaggcagatctgttgctccggctggtgcagtcactgaaccaacagtt actcaagctgttccagaaggatcaggacttagttcagatgtcactgatcg tccaaacatcgactccactgatgttgtatcaaatgcaacttcggtggaag atttgcttggaagttcaacaaatgcaaacaacactggtacattcaactct taggacctttgtaattgctccaatgatgattcttgctttggt >AW057126 gtgctcgacatcattgctcttctggcaattgtcgtttttcttgtcttttt gccggctgatagatgtctgaattttgaagcaagtgaacaaactccacgca acgtcgacaaaaagtgtcatcttcatccagtgcatcatccgaacttcagt ttcaccgaaaacttgaaaactaacgatcatttgattgatccaagtcatca agtcaccatttgtcaccgtttgaaatggtctcactattcattctgggata gcatgtatttgaatcttttgccggctctcacttgtgtttttctgattttc tacacagttcatgagatttgtgaatttgacgtgtttgttggacaggtaca aactactggcttggttattgtatcagcaatatcggtaatttactctatcg cagtaatcactcacgaaaagaatcgaattgagacggaatggccattgctc gctcatgcttatgtcctggagcaaaagaaggagaaacaaccgttaatgta ccaaaaacgtggcaatatagtattgctatgtgcattctgtcagcccttct caagatcggcagggttntggttcaacatttcattggtgataagacgttct tcttgatgacgatagccaaagggacgatcaaaatggtgaacctgtgctgt tggctggaagtgaagaagtgctgtaccactaccattctgaatgttccaga catccagaacttgaactgat >AW057127 aattccttttaatcgattcaagcacatcgtacatacgcgacaaaacgacg gaaatatgatgagcccgttggactatctgcatgtcactttgctgctcatt ggcccacttgccattctcatcggatgtggaggaaagaaaaaagcaccacc accaaagtcggcatccaagatgactgctccacctgctcctgcatcagcgc ctcccgctgctccagacgcagcccctgctgctcctgatgctgcagctgca ccagctgatgg >AW057128 gacgacgcttaccagtttttggctccgataactcgttcctccaccaccag tccaacaaccgccagcagttgctccaggagcatcacgtcccgcacctgct ccgtcgaatgagcttccacccgcgtcggaggcaaagaaattccaagtgca accggctaaaaaagcatcaaaatcgaagagcaaatcgaaagactcatcgc caagcaacggaaaggagaagaagaagcgcacaaagcgttcgggtggctcc aaatcgaaagaatcttccgagccacctccatcattgtnccgcctttaaat gccaaaaggcattcagtgcatgtcttgtccttatttcagatgctatcact ttgatgg >AW057129 gagatcgttggaaggttctagctatgaacgtactagtgtcttcagaggat ggtgttaaagaattcgaaaagattgttgtggaacctgaagatatcgaata tgttgagattccggccgatgccaaaaacgttgacttgacgcgtcaccgta tcaaagaaatcggtgattattcgtggctcactcacgtcgaacacttctcg tttcgttggaatctgatcaaaaagattgaaaatctggattgtttgacaac gttgactcatctcgagttttacgataatcaaattaccaaaggtgaaaact ttggttagcctcgtcaatttggagtcactcgacctgtcattcaatcgtat caccaaaattgaaaatttggagaagttgacaaaactgaagactctctttt ttgttcataacaaaatcactaaaatcgagggtttggatacgttgactgag ctggaatatctcgaattgggtgacaatagaattgcgaaaatcgagaatct cgacaacaatctganactcgatagattgttccttggcgctaatcagattc gtcttattgaaaatgttgatcatctgaagaagctcacagttctcagtctt ccagccaatgcgattactgtagttgataacattgcgggacttcacaactt ngaagagaattatctggctcaaaatggcatcaagtacgtctgtggaatcg atgagcatcttccttcttgaaatctggatttcaatcagaatcgtcttgag aggtcgagatatncattcaatgagacactacagacttttgggcagagaaa tagttgatacctg >AW057130 caaaagggctcggatagaaaaatgcccaacacttatcgaacttcggatga aattgtggctcagcgattggaggatggcagagatatgaatctgagtcttc aaaatatgaatctattcgcatctggagccttttcaaatgtctatcgtgga attgcacgcacagaatccaaccaccaaatggaaattgtcatcaaaaagac atggccacgtcataaaggatgtccattggaagtgaagattctcggaaaac ttggaaagttgaagcaccaagaacattgtccgccttctctttagttacca gaaacaacatgaaggtcgtatctgccttggtctaatcttcgagtacatcc caatgaatctccatcagtttctgaaggataacaatcgacgtgttgacatt atcgaggttaaactgattgtttggcagttgttccgtggacaagcacattt ggagaagtctgaaatttgtcatcgtgacatcaaaccacagaatttactgt acaatgctgaccactgtcttctgaagatttctgattatggatcatctgcg attgaatcagtgaagacaccacaacaaagctaccatgtcacaagatatta tagacccgcccgagttgcttctacgctccaaanactatggatgccagatt gtcactttgtcgtgtcgatgtgtctttggtgaatgctttaaggtggaatc tactggcaggcaatacagccagaatcagcagaagtgatttgtatggtcga gctccactgc >AW057131 tatctatattctcgactttggatttgctcatcagtacatgattaaggatg gaacactgaaacctccgtcagctcatccatggaaatacgtgggaagtctt cgtcatatgccacgtgccgcatattcgaaagtggaattctcaagaatgga agacttggaaatgtggttctatatgagtgttgagctggttaagggatgtc ttccgtgggctcatttgaagaaaccaaaagaagtgcatgactatcaaaag ttgtgccgaaatggccttcaaatgcgtgaaatgctccgaggtctttcacc agaattcgtcgacattatgcanataggtgacaaactttcattcaccgaca ctccgaactacacagaaatctacggacttctcaccaacgcgattctcttc agtggcaaaaatgaattcccgtacgattgggaggaggctgagatcaacga gttcaagaatccgcagaagccaagtgtggagcaggcaacctaatagctct ctttctatggaaataaattga >AW057132 ttcgataacttgctttattgtcgacaagttttatcgtaacaatgacagca ccgccaccgccactcgtggaacttcctccgggatctatggttgaaagatg gtcgattacaaagaagctgggagaaggaggctgtggagccgtctatttgt gcacggatgcaactggaaagtatgcactgaaagtcgagggaatctctgag gcaatgcaggtgctcaaaatggaagtgttggtgctcggagagctgacaaa gagaggaagtcgtcacttttgcaagattgaagacaaaggaagatatggcc aagttcaactatgttgtaatgacgcttgtcggaaagtctttgcaggatct tcgcaagggaaccgctcaacaatgtttgagtctggcctgttctctcagtg tcggaatccaatccttggaagctcttgaggatctccacaacattggatac ctgcaccgtgacgtgaagcccggaaactatacgattggccgtgccgagtt gaatgaacttcgtaaagtttatatcctcgacttcggaatggctcgcaagt tcacggacaacaatggagtgatccgcanaccaagagctgccgccggattc cgtggaactgttcgctatgcgccaatcgcttgccataagaatcangagct ttgaaganaagatgacgtgagttttgctctacatgcagttgagctgactg tttgacgtgtcccatggaagagatcacgacatgaacgcagttgacaagcc aagcnagcgatccgcatactccagaaagatgtt >AW057133 tcgctgtctttaaaggagaaatatcgattctataaaaatggaattcggag cgtcgtttgcttatagaagaattgacgcccacgttgaagcaactcttcaa gaagctgctcaagttctcgaaaagttggtaattgaaagagatgccgagtc ggccataacggtcgcatcccaaggaactccagatgcaataagtatcactc aagctgctcatgaaaccatttcaattgattcaattcttcgatcttcgaag caaccaaaggctaccattcgtactccacgtgactctaacaagaatcggaa attaactttcccaccagcccgaattgggttcaaatccgagcaattggaaa cttgtgattctggaataactgatagcactattgaccaagatccaccgacc ccggactccttgttcccaagtgccatctacattccggcaaagcagaagcc tcaaatgaccgtttcaataagtgcaacaactgcttcttcttgttccaaca aatcgctctaccgtaagcacattgaaaaactagcaattgagccattggag gacatcaaacacttgaagtgccgtggactgangaagtcanagccagatga ccttcttctaaagccacttacgatgcgggaactggnaaaccctaaatggg gttgtctccgagtcaatggtgtcggtcacctttatatgcatctacactgc taattcaatagcttgctatcacccaatg >AW057134 cttaccctcggaaaggcagaaagtcatatggacttactggaaatgtgaag aacgtggaaaagggtgtcaatgtgtggattgagttgttggaaaaaatgga ttttcgaaagacgatggaaccggaattcaaaccgattgacacaatgaaag cagatcttgagaagtgtcaagctttcaagaagaatattgatttttctcaa tcggagaatgttgaattgtacgatacaaatcgggtcaaaggaggtggaga agcggatttcttttatcacggaagcactctgactattccatcggtaacca acaaatcttacaatccttgtccaacttgcaattgtcgactcaccgcaatc tcttgaatcattctggataatgggtgcctcacaaaagatccaaagactct tcattcttcttggcgaagaagaactcgataaacctactctgagcgagtac tttccagaagacttcaaagagttcaagacgattcgtgtgaacaatcggaa gactgtgccgaagactgaagagcaagcaaacactcaattgtactatgaag ttgtgccaaaggattgtgcagaagctccatttgcaatgattgagatctgc gattcttggcctgatgcgaagattccaacgatgggttataacagaaattg ctgcgactgctgcagcgtgtctgaatctgatattgattgtgatgcttctt gtgctattgtgagcaactacngagccgctagagcacgatcatctacttgt gcagttctggctactgatgaagtgcacgcacgagaagccccaatattaaa gaaaatcgtctttagtacgttcgcacgcctgctgtattcgacattgctca tacatgtt >AW057135 tctcatcatgatcacggaagctgagatttgcttcaatcggatatggcgtt gacaattctggttccatggaactttcagctcaagcccccaattctcgatt ggggaattaatttacctggtatttattgattaactacanaatatcccgat tattttccagtagttgtactttgcttggggttttaactttattttattgt taaaaagggaaaagctggagcaaaatgcccattcactg >AW057136 atcgaacatcaaagcagtgaccataagtggaaagtgttgcgaaatatcta ctcgggaccgttttcagatgtttatgtcgttgcggatacagtaacgaacg aaaagtatgcgatgaaatgtgagagacaagaaggaaactctcgtccagtc ctgaagctagacgtgatggttctgatggcgacgaagggtcttcgaggatt tccgaactttgtggcggcaggcagaactgatgtttataggtactgcataa tgcaacctcgtccggaccggatcttggtcggctgcgtcgaacacgtncgg aacgcaagttctcccttccaacagctctccaaattctcggacaaactctc cgacgcctcgaggatcttcacaattgcggatggctttgtcgagaagtgaa ggcgccgaatttctgcattggtgtcggcgagaacgagtcgaccgtctata ttctggattttggattcgccagganatttgtagacaaggagggcaaaatc at >AW057137 gaaacttgtgaagaaggtctgcaatggctgcctcaaagaaattgatgcca aagcgaatgaagaagaagatgcaggtggagagaatggctcatgccagagc ttgcaaggttgcaaagagagaggctcgcgtcgctgaggaagcatctggaa aatcaactggtggatctactcgcggagccaagtgatagccgagccacaac acatga >AW057138 tcgaatcaatcgccttttgaatcggcaaagtagcaaagaagaaaaggaac agaagaatgggttccgagaagatgatcgagatttgtatcgacatggagga aggagaaccacttggtgcaactccaaatgacaagctcgttatcactaaga ttcaggctggaaccatttctgaaggaaaattgagaattggtgaccaagtt aaaaaagtgaacggacaaaattgcaaggattgtaacgactttttccgtgc gcttcgctttgctgctccatgtgcaaaaatcacggtgaaccgtgaccgaa aaaaaggctgaagagttagaagctcgtgttcatattcctgaggatcgtgc anagatcattcaacgccgtgagggatacgtttatgagttggcaacccttg tctgggttcaaaatggaccaaaacttggtttgggaatcaagcatttccag aaccgtgtgcttgtttcacgtgttgatcctggatcactagccgagaagtg tcttgttcttggagatcatttgtgtgatgttgatggaattccagtcagtg acaaggatgttgccagagatcttctcgtcaagaatatncaagagaaagga naagtcacttttgtcgtcgagcgccctgattcgatcgatgccaagcaatg ggcgatacaggcgttgctaccaatctcatgctaccaccatcagtccatat gaaccgagatgtgaaagcattgcttcgcatatcgtcaagctcttttcatg actgagcctncagcgaagagtgcattgtctacctgtccanaatgcttgcc gtgttctatcatcgagcaaactcaaaacttacgagatccggcatgatcat gacggaaagcttttcgcaaggtcagtga >AW057140 caattgagaacgctgtgcagacatgtcgatgaccaagaatcctgtgaaat caaaaccagccgatggaacaacgtccgctaaggattttgagaaccttcag agtgacttcttctccttcttgtacgctgatcatggaccattctacaaaga gaatgtgaaaaagttggaagacgcaactggtctgaaacgtgaaatgctcg catatgggctcatcgggctcaattgtgtctacatgattattggaagtggc gcccgagttgggggcaatttgattggagttgcctatccggcttatgtttc tgttaaggcgatccgaactgaaggaactgacgatgatacaatgtggttga tctattggactgttttcggtgccttttcaattatcgacttcttcgccgca atcattatgtcatatttcccaatttattgngttgccaaagcagcatttct cttgtacctctacttgccagaaactcacggatcccacgtcatttaccacc aactgattgatccatttgttgctcatatggagaagagcatgtccagaaag cttncagcanacgctggaactgttccaaaatgatcaggatcagccttgga tgcgaantaaca >AW057142 ttttcatgttcgccaaagcacagagaaaagaatagatggcagacaaatca gcatttgtgccagtcgatgcaatcggaaatcacaaaaacaccgatcttga cgttgatattgatgacgaattgttcggaaagaaaccaccaaaagccagtg caagtgcaaccaaagcagccgctccaccggctccggctccagccccagcc ccaccaaaacctgctgcagcacctgccccagccgctggaaagtaccaata caagaagtcgtcgacctatcagaagacctat >AW057143 gactggtgctcctccaacagcgaaataagcagactgggctccgccacctc cacctccacctccggctcccatgtagactgatgttcctggtggtggtgca gcaccacctccaaccttcgaggctccaccgcctcctcctgctggtcctcc tccaactccaaagtaagccgattgtgcaccagtattgctcc >AW057144 cgaaatgggagtcgagttatcgttggatccttctgtctgcccaatccaag ccaatggtggtgtttctacgcacaagatcattaatcactgcgacaaaatg ttggcctataagataaaatcttccaataactccaactacagtgtcaacac tatctttggacttattcagattgggtataccgctgatctgatcatcacta gaaagccaggaaagccgcaggccgacaagcttgttatccaattcgctgcc gttgagcagacttgccgtgatggctgtatttgccaaggctgtatttgcca attggactgccaactggagagtgctgtggagagacgatcatcaagctgtc agctgctgaatagaagaattatcacaataagatttgtgattatgaaaac >AW057145 gcgaaaacaaattcaccgcccgttccgttgtcctgtcttggtgaccggcg tcgtcgtcggcggagccgctcttctggcgattgccgcctactactactgg agccagaaaaagaaaagctctgatacttcatctgccacgtcatcggagtc caacgatgttgtcatgatgtcatcatcggagcccagagccgatggaggag ccgattcgaaggcaaagttcaatattgaggatgaaaatgtgagaagagtc tgcgagaagctgttcatggagcagatggatttgggggaagcttattttgg aggatgaagaaaccgaggagctcggcgcaatccacatggccaacgcaatc gtgctcaccggagagactgctcagctgctcaaagtgctccgcggctcgat ttcaccggctcactttgccaatattcaaaagtacctcccatcggctgact tgcgtgttcaccagcttctccaagacgagctcgccattgagactattgcc cagcatttcgactaagctcaacttctttntttttttt >AW057146 aatgtgtacagcggtggagtcactggagaaactcctgcctacttcaatga cttgcgaaaatgtggatgctggacagctccatctgtcaagcagttcgcgc aagatgagacagtctgcggtatcactgacaacagagatgttcatctagct ggaaatgtgctcaaggctgctgaagaagacgggaagatttatgctggacg attggtgcaaggaagcctgcaaatt >AW057147 aattccggcttggacacgctacggtactagttgatttggaaggtgtaaag tttgtgacggacccagtttgggctgatcgagcgtcgtttacgagttttgc tggaccgaagagatacaggccaccaccgatgaagttggaggatctgccgg atttggattttgcagtggtgtcgcatgatcattacgatcatttggatgct gacgcggtgaagaagatcacagatcgcaacccccaaatcaagtggttcgt tccgctgggaatgaagaaatggatggaaggccagggcatcggagtcgacg ggatcttcaccgctgtcaccgagctcaactggggagagagctcagaattt gtgaagaacgganagacctacaccatctggtgtctgcctgctcaacactg nggacagcgtggacttttcgaccggaaccacagattatggtcaggctgng cggtgatcggcgagaatcggcgattctattattccggagatactggtcac tgtgacggagagtttaagaagtttggcgagaagcttggaccttttgatct ggcagctattccaattggagcatacgagcccagatggttcatgatatccc agcatatcaatccggaagaggcgattgaggttcataaactcattcgggct aagaacagtattggaatacactgggtaacgtaccatatgggctctactga gtactacctggaaccacgtgacaagctcaaagagcttattgatgctccgg gagatcttangaacacgagttttgcacaattggaaatgggtcgatttggg aggcgtngatca >AW057148 agcagcaaagaagaaaaggaacagaagaatgggttccgagaagatgatcg agatttgtatcgacatggaggaaggagaaccacttggtgcaactccaaat gacaagctcgttatcactaagattcaggctggaaccatttctgaaggaaa attgagaattggtgaccaagttaaaaaagtgaacggacaaaattgcaagg attgtaacgactttttccgtgcgcttcgctttgctgctccatgtgcaaaa atcacggtgaaccgggnccaaaaaaaaggctgaagagttagaagctcgtg ttcatattcctgaggatcgtgcaaagatcattcaacgccgtgagggatac gtttatgagttggcaacccttgtctgggttcaaaatggaccaaaacttgg tttgggaatcaagcatttccagaaccgtgtgcttgtttcacgtgttgatc ctggatcactagccgagaagtgtcttgttcttggagatcatttgtgtgat gttgatggaattccagtcagtgacaaggatgttgccagagatcttctcgt caagaatatccaagagaaaaggaaagtcacttttgtcgtcgagcgccctg attcgatcgatgccaagcaatgggcgaaacagcgtggctaccaatctcat gcaccacctcagtccanatgacggagatggaaggcattgctcgcaatatc g >AW057149 ttatgagaaagtattcacatccacatattgttcggattattggtaagata atcgtcaaacatcttccaaaagttggtttggcagtagatgctcatccact aatgattgtaatggagatgtgcccacatggatcacttctttcatttcttc gtaagaacaaagggaaaacgacactttccgaacgtcttcgtttttgtatt gaatcagccgatggtcttgcatatcttgagaaaaaacaatgtcttcatcg tgatattgcagccagaaattgtttactttcgatcaccgatcaaattaaaa tttcggaattttggtctttcggatgacaaacgaactgaaatgcatgatga cacactcgataaggtaccagtgaaatggttggctccggaagttatgcagg ataaattgtattcattgaaaagtgatgtttgggcatttggagtgctcatg tgggagatatatgcagatggagctgatccatatccgggaatgacaaactt agttacgagagccaagatcttctgcgatgattacagaatgactnttcctg agactaccgcaccaaccatttctgaaatcgctttgaaacaaatgctggcg aaacttcccatcnatcgtgccacgatgaaaactgtgcatcataagctaaa gacctctcaantgccatggtcgatgtaggctcagtatgaaaattgaacgt >AW057150 tttcactcaaaacgttcaacaagaggcaaattcgacgaaaaatggacaca atggagcagcatcaggagcttgagcaagaagccatcggaccggctcttcc accgccgtcagctgctcaaaaatcggagctcttcgaggagcacaacgtcg agtacgagctgatcaacgggatcccgtgctaccagccggatcatgtggtc gacgggcaagttcagatcttcgaacgaatcggctacgacgacaaggtcgg cggaacgttcctcgggctcagcgccgacggaaaagagctcgtcgtccggn gtgcaccgatcgacgcgctgactcatgtggttcgtgccgaggcggcattc ctgtgcaaggtggaagccgagcttcaggactggcggctcttctcgcaagt tcacaagatctttctgaccgacgacgcgtggcacatgtcgctgtacttcc gtggtggaccgacgttggagcagtgctttgcgatgcggaacaagttcac >AW057151 tatccgctggtttgttgtactcattttcctcatcatttccaataaaaaga aacggtcgatatcatgtttggtaatgaaaagagcgaagaatccggaagtt ctgggtttggtttagctgaagttaaaaaggttttccagtggattctcgga tgcacttatgcgaagaaaacattcaaaactagaatttctgaaatgtttca ttttgctgatgctccgcacattgtcgtgtatgagcgaagtgaggagcgac catggtattcgatggttggaataataattggagttcttcattgtagttat gagtctctattacacacttgaagctggataccgttcaattcgaaaactca tgctttcaatgctccaagatccgcaacagagtactgtatccactccaaga tgtctatcaccaacaaacatgggatccaagaaatcttcgcctcaaactcc aaaaactccggatgtgattcgtcaaaaagttccgatgaatgagccagtca attgtgtgttcattcgaccagttattccaaanactattccagagggtact gtagccgtgccgaatttgaattcggaagaagatattctattggatagaga tcatcgtattggagagacagggacaaagatatatggaagttagaagagac acatagtgatagaagtggatagaaaaaataaacagaaactaaaagagtc >AW057152 gcaagaagaagtctgatacggcatccgttgttgctattccagaaggagac aatgagaaaggaaagaagatcttca >AW057153 aagaacagtcttgaccaacacgagatgtaaactcgattgacgctcctctt cgacctcttgttgtctggaagcactctgctcgttcaatgtggtggaaaga aaaagggagcaacttctgccgaaggaaaatcttcgacgatgggcccggct cctggaggagctcctgctgctgcttccgctcaaggagaacctgaagagaa ggagtaa >AW057154 atctacggaatgatgcgtcaagcttatggtgcatgtggatccaacgagaa tgctccgtacgattgggaaaatggaggtccagctgcctacttgctccact agaagaaatagacttga >AW057155 gacaagtgttctccgatgctccttcgtgccanatttctatattttcatca gcgcaccatgggtgttgtccgttactgttaagcggaatatgaccgagtat gagcagaaaattcatatcaacctactcaatgggatccgtcagaagaatgc gattgatgagcaagtggctaatatgcatgagctggtctacgatcccgctc tggaatccttatcatatccagaatgcgaaatctccaacgatgatattacc gtaaggaacaatgatggcgtatccacatattataatgcattcccaccaac cggacagatatttgggtgtactatgcgcaacacattgattccctgcatac gtctgccgcatgccatcatctcacctgcaaatccggcagttccaataata acagtgtcagcggatgtatttttggacctgttcgtaaattcagcagctcg gaagtggtgaaaggagagcctggttcacagtgcccaaaaggaagatcttc actggattt >AW057156 gtcgactgcaaagaagactcctgccaaaatacttagccgcagcagaagcc agcagaaggtgaaacgctcgatgagccgcaaaggttgaacggattg >AW057157 aaattccaaaccactgaaaccagttagtancaatgactgcacaaatcatg ttgccaatgagtgcaattttcgtgctacttggatccatagcaaattgcgg tggcaagaagaaaggagcagctggagccggagccggagttgcagccaaca gcccaaaaaaggcagatggaccgtcgaagaaggaaacgaagaaagaagga gatgacggaaactacgaggaactcgccgtcccacaatgatccccctgcat ccctgtc >AW057158 gctgggcggctgagctggatccacgtggctagcagaacatcactgactcg aatgttatcaatcgcgaggaatacgaggacgacgagtacggatcgccagt ataattttcataactcgttctacttctcgatagctcatcatagccaattt cgtgctataatcgtcttttttggcgcgggtttttcactcgtcttttcttt cttttttacttttactctgttctatactaatcgcggatatatat >AW057159 aacggtactcgatgagactgctctggatactaaagatatgttgatgaccg cattggcactcatgtcactcattggactctgctcaactcgtcgtgcattt ggtgtcttcactcttctcttcatgatgcatgcatttgtattctcagcttt ccatcttgcacatacanttgcactcttcattaaatcattcgattctccat gtcaatatctgaaaactccctcaactggaacacttaattcggatatctgt catgctgttaatggagttactctggtgtgtgcagtgatttcaatgattgc tactgctcttgccagtatggccgtcttcattcgtctcactacagtcgtcg tcaaaatttcgga >AW057160 tgaacaattgtcaaagagttcggatcccaacatctcctcgatgtacgttt tccatcaaggaattcaagtaaagcaggaaccaatcgatgatgaccaagag gaagagcaacaagtacaaaagcagcttgtattcaaaatcgagggctccga agacgaagaagctgtgaagaaggagt >AW057161 caactttatttatacacacaatacataattttcagagaagttttcataat cacaaatcttattgtgataattcttctattcagcagctgacagcttgatg atcgtctctccacagcactctccagttggcagtccattggcaaatacagc cttggcaaatacagccatcacggcaagtctgctcaacggcagcgaattgg ataacaagctaggcggcctgcggctttcctggctttctagtgatgatcag atcagcggtattcccatcttaataaggccacagaagcggtgacactgcag ttggagttattggacgattttatcttataggccaacattttgtcgcagtg attaatgatcttgtgcgtagaaacaccacc >AW057162 gaagagtgcagtagtccgattgaaaaggatcaacgtcgatccgccaactg gaaactatccggcaactggaggcaattcgacgcacaacatcacttccgaa tcggattcccgtcttgcattcaaggtgaagtcgtcgaacaacgagcacta ccgcgtccgtccagtttatgggttcgtggatgcaaagggaaagagcaaat tggacatcaatcgtcttccagggccaccaaaagaagacaaaattgtgatt caatacgcggaagtgccagccgaagagactgatccaatggcaccggtcaa ggcttgagctcaacaaggagaaatcattgtgaagctcatcgctgcttgaa tggaatgcaataactgaag >AW057163 atcacgaggctcaaaaagcgtttcgctcaattgaacacggtttcattatg aagttcgatgtctcttcgttgtgccaatttgttgttcaactcctccaatt ctgcaatgtttttattgttcttgcagtgatgaccatcacctcatgcggaa agaaaaaagcatcaaatagcaaggaaaattgcaaaaagagtttgcaaacc ggccctggagcagccaccgaagccggagctgcttcttcgttagctccggt tgacgcgaccaagcttgccacacctgtaccagcggcgccaaaaaaggaag aagctccacctccagaagagcccaagaaagaggagaagccaaaggagaaa tcgaagaagtcggcgaaatcgaagaaatccagcaagtccaagaaagacaa gaaggatggagaggaagagaatggatatgaaaactgccaggatatgactc cggatcagttgaagaagattgc >AW057164 tcatcagctcattcctcattcggtacatcaaatgactcgttttcttcagt ccaaagtgtgattctgccaccagttggaccatttggtcagaaaaagagaa cgtttgagtacattggtgtcacacttgttgtcgataagttgaaattggct gaatggatgaacgggatcgggaaattgtttggatccgcagagctccgtga taatgttaccaatcttcatatgcaattggtcccggtgattgatacgttta aaaa >AW057165 tgagaaatatgagctagctatgagaaatacgcactgaagatgaaggagga atgttatctggtgttataatggatacttctgatcattatgagcgtgatta tacaatggatcatgatgttggaccttcttcaatgaaaatgtctcctatac caccacctccgatcaaagaagaatcacctccaccaccgccac >AW057166 gaaagagtaaaaatacgaagaaagacggcgttgacaagaagaaaacttcg aaaacgaagaagaagtcaaatttgtcgatttccaaatcgcaaacttcggt ggatatgaatgaaaaggacaaatcaaaggaggcaaaggaagcaaaggagt tgaaagagaagaaggcaaaagaggaagccgagaagaaggttgttgccaca ccaaaaaaagatgcttcgaaggatcaggcaaagaaggaggaagatcctta ttagcaaaacgaaccaagtggaatggatgttcttgtgaaggaggacggca agaagacaaaaatggacgatggctacgaggatttcggtccaggcgccggt gctgctcaatgagcaaatggtcgagaaac >AW057167 ggggaatctggagaactgattacagtacaaacatcataagtcgccacaac aaccactgcacgacccattaagaatattgctgcaacaactaccgtcgcac ctctccaattgattgcggatgcttcacttgttgacaatgacctacaatcg aatcttgaagcaactggagtgtatgtcgatggaaagtggtggtggtgggc aatctacctgggatttgtcttgggcactctccttactttggctatcgggg gtggaatatgttacgtgttgcgacgaactgtttatggatattggtaccgc ggcatgtacagacgatatggatgtgatgtctctgcgacaaccgctggtct cactggagttggattcggagcaactacgaccgcaatgcagacgatttctc ctggaaagacgggtgcgacaacattgggaagtacttcaagtaccactgga attactgaaactactggaactactggatccacggcaaccactggaact >AW057168 tcccaacaacacctcaagatgaatgccatcttcactgccgtccttgttgc ttcaactctcgcctacactgcaatggcttggattggactcagcattgaag ccgccaacgaggatatgttctgaagtggtacc >AW057169 aattccttctactaaccctttcgactacaattaacatggacatggacaag cgatcatcggatttggaagctgctcttcgaattgtgctccagcagacttt gaacatcgttttgcaagcgcaggagaagctccccgaggcaaatgtggtac cctcaactccgcccacctcaccgagcactgatatcggcgaacaaatggca tcgttctggaatattccatcacccaaccctcctgcaacct >AW057170 gacaatttctcaagatggtcgaactgcaaatggagcaggaaatggagaaa atgaatgagattgagactgataaacttccgattgatcatcaattgagtga ctatcagaataacatcgaatcgggaaatgatcgtcaagttcaatcatgcc cagttgatgtgtctattccaaaagaagtcatgaagtgtgcaagctgtcct ttgctatgcttcaattgctcagttcaaatgcctgtctcacccgttccaaa caacaatcgaatcccgtaagatcaacgagactactcacttgatgatggaa ttttggaagattgttgcttcaaagtctgaagaggaaaagctcccatcact attcgaaaatgttgagggactgttttctgtcccattttcaactnttggaa cgtgggatgatgacaccctgtctggtgtcacatcgcttaattttgaaaag tctgatgaacaactctccgagcaagatgatgacaaaaccactgtttggag ttctaatttcccatcggctcatgttttaacagtctatgagaattctgaac agaagacggatganatggccgatgatgatatgtccgacacaacttcatct tttcttctactctttcacaacatgagtgctcaagtgccgcgctcatcttc tcagagagtcacttgcaagatcagt >AW057171 gtcgccagactttcagcccaatgactttgttattgattggaggtttgaaa aatggttgttctgagaatgaaaataaggaagaggggaagtttgagaaaat tgacaaagttttctttcctcccgagactgccaacaatactaatccagtcg gacgcctcattggtccacgtggaatgacaattcgccaactcgaaaaagat ctcggatgcaagctgttcattcgcggaaaaggatgtacaaaagatgatgc caaagaagaacgtcttcgtgaacgtgttggctgggagcatctcaaagaaa ccgattcacggtgatgatttcagtccgcttcgattcggaagaggctgcat ccgagaaactgtcatctatcaagaaaatgcttcaagaatttttggaacat actgactcggaactcaaacgctctcagcttgcaactagctgttatt >AW057172 aatgatcaacgtcgacccaccaactgttaatgttcctgcatctggaggta attcggttcacaacatcgtctcggagtcggattctcgtttggcattcaag gtgaaatcgtcgaacaatgagcactaccgtgttcgtcccgtctacggatt cattgatccgaagggaaaaaccaagttggatatcaatcgtattgctggac caccaaaagttgacaagcttgtcattcaatatgctgaagtgccagccgat gagactgatcctcaagcaccattcaaagctggagcacagcaaggtgaaag tcatcgtgaagatcactgccgaatgagaaaat >AW057173 gctcgtacaagaatgctcgtcccgctgactgcaattgtgactacgtcgtt gccgatggttgccgccatcgctttttgtgccaagaatcgtaagacggtcc atgctaaaaacaagaataagaacaagagcagcaaatctgccaaatcgtcg aaatctactcgtggagcgtcgaagagtgggaaatctcgccgttcatcgaa agctaagcactccaagagatcgtcgaagtctagtaagaagggaacgtctg taaaatcttggaatgggaagccgcaagcgtggagggaaatcatcaaagtc ttcgaaatcgaagaacgtcaagactgctaccacctctggttctcaagttt caactgtttccgctgctactggtgtttctgataagcaatctaactcatcg aaatcttctcgtaagagctcaaagagttcgaagagccgtaagaatcgtcg acttgattcggatgcccagaagaaaatggagaaatcgggacagagcggca aagttgctcttattccagaaacgcaacacacaactggaagccaagctgcg catagccttgctgaagaagtcaattcgatcaagcactccaaggaaatgaa tgtggcttctgctaaactgctataccagacactttggcgagtcaatcana ttgtattgaaggatacttcatatgaacgtaagcttataagatcagtgctc ggatactccctttatcggtcaatccgctttatgatttg >AW057174 tccaaatgacaatcgacatgaaggtggtggcctgggaaatgctcaaaggc tcggatcagccggtggacttggtccccaagaaggtcgtggttgccggaca ggaaatcacagttgattccaagtcgaagaacgaaa >AW057175 atcaactaataagatgccctgtcaaaagaagtcaaacccaacggaattgc acatctccactggccgcgagatcgttcaacggaactttgtgttccgcaac accactggcaaggacttcctgctgaagttgcatgctacgaatgaagccgt caccttcccaacggaagttttccgttttccaccattggctcatcgtgcca tccagttccgtgtgaactcatcaaagctctcccaatgggacaagatgaat cttttgatccaaaggatcgtgttgccgatctatgcgaagagcctcaagca gttcattgatcagaagaaaactgcaggaactaaggagcaagaggcattct cattgtctgtcaagttcacggatcagttctcggctccccagacagtcatc aacttgccaggatatgccacgtgtatcgagtcgactgattatccggttga cgtggaagaattggacactacaactgcagtcaacatcgaaagagatgtct ccactgctgttccaattggttcaatgatgggatttgttgaggagtacaaa cgtcgtcaattgaacaaaggatgctgtctttcaactacatctttggaact gaaagcaaccggagagcagtcaatgagatctctcgtgatcagccgtcgtc gatcatctgcaagagctcaaggt >AW057176 ataactctcccaacaacacctcaagatgaatgctttctacactgccgtcc ttgttgcttcaactctcgcctacactgcaatggcttggattggactcagc attgaagccgccaacgaggatatgatctgaagtggcgc >AW057177 acggattgctcgcgagcatccggaacgtgcggtgactttgctcaaggcgc ttttcgctactgtgtcgacatttgatcaagaaggttatgtttgtgtggag gataagaagttcactgagaaacagtccaaataacttt >AW057178 tctcgatccttgccgtttttgtccaccatggatttgctgctgctgaagaa gagaagaatacagcttcagtcgtcagccctgctccggactctgaagcagc ccaacctgctggaaacggaaccgaaacaccaaaagatgaggtgaaggatg aggcaccaaaagaaggtagtgaaactgaagcttcaccagaagccaagaca aaaggatctatggtattccatgctcttggagccatttccacagttgttct cgccggcattatgtgaagaagtctgccgaaag >AW057179 atcaaccaccatgagttttgatgaaattgacatgaccttcggaaccaaga accgcgatcaaggatatgatttgctcaaagcgcgtctcgacaaaggtgat cgttcggtggaagtcttgtggagacttgctcaagtaattcatgagaagtc tgcatgtgttccaaaagctcaacgtaaggcaagtgtcaccgaaggactca agtttgctgaagaagctgtccagaaggatccaaaccatttcaaggcgctg aagtggaatgctgtgttgactggacaagcaaccgaatatatggcaaccaa aggaaaagttggattgcagtaagaagttcaaggaattgctcgacaaagct cttgctaaggagccaaaggatacggctctcctccatttgcgtggccgtta caagtactcggtggcatctctgacatggcttgagaagaagcttgctgcca cgttctatcagcaaccaccatcacattcctatgaagaggccaacgaggat ttccttgctgcttataaggtcaatccaaaatggatggagaacacattnta tgtgtccaaatgctacgtagcaattaaagacaagaacaacgctcgcaagt cccttaccgaagtgtgtgacatcgaaccgtattccgacgctgaacaagag tttgccgatgatgcgaagcagatgttgtctaagctttaa >AW057180 gaaaaaatgatttcggcgttcacttcattcgctgtatcctacttcgtctt ggctatttcgttttacattgaaacaactgtcagcttgttccacctcgctt atttctcgtacagaaatccggcagtttcgaaggatctcatcaaaactgca ttccatcttttgaagacttcctacgacaacaaattgctgacatttgccga aatcatcgagactacacaaaacagtatgatcaagccaatggctcatcaga ataaacaacactttttggaggaaaaccagcgtaccgcacagttgcagacg atgaaaacatcaactgcttttcgcgttaa >AW057181 tacaatggatcaaatcccaccatacgagttcaacaagtacgtgctctacg tccgtggtgccgtcatcgtctgtgcttcatttgagctcctgttggtccta ttcggttccctcgaagattgcaacttccttgctaagctcttctacttcat cttccttggcggagcagttgccatctcagctcacaacattggtctcaacg tggatggtcgcgaggagctcaacaaggttctttcatcgtcagagaatgaa gttcgtggaaaggtctgctgccttgattctggtgccagcgctctccggtg tcctcgtcttcttatgtgtctctggacatgcattcttctctggtgctgca ccgtcagcacaggatccagctgctgctccaccagcccaatagac >AW057183 tatacgtgcatgtggaggatgaaaatccgaatgagaatgagccggcacgt ttgagagctggattcgagtgggctgctgaaccggatgagattctcattgc aggtgttcccaccaagttcattatgtttggtttgtcgtgtttccttgtaa ttctaacactcagtctatggattgccagcacccattattcttactttatc tggctcgtgtttgccacgttgtacatgggtcttatcgtgttccttcccga atatttctcgaatataatcagtttggctctgaactttatttactggatcg cttactgtattttcacttttattggaattattcttgatgttgttaggaga ggggacagttgtagttccggtatgagcaaggaagtttgtgatgctaatcg tcacggatacatgttggctatatgcttcggatgtgctgatctgctaattg ctggagttatcatgattctgatgttccgcatgctcaattactattacatc aatcgct >AW057184 gacatgtcggttttcataagtttgtgttctgcttggccttattgcaatga tggcagcccaattcggactaaactcgggacttgggcttggagttggaccg gcgagagctaatgctaacctaaacggaggttttcaacgtggctacggcgg caatggctacggtaaccgaggtggatacggtcagcagggcggctattatg gccagcaaggaggttacggtcaacagggtggatatggtggcaatcagggc tactatgggcaacaaggaggcggntattgaggtggcc >AW057185 attttttcaaccgatcgtccaacaagagacaatttcctctgaaaacgcaa tgccgaacccgccaccgaaggaagacacctgggcgtttcaaccaattgga gccccattcccaccgagtcctgtgaaatgtatgggagaacagaatatgta tgttgctctttggtacaagcacggtaaaccaatccacggtcgctcatgga acaatggaggagttgttgaatgttcattcccatataagcaagctgaattg acaaccaagcaacaactggaaggacagatccaggttcttcaatacgtang agaccataacaatcaaggtttctggtacgaatggattaagtacaaggatc gtattgagaagattgacgataaacatcaacttgtgcgttgtggtgattca ttcccaatcttctggaagcgtgccgaaggaaatcttcttggttatgtcga caacaagactgaggaggcttggttctcgttcaatggaaaagtgctgaaac aagttggaccacaactcaatgacatgtacatcatcacccgtaactgcatt ggtgggccaccatcttgtgattgtgccaactggtgaagtggaccanaggt tcgtgtcgagagagatgaatggatggacattcgtgaggtgatgcatggca actcgtcacttgtcaagctcttgatagactcttgatacttgccagtgtca atcagatcatacgttgcactttgacatgccagagaacctgtcatggtcgg atggaat >AW057186 tcccaacaacacctcaagatgaatgccatcttctttgccgtccttgttgc ttcaactctcgcctacactgcaatggcttggattggactcagcattgaag ctgccaacgaggatctcatctagatcgatcggagaaaaccgccacaagaa attttg >AW057187 actaactaactacaatcaactctactatacttatggtcaagaagattact gtctacactgcttttggacaattcctcgagatgatcgagcgtcaagccga acagagaagggaaacagttccagtcctttgtccgatcgttgaaaaggctc agccaaggacagctttgaacaaggttcaatcttgcccagttgttccaacg actgcaagagttacagaagagatcaagaagagcattagctgcccattgtt ggctctccac >AW057188 tttctcggaacaactccaagcgaaaaaaatggcggacaagtcggcgtaca tgggtgctggtggctatggatccggatacatgggatccaacgcctcatcg tcgggatatgcccgcgaagattatgcacaaggaggaaatggaggcggaca acaacaaaaccagggaaacggaggaaacaccaacccaggaggacaggtct tcaaggcccgtaccgatcaatcgtgctaccttgggccataagtagctgct cgaataatgtgaagactcagccag >AW057189 tctttcgttctcgccatcaccgccttgcccgcgattgccattttctgcgg aggaaagaaaggagccggtgaatctaaagaaaagccaaaggaggatgtgt atgaggatttggcaccaggagataagaagtaatacttgtgaactgacaga atgcacaatcgagcaacttc >AW057190 ttttgaacaacaatactcgatgcccaagttaaatagaagaatcgttcgtg caagggattcaaaaggacgcttcctacccggaaagaaggccaagtctgtt gccagcaagtctcgttcgagatccagaagccgcagtgttgtgagccgtat gacgacccgtaccaattcgttgactcgtcgtcgttcatcgacgaaagctc cttcgtctgcccgtcaatccagatctcgctcaagatctcgctcaagatct cgctccaaatcccgctccacgtcttcccgccgttcccgctctcgctcagc ttgttgtgtctcgttcaagcgtggacgtcctgcttcgtttgctatgaaga gtcctgaagagaagacggccgcaaagaagacggccgcaaagataatcctt agagtagattagccacctggaatgagatacagaat >AW057191 ccaacaacacctcaagatgaatgccatttactttgccgtccttgttggtt caactgtcggctacactgcaatggcttggattggactcagcattgaagcc gccaacgaggatatgatctggagtggc >AW057192 gcacctggttgtgatattgaaatggatacgcgtactttgttgtggaatga ctacaatgctgcagttccaactaatatgtgggaaatcggaaaatgtacat tcaacttc >AW057194 tcgntctactgagaaggatgaaccgncagtcaacttcttctcctcctcat cgactcgatccttgccgtttttgtccaccatggattggctgctgctgaag aagagaagaatacagcttcagtcgtcagccctgctccggactctgaagca gcccaacctgctggaaacggaaccgaaacaccaaaagatgaggtgaagga tgaggcaccaaaagaaggtagtgaaactgaagcttcaccagaagccaaga caaaaggatctatgggatttcatgctcttggagccatttccacagttgtt ctcgccggcattatgtgaagaagtctgccgaaa >AW057195 gtagatgaacttgcaatcgttctacattttaatgtactcggatggcccac tgactcccttcgagtagtagttgttgaaaaccggagcttcatagttgacc ggaggagccgagtaggtcggtatcggagtggtgactggagcggccacgtg gatctccggaacttcctcctcttgctccacatcctttttacgagcgaagc agtacccagactcttcatcgtcatacatcttgtgaagcatcttcggagca acttttggcttttgctgaacttgttgaactggctcgacgactggaacatc tggtgccggctgctgctgctgctggatctgaagagcctgcagctgagcca tcaactgttgctctgtgaagtccggagctggagacgcaggtgcctgctgn gtcgtggtgtacttaacacggtaggacaacggaatatccgatggtccacc atagtagtccgagtgcacaatcggctgttccggagcaactggagccgctg cttgacgaaccggtgatgatgnggcagacgcagacaaggagagcaccgaa cggtagagacggatggcatcctggattttggatgagtcagagatcctcgc gattgggtgtagcgngaaggcagcgtggcagagaggagcttgctcgctgg ctct >AW057196 agagattagtagatggatgtgataactttcgcatatcatcatcgtcatca ttgtcttcatttttgtcgtcttcgtgtattgttgttccacgattaataat tttcaacgccttctgagttcccttttctaacgaaatacttgtattcttgt taccacgatacgtagagttgttcatagatgaggactttttcagtctggaa ctaggtctcgcttgcat >AW057198 ggataagaagattgtagtaattggatacattatcggaacgactgcagcgg tagacttggcagcttcaaatccggatagactagttggagttgtgctgatt gctccgttgaccagtgcactgaggatgttctgcaacaatcctgacaagga aacgacttgcattgacaaaatctgccacatcaacacccgagtgctcatct gccatggagaccatgatcaacgcatcccaatgactcatggaatggctctc tatgaaaatctcaaaaaccccagtgccaccactgatcgtccatggcgcca atcatcattcaatcattagcggagagtatattgaagtctttactagaatt gcaagcttcatgcggaacgagacccttctgagctgccgagccaatcaaat cgagtcgtcctcgtcgaagaaattcaaacatgaatgaatagta >AW057200 cnaatnccttcaaaagtcgccggttgctcaatcagcataatatgagaaac aaagggcgtgataaaatgatggctgatgacaacgtctcattcacggatgc aggtgatccaacaccaggcgctaaacctcaaggaggatcgtcggcaatgc tggatctacttggaacactgaacaagaaggaagacaaaaagaagaaggat aagaagggcaaaaaaggaaagaagtcgaagggaaagtcgaagaccaagaa agtcagaaagacagacaagtttgagtcgcaaaacttcttggttcgcatcg agggaaccatattttgtgctggaatcgttgtcggattggtggtgctgctc gtcttcgttgcagttgcaatcttcttcagcgtgaagtctggaggaaacat ggtgcactacatccatccatggtggggaggacttgaagaatcgtcttcta attgagagacgaatcgaaagaaatgaaaaagtgacg >AW057201 aattggagtttttcgacggataaagaagtttcagctggaatcgaaagatc taaacgatgatctcactggtttctgtgattgccaccgcctcggctactag caccgtgcttgccatgtgctccagcaaggatcgtcaagcggaccgtaaaa agactaagaagggttccggatccaagtcttcgcgtgtttcttcgaagtcc catagatcctcaaaaaccaacaagagatctggaaaatctggaaaatctgg aaaaactggaaagttctgaaaagtctggaaaatcaagccgtggaaaatca tcaaaatctaagaagccttccaaatcaaaggccggagttcttccaatccc cggagccgcaggaggtcccaagtctgaatcgaagaagctcagacgtgata gcacggacaagtcaaagtcacagagatccaaaagatcttcgaaatctaag aaatgcgacaagtcctccaagaagtgcgacttgaacaaggctaagaacct ctgcccaacggttaaccaggcagatgtttccgacgtgtccatcanatcgg actccggtgagaaatccgannagtcgaaagctctgaagttgttnccaacc agtcgaagacacattcccggagggacaggttctgctcagtgaagagtcga ctcttctgcatcgnccacaagctccattcgca >AW057202 tctgctgcttttcatagctcaatgtgcatttggcggaccgacgaaactta ctgaggacgaacagtttgaatttttaaaacgagcaaataatttgatgcaa agcaaagctaaactcgacgcttatacttggatgaatttcgacgccgaact cgaggctgaaatattgaaaatgtcgtgcgatgaattggaaaaacagcgag gaattcgatttttcgagtcaaagtttggttaatcatgagattacaagtag aacaggagaaactggaattgcgtgcatcaagagtgattgtatgtacacag atctcataccagctgtgccgatcgctcatatgtgcctttacaatcgccca gtggtcacaacaaccgtcccaccatcaacgacttctggaactgacaatat tgcgagttttttctttggaattctcatttttggagttttgaacttgcctt g >AW057203 aattcccaggaacatcaaagagtcagattagaaaaaagaagagaaggaaa gcaaggagaccgaagttttcatcctcaagctcgaaaacacaacggtaaca ttcacaaaccccaagcttggcgatgaagtgaccattgcatcactaaacct gacaaatccgaccaaggatcggtatgcattcaagatcaagtgcacttcaa accagcttttcaagatcaagtcaccagttggttatatcaatccagaggag agcttaacaattccggtctaccactacccagcaactgtcattccggagaa taataagcactacttcgtggtctactacatcaaggctgcgaacacagtaa aggaaaagattccagttcgtgatttgtggaaagcggcagcatcatcagaa ggaacccgacgtgtcttcatagacttcaaaaa >AW057204 aacgaggcagcgaaaatttcccatcctctgccttcaaaaacttggtgact catccaaactaccgggattcgtctttcactccgttcgtcttgtttatttc cgatgacgttcctaacattcatgagtgcctcaaattcgaagagcgtatga gtgacattccaacgcagcacgtacttctcaaaaatgtcaaaaagatgcgt gacaacatcgaaaagaagtctcaaggtggaagaagagcatatgatttgac tcttgacaatat >AW057205 ttacatgtcacctgaaagaattctagaattcgggtataatttcaaatccg atttatggtcgactggatgtctgttatacgaaatggcagctcttcagtca ccattctacggtgacaagatgaacctgtactcactttgcaagaaaatcga aaactgcgaatatccacctctgcctgctgatatttattcaacgcaactcc gcgacttggtatcccgttgtatacttccagaagcttcaaaacgaccagag accagtgaaaggtctacagggttgccgaacacatgaacaattacttctcg ccttccggggaccaatcaacaactccttcaacgcaattctaaaaaaagct ataacatttcaatttcaaacattttctttaaaacgtagtgttcttgtatt ttcaaaaggtggaaacattcgtcaatgaccacgtgaatccgtgatgtgct aaattttac >AW057206 cagctttccatcttgcacatacaattgcactcttcattaaatcattcgat tctccatgtcaatatctgaaaactccctcaactgtaacacttaattcgga tatctgtcatgctgttaatggagttactctggtgtgtgcagtgatttcaa tgattgctactgctcttgccagtatggccgtcttcattcgtctcactaca gtcgtcgtcaaaatttcggataagagagttatggtcaccaagtcgttcat ttgacaactcaatgccacaattaatcagtagaaagtcaattgaatcggag aaagaagaagattgtttggagacaccaagaagaaaaatgggaca >AW057207 atgtatacttctatcagtttgactatcattactctgcaggattcggagtg ttccggtggcttcttccgtttttgggctccacacattgcacggaaatgag atatgtgctcggcaaaggaataatctcgaaattccgaccaaatgataatg ataagaagatgcttcacgttatgacaacttattttacaaattttgcaaaa tatggaaaccctaatggagaaaaccaggagactggagaatggcaaaagca cgactcggcacaccccgtccgccatttcaagattgatctggacgattctg aaatggttgaggactatcaggaacggagagccgagctatgggataaactg agagcattaaatgttagcagggctcagatgtgaaattgct >AW057208 aaatttcgaatcacatcaagacatgccgattgaagttacaatgtctacaa gcaatgaatccgcaaatcttctggtaccctcatttttgccacgtggcgaa gttctcggttggaatctgacgtcggtggtgaatccggtgactcgacgtaa agaatacacgtacacggtgtgtgtgcagactgtcggagcatttcagtgca ccgagcaagcgggcgttgtccttgtgaagtgtgatggtccatgtggaaac aagttgccaaccaatcatttgattgcattgggaaagtgtgatcacatgtt gtgtaaggcctgcttcggcattgtaaagaatccggatggatcctatggat gttcgaacttctattgctggtcggaaccacgtggaaacttccggaaggaa aaggccaactacaataaggtcatcaacaagcagatctgccgtgccagaaa gttcaaacaagacggggaagatttgcaagcatgcagcanatcgaatcttc ccaagactcctgctgacctatcggaccgtgagatgaactctgcgaaatcc tcggaatcggatacgtcgtcttnctgatggttcttcctaccaatccactg ttgactcgtactgactgccaanaagattctaa >AW057209 ttgtgtattcgaaagaaacatggcagaaaatagtgtaacacttctacaac ttgcccatctcggttacagcatgctcgctccaatggtgttcaccggatac gtcattgacaatgtggaggaaagaagaagactggtggatcatcgggtggt ggtggagctaaaactggaggaggaggaggagatggtgcaaccagtgcaaa gagtgataagaaagaaaaatcatcgagtgctccaaacccagagggaccaa aggctccaagtgataagaatgctgtcgcagggacacatgatccaaattat caaactcttgctggagttgatggaaatgtgttccaagagaaaggaaaagc ttctcctgttgctgctgctggtggagcttctcctgctgctggtgctccaa aacctggaggtcctggaatggctgccacccacgaccccaactaccaaaca cttgctggaattggaaacgattgtttcgacaagaaagaaggtgcaaaacc agcttgtggtggtgcggctccaggtgctccaaaagcctgtggtcctggaa tggctgctactcatgatccgaactatcagacacttgctggaattggaagc gattgcttncagaagaagtgattgtgtcaactcgtgcanaatcgggtaca tnnacgaaatatgtga >AW057210 aaaaatgaacgatcttgttattaatcagaagattcttccagatatttcga aatcgaaatgggatctcgacacgtattcaggtcgtgtaaaacactacttt gcatctgctaacccgatgacactcttcacctcatccaatacccaggaaat gtgcaggaaaatagttgtagactataaaaaggggataataaatccggaat tgacgatggatgagctatggagtgcaaagatcctctatgattcagtatat catcctgataccggcgaaaaagatgtctgtctcgggagaatgagcgctca gaccccagcaaatatggttatcactggaatgcttctcagctgctatcgta cctgtcctggtattatattctcccattggatcaatcagtcgttcaatgca attgtcaactataccaatcgaagcggaaattgcagaactaccaatcagca gctactctattcgtatttctgtgctactggagcggctacaacggcggctc tcggtctgaatatgatggtgaagaatagtcatggattggctggaagattg ttccatttgtggctgttgagttgcaaatgccattaatattccaatg >AW057211 gaataaattcatttttacactactcaacaattgatttgctgcttctttct ccacaatctacaacttctgaaaaatgatgaaaccaccgatttgccgccgg atgagcctgccaccagttctcgcaaagaaggaaggtgaacgaattcaagt tccagaaggtggagaattggtatgctaccgaaaccaagaagatcaacatc ttttcttaaaaatagaatactcgcgcgaaacgatgattaaactttcatcg tctgttttctgttatgcacgaccaactggcctcaaagaagtttccgatga gttgccggaactacttggactaccccattcgattccactgacagattctc cgccgaattttgaatgtcttcctgaaccattggcacatgtttcgggatgc aattcacatttcagtatgaaccatcctcctccacttctgagccctctacg tggccctgtaactgaagctgatgaggcgatgcgcacactcagtagtcatc gagaacttcagaagcggcttcaaaacatttcaatgcgaggagagttgagc ccttcaggatgcctggttgacagttgtgacacacctgcaattcgtcagat tcgcaatcctcagttcacttga >AW057212 tcgaagttactcggacttctcatcttcatcgctttttaatcaacatcatc atgtttggacgtttgaagcagaaagttaaggaaaagactggacgtgccaa ggcgacaactcttcccgcagaagtggacgatgcgatgggctacttcaaaa atctgacgccacgtgtcaaggaccttcacaagagcatgacaaacttggaa gatattagcaagtggcagaagaaggccagtttctctggcacccttgagaa ttactcgcgtctcggtgacaagatcaatgtgaaaccatttatggatgctg ttgatgctagaatgggtgccgaagctgatgccgtgaaaggggtcctcgcg atttgtgaaaaatacaagtcattctaccaaaacgagggaaaacttcacgc ggacagtatcgccaatttgaataggactcggctcgacatggacagtgcgg cggataaatatgcgaacaacgagactgaagttaacaagactcgtttggat acagtaccacggaatttgaagtggcttgtgagagaatgcgagaactggcg aacggaatcaagacaattgaatcgaaccattcttcctggcaagacgtctt atgaggagaataaagtgcngtgcgtanataa >AW057213 gataggtgtcacccccaaccgattgtccttcaaaaaattcgcattttcaa tcatcgagatgatgaggtgaaaatcgataatttttcggctcgtcgagctc caaaactccgcgacgatgatagtgacgagctcaacgtggatgtaccagct gaagatgatgatgatgtggaactcgatgacgtcatcgttgctcaaaatcc ggcgttttacggcacaattgaggcggagaagtgtgcggaacgagttgctg cacatctttcgatggcctgtgagaacatggaacgtctgcaatttgtgagc gaggccgtgtatccacagagtgctgatcatttgaagaaacttcaagaaat cgatgatgacgtcaaggatttcaattggcagatgagagagcgtcgtgtca aggcttcaaatccagcaggaacagccacaaaagttgcacacttcatt >AW057214 tcgaaacttgtgaagaaggtctgcaatggttgccgcaaagaaattgatgc caaagcgaatgaagaagaagatgcaggtggagagaatggctcatgccaga gcttgcaaggttgcaaagagagaggctcgcgtcgctgaggaagcatctgg aaaatctactggtggatctactcgcggagccaagtgatagccgagccaca acacatga >AW057215 gaaacttgtgaagaaggtctgcaatggctgccgcaaagaaattgatgcca aagcgaatgaagaagaagatgcaggtggagagaatggctcatgccagagc ttgcaaggttgcaaagagagaggctcgcgtcgctgaggaagcatctggaa aatcaactggtggatctactcgcggagccaagtgatagccgagccacaac acatg >AW057216 aattccgaattccattcgactaaccatctttttgcaaacttgcaccaaat caaccgacatgcaatcaatcaacatcttgttcgccatgctcctcatcttg gctccaattgtcaatggagacgatactgccgttgctgtgactgcaactga agtcactgaagatgcaactgaagtcactgaagatgcaactgtggctcaca ttgaaacaacagccgaagccgttgcagaagcagaaccagctacagaacca gttcaaaccacccgagctgttgaagaaactacacaagctgttgttgtaga atccactcaagagactgtaaatgctgtaaccaatactccagttgatactc cagctaccaacaacgtcgaagcaactactgaagcggcttctcgtccatcg ctttcatcgactgttgcatcaaatatgacttctgctgatgacttgctcgg agaaacttcaaccaatgccacaaaagctgcttacaacactggaaccttca ttgttgtcccaatggtcgttctcgctttgattcaatga >AW057217 gatgaaccgtcagacaactgctctcctcctcatcgtctcgatccttgccg tttttgtccaccatggatttgctgctgctgaagaagagaagaatacagct tcagtcgtcagccctgctccggactctgaagcagcccaacctgctggaaa cggaaccgaaacaccaaaagatgaggtgaaggatgaggcaccaaaagaag gtagtgaaactgaagcttcaccagaagccaagacaaaaggatctatggta ttccatgctcttggagccattnccacagttgttctcgccggcattatgtg aagaagtctgccgaa >AW057218 tctggaaagaggagtttcttctgaagacatantgattccttctgttcgtc ggggtgtcattccagtcaacactcttcgttaccaaatagaaaagcatctc gagatgtgtactccagcttctgaacaattgtcaaagagttcggatcccaa catctcctcgatgtacgttttccatcaaggaattcaagtaaagcaggaac caatcgatgatgaccaagaggaagagcaacaagtacaaaagcagcttgta ttcaaaatccgaggcttcgaaaacgaagaagctgtgaagaaggagtg >AW057219 tcccaacaacacctcaagatgaatgccatttacactgccgtccttgttgc ttcaactctcgcctacactgcaatggcttggattggactcagcattgaag ccgccaacgaggatatgatctgaagtggcgcc >AW057220 ctcttttctcttcgtcgccgnggatcaaaacgttgcccgacagggatgag ctcgaataagacctctcgctcaacttcgtcatcgtctgtcacatcatctt caggacatggtgcatcgagcttctccgaggattcgtctgttcgctctgtc accaacagtgttagaagtactagaagcgctggatctatcatgtcaatggc tagtgccgaggcaagtgtcgttgctccagatctgacaatctaccatggag atcgttagcaatcctaccagctcgctgacaaggggaaaatggtcgttatc aaccggaaaaatggggtgattgtctacatgcttcgttgtgtcgacggccg tcgtgtctacattgaganatcttccgaaggagccagtcttattctgacta atcaacgtggaaaagtgatcaaggcattggccgngcactactag >AW057221 atcaactttgtctcctcaaaaataagtctacaaatgatcaacatcgatcc accatccggcgactacccagcttctggtggttcatcaactcactacattg tctccgaatcggaatctcgtttggcattcaaagtcaagtcgtccaacaat gaatcgtatcgtgttcgcccagtctatggattcgttgatgcgaagggaaa ggctaagctcgaagtgaatcgtttggctggaccagcgaaggaggacaagc ttgtcattcaatacgccgaaagttcagctgatgagaccgatccgaaggct ccgtttgcggctggtgctcaacaaggagaagtcgttgtcaagatggttgc ta >AW057222 aacattcgactaaccatcatctttcaaacttgcaccaaatcaatcgacat gcaatcaatcaacatcttgttcgccatgctcctcatcttggctccaattg tcaatggagacgatactgccgttgctgtgactgcaactaaagtcactgaa gatgcaactgtggctctcatcgaaacaacagccaaagccgttgcagaaac agaaccagctacagaaccagttcaaaccaccgaagctgntgaagaaacta cagaagctgntgntgtagaaggcactcaagagactaaagatgctgaaacc aatgctncaagtgatactccagctgncaataacatcgaagctaccactga aagggcttctcgtccatcggtttcatcgactgttgcaccaaatgtaactc tggtgatgacctgttcgggaacctcaacaatgcacacaa >AW057223 tcgattctcgaaatggctccaacacgtacttcacgacgcagttcagcgaa cttttcattcgacgatgtaaatgttgaggaacagaggcaagcgtatctcc gctacgaacaggaattgaaggatctcgcacttgctcgaaaccttgagaat gaattgaactgggggccaaatccagccaacccagcgcctcctcaaaatcc tccacaacctgaaactgttcatatccaagtgaatcgagataatcctcaag ctcaacaacaaaatctgacaggaaccatagctccagctggaagagaaggt actggagtacaggtggctgctgtggccccggatcccaccagcgcagcgac tgggtcacaaggacaaccaactccccagaatgcacaaaatcagccaacaa tcagagccgcagctggaagagaaggttatgganggcacgtggacgttnta accgccgttcttgcaagaaacttgccgtcacaaggaatcaatgaaaatgc cataaaccctggatgatgacgaatctgaagtcaattgtcagactcttttg aagctgctccatcgaatcagccaagcacatcacaaggcacctcaaatcat gcaggctgagtggcttcacgatcgcgccacgggngaagtgaccgctgttt gggactcagtgaccaaacaagg >AW057224 gaatggatcgtagcattgatcgtcgtctttgtgggcactgtcaccaatcg attggaagtgaggcacttgtcgccatgaatcgtctctggcatccggacca cttcacgtgctcatcgtgcaaacgtccgatcaagcagacgttccaggctg ccgacaaccacgcctactgtgtccagtgctttgctcaaaagtacaacccg aaatgtgctggatgcatggaaactctcgtcgacacgtgccttcttgcttt ggaccgacactggcattcacgttgcttacgtgctcctcgtgcaatcgccc attgccaaatggagaattctatctggttgatgataagccgtacgacttgg attgtcactgggcaaagcgtctcgagaagagagagcacatggaacgtggt gaacgttaagaagaacgccgttaatttgtcgaacttccccactgtttttt tccttgtattcttgtgat >AW057225 caactcaagaagaagaagaagatttgctgtcaaatcggcatacatgtccg ctggaggatacggatctggatacatgggagcaaatgcttcatcatctgga tatgctcgggaggactacgctcagggtggcgctggaggatccggaaatca gaatcagggaggatccggtggaaataccaatccaggaggacaagtgttca aggcacgtaccgatcaatcgtgttacttgggaccataagtctcaactgta ttcgaccggcaatt >AW057226 aaaaaacttcatcaagatggctgcgaatatttatttatggatgtcgaaac gcaacgtcgaattgaatacacacgacagaaagtgatgagaatcgagcaaa tgaatgagcaacttcggaagttgcaagtcgaataagggtcgcgaggagaa gttggacagacttttgaaacgaaaggaatccttggaactggacgttgcaa gattgactgacgcgtcgatgagagctgagccagaggtcggagcggagctt cttcattctatcgaagagcccatggaagttgatatgatctacggtgaagc attccacgcaaagacttgtcaactgaaagtccttctcaacgagataattg ttcgaacaagcttcaacgagaaagcaatgtgtaaagagattggacatcag gaagctgaattcgagaatcgattgaaggaggtgatcagtggaagagctca attgacgctgaaatctgaagaagctcagaagaaatgtgaattgttgatga gagagcattcgaatgtttatgaagatgttcgagagatggaggataatatt gagaagtttgatgcatcgagatggttgttgaatgtggagaaaagacgagt ctctgatattctgatcgcaccaaagagagccgannatgggaatgagtgct gccagccatacattggtcttcngaagcttctcactttgagacttgcacat caaactgagtctcactcacgatcattt >AW057228 acgaaagagatggcggacaagtcggcatacatgggtgctggtggatatgg atcgggctacatggggtccaacgcctcatcgtcgggatatgcccgcgagg attatgcgcaaggaggaaatggcggaggacaacagcagaatcaaggatct ggaggaaacaccaacccaggagggcaggccttcaatgcccgcactgatca atcgtgctaccttgggccataagtggcggttcgaataatacaagagcaag tagtcaaccacccc >AW057229 atccgtccggaaaagctgcgaatgctgcgagagcttatggaagcgggatg aaaaactctgcagcctctttcaatcactcttgagagcagttcgataagct tttcgaagcttgtaacacattcaaggatcaaatcaatggaaacgtccttg ccaaattgatcagtttcaaagatgtggaatgcaaggatgtcgacactcaa atgacccagttgaagaaggctcagaaggattatgcgaataagaaatcaaa gatagttccagatcaagttcaggttgatgcagctgaagcaaacttaagaa gtt >AW057230 aagatcactctgatcgtcgtcgcaccaacaccttttcttcttcagccgat gaggatggagttccaaatgaggtcgccgactacctggtctacttttcccg catggttgacgaacaaaatgtgccggaaattttgactctctacgatcaag cttttccagatctcaccgagagattcttccgtgatcgcatgtggcccgat gagaatgttgtcgaaagaattattatacagtattg >AW057231 tctctggaagcactctgctcgttcaatgtggtggaaagaaaaagggagca acttctgccgaaggaaaatcttcgacgatgggcccggctcctggaggagc tcctgctgctgcttccgctcaaggagaacctgaagagaaggagtaa >AW057232 gaattcgagcagcaaatctgcctaatcgttgaaatctactcgtggagcgt cgaagagtgggaaatctcgccgttcatcgaaagctaagcactccaagaga tcgtcgaagtctagtaagaagggaacgtctggaaaatctggaaagggaag cagcaagcgtggagggaaatcatcaaagtcttcgaaatcgaagaaagtca agactgctaccacctctggttctcaagtttcaactgtttccgctgctact ggtgtttctgataagcaatctaactcatcgaaatcttctcgtaagagctc aaagagttcgaagagccgtaagaatcgtcgacttgattcggatgcccaga agaaaatggagaaatcgggaaagagcggcaaagttgctcttattccaaaa acgcaacaaacaactggaagccaagttgggtatagccttgctgaagaagt caattcgatcaagcactccaaggaaatgaatgtggctcctgctaaacttc aataccagacacttggcggagtcaatcaaattgaattgaagaatacttca aatgaacgtaaagcttataagatcaagtgctcggataactccctctacgt gtcaatccgtttatgatttgctgagcacgttctccgttaagatgatggtg agat >AW057233 ccggatccacagggaaagtactactgtattgtaggagctgatcgtgcgtt cgggagagaagtcgtcgagacacattaccgggcttgtcttcacgccggac tcaacatttttggaacaaatgccgaagtgactccaggacaatgggaattc caaattggaacctgcgaaggaatcgatatgggagatcagttgtggatgtc gagatacattctgcacagagttgctgaacaattcggtgtctgcgtatccc ttgatcccaaaccaaagggtcaccatgggagactggaacggagccggatg ccacaccaacttctcgactgccgaaatgcgtgctccaggtggaattgctg cgattgaagccgccatgacaggactcaagcggacacatttggaggcgatg aaggtgtacgatccacatggtggagaagacaatcttcgtcgtttgacagg acgtcatgagacaagttcggctgacaaattctcatggggagtcgccaatc gtggatgctcaatccgtattccgagacaggtggctgcggagagaaaagga tatnctggaggatcgtcgtccgtcatcanactgggatccttatcaagtga ctgcgatgattgcacagagcattctcttctag >AW057234 gaaaattcaatcaataatcactatatcaatgttcattatcatcgtcgtct cctataatgagctgacagctgaggaaaatgataagaggctcgagacatgt ggaaatgagcatattgggaaaccatcgaaaaactcgataatctctcctgt ctcctggcttaccaaattgacaagatccgaaatttctgctccggcagtta taatatctcctcgacacttgatcacttcttcacggcttttcctcacaaaa tcagcttggaaaaatagcggagattcgattgattgtgatgacagcataaa gcacttggaagttccggtaaacaagctctcggatgtgattgagccgtgcc tttctcaaaaggagaattgctccccgaaagtgatgaattttgccagagca tatattctgaacttttgcaaatcaacattggtgcaaaaaagagtgtactc cttcccaatgattntggagcttgatgagaatttggaaggcaactcaagtt atccatgtctagctgatgaatcaatcanacttgccaaaggagatgccatt gancgcttatgatgacannacaattcgaatggagcatcgaanagtgatgt cggcgccgtgtagtccgatatctataccgactgt >AW057235 agttactcaagctgttccagaaggatcagtacttagttcagatgtcactg atcgtccaaacatcgactccactgatgttgtatcaaatgcaacttcggtg gaagatttgcttggaagttcaacaaatgcaaacaacactggtacattcaa ctctaggacctttgtaattgctccaatgatgattcttgctttggt >AW057236 ctcccaacaacacctcaagatgaatgccatttacattgccgtccttgttg cttcaactctcgcctacactgcaatggcttggattggactcagcattgaa gccgccaacgaggatatgatctgaagtggcgcccat >AW057237 ttttgttttggtttgatttgtccgtaaaagttttcacaaaatcatcaatt ttctgttgtttttcttgctccttggcttgtttcttttccttaactgactc gacgaattctccaacttgttgagaaacctccttccatttctcgatcgctg cttcggcttcctttttcttggcttcagctttttcctgtttctccttttcc aattcatcggctctctgcttaactgcatcagatacctttaaagcgtttcg ttcagtttcttctgagtctctttcccgtatttgtctcttgctgctctcgc attntcaaaattaactctcagttcttcttccaatgccttttccgtatcaa ttactgactggaaggtgtcagcaagggagtagtagcgaacggatgaaacg ctcgacgcaaaaaa >AW057238 gcaatgtcacgttgtcaactcattgcaaacaaagactggaccaacattga acggagtgattggaagacaatccggacaagttgctggatttgactactcg gctgccaacaagaataaaggagttgtatgggacagacaaacacttttcga ctatttggctgatccgaaaaagtacattccccggaactaagatggtcttc gctggtttgaagaaagctgacgaacgagctgatctcatcaaatttattga agtggaagctgccaagaaaccatcggcataagcctctactaaata >AW057239 tcgtcagccctgctccggactctgaagcagcccaacctgctggaaacgga accgaaacaccaaaagatgaggtgaaggatgaggcaccaaaagaaggtag tgaaactgaagcttcaccagaagccaagacaaaaggatctatggtattcc atgctcttggagccattttcacagttgttctcgccggcattatgtgaaga agtctgccg >AW057241 atgagccaaagtgtgtagatgttgttgaaggaaaggaaagttctggagtg tgcaagacgaaaggcggagtctgtcgctttggtcattgctgcccatcact taccctgacaattgcaccatctggaaatggtactgagtcagcgacgccta ccttgggcccatatccatacttgactaaattatccgtgtgatgctaacaa acctatcccatctcaattcagcacctatgcattttgcgatcctgacacta atcgcgttggtattttgggcaaaaggcacttaactggagaagaacgtact gaggtgaagggatcggcatgctcttctaacaaagactgcaagtcgggaac tgtttgcgtgtatgttaatatcaataaacacgtctgctactaccatccgc tgaagaaaatcgcccgtgatgtcagtcaaccatggctctatgtgctcatt agcttcctcatttgcggttntatttntgtcattntggcagtcatgagctt cgtctgctaccgttcgaagtctgtgtttgacaagtaccagccaaagaaga atgcaggaacacatggtagcagacagtgatgggaagaaaggaaagatagt gggaagaacgagacgatacttanagtcaacgagctcccagtcaagaccag agacagagctgatcgacagcgga >AW057242 ttccacctggtaccactgtcgacactggaatcgtctctcccgaaggattt gatttctacctatgctctcactatggagtacagggaacttctcgccctgc gagatatcacgttcttctggatgaatgcaagttcactgctgatgaaattc aaaacatcacttatggaatgtgccatacatatggtcgttgtactcgttcg gtctccatcccaactccagtttactatgctgatttggttgctactcgtgc ccgctgtcacatcaagagaaagctcggtcttgccgacaacaatgactgtg acaccaactcgctctcttcatcacttgcttctttgctcaacgtgagaact ggaagtggaaaaggaaagaagtcacatgctccaagcgtcgatgatgaatc gtattctcttcctgacgctgcatctgatcaaatccttcaggactgcgtct cggttgcagctgactntangagtcgtatgtacttcatttgaagactcttc atgcagacggagccagagaaa >AW057243 accgccttcgtcccaaatgacggttgcntgaactttgttgaagaaaacga cgaagtgctcgtatctggtttcggacgttccggtcacgccgtcggagata ttcccggagttcgtttcaagatcgtgaaggtcgccaacacctccctcatc gccctgttcaagggaaagaaggagagaccacg >AW057244 ccgnctccactcttcacttgcactattcttatgcaaccaagcaacgncat gctcgctgttcttctcgccttggcttcatttgctcaaggaggcagatctg ttgctccggctggtgcagtcactgaaccaacagttactcaagctgttcca gaaggatcaggacttagttcagatgtcactgatcgtccaaacatcgactc cactgatgttgtatcaaatgcaacttcggtggaagatttgcttggaagtt caacaaatgcaaacaacactggtacattcaactctaggacctttgtaatt gctccaatgatgattcttgctttggtgcag >AW057245 cgaaaatcgacaacgagacgcagtcacagtgagaaaagttcatcgagagg ttccatgtcatcaccgccaaccagattctatccatctgaagattcagagt cgatttactcgactcgaaaatgctccaaaaggactacgacgactgctact acggatgaggagaagccgaacaacagctactacattgacgatatttatga ttcgactgaagaatatcaagtgacattcccgacggttgagctgaaattgc cacgtcagagaaagcattgccgcaagcgatcgaagagacaggatcaggca cagggagagcatgtgacaatcacgaaatgtgttgatagaagacaagtcta cggagagcccgataataagaacaccatatccgagcactctacgtacacct actctacccatccggaacgttgctctcaggccggccgtacttctcgttcg aacagctattctgacgccacagatgccacatatcggactgg >AW057246 gcttggttatccagcagcactactgcaattatgtttggaggtggagattc aaagcctatcgataagaagaaggaggacaagaaaggtttcgatactcgaa aattcttgattgatctggcctcgggaggaactgctgccgctgtttccaag actgctgttgctccgattgaacgtgtcaagcttctgttgcaagtacagga tgcttccctcaccattgctgccgataaacgttacaaaggaatcgtcgatg tcctcgttcgtgtccccaaagaccaaggatatgctgctctctggagagga aacttggctaacgtcatccgatacttcccgacccaggcgctgaacttcgc tttcaaggatacttacaaaaacattttccaaaagggattggataagaaga aggatttctggaagttcttcgccggaaatctagcttctggtggagcagct ggagccacttcgctctgttttgtctacccattggattttgctcgtacccg tttggctgctgatgtcggaaaagctaatgaacgtgaattcaaaggcctgg ccgattgtctcgtcaagatcgcanagtcggatggnaccaatcgactctac agaggtttctttgtctcggtacaaggtatcatcatctaccgcgccgctta ctttcggaatgtcgacacttgccaagatgtgttcactgctgatggcanga aactcaacttcttcgcttgcctggcttattgcctcagtngntacntnngt gatctgnatnctctnctatccatggnatcctgntcgtcgcgcatgatgat caagctggtcgcaagatgtctctaccagatactttgattggccg >AW057247 tttctctgaataaccccaaacgaaagacattcgatgttggacaagtcggc ttacatgggcgctggaggctatggatcaggctacatgggatccaatgcct cgtcgtcgggatatgcccgcgaagactatgcacaaggaggaaatggtggt ggacaacaacagcagaaccagggttccggaggaaacaccaacccaggtgg acaggtcttcaaggcccgcactgaccaatcgtgctaccttggaccataag atgatcgacactaggagagccagtagccaaca >AW057248 ttaaaagcaagcttcctcgtagctcgtcgtctggattctcgggaacagga tctgaaagatctatttcaagcaaaagatcgaaaattggaaacacttctgg gaaccgtaacaagatttcgcaaagtgctagaagaagacttttgcaagctg aaattgagagtcgacgcaatagatccaacaaatcaccaggttccagtcgt aaaagcattctgaagaaatctccgatgaaaaaaagaagatctctttcgag aaaatctttaccaaagaaagagcatattcctccagttcaaaagtttgcaa ttatcaagaatccagctgctcgtaatcaagttcgtggatttgttgctgaa tacgcacaagatgcagaatttgaagcttttgagcttttggttgacggaat tccatttgttgctctctctctgatgaatgctcatccagatcttcgtggaa gattcaaacctgttccaccaccaacaccgatgaagaagcatanctcaggt atncaatagcccacgaaacgntcctttgcgcgtgaatctatttccttcgt taannctcganaagtgttcgtttcactgantggaggatctctttntcgaa nagagctagaaaatcttcatccgggaatgcgattcatcggatcgtgagct tggagnagacagtatcgnatgccatgatntgacacttgtcgcagtacaat ggatttgtgggaagatgtgnaccatagtgtgaaaa >AW057249 tgactgtgaatgattctctgctacaccattttattatcatcatctcggta tccaattccatattcttatttcgagttatccagaaatggcgatccattac aatatgtggagt >AW057250 aattccttctacgcgaaaacaaatgcacctcccgttccgttgcccagtct tggtgaccggcgtcgtcgtcggcggagccgctcttctggcgattgccgcc tactactactggagccagaaaaagaaaagctctgatacttcatctgccac gtcatcggagtccaacgatgttgtcatgatgtcatcatcggagcccagag ccgatggaggagccgattcgaaggcaaagttcaatattgaggatgaaaat gtgagaagagtctgcgagaagctgttcatggagcagatggatttggggga agcttatttggaggatgaagaaaccgaggagctcggcgcaatccacatgg ctaacgcaatcgtgctcaccggagagactgctcagctgctcaaagtgctc ctcggctcgatttcaccggctcactttgccaatattcaaaagtacctccc atcggctgactngcgtgttcaccagcttctccaagacgagctcgccattg agactattgcccagcatttcgac >AW057252 accattaccgtcgttttgaagaatgatcatcgtcgacccaccaactggaa acttccctgcatctggaggtaattcggttcacaacatcgtctcggagtcg gattctcgtttggcattcaaggtgaaatcgtcgaacaatgagcactaccg tgttcgtcccgtctacggattcattgatccgaagggaaaaaccaagttgg atatcaatcgtattgctggaccaccaaaagttgacaagcttgtcattcaa tatgctgaagtgccagccgatgagactgatcctcaagcaccattcaaagc tggagcacagcaaggtgaagtcatcgtgaagatcactgccgaatgagaaa atagatc >AW057253 caaaagatagtacaaaaagtacctatgaatgtccaattcctcagggtgga ttctatggattagctgatcatccaaatcatgggcttattgcatcgatttt gaagaggaaacgccgcaaaatcaagatgacagaagacgtaaagaagatgt gcgggaaggttgaagcttacaagacatgctccgacaagctccatcaggca ctattgttcatgctcgtcgagagtccagaaatttcaaaggatttggtgac tcacttcaaaactgaaccaaaattatcgtatgctggaaaatatctcaaaa cttatgaggcaattgcaaataagggacgggataagacaaagtatgagagc ttggagccagcgattagtacgctttctctgttggatgcagaacgcgaatc tcgtgttcgcaagcagttggacaatttgaagccgctgacaaaattcattg gagaagaactattggagtacgcacggttgaggagagtgtactgggatggc ctggaagcctatgacgatgcgctgacccaacagagcnaagatcgcactga ggaagccgagcgaatcactgtcaacgctcaaaaatggagaaatgatgtcg ccagaaactgatggattcatcaaaaacgggattntcgacagagaccaaag cattgtgatgctattctgaaatccggatgaagcatcctctcncatcgcgc catgtcgaccacacaatcagccgaggntctgagaactgtggaaccccta >AW057254 cagcactgggtatttcaggcccgggtgtttctgcccaaaataccgcaact ggtggtaaagttggtgaaactagtgaggtcacaacacaagtgtttcaagc ttcgacatatggtgctgtcaaagcaccgaagattgtggctgatgcccagc aaggaaccaatagaagttcagaaacacttgaaaataaaatggtacacgat ttcatgtggtggattgggtggagtcgttggagctattcttttctctggaa tattcttctttttcggtgtagcatctatgcgtgataaagtctactttttg gaaattcctgacattgtacaaatgtttggtataccctcgactgtccagaa tgcatcg >AW057255 ttttatcaaagtgatacgctcaaacacgtgacaatttgaaaaaatgctta ttgaactcctcatatccatcaccgtgctcgcgtcgattctcgtctcgtgc aagaaagagaagaacgaagcggcaaaactgaagccacgtggcttcgatcg aaaatccagtggagcaccggctccggctggagccaacttggttaacaagt ctggcggttccggcggaggtagtggaggatctggagatgctaatagactt ctcaaaagagaaatcaaagaagtaaagatggaggagcgttcggctgacga taacgagacaatcaacgatgtaaagtccaactggggaactgtttgaaat >AW057256 aaaatcgatttttcgtcatgattcacgacacgtacaatccgtacactgct gttcagcgaagcccatcacaatggttcatctttgtgccatcttctgcacc attggataatactacaatcattacattgtcccatgatgctgtcaatttgg ctagtattccaccaacggactctgcggaaagtcttcaaagtcgttatgta gtcccggaagtaattccattcaaaagctcatctgaagttgacactcgttc ggctggaaatccatatttccatcatccaaacttcaccactccatctcaat actttgaacatattatggattctgaaggagaacaacacaagatggagaat aaaaacgaagaaaatttgctgactattatggatatcaagaacttngagga tccatttggaaattatgatatggatggagaacaggcaccaatggttagca tcaatccaccaaatgatgtgactgttaacaagaacgccttctcttctgac tcaattgctgatattcagaacatgaacatctcattttcgatcaaagcttt cactgctccagctatccccatcagtctcctgagggacacttantaagaaa tgtgagcncgaaatatgttcgagtgagcagactgatggaga >AW057257 aggtgatgtctacaagaaggcagtgcagttctactcaaacatcacggctc caagatcaacatctgtgcttgctccagtgatgtcgtcccttgaagtctac attaacaccaccacgacatctgcttttgcttcagctcagagcatcaaagt ggctgatattcttgaagaagatgcagatgcaatccgtgtgaagtcaatta gaatggctggattcattgcccaatgcatcatctttttatttgtctacacc atcgtcacgatggatgttgagatttggaaaattaatatggactggttgaa aattcaatattttcagcatttcgaagactccgctgctgaagttccggtct >AW057258 aattccgaagaacactcctcgattgcaaattgtaataaccctaaaaatga ctcatttgaacttcgagactcgcatgcctcttggaacagccgtcatcgat caattcctgggacttcgcccgcatcccacaaagatccaggcgacctatgt gtggatcgatggaactggggagaatctgcgctcaaagacacgaacattcg ataaacttccgaagagaatcgaggattatccaatctggaattatgatgga agctcgacgggacagggcanagggacgtgactcggttcgttacttgcgtc cagtcgccgcttacccggatccatttttgggcggtgggaataagttggtg atgtgtgacacgttggatcatcaaatgcagccaacatcaaccagccatcg tcaagcctgcgccgagatcatgcatgagatccgtgacacccgcccgtggt tcggaatggagcaggaatatctgatcgtcgacagagatgagcacccactc ggatggccgaaacacggattcccggatccacagggaaagtactactgtag cgtangagctgatcggcgttcgggagagagtcgtcgagacacattaccgn gcttgtcttcacgccggactcaacattgttggaacaaatgccgaagtgac tccaggacaatgggaattncaaattggaacctgcgaaggaatcganatgg gagatcagttgtggatgcgagatacattctgcacagagttgctgaacaaa attcgtgtctgcgtatcncctgatccccaaaccaaagtcaccatgggaga cttgaacnggagccgatgccacaccaactttctcgacttgcngnaaatgc tgct >AW057259 taaaatggatgtcaagcataatccaaaaaaagaagtgatcgaacgaagtc tgaatgcgactgaaaaagcgatcgaatcgattcgtgaaggtgacctagtc cgtcgtcacactgttttggagctccgtgacatgcaaatttcgatgagtga ggatttcgatcaaatccgtggaatgatgcatgaattggatcaccaaattg acaaggaacgcgctgaaaattcgaaatggatgaattggaaaattgagaag gccaaggcaactgctgatcaggcgttggcctctacactgatggtcaagga cgtccagctgttggaaaagaaggtcaatattctgaaggattctgtgattc aagttaacaaggcattctacaagtatgagaaggatgttgacatgaaagat ttgatggatcaagttactgacatggtgcatcgtacggaaaagaaggagca agatgcgttggagccccatgcgactgatgagcaagctatcgagaaagcct tccgtggagcaattgaaggcctttatggcctgaaatcttccaatccaaag gtcatggaggaagctaagctgttggctggagaaatgcgtgttttcagaga tgctngctgctacaagaacttncactcgatgatctcaaaagttgcgcctg gtaaatcggagtttccttgatacagcagctctgagactcatttcgacttc tgatcctctcgctgccaa >AW057260 ctttacggcggaaaaagtgaaattttcggtagtttctcgaaaaatatttt tcaattcaagcaagatgccagacgatgtgtgcgacgaaacattgaaaatt ggagtggttgttgggaaaaagtaccgagttatccaacagctgggtcaagg cggctgtggatctgtgtacaaggtggaggacatcgaagacaagacaaagc agtacgcgatgaaagtcgagttcaactcgaatgccaacgctggaaatgtt ttgaaaatggagggtcagatcctcacccatcttgtgtctaagaaccacgt ggcaaaatgcatggcaagcggaaaaaaggatcgatactcgtacgtggtga tgacccttctcggcgaaagccttgaatcacttatgaagaaacacggacca tttttcaacgtgtctacccaaatgcgcatcgggatttgccttttgttcgg catcaagcagattcatgatattgggtttatccaccgtgacttgaagcctg ccaacgtggcattgngaaataaaggctcccctgacgaacgctacttcatc gtgctggactttggcttggcacgccagtacatcacggataaggaggacgg aaaaaaagagcgtcgcccacgtgagaaagctcntcttcgtggcancctcc cgttattgctcgtagctatggcacatcgtttcgagccagggagagtgatt acctgtgggccttgtctacatngctcgcggangtggatgccagctggnct tgtctgatttggatgataaggcgaaatcnnggaaatgaancnaacgtngc cgacccagatctcttggcaaaagcccaatcaaa >AW057261 gccgagaatgtgctgggagcccctggagctggattcaaggtcgccatgga agcctttgacatgacgagaccaggagttgccgccggagcacttggactct cgtggagatgtttggacgagtcggccaagtatgctctggagagaaaggct ttcggaaccgtaattgccaatcaccaagccgtccagttcatgcttgctga catggctgtcaacctcgagctcgctcgtcttatcacctacaaatccgcca acgatgtcgacaacaaggtgcgctcttcatacaatgcttcgatcgccaag tgctgtgccgccgacactgccaaccaagctgccaccaacgctgttcagat ctttggaggaaatggcttcaactctgagtatccagtggagaaactgatgc gtgatgccaagatctatcagatctacgagggaacctcgcaaattcagcgc attgtcatctcgcgcatgcttctcggacatttcgcgcaaaatggaactag cagaatttaggatgtgccgtttttgagcaa 6.fa100644000766000024 3174013605523026 17466 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/t/data/dbfa>CEESC12R gcacgagtccatctccatatgccaccacaacantggtcctgtcgaaccaa caaccagcttggctcaatgacaaaatgcttcgcgcgccanaatgccaaca aatcccgtgccaccagagccaccggcgcgatatgcagatcataccgctgg aagacgatctcgatcgagccgtgcatccgatgggagaggaactctgaatg gcggactccatcaccggactagcggaagtcaacggtcggatagtccacct cacacagatgtgagctatgttcagcttcactcatccgatggaactggtag tagtaaggaaagaantngggagcggagaacaccaccgaataaa >CEESC13F cttgcttgaaaaatttatataaatatttaagagaagaaaaataaataatc gcatctaatgacgtctgtccttgtatccctggtttccattgactggtgca ctttcctgtctttgaggacatggacaatattcggcatcagttcctggctc tccctcctctcctggtgctccagcagaaccgttctctccattatctccct tgtctccacgtggtccacgctctcctggtgctcctggaataccttgagct ccctcgtgccgaattcctgcagcccgggggatccactagttctagagcgg ccgccaccgcggtgggagctccagcttttgttncctttagtgagggttaa tttcgagcttggcgtaatcatggtcatagctgtttcctg >CEESC13R gcacgagggagctcaaggtattccaggagcaccaggagagcgtggaccac gtggagacaagggagataatggagagaacggttctgctggagcaccagga gaggagggagagccaggaactgatgccgaatattgtccatgtcctcaaag acaggaaagtgcaccagtcaatggaaaccagggatacaagaacagacgtc attagatgcgattatttatttttcttctcttaaatatttatataaatttt tcaagcaag >CEESC14F aaaaaatgcgaagntccaacagttccatgctatcgttatctggatatttc aaatggattattccttggaggccgtcctggaacttcgaagcaaatcgaga aggccttctctggatgtatttctgatttgtcagtggataaagaagatgtc gattttncaacgatcaaagaaatgcacaaagttggacaagttcatgaagg atgcaagcatcgtaaagatttttnctcaacttcggatggacaatgctcgg ctacctcgaagtgtgtcaatcgttggggaggcagaatttgtagctgtccg caatcggttcattcgactggtgaatgtgttggagcacttggaactcaaga tttacgtgggcattctctatttgaagaggaatcatttgtttttgtaccag ccaagccaagtatctgtaccgtttgaagtttcatttgaattccggacatc tcgagctgatatgcaagt >CEESC14R gcacgagtttttttttttcactgaaatcgatcttccatcaaccaaaacat tccgaatgcatccttcaaaacgtgatggatgtccagttccaggtgcaatt ccaaagtataagctctctagattcatatcagaaatggaagctttagcttc tgcactgtagattccatttatactagttgcaactgaatctgcttcaaact tgatgacaacattcatccaatgctttgatgttacttcaggagctggcaac tcaacttcagaatctccaatgttatatttcaaagttccattcatcaactt ccaagttgtaatgaacacttctttgg >CEESC15F cnctttgtaataaataatttattatgccncgaaaataattnccnccaaaa tcaatctttcagcgggtgggtgtaatcattgggaacngggaagtcactag gaaataaggaaatagngaaatacaataaataaaataataataataatagg cgactatgattagttagaaaacacagctctgggaattgtttggaagtgtt gagagaaattnttgattttttacaaatggggaatatgattgaccgttgga ataagtnaaaatattantaaaaatagcgctgantgaaaacttaataagtg acagtgaaaaggatttgaaaagntaattaanccaactacg >CEESC15R gcacgagggcggacaacctcaaggcgctacaccgggacaacccgatcaga actttgactacatgttcaagctcctgataatcggaaattcatcagttgga aaaacatcattcctcttccgttactgtgatgattcattcacttctgcctt cgtctctactgtcggaatcgatttcaaagtgaaaactgtgttccgtggag acaaacgagtcaaacttcaaatctgggataccgccggacaggagaggtac cgtaccatcaccaccgcctactatcgtggagcaatgggatttcattctga tgtatgacatcacttaatgaagggtcttttaatagtgttc >CEESC15RB ggcggacaacctcaaggcgctacaccgggacaacccgatcagaactttga ctacatgttcaagctcctgataatcggaaattcatcagttggaaaaacat cattcctcttccgttactgtgatgattcattcacttctgccttcgtctct actgtcggaatcgatttcaaagtgaaaactgtgttccgtggagacaaacg agtcaaacttcaaatctggggataccgccggacaggagaggtaccgtacc atcaccaccggcttactattcgtggagcaatggggattcatttctgatgt attgacatcactaattgaaggagtctttttaatagtgttccagggttggt gcactcaaatcaagncatactca >CEESC16F cccattttacaaatttatccagaggaatggattttcaattaaaatcttga aaaaaactaaaaagtagagaaaattggaaactttggtgggtttaaacgtt aaaagagattaaatttaaaaaaaaagggagatcgantcgaataatttggg tggatgggatcattgtacaatataaatagaaaaaaaggaagagttcaatt gggatagaaaaaaaaagtgaatttttttttttgataaggtagntagtgtg ggtggtggcggga >CEESC17F tttcctgaaaccgtcagtcttacttctcgacgaaccgaccaatcatttgg atttggaagcttgtgtgtggcttgaggaggaactcgctcagtataaaaga acctngttggtggtttctcactctcaagatttcatgaacggagtttgcac caacatcattcatttgttccaaaagcaattggnttactatggaggaaact acgaccagtttgtgaagacacgtcttgaattgctcgaaaatcaacaaaaa cgntacaactgggaacagtctcaactncaacacatgaaagattacgtcgc gaggttcggtcacggttctgccaaactcgctcgtcaagctcaatncaaag tgaaaa >CEESC17RB gaaaattcacacaaaacactacacatttagtgatgtgacaacaacaaagg agggtaattggaaaaaaagggtagaaacaggaaccggaccaacaattgga ggaaaaccgacaaaaattgggtcaaagagagtaaaagatgaatgaaaaca agagaaaatataatcaaaatcacaggaaaatgnaattgaaatatcctana ttgaanatggggggnaaggtgaataatgngagaaaaatctcgggaaatca gttcgattctaatattagaattggcagattttcgatgttttcggggggaa atagg >CEESC18F aagatcaatgatatggaatggtggaatcgattccttgattccgatcctcc aatcaatactaaggaagtgaagccagagaactcgaaattgagcgacttgg atggagagacacgtgccatggtcgaaaagatgatgcatgaactgttgcag catatcatgcttttccttctcatcagacgttgcggcacgagctcgtgcna aattcctgcagcccgggggatccactagttctagagcggccgccaccgcg gtggagctccagcttttgttccctttagtgagggttaatttcgagcttgg cgtaatcatgggtcatagctgtttcctgtgtgaaattgttatccgctcac aattccacacaacatacgagccggaagcataaagtgtaaagcctgggggt gcctaatgaagtgagctaactcacattaattgcggttgcg >CEESC18R ctcgtgccnaacgtctgatgagaagaaaaagcatgatatgctgcaacagt tcatgcatcatcttttcgaccatggcacgtgtctctccatccaagtcgct caatttcgagttctctggcttcacttccttagtattgattggaggatcgg aatcaaggaatcgattccaccattccatatcattgatcttctcgaggggg ggcccggtacccaattcgccctatagtgagtcgtattacaattcactggc cgtcgttttacaacgtcgtgactgggaaaaccctggggttacccaacttt aatcgccttgcagcacatccccctttcg >CEESC19F gctctcgactccatcattccaccacaacgcccaactgaccgaccactccg tctcccactccaggatgtgtacaagatcggaggaatcggaactgttccag tcggacgngttnagaccggaatcatcaagccaggaatggtcgttaccttc gntccacaaaacgtcaccactgaagtcaagtccgttgagatgcatcacga ntctnttccagaggccgtcccaggagacaacgttggattcaacgtcaaga acgtctccgtcaaggatattcgtcgtggatccgtctgctccgactccaag caagacccantcaaggaggcttcgnaccttccacgnccaggtcatcatca tgaaccatccagggcagatctccaacggantacactccantt >CEESC19R tggatccccngggctgcaggnaaaaatcaacaagataaactcaacaatga agatttacttcttctttggtgcagccttttgggcggacttggtgaccttt ccagaggatccatcagacttctcaacggacttgataactccgacagcgac ggtttgtctcatgtcacgaacggcgaaacgtccgagtngagcgtagtcgg tgaaggattcaacacaaagtggcttggttgggatgagctcgacgattcca gcatctncagacttgnggaactttgggaagtcctcaaccttgttaccggt acgacggtcaaccttctacttaagctcgttgaacttt >CEESC20F aacaacaaaattgattttaattgaaggagaggtagagaatgaaaacttgt gaattgaaaagaatagaaacaaaaaaattaaacagttatttagggcttat ggcgaggggtaacgaatgagaaaagccctttaactgtgagataaagtaaa aagaagaatgaaaaatagaaacaaaaaatatttaacagttatttagggct gaggctaaagaatgaaaatccattaactgtgagagaaattgaatggaaga atagaaaaaaaacaagttcaacagttattggttggcccaaaatcaacctc gatgagcgcttttcggatggcttctgtgcgcacttcgagaaccttctcgt tttcgacgttttcgagagcctctttctacg >CEESC21F gaagttgaggccaacgttccaaatgttatcaatgagcagctgagccggag tattccgaagacatgcagtttgatcagtgatctttgtctgattgcagccg acaagatcggctagtgctgttgatttttgttttgctcgtttcggagaatc cattgaccacttgttgtcgagggatcctgattggagaattccgttttgga ataatcccttcgacgccggtgcaattaagtgagctacaatcgaagctgct ccagccgattctccaaccaacgatatcctcgccggatttcctccaaacga gaaaatatgatcccttatccaataaagtgcgagttgttgatcaagcattc ccatatttccaggnacatcttcatgatctagg >CEESC21R tcgtnccgaattcggcagagntncntatgtgaatatttgggctccggcgg atgcttacaatcttactgtacttgtatggctgtttggtggtggcttctgg tatggntccccatnactgttactttacgncggaaaagaactagcaacacg tggaaatgtgatagtagtgaacatcaactatcgagttggaccatttggat acctgtttctagatcatgaagatgttcctggaaatatgggaatgcttgat caacaactcgcactttattggataagggatcatattt >CEESC22F aatcacaatgattttatagtcgaaaagaaataaaaaaacgcattatgctg agggcttcgacatacaaagtggaaagggttgggggaaatacatggaaatt nccttttttttttcggaaaaacaaatttttgttagtatttacaattacat tttgttaccagtcagacaagttttttgagggaaaaaaatccaataaaaat gagcatttttcagaaggacgtataatgtacacgaaggtggtngtgtnaaa aggagacaacaaaagggaaaaattgcgggttaaaaatggccgggaaaacc >CEESC23F tttttttgttagagtattttatatatttattattattacagcttacagaa ctttgatttgttttacagaaaaaaggtgcaaccgnttagacaaattcaat ggattatcattatttgaaactttttgcagttccttattttcaaaaaaatc ttggtttttggtttgatcagggtgagaaaggatttcgggggtcgaaagct agaaaattatcaattttttgtgattttcgattgtg >CEESC24F gtcttattaaaagctttattatgaatgtggctcaaataatgagcatgatt cagagaaaaaatggtttaaaatgtcaatttngtaatgagaaaatgggggt catcggcagtaatagggtacaacaacaaaagtgattgcattaaacctcaa cttcaaaccaaagttacacagngnacctagttatacatgcctagattact accggantagtattgancaaatacaagagaagttaccaatgaagatttgg gtgagantgggaagcataatgcagtcggctagagaagttgg >CEESC25F taaaccaatatgatttattattaaaattttaaaagaacaaaaacatgctt tagaattccaaaaatgattttaaacaagtgaatgaaagtatcacaaatac gaaaagagaacccgaagaagagaaaagaagaaattataaaaaaaatattt tagagctccgacttttgaaggntcgaataccgtttatcagatggcttaag ctctttgaacactgatggaggtggtgttgtgtcaattggacgagtagatg gagcttgagcttcatgatcatcagtgattccacgtgcagcttttgccttg gcgagctcgatcattcgttggatcaaggttctcgtggaagtccttgtgaa gctttccagagtgaagatccataacaaactctctaagtttacctgggata ttca >CEESC26F ctactcggcgaccagctccaccaagggaaacctcttcttctacctcaacg cgttgatcatctccatcgccccgctctacctnttctacggagttcaccag atggagatccaagactcgcttgtcgtgtggggactctntgccgtcggcac tgcctacctcctgtccctggcctgcaaaaaccagaagtgccttctcaagc atcaaatcgtgatgaagcgcggntcagctgtggaacgcgagatcagcgga caatatgctgctgacaagaaaatgactgttaaggagaaggaggagcgcgc gcttttccgcaaaaacgaggtcgncgacancgaatncacctacttgtcgg tcttctacacaaantcgctctacttgaccat >CEESC27F tcgaaagtttctccagaatttcgatcaattcacagtcgattagactattc actgaagactatgtcattctctcaggattacggagcaatcagtgaagaag gaccaatagaagtaggatcaggaattttgaaggtggagtcaattgaatat atttttgaatacgatgagaacatggntcaagtgaaaatcaaatgtttgtt ggccccggaacttgtgagattctcgaatgataagtcaataatcagaagat attttcattattacctggccgcttcacaacgaattgttcagcatgtcaaa ctcagggaaagtgacagtccatttcgaaaccttcaaagcctaaaanttcc atccatttcttgggactctaaacggagctagaggatatcctcaatttcat caattt >CEESC28F gttgaaaaacatttcattgaaagatcgatttttggtaaagcagatcagtc aaatttgaatgcagtgaaatgatgatctgtggggctggagagagcgattt agtggcaacaattgaaacgnggtaacagggtgaaactttggtttgtgtca aaaaattaattagttaaagcaaaaaaatgtggaaatgtcgggggaacaat aaacatgttaacangantaaaaaaccgtggatttatggaatggctcttct aacaatgttgttgcggaagaattcttgcattctgtgaaagtttccatcaa cattgccaactatggctcgaatattcgcctgcactgggataataattgat tcc >CEESC29F caagaagacaatttgttttgattggaaatggacgaatgatgaaaggaaat aaacatcttttaaaactctacaagtatgggttttcttgaatatttctgga actaatgaatacatatttncagacaccttcaatcggaaaaagtcatcttc ggaccgtcaattctcgaaaccttcaacgatcttgaggaagctgctctacc aagatcggataaattttgatttttggttgagcttaggtttttagatgata gtattcagtttctaacggatattcacttcatgtaactattattgatntca tattttnatgttt >CEESC30F agaaaatcaacaacaatttcatttgaatgaggaagagagtaacataacca acaatgatgaataaaaaacaaagaaatgaacaatttttggggaggggcgc ggggaaacgaacaataatggaaatagaagaaaagagcaaagcctacgtgc agaatagagtgaaagcgggaaatatttctcttctgcgtctctttctgttt gtgtgtgatttagaattccatactatccgtctttcggccttcgaccacaa ctgaagtgataacatgtccgtcttcggtttccttggattcgagctcacgg cgaatttcaggcgagatgagatagtgtccatggaatctgatcagctcata aatagctgccttctgctcggcattcaaatccgttcttgtatcgctggcac ag >CEESC32F gacggggaatggggagcaaaacaaagaacaatttgatatactataaacca ggggactgggaaattgaaagcagagaaagttgggatcacagattatttta tcagtttaatggtacttgcaaacagatggcactgtgcatccgtgcttctt gatgatttcttgagcagcataagatccacaggtgactgaagcttcgactc cctttcctttctctcgtgccgaattcctgcagcccgggggatccactagt tctagagcggccgccaccgcggtggagctccagcttttgttccctttagt gagggttaatttcgagcttggcgtaatcatggtcatagctgtttcctgtg tgaaattgttatccgctcacaattncacacaacatacgagccggaagcat aaagtgtaaagcctgggggtgcctaatgagtgag >CEESC33F aataaataaaattattttattaaagtattctcaaagtcaaaatggcaaat aaagcttganccaaaattttgttcactattattattacaacttccttgct aatttaatgtctccgccggttcttgaatagaactgatttggagcattata tttttnagttccattggaagatgttgagaagtaggcagtgacatctggaa tgacttgaggagtttgagaagcttttggcattgatacacccattcgacgg ccacggtattcacagcttttnccgaaatatggacgttcacagcggcaggc aaagttatagatagctggatttgttttcggtcgtggtaaa >CEESC33R tcggtctaccacaacaagactaccaataactactggtagtacacaaactg gtgaaccgtttcggactatggcaatcaactgaaatggggtgaagtttggt aggaaaacgaaaaaaagaagagaaagaagaagattgcactgccaacataa gatggtcattgtgtggtaccatcaactactgctagtacttctgatattta taggagctgccaaaggagctcagtacataggttcgggagcctcccaaccc aaccgaacggatgttgtgtggatggttccctcttggacatgtaaaaatga ntattcaattgatgtagaaaagtatgggattctgcaaaattgaggtncag ca >CEESC34F ccgtagttntttttttttttttttanttttatgattttattttaacgtga ataaacatcacaaaagtgagcttactcaaggggtggggtntggggcggct aaaccaaccactaacaagtaacaaaaagaagggtgacagtaagananaaa aacaggngatnggtatgcttagcaactnggggaacgtgctaagagcactt ggcaatgaacttattgcttctgagcggaaacgngaaccgatgcagcttcg tcgaccttcgagcggaacaattcactatcttgaagcatcatgatcaactc ggaattt >CEESC35R ggagacaagggacgtgagcgtgacaacatcaaggaggatcagaccctcta ctacaccgtccagctcgtcgatctgttccgcgctgtgccaggagagaagt ggaccaccgatgagggaatcgttattgagcagacacacaagattgatgag gataagtgcaagaagtcgaagagtggagacaccattcaccaacagtacgt gcttcatcttnaggacggaactttcgtcgattcgtctttttntcgcaacg ntccattcatcttcaagtttgaataattaatgaagtcattcaagggaatn ggacattncccattgactggaattgttgcga >CEESC36F aagattttaataaaactttattgaaatttgctcaatatcagangtaaata aatcagtatcaggataaatngtgaacagttatatttgcttctgtaaacag ttgggatttgaattcagatgtaaatttacataactnctcgttgctgaact tnatactccaaaatccatgtatccnctttatgactgangacaatanccgn gaagttgtttatatgaatggncagttg >CEESC37F agtttgaatgtttttatttttttactttaaaaaaaaatttaatttcaaaa ataaaaaaaaacagtttgtggcaaggaaaagggggaaaaattttntgagt gggcgtggggaacaacttgagttttttgaaagagtcattttggcgggaaa agcggaaattnttgcgaaatatctacccgttactcgcgtggctttttgta gnctaaaactttagtagaggaggagaaagaaaaanctggggaaaaaaatt tgggtcacagnaaaaaaatgcaattgattagcaancgcaagaaggtgggt agagcgtgtgaaa >CEESC39F agaattttagtgttttattgaattgttgaatacaaatcaataaaaaaata acatatgancggttattatgactttctttcatatatatatcccatatatg ggtttaccaaaatgtgcacgaaatgaatacaaataaattatttaatcagt gtccatcttcgcctcgaaacggcttcccaacccggtttcacttccagcga ctcctctattcacactatccagaacagagttccagtcgacgccgaatcga tcggttctttcttggattcgaagcacaacttggagagcatcagtggcgac agtcttcttctgttgagcacacagctcgtcggcctgctccaaaatgttgt caatcagcttcccagcggccactgg >CEESC40R cagagttacaaataaaagcgggcaacaatgtcagaaaaatcattgcaatc gaaaatnctttctactgtatttttncttgcaactctaattgcatttnctt ctgccgatggatatacctgtnccggaaatacgctgataaatccatttntn aatctttcggagccctactactatccagggncatggcgagaaaacatgga accagctgantatgctccagntcaaaagtgtaactggaagatca >CEESC41F aagaagaggatatatttatccaaactgcaacaacaaaaaacacaacaata taacttgaaaaataaaatacgctcataaaaaaacaattttaaaaattaaa aattattccttgtgctcggncaattgaaatcctgccttcgtcgagttcac tgatccntttgctttctggcgagggccccaattcttccctttattgcatt ggnaccgtacaaggntctctttcttggnctgttcgatggctccngggtga agagtgaagttgactgtgttggtgatgggttntcccagaatgtgatatcc ngctgattctttggccagacg 2.fa100644000766000024 6050413605523026 17462 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/t/data/dbfa>AW057262 gacgagggtccgtctcggcgtggtgcaatttgctttagagagcgatcagt ttcaagaaacggtgacaatgaaattgatttggataaaagtttgagcttga agaaaccatctgcatcgaagaatttggccagtttgttggaaaagaaggaa gaggcaagataaactgatgtaacatcttccatgatcagttcaaaacctcc aacttctcctggcacctccgtctacatgtacaacactggatcagccaatt cgacttttatgtctgctaaagatttgcatanagaacgtgttgcagcttct actggtcccagatctgcttcaaaatctccaaaaggaacacttcgcgtgaa ggaaactaaagtcattcgtgaagttcgccaagaagatggaaagccacctg agattagtgagaagaaagaagaaactgtgaaagaggaaaaagtgaatctg agtgagagattgagagccagatcgagagcttcatctcctgcaactccaac tttgaagagaacattcaatcagacggatgagtcgaatattgtgactttga gtgcagtcaaagaaacccatcaaactttggaaatgacaccaatcatagtg aattctgaaacagttccttcaacttcttatggacaacgtgcttttccctg agagtgttcaattggatatgctcattgacaagaaacgacttgt >AW057264 aggctccgatcccaaagcctgaggaagatctcaagggctccacggatcaa agtaccactgagccaacgaagctcgcctgatcccaaagcccatcggaacg ccaggaaatccaacaaatgccgaattcattccgcgagttgcaggactcaa agttcaagccggctccaagcccaaagtctgaaaagggccccgcggagcaa agcttgtctgagccgtcgccggtccccgggaataaccgcaggaactctga gacatcacaagttgacacaatttccccggtgccaaccaagctcgttggaa cccaaagtccatcggaacgccaggaaattgagcaaatgcccaactcgttc cgcgagctgcaagactcgaagttcaagccggcgcaggctccaaacccaaa gcctgagcgcgttgagaggggctccgcggagcaaagcatgtcggagccgt tgtcgatttctagagttgcatttggctcgccgatcgctccgaaaccacgg ccatcgccactccaagctccgcttcttgagacgttggctactccaccgac aatcgacgctcctaccgctgcaatcgagacggcaatcgagagaagcgcgg aantttcgtcatctcactcggaggatccttccaactcactttttcaagtg tgcagnatgccgtaaggaagaatcgagtggtc >AW057265 ttcgctaatcattcccttgttctactgatcgttggaaggttctagctatg aacgtcaccagtgtcacttcagaggatggtgttaaagaattcgaaaagat tgttgtggaacctgaagatatcgaatatgttgagattccggccgatgcca aaaacgttgacttgacgcgtcaccgtatcaaagaaatcggtgattattcg tggctcactcacgtcgaacacttctcgtttcgttggaatctgatcaaaaa gattgaaaatctggattgtttgacaacgttgactcatctcgagttttacg ataatcaaattacaaaagttgaaaacttggatagcctcgtcaatttggag tcactcgacctgtcattcaatcgtatcaccaaaattgaaaatttggagaa gttgacaaaactgaagactctcttttttgttcataacaaaatcactaaaa tcgagggtttggatacgttgactgagctggaatatctcgaattgggtgac aatagaattgcgaaaatcgagaatctcgacaacaatctgaaactcgatag attgttccttggcgctaatcagattcgtcttattgaaaatgttgatcatt tgaagaagctcacagttctcagtcttccagccaatgcgattactgtagtt gataacatttcgggacttcacaacttgaaagagattt >AW057266 gaaagtctttggggagatgacgagcctgtacggaggaaggagaagatcag cactcggttgttgggggtccaattgcgttcagaatactcttgttttcagt cattttgtgaacagccatactgacggtattccacgtattactgattcatt ccatgacgacaggctgccatatgtttgcacgatggatgtcacgatgattc cgatcctt >AW057268 ctactacaagctcggtgttggaatgaatgagtggaagaaccctgagcacc ttgccgagcacatcaatggagctgcttactccaactttgacattgcttac tatccatcggagaacgagcggttcactttgtacactccagaggaattctt gctgtatgttaagagattg >AW057269 aaactcgaagaacagtcttgaccaacacgagatgtattcacttttgacgc tcctcttcgtcctcttcttctctggaagcactctgctcgttcaatgtggt ggaaagaaaaagggagcaacttctgccgaaggaaaatcttcgacgatggg cccggctcctggaggagctcctgctgctgcttccgctcaaggagaacctg aagagaaggagtaatgaaca >AW057270 tcgctcgcttgcgtctcttgctcgccgcccgtgcacttgaatgcacagcc cgtcttcagaatgttactgttaagggagttgccgtgcgcaataagaagag attggcaaatgttgaagttcaactctatgagaaggacacccttgacccag atgatcttttggacaccaagaaatctgatgctgaaggagaattcagcgtt tacggagaagaagatgagactcatgctattgccccataccttttgattac ccatagctgcaacccatctaacccaattgtgtccgcatcgcaagtacttg gtgccagaggacaagatcggaggaacctacgacatgacctacgtcaccct cgacatcaaggttcacggagagaaggaaaaatgccagtaaaaagtgcaaa cttcctggattttattgactatctaaatatatattttttctatatga >AW057271 aagcgatcatcggatttggaagctgctcttcgaattgtgctccagcagac tttgaacatcgttttgcaagcgcaggagaagctccccgaggcaaatgtgg taccctcaactccgcccacctcaccgagcactgatatcggcgaacaaatg gcatcgttctggaatattccatcacccaaccctcctgcaacc >AW057273 gttgatgatctataccactggaaacaacaactcaagtgagcttgtggatc caatgagcattactctctgtgtactctaatgtgcccaccaacatgagaat tgccaaacaccaccatgcagagttgactggcatctgctcattttgtacct gcttgccacggccacggatactcaaactcacgtcaatgcatactagctaa ctcttgctaccaggactcaatctgtttgatgccgaatgaacaactagtgt tcactccaggaatgttt >AW057274 tctcccaacaacacctcaagatgaatgccatctacactgccgtccttgtt gcttcaactctcgcctacactgcaatggcttggattggactcagcattga agccgccaacgaggatatgatctgaagtggcgccc >AW057276 tcgactaaccgtctccactcttcacttgctcaaatcttcatgcaaccaat caacgtcatgctcgctgttcttctcgccttggcttcatttgctcaaggag gcagatctgttgctccggctggtgcagtcactgaaccaacagttactcaa gctgttccagaaggatcaggacttagttcagatgtcactgatcgtccaaa catcgactccactgatgttgtatcaaatgcaacttcggtggaagatttgc ttggaagttcaacaaatgcaaacaacacttgtacatccaactctaggacc tttgaattgctccaatgatgattcttgctttggtg >AW057277 tcaaagttcataaacggatcaatacttgcaaatgatggcaaaatactttg gcgccacagatgcattcaatgcaattgttcaaaaagtcgacgaaacactt attcaagcagaatcccatcttcgtaatcttcatgaagatacagtgggagc aaagccgtctgatagtttgccggaccgcactatcgttccgtccccatctt ctcaatcggaacgttcatgctccccggagcctcgtattgttgctcctcaa ttgtctgcatactctggatcatccgctgcgtcttcttcttccgtgaatca tattgatgtgaagagcaagtcgtatttggcattggataagaagaaagcac tgatcatgacttcgctcaagtcaaagagagttatgaacgatagtgatgtg acaaaagttcagaaattgatcgatgacttgttcggaaaacaaacttcttc gtcctcatcttccatgtccatccttc >AW057278 aataacctctcccaacaacacctcaagatgtttgccatctacactgccgt ccttgttgcttgctctcgcctacactgcaatggcttggattggactcagc attgaagccgccaacgaggatatgatctgaagtggcgcccatc >AW057279 tcgaagtgatggattctccaacatcaccattgacttcttcaaatagtgga cttatcactgttctggaaagaggagtttcttctgaagacacattgattcc ttctgttcgtcgtggtgtcattccagtcaacactcttcgttaccaaatag aaaagcatctcgagatgtgtactccagcttctgaacaattgtcaaagagt tcggatcccaacatctcctcgatgtacgttttccatcaaggaattcaagt aaagcaggaaccaatcgatgatgaccaagaggaagagcaacaagtacaaa agcagcttgtattcaaaatcgagggctccgaagacgaagaagctgtgaag aatgagt >AW057280 aaccctttcgactacaactaacatggacatggactagcgatcatcggatt tggaagctgctcttcgaattgtgctccagcagactttgaacatcgttttg caagcgcaggagaagctccccgaggcaaatgtggtaccctcaactccgcc cacctcaccgagcactgatatcggcgaacaaatggcatcgttctggaata ttccatcacccaaccctcctgcaacc >AW057281 ggacaacaagactgaggaagcatggttctcgttcaatgggaaggtgatta agcagctcgggccacagctcaacgagatgtacatcatcacgcgcaactgc atcggaggaccaccacattgcccatgtgctgtgtgcggagctgctccacc accaccaaagccagtgccacgtgtcgagagagacgaatggatggacattc gtgagggagatccatggncgactcgnccaacttgtcaaggctcttgacaa gactctggacacccttccaggagtcaatccagaccaatatgttgctctct ggtacatgcaaggtgaacctgttatgggtcgtgtctggaatgaaggagga aaggtggctgccaacttctcgtggttcaacaacgagtattgcaagaatgt tggatctatccagcttctcatctatcttccggacagtgttcgtggttntg actatggatggatcccattcccggaggctgctcagtttggagacaaagct tggcatccagttcatgtcaacaaccacaagggagatatctncgttggagt tgttaacgttgctggaggaaagcagattcttgccaggggtgattgtccgt aacgagaagtatggttatggataccaaagaaaggagcattcttgcaa >AW057282 agtacgctctttaacccatgaattgttgtgtccgggatgtcgtcaatgtc tgagtcggatagttatcaatcgagccaattgacaagcgagcccgatcttg tggcgctggatgccaagataatggcggtaatggatggaactgaagagctg gagagggagattggcaagatgatggctctgcagcaagcgattagtgatta caagaagagccatcaacatagaactcaaagaaccaaggagaagcttacgg atatgtctaaagacccttacaatcaccgggaagaacattgttaagtcgtg tgaagagcggctcaagcaaatctcggatgtcaccgagccatacatcaaca gcggaatctcatcggtggaagatcacacctcctcggttgtggagcgctgc attcaaatgctcggagcaatttctggacttggtggggcaatgaagaaaac ggagggattgtcttcgaaagtgctcgagcatcacagaaagctcgcgatca tggagcacaagcaagccgatgcaatttctcgctacgagaaagccgtcgga gttcagaagctcgtccgcgagcatccggctggagacccactgcacgagc >AW057283 atcgatcatagtgcaaaatggagatttaatctaattgcattcaaaatgaa gagcaaaagaaggaaaaggaacttattgctgaaaaagcagcaaaggctgg atttgtcaaccgactttatgtgaacatcggacaaaaagttggagttgtcg agctgactaagttagagccacgttttgagaggaacatcgataagctcacc tcctaccacaacttaatctacaaaattgtgaatgtaatcgaacttcaagt tcaattcatgcccaaggcaatggcaaagaaagcagttttgtgtgctcccg gcgagattccatgggaagttcttggaggatggttgaattatttgggaaaa tatcagtttgatgggcaacattctaaaatgctggaaaaatacagtagcgc ctgtggaagaattgctcaaaaggagatccaggtgcagaaaagaactcgtt ctcatctgattaaaaagatgcgtttgtacactggagaggaaagtgagata ttgaattcgaacgtagaaaatttgaataacttgctccatgcaatcgatga ttctcgtcaccatgtgaagtcttcacaaaccacgaaagaggtgaaagcaa aaggcgaaacgtatcgtaaagccatcaatgctttcaacgaaacggcaaat gaagttcaagcattgattgacgaagttgcaatggtttgggtgccc >AW057285 aaaaactacaacttcggagtaatggaaaatgagaaaagtaagacggagag tttgaagaaggacgaaatcgatgaggcaaactcagaatcttcaaaagtgc cactaacaattgatccagaggaagccaaactcccaaatgccggcggaaaa tcggagcatatggtggtcaacttcacttcaaaacgcatggcgatcaaagt gagatgtggcaatgcactatttcgtgttgagccaactcacatgatcatcg agccgaacaagtgccgccaactgacaatcaatcggatgcccggaccaatt caaaaggataaagcgatcgttcaatacctccaaattgaaaatgatgtgca agatccgaaggctgcgttcaaagcagcggacagtgctggaactaagattc cacacttgaagatcaagctggtggccggagcaagtggaggtcgtcagatg tcgagagaggtggtggatgagtagtttgggaaaaaaac >AW057287 gaaaagaacatgaaaataattctcnggctgccgctagaattcagcaagtc attgcgaatgcagccggaattccatcatgtgaatatggaggaagtctctg gtataaacaagaagaaaggaataaactgaaggaaaccctcaaaactcagc atgatatttctggaagcagaaataatagtgatagtgggatatctggtgga ggaggaagcagtgataatttgagcatcgacgatttcgaatctgtttcgga gaaccagtgtgaggaaaatgtgataccggctatgaactgaatttgcacat tgtaatttttgttattaaatcatattgaaaatt >AW057288 caaagttgctcaaaattttcattgccggattgaccttttagctgcattgc tgacaagatcaaaagaatggtcaaaggaaaaggttcaaaaagatcaacac caagtttgcgcgccaagaagaaaactggcacggatagacagaagccgtct gtaaaacaaaatgcatctcaaaactcaaagaagagtagcagacagaaaaa gacccccagtgttggaaaagaacgggaacaagcaacggataagaaacgag aaattgagaaaaaaccacaggaaaagactgctttggatgagcagcaaagg aaagctcaaacggagactatcagcaacttggaaatccttccggacaagaa tcctgctaaaatggatgacggttatgaagatttcggtcctggtgcagctg ctcgctaagt >AW057290 tcgccatcaccgccttgctccgcgattgccattttctgcggaggaaagaa aggagccggagaatctaaagaaaagccataggaggatgtgtatgaggatc tggcaccaggagataagaag >AW057291 cgatcaaccaccatgagttttgatgaaattacttgaccttcggaaccaag aaccgcgatcaaggatatgatttgctcaaagcgcgtctcgacatgtgatc gttcggtggaagtcttgtggagacttgctcaagtaattcatgagaagtct gcatgtgttccaaaagctcaacgtaaggcaagtgtcaccgaaggactcaa gtttgctgaagaagctgtccagaaggatccaaaccatttcaaggcgctga agtggaatgctgtgttgactggacaagcaaccgaatatatggcaacgaaa gaaaagttggaatgcagtaagaagttcaaggaattgctcgacaaagctct tgctaaggagccaaaggatacggctctcctccatttgcgtggccgttaca agtactcggttgcatctctgacatggcttgagaagaagcttgctgccacg ttctatcagcaaccaccatcacattcctatgaagaggccaacgaggattt ccttgctgcttataaggtcaatccaaaatggatggagaacacattttatg tgtccaaatgctacgtagcaattaaagacaagaacaacgctcgcaagtac ccttaccgaagtgtgtgacattcgaaccgtaattccgacgcttgaaacaa gagttttgccgatgatgccgaagcagatgttgtcttaagctttaa >AW057292 tttgcgcattttgttgctgctcttctagcttacgtttaatctctctctgc ctttctgcctccttctttgcctcaatctcgagtcgttctttctcttcaga ttctcttctttttctctcgatttcagccaactcatcgcttttctgctgtt ttttgagatcagcaatcattgaatcaactcgtttcacataatgctcataa gcatttt >AW057293 aatttcaaagtaataatgcaaaaagtttgtgaaaaacggatcgatagcgt cgtggcgtcggctgtcgagggttctggaatgttctacgcggtgagattcg agccgccgccgtcgaaaacttgctttgacgtgactctaaagacgtacgga ccgccctacaccgagtacatcgtaaccgtcgcaattccgccgaaattccc gttctcgccgccggcgatcacttgcaaaactgataaaaatatgaaattcc tgtttttggaggaaaatcaatggaaaccgtctaccggaattgttgcagtc cttatcgaagcctgcagtgtgatttcgcgtcgagacctggtcccccgtgc accggttcttccacgcatccgtccaccacaagcacgtacgccaactagtg cttcgcctgcaaagtcgccaca >AW057294 tgagagatggtaaaaccggcaaatacttcttcttcacttatcactaaaag agttttgactctcggtaataatgtcacaattgatatatacgatcatcatt attatccaatgtggttttggattgtaatttctgttggatttgtcttctgt actctgagctgtgctgtttggtttatgtgtgctatgtggagattgaaaaa gggtaaagaatgtaaccatccatcgtttgaagcgcgtaccgttgtgacaa aagatggagaagagaaaccggatccccaaatggctcaaaaatcagaaaag acttgtaaaaaattaggtgcattgggtgaagctgaatcgttggccaagag cttcaaaagtatcagatcgaaaaagtcgatgaagtctacaaaatccaaga aatctgaaaaagatgtaggacatgatgatcataagaaggaagatgttcat ggtgatcaaaaggatgataataaagatcgaaacgatggaggacgagattc tcatgttgttcagatggaacataattctgaagaggaacacgagccgagtg gatttaaaaagctgggcaagtctttcttcaacttcaaaaagtagcaaact ctcttcaccttcacccaaagaatagc >AW057295 atcaactgacaccaacacatcctgcttaatcgacatctatcatgtccagg aataacactcaaatgcacatcacaagttctcaattagaagacggatttcc atcgattaccaacaattttctgacagtgacagtgaacttcaactacgatc catcgaatccttctgagccaccaacaaaagtcctggagaagatgagtgat ctgattggccaacagattgcgaatcttcaaaagggaaaggcaccaaaggc aaacgacgacaagagcaaaggttccatgcctaccgttgaattctctcgga cccagtccatgactactcaacaaagtttggaggacgatgatactcaacga gaaaatgttcctcttgaaaagaaaaagaaggggaaatacagttcggagta cgctaatcttttggtggaaaagccaactcgttaccgtttggttccatcga aaaatgtgaaagtagttccagaagatgagctcccgaagaaaaagtttgac aaggatcggaagagaagagagtacgtggaaattggaaaactgtattctaa agaaccaattattgatgaaagtgaagttttgaaaaaggagaaagggaata caaagagaagaagca >AW057296 tccaaatgacaatcgacatgaaggtggtgtttcttttaaatgctcaaagg ctcggatcagccggtggacttggtccccaagaaggtcgtggttgccggac aggaaatcacagttgattccaagtcgaagaacgaa >AW057297 ttaaagcactgtcaactgatacgatttcattcaatacctggaaaagcgat ttccttccatatgaagaacgttgtctgcttgagttatgaccgtaatcatc ctattcaatcaattcgaaagttccggaagaaaatgagcagcaagggagta atggagcaatcaactcagccgaaggacagcacaaaggaactt >AW057298 taaccctttcgactacaactaacatggacttgttcaagcgatcatcggat ttggaagctgctcttcgaattgtgctccagcagactttgaacatcgtttt gcaagcgcaggagaagctccccgaggcaaatgtggtaccctcaactccgc ccacctcaccgagcactgatatcggcgaacaaatggcatcgttctggaat attccatcacccaaccctcctgcaacct >AW057299 cagaatccccatccaactaccaccggctggcttatcaatggacgagctgg aagtattggttcaacaagcagtcgctggtcagaatatggtcatcactctt ccggttccagcccacaagaagttgattgtcgagcagatcgttgtgaagtg cgatgaacatgttatcagcctgccagcactcattgtcaaacatcgttagg ttcttcagtagccgaactcttgtcgaactctctttctctttt >AW057300 gtagaatctaggggaacggttcgttctgctccaatcgcctgccaccgtat caccgaaatgggaccgaaggatgactgagagagctggttctacttacttc tcgatctaatcgtacctagcggtctaccttggcgtaagctctctgacaag catgaggtgctcaaggagaaggaggaatgccgaaaagataagcgttcctc gatgttcgccgggctcacacagactgattatcctagtacagtttgtgact acattgacggaagagcttaccaggatcgtgtggactaccagttcatctac aagaatcttgcagaggcttgcaaggtttgtaatctt >AW057302 tcgttcctcttctcgaaggccctcgtatacggattgttcggatctgctga gc >AW057303 ccctgtcaaaagaagtcaaacccatttgatttgcacatctccactggccg cgaaatcgttcaacgcaactttgtgttccgcaacaccactggcaaagact tcctgctgaaattgcatgctacgaatggagctgtcacattcccaacggaa gtttttcgttctccaccactatcgcatcgcatcatccagttccgtgtgaa ctcatcaaagctctctcaatgggacaagatgaatctttcgatcagaggat acgtgttgccgatctatgcgaagagtctgaagcagttcattgatcagaaa aaaactgcaggaactaatgagcaagaggcattctcattgtctgtcaagtt cacggatcagttctcggctccgcagacagtcatcaacttgccaggatatg ccacgtgtatcgagtcgactgatcatccggttgacgtggaagaattggac actacaactgcagtcaacatcgaaagagatgtctccactgctgctccaat tggttcaatgatgggatttgttgatgagtacaaacgtcgtcaattgaaca aaggatgctggttgtccaactacatctgtggaactgaaaagcaaccggag aagcagtcaatgagatcttctcgtagatcaagccgttcgtcgaatcgttc tgcaaagagctcaaaggcttgccgtgttcaagcctaat >AW057304 tcggtggcgagtgcgtgagttggtcaataaagttcagtcgactaaagcaa aaagtgaaagaaaagtacgggcagaagcaaggcgacagtgcttcccgccg aaatcgacgaggcgttgatctacttcaaatctctgaagccccgcgtccag gatctctacaagcacatggcaaatctgaacgacgtggcaaattggcaagt gaaagccaacttttcaggcccactggagaactatgctctgctcggagatc gtatcaacgtgcaaccgttcatcaattgtgttgacacgcggatggaggca gatgtcgagtcgatggataaggggctggcgatttgtgagcggtacaaggc gttcacacagaacgagagtaagcttcacacaaatacaattgccaatttga ataagacacgtctcgatatggatagtgctgcgaacaagtacgccagcaac gacactgacgtcaatcggactcggtttgatgatgccactcgtgagtttga ggtggcttgtgatcggatgcgtgagttggcgataagtattcagacaattg aggagactcattccatgtggcaagatgaattgatgcgggagataaaggcc ggaatgaggaaaccgaattga >AW057305 tactgagaaggatgaaccgtcagacttctctctcctcctcatcgtctcga tccttgccgtttttgtccaccatggatttgctgctgctgaagaagagaag aatacagcttcagtcgtcagccctgctccggactctgaagcagcccaacc tgctggaaacggaaccgaaacaccaaaagatgaggtgaaggatgaggcac caaaagaaggtagtgaaactgaagcttcaccagaagccaagacaaaagga tctatggtattccatgcctcttggagcccattccacagttggtctcgccg gcattatgtgaagaagtctgccg >AW057306 ttttttgagattaatttaatttattccacagtaaaagttactcaaagagt ttcatagccgatggtcttgaaatcgcattcttcatgactggggatggctt cttctcacgttgctcccgcttcaaatcaccgatgattccccacatatcga aaaccattccatccgg >AW057307 ttgtccattacgagatgtattcacttttgacgctcctcttcgtcctcttc ttctctggaagcactctgctcgttcaatgtggtggaaagaaaaagggagc aacttctgccgaaggaaaatcttcgacgatgggcccggctcctggaggag ctcctgctgctgcttccgctcaaggagaacctgaagagaaggagtaa >AW057308 ttggaaaagaaggagacaaagaagaaaggttagtataacaagaagagcaa gaagaaggcgaagaagggaaagaccaagaaggttcgaaaagcagacaagt acgagtctcaaaactttctgtttcgagtggaaggagccatgttctgtgcg ggaattatcgttgctatgattatgctgttcgtcatcattatctacggaat aatcacttcaagtcaaactggaggacagttcaacagatacatggccccac tattctgattggatcaggacagagaaatgtcgcaagagacaa >AW057309 acggctacgagttctttgcctccaagaagatggtcaccattttctcggca cctcactactgcggacagtttgacaattcggctgcaacaatgaaggtcga tgagaacatggtctgcactttcgtcatgtacaagccaactccgaagtcca tgcgtcgaggataagctctgcaaactgtcaccaccatccaaccaaccaac caa >AW057310 taccgtcttaccggccgtggctatcttctgtggaggaaagaaaggaggaa cgaaaggagagaagaaggattcagtgtacgaggatcttgcttgcagagac aagaagtagttggaacttcatcgacaccaccaatcaacaaaagacgtctt caatgctttttcatcgtcttcttcta >AW057311 gaattccaagtttgagaaaatgaactgcttattctttccgctttcctcct tgttgccatctttgtcatatccgatgctgccgttgctcaacagcaggtta aggacggagaaaaagttgaaatcgatgctttcaagggagccaaggcaatc aagagaaccgttgccggtggagatcaaatcttccaccttgacggagataa caagggatcatttgttgatgctaagggaaagaagattgagtcaaccaatt atgaagctaataacggaatccttatcattaagaagttcaccaaggccgat gttggaacctactccgagcacccagctaaaaacaccgaaaccaagcacgc tgatggatccatctccgctgttccaggactcactcttgatatctccctgc aataaacaaa >AW057312 gcaagtatatccagcggtttgactctgtcagaccttatggctacttagag aaatcatcgagcgtttgccactctaagagcctacgatgcgatgaggatcg ttgttcgacatgattgtgctcctagagctccttatgaacacgctcctcga caaattggctatgatgcacctgtttatggatctcacatgcacgcagcttc tgtcgattacctactaactcgacctgttgccggtgccaaagctcttgacg ttggctcaagaagtggatatttgacagtcagtatggcaatg >AW057313 tcgaagaacgttgagacgatgactgcaattgattcagtcgaagtgatgga ttctccaacatcaccagtgacttcttcaaatagtggacttatcactgttc tggaaagaggagtttcttctgaagacacattgattccttctgttcgtcgt ggtgtcattccagtcaacactcttcgttaccaaatagaaaagcatctcga gatgtgtactccagcttctgaacaattgtcaaagagttcggatcccaaca tctcctcgatgtacgttttccatcaaggaattcaaagtaaagcaggaacc aatcgatgatgacaaagagcgaagccaccactccttcttcacagcttctt cgt >AW057314 cgtctcctcaagtccgccgtcgcccaatcgaacaatgtcgtatgcacaga gtttctacgccgatcagaagaaagtcgagaagccagcggagcaagcttcc tctcctgccacggctgccttccccgctaccaccccaatcgctgaggatcc tctgactccatcccaaatccaggatgccatccgtctctaccgttcggtgc tctccttgtctgcgtctgccccatcatcaccggttcgtcaagcagcggct ccagttgctccggaacagccgattgtgcactcggactactatggtggacc atcggatattccgttgtcctaccgtgttaagtacaccacgacccagcagg cacctgcgtctccagctccggacttcacagagcaacagttgatggctcag ctgcaggctcttcagatccagcagcagcagcagccggcaccagatgttcc agtcgtcgagccagttcaacaagttcagcaaaagccaaaagttgctccga agatgcttcacaagatgtatgacgatgaagagtctgggtactgcttcgct cgtaaaaaggatgtggagcaagaaggagaagttccggagaatccacgtgg nccgctccagttcaccactttccgataccgacctactcgggctcctccgg tcaactatgaagccttccggggtttttttccc >AW057316 catcatgttccttcgcaccctcgttgcccaattcacaagtttctgccatc agctccatcaccttgcaactctgtcaagagttctgtgctggtgtcaatgg tggtgaatcttacgcattctgctctccatggatcagttttgccactcaca gaaacaagacttgctacaatctctgtgttcataactgtgctgctgtctat gatggttcctgcacaactgataaagacttcagatgctgcttgaaaactac tccagccaagaaacaagaattcaagatgagtggttgcaacaagccttaca acaatctttaaatgagttctctggtt >AW057317 tctatgcaaaggattgttttaacattggatgggcagcagataaatcaaga cgactggacatgatgtgcgtatttaacgatgctctagacacgagtggatg ttgttatagagacacctcaaacttctgttcagaggggatgtcagtgttgc catctcaacgatgtgatacccttgatgactgtaatatgcgaacaaatcaa actgcgcaaagatggtgtgatcccgtttcaaaatanttgctgtccgattt gaaaaaggaacaagcacttttatgcccggacaacagtacagctttaatga atgaacatcattgtattaactacgacgaaaaagatatttggagtggaaag tgtaagacaccgaacggaatttgcaaatatggacactgttgcccatcaaa taaaactgaaaaattgttacctggaactccatatcgcactcatcaaaagt gcactaacaaaacaattattcgtgatgatcaacgttttggatactgtgat cctaaaaccggaagggtattcataatgagtgaactcaattttcacgggca gagaaacaaggaactctcgtcatactgtaatactgcaagagattgcggtc ggtcgtttggaatggataacgtatgtgttcgaatgaataaagaacgctca atgctttcttcaa >AW057318 gcgactgcgttggatcaaaccgacatggttcaaattcccaacactccaac attggttgccgaggaaaatttgaaccacaaacgctctaaagcaaatctcg tggtggctcaagagtctgtcgcaatggagcacatcgctgctcatcagctt ccagctcccgagccacgtcatcgtggaccggcgattaaggataagccgga gagaaaggatcgtcttccgacggttggagaatattttgaaaatgataaag gagatcgtttcattttgcgtcagaagctgggcgatggtgcaatgggacat gtttttctgagcatttttggtggcagaagtgttgcaatcaaagccgaaaa gtattcaacagggatgcttccaatggaaattaaggttttgttgagtatca gacgccacaatggagttcatttctgtgatatcattgattatggaaccatc cgtcgtgaatacaactacatgataatcagtattcttggaaaagatctcta ccgtcttcgtgccgaacaaccgactcgttcattcactctcaatacgacta caaagattgctcttgaaactattgaagctattgaagagcttcacaatatt ggatacctgagccgtgatgtcaagccaagcaactntgctccaggacaacg cgacaatggacagcataagacaattttcatgtttgaactttttggg >AW057319 tcagttctcatagtttcttgagtcattcctgggtgtgatttattaaaaac atcctcgcatcgtggaataattaaacatggagagcaagccagttgcgaca aaccaaaatacggaattggagaaggcaaagctgctgaaaaaaaaaactcc cgaagaattggcagcactggcaagcaagaaggtattctcaacggaatcag ttgaagaaccagttccagttactcgtcgtccaagtgaattgtcaatgggt tcattgaataatcaattgaaagaagttcaaatgggtaaaagcattcaatt cggctgagaatcatgcatataaaactgccaatgacggagagaaaaaggac >AW057320 aattccaatttttcgaaacgatggcngtcacatatgatacacttcgtgca gaaattgaagagaagaaagaagactcagtttccgaaaggacaactcgaaa tggaaaaatactgaatgcaaaggatgaaccagagtttggaatgaatatta ctccaaccacgttgttcttcaagtacccaattgggggaattggttattca ttctttacagtcaccaacacaacatcggagaaataagcattcaaggtgaa atccagtgacaacacttggttccgttcaaaaacccagcggtcggttcatt aagagtggtgaaaaagttcatgtgagagtcacattcaatagtccagacgc aggaaaagaaagacccgaaagaggctcagacaacaaaaatcatgttgcaa tcttccacgtggcagctggtgacgcgaagacgtacaaagaagcgtttgcg aagaaagcggcggacggagttcatcatttctttgcaagac >AW057321 ctggaggagactatacacaaagagcaggaaatgattcaaacaccggtgat atctgcctcatgacgttccaagcggtggcaactgatatggatgctgctcg tgacttctgtaacatcaaagctccgtggcggcttagagaagcaaagattg ataaatcacaagacagtattccggtgattatctgcgacgttgaagctaca ttcacttgcaatgccggatggattcaaatgttcgggtattgcttcaagat gagtgaggtccatgatcgctacacacgtgaaaaagctgagcaatactgta aagatcaagctgggccaagttttcaaggagaaattgccggcattcatcac agatacatcttgacgccttggagaagct >AW057322 tgccaaaaaatcagcgaagaagttaacaaatgatgacgtggatgttttga cgaatcccagcgacgaatacgtggattctttcatgaaatatcacggaaac ggtcgagcagtattcaagcgtgaagatttggctcagtggcgcgatagctt cccggactacaaattcaaagtgatcagcttaaaaggagcgccgagagtca tcgcaacggcacatttgtgcacatttcgcctcattgatccattcatccat taaatccgataatggtcattgggttccgtatggatagatcctggatttag atctcccggaccggcaaaacttcagaatgatatgtgtcgagtggaaatgg acagggaagatgataatattgtcttacagattaaccaaccggtcaaaaat ctctggcacatgctgtccaagcggtaagaattcttgtatctcggacacaa agccggcgacgttggctacaagacattctacagtgcacacgacgtcgtgc tccctgagaacttgcatttgtctggaattacagtgatagatgctcgagat gctccgaagagagatatcataaattatgatcagacactccatccgtatca ccgagacaagtacatcatctatcatatgtacgatcggtatggatctcgtg aagtgtcgtatgatgatgtagtgaatcgtgtatcgtcatctggcctagtc tggtatctatgatcatcagatcagatgtaatttatggcctaattatgctg tatatacacgtgtcgctacagcaatcgtt >AW057323 gtcggatccggctgcaaccagcccctgttcatccaatctccaggttgtgc aggctgtggtggcacccagaagttcccagtgatgaccttcaatggtgtgc tcgtcggagaaattgtccgcctctacccgggattcatgcaggagatgttc accgatgcggacacctatattgttcatttcccaatggacatgccgccaat cttgaagcttctgcttgtcacctcggttttcctgatcgacttcacctact tcgaggatcgtaaccaggatcaacatcgtaacggcggaatgccttaccga acgcatagcagcttctaaacttataatg >AW057324 ggaattgacaatcaggttcaatcctcacgcgatgatcgtctggtgtggct gacgttctgcccaactaaccttggttcaactgtacgagcatctgttcaca ttgctcttccaaaactcagcgctcgttaagactccaagcgattctgcgat aag >AW057325 gtgaatggttccaaggagacggtatggttcgtcgcaagatcctccctatc gag >AW057326 ctggatctatcatgtcaatggctagtgtttatgcaagtgtcgttgctcca gatctgacaatctaccatggagatcgtaagcaatcctaccattttcgctg acaaggggaaaatggtcgttatcaaccggaaaaatggggtgattgtctac atgcttcgttgtgtcgacggccgacgtgtctacattgagaaatcttccga aggagccagtcttattctgactaatcaacgtggaaaagtgat >AW057327 tttttacgatacaaaaatcactttactcatttacaaccaaaacagaaatg atattctcgacgagaccagactttcccttgaaacctaatcctcactgaca ttgcttgtcttgaaggtgatggaagtggtttcacggctatccaaaggcat tgacactgacttgctgttcttgttctttctgattttcagactaccctttg aagatctatcagtgttttccgacttttcagttggagacttttgaacgatt cgatggaagaagacgagttggatttggaggcagctctggatttacttttc tcaatatcactcattgaagttgatggggccttcattttgggcttctcctt cttcgcatcaccagcctttttcgttttcttatcaaggcatggaaggttca aagcagttttgcggaaccacacaccgacaacagctgccaaaatgttgacg gctgtgataattgcaagtgcgatccaagatccagcgggttgattgtcgaa gatcaagtagacgagccagtagacttcacacgcg >AW057328 taacgtgtaatattttagacgcaaattaaccacttccaattctggaatat ctggaatcattcagaatggtagttgttccagcaacttcttcacttccagc caacagcacaggttcaccattttgatcgtccctttggctatcgtcatcaa agaagaacttcttatcaccaatgaaatgttgaaccaaaacccttccgatc ttgaaaaggggttgacagaatgcacatagcaatactatattgccacgttt ttggtacattaacggatgtttctccttcttttgctccaggacataagcat gagcgagcaatggccattccgtctcaattcgattcttttcgtgagtgatt actgcgatagagtaaattaccgatattgctgatacaataaccaagccagt agtttgtacctgtccaacaaacacgtcaaattcacaaatctcatgaactg tgtagaaaatcagacaaacacaagtgagagccggcaagaggatcaaatac atgctatcccagattgaatagtgagaccatttcaaacggtgacaaatggt gactagatgacttgtatcaatcagatgatcgttagatctcaagttctcag tgatactgaagttcga >AW057329 atgctaaagttcaagtattagacggactcagtgggatgatcatcagacgg attatcaagtacaacacgcgaaagctctgtctcattcttctcttcttttg gtggacggagcatcgaatgctcacggcactgagttccatcaagacgagta atttgttgcaagagatgaataacctttccacgataagtatccactgatgc aattggattgaaatcgaggtaaacactattgagttgtggaagttcaacca attcatccataatgctccagttatccaacttatttcctcttgcccaaaag tctgtaagtgtcttcaattgatggatattctcgaccttctcaagacgatt ctgattgaaatccagaatttcaagaggaagatgctcatcgattccacaaa cgtacttgatgccattttgagccagataaatctctttcaagtt >AW057330 tgtatctcgagctgctcacattcatcgatgaaacatcttctgcagttgga actgcgaacatatcaaatacatcttctgcctgattcttggctgattccac tgccaggaagattgcacctcttagcaattcaccatagacacatacacacg accatgtgacaatcttgcatccatagtctatggagcctagaagcaactcg ggcgctctactatatcttgcgacatgatagctgtgatgtggtgtctccac tgatgcagtcgagatgacgcatatcacgagtctcagacgacagcgcctgc attgtcagcagatctgtgctcgatgcacg >AW057332 tatttaatttcaataagactgccacttctcaaattgaataggactgccca ttaatgacgagttttgccaaattcttcttcttcatggacccatttaagat tccagacaaatcaataacatgatcaccgattgacaaattggtagaaaact ttgtgcgattctcaatgttcaattctggtactcgattgcaattgttttgg aggaattggagcatcatcaatacatcgttcacgttttttggacttttcac tattttttttcttttgcttttcaagaggctgaagccacagcttgacgcgt ccatcagaggaaacaaactccattggttgataggcaagactattggattt agcagttgtagatgcatatgaaaggtcgaccgacaccattgaactcggag acaacgccattttaggttttccagttcccgcatcgtaagttgcttttgga acaaggtcatctggctttgacttcttcagtccacggcacttcaagtgttt gatgtcctccaatggctcaattgctagtttttcaatgtgcttacngtaga gcgatttgttggaacaagatgaagcagttgctgcacttatcgaaacgtca tttgagcctctgcttgccggatgtagatgcacttgggaacaggagtccgg ggtcgtgatcttgtcatagtgctatagtattcagaatacagtttcaatgc tcgattggaaccttctggctgctggaa 3.fa100644000766000024 12052713605523026 17505 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/t/data/dbfa>AW057334 tttcatccatcaaaacatttttattcaatctagacagtgctattcatgac atcttgctgaattcttatcatcacttgaccttgcgaagagcttttccttc atgatcatgtccgatctcgtgagtttgagtttgctcgatgatagaaacac ggcgagcatttggaccggtagacattgcactcttcgctggaggcttcagt cctggaagagcttgacgatattgcgaagcaatgcctttcacatctttcgt tcatttggactgatggtggttgcatgagattggtagccaacgcctgtttc gcccattgcttggcatcgatcgaatcagggcgctcgacgacaaaagtgac ttttcctttctcttggatattcttgacgagaagatctctggcaacatcct tgtcactgactggaattccatcaacatcacacaaatgatctccaagaaca agacacttctcggctagtgatccaggatcaacacgtgaaacaagcacacg gttctggaaatgcttgattcccaaaccaagttntggtccattttgaaccc agacaagggttgccaactcataaacgtatccctcacggcgttgaatgatc tttgcacgatcctcangaatatgaacacgagcttctaactcttcagcctg tgtatcgtcacggctcaccgtgattattgcacatggagcagcacagcgaa gcgcacgngaaaagtcgttacaatncttgcaattntgtccgtgcactttn ttacttggcaccagtgctcatcttgctcagaaatgcttcagcctggatct agtgatacgagcttg >AW057335 ggcggtggttgatcgggcttaatcacattgttattcgcatccaaggctga tccctgatcatttggaacagttccagcgtttgctggaagctttctggaca tgctcttctccatatgagcaacaaatggatcaatcagttggtggtaaatg acgtgggatccgtgagtttctggcaagtagaggtacaggaaaaatgctgc tttgcaacccaataaattgggaaatatgacataatgattgcggcgaagaa gtcgataattgaaaaggcaccgaaaacagtccaatagatcaaccacattg tatcatcgtcagttccttcagttcggatcgccttaacagaaacataagcc ggataggcaactccaatcaaattgcacacaaactcggcgccacttccaat aatcatgtagacacaattgagcccgatgagcccatatgcgagcatttcac gtttcagaaccagtgcgtcttccaactttttcacattctctttgtagaat ggtccatgatcagcgtacaagaaggagaagaagtcactctgagggtctca aaatccttagcgacgttgtccatcggctggtttgattccacagattcttg tcatcgacat >AW057336 ttttttccataaaatacttcattcttaataaaaattcaaattcctcgtag tttatcactgatccgacgcctcccaaattcgacccatttcaattgtgaca aaactcgtgttcttaagatcttcccgagcatccataagctctttgagctt gtcacgtggctccaggtagtactcagtagagcccatatggtacgttcccc agtgtattccaatactgttcttagcccgaatgagtttatgaaccctcaat cgcctcttccggattgatatgctgggatttcatgaaccatctgggctcgt atgctccaattggaatagctgccagatcaaaaggtccaagcttctcgcca aacttcttaaactctccgtcacagtgaccagtatctccggaataatagaa tcgccgattctcgccgatcaccgcccagcctgaccataatctgtggttcc ggtcgaaaagtccacgctgtccccagtgttgagcaggcagacaccagatg gtgtaggtctttccgttcttcacaaattctgagctctctccccagttgag ctcggtgacagcggtggaggatccgtcgactccgatgccctggccttcca tccatttcttcattccagcggaacgaaccacttgattgggggttgcgatc tgtgatctcttcacgcgtcagcanccaatgatcgtatgatcatgcgaacc actgcaaaatc >AW057337 ttgctgaattcttatcatcacttgaccttgcgaagagcttttccttcatg atcatgtccgatctcgtgagtttgagtttgctcgatgatagaaacacggc gagcatttggaccggtagacattgcactcttcgctggaggcttcagtcct ggaagagcttgacgatattgcgaagcaatgcctttcacatcttcgttcat ttggactgatggtggttgcatgagattggtagccaacgcctgtttcgccc attgcttggcatcgatcgaatcagggcgctcgacgacaaaagtgactttt cctttctcttggatattcttgacgagaagatctctggcaacatccttgtc actgactggaattccatcaacatcacacaaatgatctccaagaacaagac acttctcggctagtgatccaggatcaacacgtgaaacaagcacacggttc tggaaatgcttgattcccaanaccaagtttggtccattntgaacccagan cagggttgccaactcatanacgtatccctcacggcgttgaatgatctttg cacgatcctcangaatatgaacacgagcttctaactcttcagcctttttt cgtcacgggtcaccgtgatttttgcacatggagcagcanagcgaagcgca cggaaaaagtcgtaccatccttngcaatttgtccgtcacttttttaactt gtcaccaatctcaattttctttagaatggttcagcctgaatctta >AW057339 cttcctcaaccgccggctcctcttcctccgcgatcgtcggaatctcatcg gcacggcacgccacaaaacgtggaagctcatccggcgtcggctcccgctc gaaatccatattctcctcgtgatccgtcacacctccgagctccttgtaga tttgctccgtgagccctgccaacttgacatattcctccttctcgacaacc gagtattccttggcgataaccacttcggcaatcttgtggaacacctctgg aagctccgcgaattccttgctcttctggaactcggccttcttgactccag cctcttcgcggctcagacttccccatgggagctcgccgagcaccaggtga accagctggtagaaccatgtctcgagatcctgccgggcgccaacaagacc atcgtcactgcacggagcataatccaagcagccagcgtagctagcaatcg gagctccatcatcaccagaaatgttcttgaccaaagacgagatgtccgcc atgaagaggtgacggcttgccgcatcgtagtggaagctgntgagatccat gttgcggacgaggtagccatgcttgtgagcacatcngaacacattgagca catcctcgggccagcgttcagccgncccaagcgtgacttgntccgcatcg caaagcactggctcaacgtcgtccacccacgaagacagcnacatgtgcca cgcgtcgtcgtccagaagatctggaactgcgagagagccgc >AW057340 cactgcattcattttcagatgcagccaataaatgtgaatcatcaataaca ctatcagcatcttgctcatttccaacagtctgctttgctcctaggcttgg ctcacagtgagatgatgcttccgaggcttcttgaacatcagtctcactaa acgatggctcaatttcagctaaactagacatgagactttctgtaaccatt gacattaccgaaagattctcatccgaagtggctgccaaaatattagaatc tccaagttgggcttggataatatcttgaatgttcccaaaaaaatcattcg aactaccgtgaagatccatgccagagtcttgctaagaaatggcatcacat gcacgagcatccgacctatcttcatcacgattcctgcatccagcactctg ttcacaacccgtcattattggatcacattcgagtgtaagagtttgcacat gtgcatattcgaacatcaga >AW057341 caatacaaagaaaggttgtcagttatatttcaagtctatttcttctagtg gagcaagtaggcagctggacctccattttcccaatcgtacggagcattct cgttggatccacatgcaccataagcttgacgcatcattccgtagattgct tgatagttgggtgcatcatagggaattcggaatt >AW057342 tgccttcttcaacggtccatatgccttttttgggatgacggctgcaactc cggctccgactccagctgctcctttcttcttgccaccgcaattggctatg gatccaaggatctcgaaaattgcaatattagcaacatgattcgtgcaatc attgatactaactgaattgagaggcttggaattgt >AW057343 atttattaaatcaattaaaatcacacaaagatatcagcttgtcccatttt tcttcttggtgtctccaaacaatcttcttctttctccgattcaattgact ttctactgattaattgtggcattgagttgtcaaatgaacgacttgttgcc ataactctcttatccgaaattttgacgacgactgtagtgagacgaatgaa gacggccatactggcaagagccgtagcatcaattgaattactgcacacac cagagtaactccattaacagcatgacagatatccgaattaagtgttccag ttgagggagttttcagatattgacatggagaatcgaatgatttaatgaag agtgcaattgtatgtgcaagatggaaagctgagaatacaaatgcatgcat catgaagagaagagtgaagacaccaaatgcacgacgagttgagcagagtc caatgagtgacatgagtgccaatgcggtcatcaacatatctttagtatcc agagcagtctcatcgagtaccgctcatgatcctgcaactttgtatgcagt tgagtagagaccactgactaggaagatgacagtgatgacagcacacagtg gtgcttcaatgacgataagcataccat >AW057344 atacataaacctgattgacacggtaaagggaggtatccgagcacttgatc ttataagctttacgttcatttaaagtattctctaattcaattggattgac tccgccatgtgcctggtattgaagtttaacaggagccacattcatttcct tgaagtgctggatcgaattgactgcttcagcaaggctatgcccaacttgg ctatcagtcgtttgttgagattctggaataagagcaacttggccgatctc tcacgactcttcattctattctggcatccgaataagtcgacgattcttcc gctcttgaactctatagctctacgagagacttcgatgagt >AW057345 cacagttcaatctaagcttgaatagttgatccaccacacggagttgcttc aaggcacacattgacatcacttttgttctttttattagcctgaatacggc aaacctttgagctctttgcagaatgattcgacgaacggcttgatctacga gaagatctcattgactgcttctccggttgcttttcagttccaaaagatgt agttgggaaaaccagcatcctttgttcaattgacgacgtttgtactcctc aacaaatcccatcattgaaccaattggaacagcagtggagacatctcttt cgatgttgactgcagttgtagtgtccaattcttccacgtcaaccggataa tcagtcgactcgatacacgtggcatatcctggcaagttgatgactgtctg gggagccgagaactgatccgtgaacttgacagacaatgagaatgcctctt gctccttagttcctgcagttttcttctgatcaatgaactgcttgaggctc ttcgcatagatcgggcacacgtatcctttgatcaaaagattcatcttgtc ccattgggagagctctgatgagttcacacggaactggatggcacgatgag ccaatggtgggaaacggaaaacttncgttgngaagtgacggctncattcg tagcatgcaacttcagcnagaaagtccttgccagtggtgtgcggaacaca aagttncgtntgacgatctcgcggccagtggaatgtgcaattccgtgggt tgacttctttgacaggcat >AW057346 ctagaacgcatttgtggatgctatttaaatgttctggctagatttctcgg gggaaactgaataaactaaagcttatcaacatctgcttcgcatcatcggc aaactcttgttcagcgtcggaatacggttcgatgtcacacacttcggtaa ggtacttgcgagcgttgctcttgtctttaattgctaccgagcatttggac acataaaatgtggtctccatccattgtggattgaccttataagcagcaag gaaatcctcgttggcctcttcataggaatgtgatggtggttgctgataga acgtggcagcaagctttttctcaagccatgtcagagatgccaccgagtac ttgtaacggccacgcaaatggaggagagccgtatcctttggctccttagc aagagctttgtcgagcaattccttgaacttcttactgcattccaactgtt ctttcgttgccatatattcggttgcttgtccagtcaacacagcattccac ttcagcgccttgaaatggtttggatccttctggacagcttcatcagcaca cttgagtccctcggagacacttgccatacgttgagctcttgaacacatgc agacttctcatgaaatacttgagcangtctgcacatgactttcaccggac gatcacctttgtcgagacgcgctttgagcaaatcatatacttgatcgcgg ntcttggntcccgaagtntngtcattttcatcatacatcat >AW057348 tagattgttcagatgatccataacacgagttattcagcttttggctcatc atctttgcaggtgagaatagttccctgtgacatcttatatgatccatcct cttgcttcttgaaatccataataagctcatcattctcttcgaaaacgaat ccgtttgctttaacatgcattgaaaggttatcctcacgagtgtatgcttg aatcaccttcttatgtcagatcagttgggaatttctctttttcaataaga gcatgcatgttgaatgcaacaaattgacccttgtacaaacgattaccgca catgttgaagatcatgtgtggcagaagccagttggtgagatcatggatat cttcactggttctcacgtgcattgtatcggcggccaactttgtggcgact gtctttgccaattccaagtctgcagcatcacgagcagcgtcacgtggctc cgcggtcaccgggatcagattactggtatcagcctccctttacacgaatg acataatgccactgagggcatgaacacgaaagtccagaaaagctgcat >AW057349 ctcttctagcgattgatgtaatagtaattgagcatgcggaacatcagaat catgataactccagcaattagcagatcagcacatccgaagcatatagcca acatgtatccgtgacgattagcatcacaaacttccttgctcataccggaa ctacaactgtcccctctcctaacaacatcaagaataatttccaataaaag tgaaaatacagtaagcgatccagtaaataaagttcagagcaaaactgatt atattcgagaaatattcgggaaggaacacgataagacccatgtacaacgt ggcaaacacgagccagataaagtaagaataatgggtgctggcaatccata gactgagtgttagaattacaaggaaacacgacaaaccaaacataatgaac ttggtgggaacacctgcaatgagaatctcatccggttcagcagcccactc gaatccagctctcaaacgtgccggctcattctcattcggattttcatcct ccacatgcacgtattccaaatccacggatgcacaactgtctggattcat >AW057351 ttttttttactagaaaatatctttttatcaataaacacatattcaataca gttcggaatggacagaattaaccatccaacttgcatcctggtttagcctt acggcacaagacagtagtattgtcagcacaagattttgcagaagcagtat gctcttttccttgataaccatatccgtacttttcattacgaacatcgacc tttgccaaaatttgctttcctccagaaacattgacaacttccactgagat gtcacccttgtggttgtcacatgaactggatgccattctttgtctccgaa tttggcagcttctgggaacgaaatccatccataatcgaaaccacgaaggt tatcaggcagatgaacgagaagttggatagatccaacattcttggtgtat tcgttattgaaccacgagaagttggcagccacctttccaccctcattcca tacacgacccatgacaggttctgcctgcatgtaccaaagtgcaacgtatt gatctggattgacacctggcaaagtatcaagagtcttatcaagagccttg acaagtngacgagtttggcatgcatcaccctcacgaatgtccatgcatgc atctctctcgacacgaaccttttgtccacttncacagttggcacaatcac aagatggtggcccaccaatgcagctacaggtgatgatgtacatgtcattg agtagtggtccaacttgttcagcacttttncattttgacgagaccaagcc tcctcagtnctgtngtcgacataaccaagaagaatttctttcggacgctt ccagagaatggggaatgatccacacnacgcaccaagtgatggtnatcg >AW057353 tcgaacaatgtcgtatgcacagagtttctacgccgatcagaagaaagtcg agaagccagcggagcaagcttcctctcctgccacggctgccttccccgct accaccccaatcgccgaggatcctctgactccatcccaaatccaggatgc catccgtctctaccgttcggtgctctccttgtctgcgtctgccccatcat caccggttcgtcaagcagcggctccagtttgctccggaacagccgattgt gcactcggactactatggtggaccatcggatattccgttgtcctaccgtg ttaagtacaccacgacccagcaggcacctgcgtctccagctccggacttc acagagcaacagttgatggctcagctgcaggctcttcagatccagcagca gcagcagccggcaccagatgttccagtcgtcgagccagttcaacaagttc agcaaaagccaaaagttgctccgaagatgcttcacaagatgtatgacgat gaagagtctgngtactgcttcgctcgtaaaaaagatgtggagcaagaggg agaagttnncgagatccacgtggccgctccagtcaccactccgataccga cctactcngctcctcggtcaactatgaagctccggtntcaacaactacta ctcgaaaggagtcagtgggccattcgagtacattgaatgtcgaacgattg cagttcatctacg >AW057354 ttttttttcactacaccaaaatttttattcaaatcaatgcatcatacttt tcacacacgatcaaaaaatctttttcgagaaactttttcttttggagtca gatgatggaggtctaggcacaatctagttggcaaccaaaggaatgacggt gagctcgcgtggtggatcactctcaaagactcccctggcacaattctgct cagcagaagcattggatgtgagaacaatcaagtgatcagtcttctcgact ccattatgacggagcacatcgatcgaaagcttgtccccgggctcaacaaa tccgaagactgggttgacgcggtagagaaggttgtcggaagttttgacct taaatgcctttctcgacttggtgttgttggcaatgctgactgtctgaact cctccggtcgttgcgaatggaagcttgttgggcgatgcacgaagagtcga cgtcttcactgaagcagaaccctgtccctccgggaatgtgtacttcgact ggttgggaaccaacttcagagctttcgacttttcggatttctcaccggag tccgatttgatggacacgtcggaaacatctgcctggttaaccgttgggca gaggttcttagccttgttcaagtcgcacttcttggaggacttgtcgcatt tcttagatttcgaagatcttttggatctctgtgactttgactttgccgtg ctatcacgtctgagcttcttcgattcagacttgggacctcctgccgctcc cgngatgggagaaactcngnctttgnattgaaggctnctagattt >AW057355 tcacataatttcgttattgcaccgattttgcacgagttgacacaaatcac ttcttctggaagcaatcgcttccaattccagccagtgtctgatagttcgg atcatgagtagcagccattccaggaccaccggcttttggagcacctggag ccgcaccaccaccagctggttttgcaccttctttcttgtcgaaacaatcg tttccaattccagcaagtgtttggtaggtggggtcgtgggtggcagccat tccaggacctccaggttttggagcaccagcagcaggagaagctccaccag cagcagcaacaggagaagcttttcctttctcttggaacacatttccatca actccagcaagagtttgataatttggatcatgtgtccctgcgacagcatt cttatcacttggagcctttggtccctctgggtttggagcactcgatgatt tttctttcttatcactctntgcactggttgcaccatctcctcctcctcct ccagttntagctccaccaccacccgatgatccaccagtcttcttctttcc tccacattgtcaatgacgtatccggtgaacaccattggagcgagcatgct gtaaccgagatggggcagttgtagaagtgttcactattttctgc >AW057356 tttcctctccaatatctgattctgcacgtcaacctccaactttctcacat gaattgcacttctctgcgggaaaagagcacatcccaacggtgtcatgaag taaagaccaatccccgccaaaaatgtctgaatcggcatctcggagtactt ttggactagtggacgagctttgtagtaggcggttcgcgtgaatcgattca tgatcaccgggctcagaaccatatcaggcatagccatgagaattcggctc agtgtgacttgagcaattgcgagagccgccagctgcttggattttgcgac gagatggtcgtcctcatcacatagctcaattccctcagaaagttcgcatt ggaatattaatggcatttgcaactgcaacagccacaaatggaaccaatct tccagccaatccatgactattcttcaccatcatattcagaccgagagccg ccgttgtagccgctccagtagcacagaaatacgaatagagtagctgctga ttggtagttctgcaatttccgcttcgattggtatagttgacaattgcatt gaacgactgattgatccaatgggagaatataataccangacaggtacgat agcagctgagaagcattccagtgataaccatatttgctgggtctgagcgc tcattctnccgagacagaaacatctttttcgcggtatcaggatgatatac ttgaaatcatagagatcttttgcactcatagctcaatcatcgtcaaatcc ggatt >AW057357 atgtcagtatttagaatgttttcgaatcaagtgaactgaggattgcgaat ctgacgaattgcaggtgtgtcacaactgtcaaccaggcatcctgaagggc tcaactctcctcgcattgaaatgttttgaagccgcttctgaagttctcga tgactactgagtgtgcgcatcgcctcatcagcttcagttacagggccacg tagagggctcagaagtggaggaggatggttcatactgaaatgtgaattgc atnccgaaacatgtgccaatggttcaggaagacattcaaaattcggcgga gaatctgtcagtggaatcgaatggggtagtccaagtagttccggcaactc atcggaaacttctttgaggccagttggtcgtgcataacagaaaacagacg atgaaagtttaatcatcgtttcgcgcgagtattctatttttaagaaaaga tgttgatcttcttggtttcggtagcataccaattctccaccttctggaac ttgaattcgttcaccttccttctttgcgagaactggtggcaggctcatcc ggcggcaaatcggggtttcat >AW057358 cgcaacgccactttaatctccttcataagaccgtcttgccaggaagaatg gttcgattcaattgtcttgattccgttcgccagttctcgcattctctcac aagccacttcaaattccgtggtactgttatccaaacgagtcttgttaact tcagtctcgttgttcgcatatttatccgccgcactgtccatgtcgagccg agtcctattcaaattggcgatactgtccgcgtgaagntttcccctcgttt tggtagaatgacttgtatttttcacaaatcgcgaggacccctttcacggc atcagcttcggcacccattctagcatcaacagcatccataaatggtttca cattgatcttgtcaccgagacgcgagtaattctcaagggtgccagagaaa ctggccttcttctgccacttgctaatatcttccaagtttgtcatgctctt gtgaaggtccttgacacgtggcgtcagatttttgaagtagcccatcgcat cgtccacttctgcgggaagagttgtcgccttggcacgtccagtcttttcc ttaactntctgcttcanacgtccaaacat >AW057359 aaaattaaatcaatatggttgacgagaatgagaacgacgagttgctcgta tctgacggaggcgaaatgattgacgtcgagacttgatgatttccatgggt gatacgtcaggtgtggcgatccagcgaaggcgacatagtggacacgaagg ctgggtagaggaacggtgccacctttgaacacatgttttgcatccaataa cgttcatgcatcgattacagccttgtgggtcaactggagtctcaaaacag acggtacactctccccgtgcgccgatcggtgagccaactcgagctgcatg atttgaggtgccttgtgatgtgcttggctgattcgatggagcagcttcaa aagagtctgacaatttgacttcagattcgtcatcatcaagggttatggca ttttcattgattccttgtgacggcaagtttcttgcaagaacggccggtaa aacgtccacgtggcctccataaccttctcttccagctgcggctctgattg ttggctgattntgtgcattcctgggagttgttgtccttgtgacccagtcg ctgcgctggtggatcncgggccacagcagccacctgtactccagtacctt ctcttccagctggagctatggntcctgtcagatnntgtgntgagcttgag gatatctcgatcacttgatatgacagtntcaggtgtggaggattttgaga agcgctggttggctggatttgcccccagtc >AW057360 taaattaaagagatgaaagctctttctcaatagccttcgagatcaaatcc atcaagttaattccactaatcggaagctgattcgcatgaaccttgcacgg ctgtttaaccgaaatcaacgagtttgaatgctcaatcgggcccaaattct tcttcttgttcttcgggcatccttccttatgaaccttcaccacttttcca atctcttcaacatcatcaaaaactacagtaggaatcggagctgtgccaag tttgaattgatgttccggagtaaaatgatcgtgagtgagacttccagttg atgtgacaaggtcttcaagtgaagaagcttccgaagacacaatgtatggc ttgcggcactcaattcccattttcggctctcttttggtgcgatccagaat atcagagactcgtcttttctccacattcaacaaccatctcgatgcatcaa acttctcaatattatcctccatctctcgaacatcttcataaacattcgaa tgctctctcatcaacaattcacatttcttctgagcttcttcagatttcag cgtcaattgagctcttccactgatcacctccttcaatcgattctcgaatt cagcttcctgatgtccaatctctttacacattgncttctcgntgaagctt ggtcgaacaaattatctcgtgagaanggacttcagttgacaagttcttgc gtggaatgcttcacccgtgatcatatcaactcccatgggctcttcgatag aa >AW057361 gacttcaagattgatgtcgaatcgattgtttaggcagccaccaatgggac aaccatcatctcacgatgctccgcttggttagcaaatgcctcatgaggat tagtagagccgttggcattggcagtcaagagcacaagtttgtcagttttc tgctctccattcaatctcaacacatcaatcttaacggaagaacgtggctc agcaaatccataaaccggattgacacggtaaagggagttatccgagcact tgatcttataagctttacgttcatttgaagtattcttcaattcaatttga ttgactccgccaagtgtctggtattgaagtttagcaggagccacattcat ttccttggagtgcttgatcgaattgacttcttcagcaaggctatacccaa cttggcttccagttgtttgttgcgtttttggaataagagcaactttgccg ctctttcccgatttctccattttcttctgggcatccgaatcaagtcgacg attcttacggctcttcgaactctttgagctcttacgagaagatntcgatg agttagattgcttatcagaaacaccagtagcagcgggaacagttganact tgagaacccagagtggtagcagtcttgactttcttcgatttcgaagactt tgatgattnccctccacgcttgctgctttcctttncagattttnccgacg ttcccttcttactagacttcnacgatctcctggagtgcctagcttcga >AW057362 caaatgggggtttttctttgttctggatatctcaaccgacatgattctag aagagaatgctctgtgcaatcatcgcagtcacttgataaggatcgcagtt tgatgacggacgacgatcctccagatatccttttctctccgcagccacct gtctcggaatacggattgagcatccacgattggcgactccccatgagaat ttgtcagccgaacttgtctcatgacgtcctgtcaaacgacgaagattgtc ttctccaccatgtggatcgtacaccttcatcgcctccaaatgtgtccgct tgagtcctgtcatggcggcttcaatcgcagcaattccacctggagcacgc atttcggcagtcgagaagttggtgtggcatccggctccgttccagtctcc catggtgacctttggtttgggatcaagggatacgcagacaccgaa >AW057363 atttgaatctcttctggctgctccgttgtttcagcgccgcttggagatgt acttggaggctctggtgaatccgttggctctgcagtggacgctgttgatg gttgaacggtagtagtcgtggaagcttcagaacaaattccagccaattcg cacaacgctggatacattgctgagctcattttgtagtaccacgggtattt ccggcttccaatagttgctgctaaatcgaatccaaccatccaaactctgc cggaatcattttttgtgagaggtcccagatcgaatcccttttcggtattt ggaacagttccgtatagaatatccggactaacaccggcgccgacattcac ttttcgatgcttccattccgaattgtttgtacatccataagcgtcaatgg catctcctttggcaagtttgattgattcatcagctagacatggataactt gagttgccttccaaattctcatcaagctccaaaatcattgggaaggagta cactctnttttgcaccaatgttgatttgcaaaagttcagaatatatgctc tggcaaaattcatcactttcggngagcanttctccttttgagaaaggcac ggctcaatcacatccgagagcttgtttaccgaactttccagtgctttatg ctgtcatcacaatcaatcgaatctncgctattnttccaagctgatttgtg gagaaaagccgnngagagtgatcagtgtcgagagatatataa >AW057364 aatatctagttcatctgagagctggtggctccttcaccagtagttccgct tgtcgaatcagcctctgtcttcttggtctttgacttggagctctttgact ttgaagtatcgtctcccttctttccacctttctttcctttctttccatca ctgncttttcccttttttcctcccttcttctttggcttgtacttgtcaaa cacagacttcgaacggtagcagacgaagctcatgactgccaaatgacaaa aataaaaccgcanatgaggaagctaatgagcacatagagccatggttgac tgacatcacgggcgattttcttcagcggatggtagtagcagacgtgttta ttgatattaacatacacgcaaacagttcccgacttgcagtctttgttaga agagcatgccgatcccttcacctcagtacgttcttctccagttaagtgcc ttttgcccaaaataccaacgcgattagtgtcaggatcgcaaaatgcatag gtgctgaattgagatgggataggtttgttagcatcacacggataattagt caagtatggatatggggcccaaggtaggcgtcgctgactcagtaccattt ccagatggtgcaattgtcagggtaagtgatgggcagcantgaccaaagcg acagactccggccctcgtcttgcacactccagaacttttcttttcttcaa cacatctacacactttggctcattataccatgtgtttgatagtcc >AW057365 agagagaaaaagagaaaaaagtccagaattcgagaagttgagagaaaaat cgaagcccatctagttttgtggaatcggtgtcaaattggggcgagcttgt gagagcacgtggtccacgaacgagcaggtgacgtcacacattagacggcg cgaacggcagtatctggcagcaaactcacagtaggaggcaaaaatgatcg catccattctagtgaacatatttggcaattgacataaagaattgctcaac aacgacggcccagtagcgagcccacggatagctgagcctaacgacgagag caatgacacgtggaatcagaatttccaagttgcaaaccaccaagtagcac agaatgatgccgaacacgagcttggtgaatctcggattgttgcggaaaaa gttgccaatcgacgtggcacagttgccgagagtttgaagacgattct >AW057366 gatcattaaatgaacttttggatttcatcgtagatggcaagaacgagtgc tccaccagttccacggaatacatttgacagggctcccttgaacatggcgg acattccttcgtttttgatgatcttgacggcacaatccaaagtattcttg tagagaacatctttgcgaccagactgcatcatcatgcgacgacgaacagt atcccatggataggagaggattccagatccaacagtaaccacttgagcaa tagcccaggcagcgaagaagttgagtttcttgccatcagcagtgaacacc atcttggcagtgtcgaacattccgaagtaagcggcgcggtagatgatgat accttgtaccgagacaaagaaacctctgtagagtccgattggtccatccg actttgcgatcttgacgagacaatcggccaggcctttgaattcacgttca ttagcttttccgacatcagcagccaaacgggtacgagcaaaatccaatgg gtagacaaaacagagcgaagtggctccagctgctccaccagaagctagat ttccggcgaagaacttccagaaatccttcttcttatccaatcccttttgg aaaatgntnttgtagtatccttgaaagcgaagtcagcgcctgggtcggaa gtatcggatgacgtagcaagtttctctcagagag >AW057367 acagagatcaatcatttttcaacactatgtgcacgatccttccacaaatc attgtactggcgaccagtgtcaatcatggcatccgatactgtctctccaa gctcacgaatcgatgaatcgacatctccggatgaagattttctagctctt ttcgaaaaagaagatccctcatcagtgaaacgaacacttttcgagttaac ggaagaaatagattcacgacgcaaaggacgttttcgtgggctattgatac ctggagtatgcttcttcatcggtgttggtggtggaacaggtttgaatctt ccacgaagatctggatgagcattcatcagagagagagcaacaaatggaat tccgtcaaccaaaagctcaaaagcttcaaattctgcatcttgtgcgtatt cagcaacaaatccacgaacttgattacgagcagctggattcttgataatt gcaaacttttgaactggaggaatatgctccttctttggtaaagattttct cgaaagagatcttctttntttcatcggagatttcttcagaatgctnttac gactggaacctgntgattttgtggatctattgcgtcgactctcaatttca gcttngcaaagtcttcttctagcactttgcgaaatcttgntacggttccc cagaagtggttccaaattttcgatctttttgcttgaaatagatctttcag atccctgtcccgagaatccagacgacgagctacgaggaagcttgctttta >AW057368 ttttaggggtttcacagcagtcttcagagcctcggctgattgttgtgggt cggacatggcgcgatgggagaggatggcttcatcacggaatttcagaata gcatcacaatgctttggtctctgatcgaaaatcccgtttttgatgaaatc catcagtttctggcgacaatcatttctccaattttgagcgttgacagtga ttcgctcggcttcctcagtgcgatctttgctctgttgggtcagcgcatcg tcataggctttcaggccatcccagtacactctcctcaaccgtgcgtactc ccaatagtcttctccaatgaattttgtcagcggcttcaaattgtccaact gcttgcgaacacgagattcgcgttctgcatccaacagagaaagcgtacta atcgctggctccaagctctcatactttgtcttatcccgtcccttatttgc aattgcctcataagttttgagatattttccagcataacgataatttggtt cagttttgaagtgagtcaccaaatcctttgaaatttctggactctcgacg agcatgaacaatagtgcctgatggagcttgtcggagcatgtctttgtagc ttcaaccttcccgcacatcttctttacgtcttctgtcatcttgatttttg cgcgtttcctcttcanaatcgatgcnnatagcccatgatttggatgatca gc >AW057369 tatatctgtttgtccatttgctttcaatctctcggctaaatccctccgta cgcaacattttcagtccattcaaattgttgacgctcgtacggtgacgggg aaattccgattccagtggccacatctttcgaggatcccgagttgaaatca gtgtattgagacaagttgcgcatgttgcgaagttctgcgagagattcggg cgagtagtcacggaggacaagtgagtttgtaagcatctccttgagctcga attctccatcagtctgctccaactcgtacattgattcggngctcaacatt ttcttattaagtgttccgtcaggagacgtggatgggatagctggagcagt gaaagctttgatcggaaatgagatgttcatgttctgaatatcagcaattg agtcagaagagaaggcgttcttgttaacagtcacatcatttggtggattg atgctaaccattggtgcctgttctccatccatatcataatttccaaatgg atccttcaagttcttgatatccataatagtcagcaaattttcttcgnttt attctccatcttgtgtgttctccttcagaatccataatatgttcaaagta tttgagatggagtggggaagtttggatgatggngaatatnatgttccagg cngaacgagtgtcaactttagatgaagcctttgaatgcaataaccttccg gnactacataaccgactttgagacttttccgaaagtccgttgctggatac tagcca >AW057370 tcgattccatttccataaattgtctcacgatcatcacattttggcgtgct ggcagtgggcaagtctgaatctcacgggccatgtagagcttgaagaccgg aacttcagcagcggagtcttcgaaatgctgaaaatattgaattttcaacc agtccatattaattttccaaatctcaacatccatcgtgacgatggtgtag acaaataaaaagatgatgcattgggcaatgaatccagccattctaattga cttcacacggattgcatctgcatcttcttcaagaatatcagccactttga tgctctgagctggagcaaaagcagatgtcgtggtggtgttaatgtagact tcaagggacgacatcactggagcaagcacagatgttgatcttggagccgt gatgtttgagtagaactgcactgcctt >AW057371 ataagctttctgcttgcaaaggatttttcttctcgcgagcctcggatttc gctctctacgactgcagcgagaagtacggtgtccagatagctgatccatt tgatgaagatgctctccgactattcaacgatcttccaccaaaacaaataa ttgcgccagactcgaaattgttccgttcaagtcccacaaaagccggaaaa tgttccgaccacgcgatctcgttctgccaattatccgaaaaggatatgcg agttcttcaattcggaatgtcgctgaaagtattcggacgcggaggtcttg acacttcttcacatgagactaccgatttttgatcttctccagcagcttct ccagaatctatccaatagtatttctatcatgggacgtgtatttctcttct tgccgggtgtaaatttaattatttttgaaattaattatttttccataaaa ta >AW057372 ttttcaaataaaattttattccgttcaattcaaatgggggtatttttttg ttctggatatcttaaccgacatgattctagaagagaatgctctgtgcaat catcgcagtcacttgataaggatcgcagtttgatgacggacgacgatcct ccagatatccttttctctccgcagccacctgtctcggaatacggattgag catccacgattggcgactccccatgagaatttgtcagccgaacttgtctc atgacgtcctgtcaaacgacgaagattgtcttctcaccatgtggatcgta caccttcatcgcctccaaatgtgtccgcttgagtcctgtcatggcggctt caatcgcagcaattccacctggagcacgcatttcggcagtcgagaagttg gtgtggcatccggctccgttccagtctcccatggtgacctttggtttggg atcaagggatacgcagacaccgaattgttcagcaactctgtgcagaatgt atctcgacatccacaactgatctcccatatcgattccttcgcaggttcca atttggaattcccattgtcctggagtcacttcggcatttgttccaaaaat gttgagtccggcgtgaagacaagcccgggtatgtgtctcgacgacttctc tcccgaacgcacgatcagctcctacgctacagtagtacttntccctgtgg atcccgggaatccgtgtctcggccatccgagtggntgctcatctctgtcg acgatcagatattcctggctcatttcgaacacggggcgttgtcacgg >AW057373 agacagttcaatcaatggggaaatacatgtagaaattaatgcgatccaga tatagacgaagcagcggaagaagacgactttccattcggtttggccagcg aggaggatgcagaagtcgaaaatgagctctcagagctgctggaatccaag gaaactcccgatttaccaggcgcaacttttgagatcatcgagtggaagtt cttgttagcagcagcatctctgaaaacacgcattttctcagccaacagct tagcttcctccatgacctttggattggaagatttcaggccataaaggcct tcaattgctccacggaaggctttctcgatagcttgctcatcagtcgcatg gggctccaacgcatcttgctccttcttttccgtacgatgcaccatgtcag taacttgatccatcaaatctttcatgtcaacatccttctcatacttgtag aatgccttgttaacttgaatcacagaatccttcagaatattgaccttctt ttccaacagctggacgtccttgaccatcagtgtagaggccaacgcctgat cagcagttgccttggccttctccattntccgattcatccatttcgaattn tcagcgcgttccttgtcaatntggtgatccaattcatgcatcattccacg gnantgatcgaaatcctcactcatcngaatttgcatgtcacggagctcca aaacagtgtgacgaccgactaggtcaccttcacgaaatcgatcgatcgt >AW057374 ccgagcgggtttgtgttgaagtcatcgacggtgaaaaaatcaacttcctc atttgatggtggtgtgacgccttttctggggacatttccaactgctggaa cacttggcttcttctttctcttttcgggctcccaatgatatgggtcggac cacttgtacttcgcgcttttcatggcatcattcaagagattgaaaagctt ttcgtagtcggggcggtgatagaattgcgtggcccgaacgatcttgacaa actccagcatttgaattgggctttttgcaaagagattctggtcggcgacg tgtcgcttcatttccccgatttcgaccttatcatccaaatcagaccaagc cagctggcatctcaactccgcgagcatgtagatcaaagcccacaggtcat cgactctcccctgctcgaaacgatcgtgcatagctaccgagcaataacgg gaggtgccacggaagagagccttctcacgtgggcgacgcatcttttttcc gtcctccttatccgtgatgtactggcgtgccaagccaaagtccagcacga tgaagtagcgttcgtcaggggagcctttatttcccaaatgccacgttgca gggctcaaagtacggtggataaacccaatatcatgaatctgcttgatgcc cgacaaaagggcaatcccgatgcgcaaattggtaagacacgttgaaaaat tggtcggttctttcatagtgattcaggctttcgccggagaggtca >AW057375 tggacaataagacggttgaaggatggttctcgttcaatgtcattgtgatc aaacaagtcggcccacaggggtatgaatggtacatcatcactcgcaactg tatcggaggatcaccacactgcgaatgcgagaactgt >AW057376 ttgtccatttgctttcaatctctcggctaaatccctccgtacgcaacatt ttcagtccattcaaattgttgacgctcgtacggtgacggggaaattccga ttccagtggccacatctttcgaggatcccgagttgaaatcagtgtattga gacaagttgcgcatgttgcgaagttctgcgagagattcgggcgagtagtc acggaggacaagtgagtttgtaagcatctccttgagctcgaattctccat cagtctgctccaactcgtacattgattcggggctcaacattttcttatta agtgttccgtcaggagacgtggatgggatagctggagcagtgaaagcttt gatcggaaatgagatgttcatgttctgaatatcagcaattgagtcagaag agaaggcgttcttgttaacagtcacatcatttggtggattgatgctaacc attggtgcctgttctccatccatatcataatttccaaatggatccttgca gttcttgatatccataatagtcagcaaattttcttcgttttattctccat cttgtgttgtctcttcagaaatccatatatgttcaaagtattgagatgga gtggtgaggttcgatgatgggaatatatgttcagccgacgagtgtcactt >AW057377 ttcagctcaagtatctagtaatagtccagttatttttgctcctcgtcttt tttctcttcatctcccaaaatgtcactggcttgacgagcaattgttttag tatgactagtagtatttgcacttgtgagattccattctgagctcttttcc cacaacttttgtcctttggatgtgaaaccaagaatttcaacttctggatg tctctcagcaagaagcatgaaattaacattaagcacaagtcgtttcttgt caatgagcatatccaattgaacactctcagggaaagcacgttgtccataa gaagttgaaggaactgtttcagaattcactttgattggtgtcatttccaa agtttgatgggtttctttgactgcactcaaagtcacaatattcgactcat ccgtctgattgaatgttctcttcaaagttggagttgcaggagatgaagct ctcgatctggctctcaatctctcactcagattcacttcttcctctttcac agtctcttctttcttctcactaatctcangtggctctccatcttcttggc gaactttacgaatgactatagtctccttcacgcgaagtgttccttttcga gattctgagcagatctgtaccagtagagctgcacacgttcttatgcaatc tctacagacataaaggcgaat >AW057378 tttatagtcacatttttattcaaaaaattcatagggctcaacatgatgat tgatgagacacgcgtgctgatgagccgcattcattgctccgagcattgga ttcattccggtcaagtaggtgtctggactattttcgagtggcgtgaagaa ggtttcggggagctccaagatttcatagctccctccagtgacgggatcca gcaaattcggcgggttatacttattgaagttgaccggcaacaccttcgtc gccgggatgtttgccttgatagatggatcatagacagcagcatccgaatc aatcaatagaccacctcgatccttcttacggcatctgcacactttgaaaa gtgagttggaaggatccatcgagtgagatgacgaaaattccgcgcttctc tcgattgccgtctcgattgcagcggtaggagcgtcgattgtcggtggagt agccaacgtctcaagaagcggagcttggagtggcgatggccgtggtttcg gagcgatcggcgagccatatgcaactctagacatcgacaacggctccgac atgctatgctccgcggagcccctctcaacgcgctcaggcttatggtctgg agcctgcgccggcttgaacttcgagtcttgcagctcgcgacacgagttgg gcattgctcaattcgctggcgtatcgatggactttggtttcaacgagctt ggttgcacagggaaattggtcaactgcgatgtctacagttctgcg >AW057379 aacctaaaaatcttcatctagttgaataagggcgaaagtttatagtatta gacggactcaccgggatgaccatcagacggattatcaagtacaacacgcg aaagctctgtctcattcttctcttcttttggtggacggagcatcgaatgc tcacggcactgagttccatcaagacgagtaatttgttgcaagagatgaat aacctttccacgataagtatccactgatgcaattggattgaaatcgaggt aaacactattgggttgtggaagttcaaccaattcatccataatgctccag ttatccaacttatttcctcttgcccaaaagtctgtaagtgtcttcaattg atggatattctcgaccttctcaagacgattctgattgaaatccagaattt caagaggaagatgctcatcgattccacaaacgtacttgatgccattctga gccagataaatctctttcaagttgtgaagtcccgaaatgttatcaactac agtaatcgcattggctggaagactgagaactgtgagcttcttcaaatgat caacattatcaataagacgaatctgattagcgccaaggcacaatctatcg agtttcagattgttgtcgagattctcgatggtcgcaattctattgtcacc caattcgagatattgcagctcagtcaacgtatccaaacactcgattttag tgatatgttatgaacgacaagagagcctcagcttgtcactcgtcaaat >AW057381 taccaaaaaactcaaaacaccgtcagcgacttgttgcatgcagaaataaa tcagaaatgattgccacttatccctttgcgttgagatgttgtggcgcctt gtaagccaactcgtgtacttctacattgtacagcttgacaacgattccgg aacgagcagccgtgagccaaatagctgaagctgagagtccgatttcgcaa ccaatgacaacggtgtttcagactttgattcccttggcggcgaaggctcg atcacaatcagccttcttcttgaatcccttggcagcacaaacttctacga ttgtgaagagcaattgatcctt >AW057382 cagcttgtcccatttttcttcttggtgtctccaaacaatcttcttctttc tccgattcaattgactttctactgattaattgtggcattgagttgtcaaa tgaacgacttgttgccataactctcttatccgaaattttgacgacgactg tagtgagacgaatgaagacggccatactggcaagagcagtagcaatcatt gaaatcactgcacacaccagagtaactccattaacagcatgacagatatc cgaattaagtgttccagttgagggagttttcagatattgacatggagaat cgaatgatttaatgaagagtgcaattgtatgtgcaagatggaaagctgag aatacaaatgcatgcatcatgaagagaagagtgaagacaccaaatgcacg acgagttgagcagagtccaatgagtgacatgagtgccaatgcggtcatca acatatctctagtatccagagcagtctcatcgagtaccgtttatgatccg gcaactctgtatgcagttgagtagagaccactgactaggaagatgacagt gatgacagcacatagtggtgcttcacttgacgactagcatcccat >AW057383 tatcatccgtagttgattgatatggtgtgtacttatccgtcgagcttgta tccagccttggccttacggcagagaacaatggtgttatcagcgcagttct tggcagttgcagaatgctcctttccttggtatccataaccatacttctcg ttacggacatccaccttggcaagaatctgctttcctccagcaacgttaac aactccaacggagatatctcccttgtggttgttgacatgaactggatgcc aagctttgtctccaaactgagcagcctccgggaatgggatccatccatag tcaaaaccacgaacactgtccggaagatagatgagaagctggatagatcc aacattcttgcaatactcgttgttgaaccacgagaagttggcagccacct ttcctccttcattccagacacgacccataacaggttcaccttgcatgtac cagagagcaacatattggtctggattgactcctggaaggtgtccagagtc ttgtcaagagccttgacangttggcgagtcggcccatgatctccctcacg aatgtccatccattcgtctctctccgacacgtgcaactgccttttgtggt gtggagcagctccgcacacagcacatggggcaaatggtgtgtcctcccga tgcagnttgcgcgttgatgatgtaccatctcgtttgagctgtgggcccga gcttgcttaatcacccttc >AW057384 catgaactctatctggacatctctcnttgagcaattctcgctgtttccgt tattgccctcaacgagaatgaaaacaattatctgcaagagttgctcgatg ctggaatttcccaggaaaccgctaacaagcttgtagacatcacagccagc cacaacaacgatggagaaatttctgagaaatcaggaaaaactattttcca agaaatcatttctgagactgatgcagctatcaaacaagcaccagctaatg atcagcaagcctacaaggccttcgttgaaagcaaggcagctgaattcggt caaccagacgagatttccattcaagttgaatctgattccgaataattttc taaaactcaagtatcgtctgtattacaa >AW057385 gagatggaattgcctttcgccgaccatcggcggagcctctgcgctttgtt atgaaggacagagccaaccagctagacgcactcaacaaccgatgggccag aacccccaggcaggtgcaagttacccacccaacatgaaccctgtgacaaa cggaccacgagcttggacactgaagcctggtggatacatacaatgggccc aggatcccggtagcatcaaatcacctaatccaccacgaggactcgtctac tatcagcctgagaactatacatacaagcctggaaaaggtggaaaatatat gcatgcgtgccctatcctccccaacgcaaacgaacgactctcgggtcaaa gcgaccagtacaccgtatccgacagaatcacggtacactggagcaagact gcggcatacagtctaagagtctatgtgctctatccggtgcagacagagag cgaggaggacatcttcctatccggcactattgctcaagtcatcgcgaacc tttatgacctcaaggaattacgaggaaacaagccgccaggtgtgagagca cacagaccgatgctgatgccgcaatagcgtagngcggatccacttagaag taggaagcgc >AW057386 ggataaacttgacatccattttgttcctctccagctgatgctactgatgc tcaagccgctttcgctgcagtgactcctgctggaactgtcaccattccaa tgtcggccaccgcctaa >AW057387 ttgaatgtgaataacgatattatctctgattattccaccagtacttatga gccaactagcttcactccaacttctggctccttcacccacacttcaacta gcggtgaaaatcttcatacaatgccaaatgagttgcagcctttgccactg tgtgatatttcatcggaatacgagtcgaaaaccgagccggacactggaaa acaccaggttgacgtgattgccgaggcaaatgggatggagatgaagctca atatgactggaatgcttaaagggcttaanttgaact >AW057388 aattccaccgcccgttttttggtctctcattgcggagagcagaattcaaa gagccgtacaaaagaagacacaccgacgccaacaattctgatatcatgag ttggaattgccttttgccgattttcgtcggattctttgtgcttttttatg aaggacagtgccaaccagctagatggactcaacaaccgggtggccagaac ccccagccaggtgcaagtttcccacccaacatgaaccctgtcacaaacgg accaggatcttggacacaaaagcctggaggatacaacaatgggccaggat ccggaagtttcaaatcaactaagccaccacgaggactcgtctactatcag cccaagaactatccatacaagcctggaaaaggtggaaaatatatgcatgc ggaccctatcctccccaacgcaaatgaacgactttcggctcaaagcgacc agtacaccgtatccgacggaattacgggagactggagcaagattgtggca aacagtaaaggattctatgtgcttcatccggtgcagacaaaaagcgagga ggacattttcgatccgtacactattggtcaagtcattggggtacctttat gaggttncaagaatttggaggaaaagannagccggcaagtggaaaagtac accgattgagctgaatgggcaataaagattccggaccatttgaagaaaga aaagccg >AW057389 aattccgaacaactcagatgctgccatgatagaagacgataaaagtaaac gagagaatgttccattgacagttgtctactcgaatccatcaatttctgta tcaaagaacttttccactacgccacttgctgagaatcaagctggagcatt tgataagctggatgatgatgaatttgagaagatgacgtcagatgttgacg aaaaggagattctgaagttggcgccacgtcttcttaaagtagccaaaaag cacgcggcaatcgagaaatgtctgacacccagagagaacgaagtacttgc caagttcttctcaggaaagcagaagctggattcaaatgtattggctgtct tggattctgcattggataagattattgattatttgcaaaagaataattgt gcggttgatgaggagacaaaggctgttatgaagaagagggataagctgaa agcagcaatgatgaaagagttcttggtgtcaccccaatatcttccaaaaa catggactgcaaaattcaatgaatggaaatctgaagccgaaaagcaaaag aatggaatcaactggttccgtgttttcttctcgtatccaaagcacaaatc atttgaagatggaccagaagacacttttggaaaatttccgccgtagnaaa tcgtggtattctgantgggacttattctgggacccggagattgtaaaa >AW057390 acgaactgatctctgtgatcatatcataataatggctacaatttgtgagc ttgtccagttgccagttggaagtgaatgtggaaaatggacaattttgaag aaactcggagaaggcgcatttggtgcagtctatcttgtcagccaaaaaga aaaacccaaggtggaatacgcgttgaaagttgaagcagagtcggatccat tgggcttgctgaaaatggaagtggctgtgcttttggaagtgaaaaagcag aaaatcgttggacgccactttttggagttggctgacagaggaaacctgcc acaaaagttcaattacatggtgatgacgttggttggaaaaagtttgcagg atctccgcaaaactgctccattcaacaaattctcaatgggaaccgccatt tctgtagccagacaatcattggaagctgttgagga >AW057391 aattccctctgaacaacctcaaacgaaagacaacgatggcggacaagtcg gcttacatgggcgctggtggctatggatcgggctacatgggatccaatgc ttcttcgtcaggttatgcccgcgaagactatgcacaaggaggcaatggtg gtggtggacagcagcagaaccaaggatccggaggaaacaccaacccaggt gggcaggtcttcaaggcccgtactgaccagtcctgttaccttggatcata agaaaatcgacacaagaagagccagtcgccc >AW057392 tcaccgccgaacagctcaccgatcctccacaaatcccgacggccttgtcc aactcggtcaacactgcgattggtggaactccatccgactttgagtcgaa ctccgggctatctgacacctcggcaggatcgggccgcgccaactcggccg tttccgatacgaccacagcaatgtcggcgaacgtctccggagattattat gaatgatcttcgaaaaggaagcctgtttggccgacagatagggagtctgg ctaccacggatggagggccccagagtgttactatcattactttctcgaac actactaacatgtcaactgcttcccagcccaccagcctcgatgacaaatc gcaaaagtcgcagaaaactggatcaatgaagactggaattccgatgagat cgcctggatcttccatggctggcacaggtgcgatgtctcgtaaaaagtcg tcgcaaaagcagatggatgctctgaagatagagcaagtgccggctgctcc cgatctctcaatataattcaatacatctaagatatcgaagagtcgtaaag gc >AW057393 tcgatcaaatcaattcgaaaaaatcatgccgtctttaaaaggaggatgat gtaatgaaaaatgtaactttcgctgaaggcaaaaaatttggtgactggaa aatcggcaaaacgatcgatgaaggaggatttgggaaggtttacattgcaa catcaatcagcgatccaaagaaagtggctgctttgaaagccgaatcaaat gaaatcgaaggaggatctgcaatcaaattggaggcaatgatcctaaacaa actgaatgccaatggacccgttccccacattccagtcgtccactaatgcg caaaacgaaagctctactgctacatggtgatgacgttgttggggagaaat ctacgaaaactgaaatccacaaatctcgtagtcaacaatggattctcccg tggaacgtggagccgaatcggaattcaatgcctgtatgcattgaaatatg tgcatgacaatggatttattcatcgagatgtgaagccacaaaacttcttg ctaggaaatgagacggatagtgaaagagcaagaattgttcatatcttgga ctttggtcttgcgagacctttcgctgtttttcatgcccgagagaataagt ggatcgcacgtagagctcgtggaactgcagagnttcgtggaactctccgt tacacgtctccgaatgttcatctncgaaagtaacaaggacgggttgacga tgtatggtccctgctatatgtcatcattgagctcaacggtgataagctct tccatggcaaaccgattctcaacgtcgacgtgtggagcaaatgaagctga acttgccggcgaaggtngtctgtcaatatgccagcctgtttgataagtga tgcct >AW057394 agacaagtacgagtctcaaaactttctgtttcgagtggaaggagccatgt tctgcgcgggaattatcgttgctatgattatgctgttcgtcatcattatc tacggaataatcacttcaagtcaaactggaggacagctcaacagatacat ggccccactattc >AW057395 tggcctgtgctaaacccgcttccaaccacctattcaattcctgtacgatc atccagactcctcatttcgaatgatggaattctagaagattgttgcttta aagtctgaagaggaaaagctcccatcactattcgaaaatgttgagggact gttgtctgtcccatctttaacttttggtacgtgggatgatgacaccctgt ctggtgtcacatctgttaatctagataagtctgatgaacaactctgcgag cgtgatgatgactacaccactgattggagtgctaat >AW057396 ttgtctgcaatggctgccgcccagaatacgctgccaaagcgaatgaagaa gaagatgcaggtgtagagaatggctcatgccacagcttgcaaggttgcaa agagagaggctcgcgtcgctgaggaagcatctggaaaatcaactggtgga tctactcgcggagccaagtgatagccgagccacaacacatg >AW057397 agattcaaacgccagagttggaagttccgtcaatcaacttggattcaagg gacatgggggaggacttggtccaagaatcattatggctggaactttgact gctcttcaatggttcatctatgattcgatcaaggttgcaatgaatcttcc tcgtccaccaccaccacaaatgccagaatctttgaagaagaagcttggaa ttccgggaaccactgaagttgctccagtcgctgaaaaagttgctgctcca gagaaaaactcaaaatgtgagaaacccagaaag >AW057399 tctatcgaatttattcaagaaaaagttcggaactttatgataatattgca tatcttcatcatccaagtggtgtcaccgttgtcgttcttcgaaatattcc tgaaagtgaagttgtcgaagtggattttggaacgacgaagaagcacggcg cagatcgttcaacaaatcaagtttctggaaaaggaaaaaaaggagctctt attctacaacccgactcaaaattgtgtacattcaaatgcaaagatggttc agagcctgtcctgagagcaggtcgtcgtggaactcttgttgagatgaacg atcgcctgaaaactacaccagattttattagaacagcacctgataatcag ggatttatcgcgattatcacctacggagccggagtacgtgaaactgaagg aatgggagatgaccttcctccgaaaagacttttctta >AW057401 taaatcatgttcaagtacgttctcctcgtcaatancttcatcgccctcat cgagatggcttcagccgatttttcgtgctatttctcggattccatctgca aatccatcacctgcagaaactgcaaagtcgccacctgtatcaccggagac tgcgtctgcaccctttgctaattttttgaataatttttttatcttttga >AW057402 tctgaagcncttcatcagatcgtctaaataactgccagttgaacaattag ttgatgactccgggatgctcatttcgcgttgcgtcgaga >AW057403 agctatggcttcctcgttcgacaaccaaatggatcaggatggaatgtgct ccgtgtactctgctcagccatcggagacaaattgctccatcaatgaggtg cttgccaaagaaattatcgctgtcaatgagacgccagatgatcaagctga ttcttcgatctacccaatcccaaaatcagaaacaaatgtgtcagctagcg aagggttccagccatgtcaagatatcaatcaattcaatttgtccggttac tctgctccaaaatccgagaccccagtgaccatgaatgagaagttcgagcg gtgcanagacttgatgaacgttcttgactactctgtctactcaatgccac catcagaagcaaatgttacaatgaatgttgcaagcttctcggagtacact gcccttgcctcggagaccaacgtcacaatggctgatgttctcaagaacgt tgctcaggatttggcgtcagagcacactgcaaaatcagctcatccaacat tcgacaccaccgcctacgtagagcgccttcaagccgagnctcggattccg gatagcaaagtgattggattggagtgcagcaatttttcgaatgcgaagat cattgattcaattgaatgccttcatcagttggacanattcaaaccaatcc ngtggattcgatcagaatnctgattttggaaaaactgcccaatctgtgga aagctacctnnctgcgtcatncaatgcagntcatncatcacagatctga >AW057404 agcttccggtgttcgtcaggtctccaanctactcgattaaaatttatgga gcaattcaacatgacttcggcatgtctgaagatgaatcagtccaatatcc ttctccgatttctccatcctcggattcttcaagtgaaacaatcacccatg tctcccctggatatctccttcaggaagtcttcttgcagagagcagtcatt gaagaacaaataagattattgatggttttcatgaaattcagaagctcaa >AW057405 ctatccgacagcgtcttttgatggagcagagtctgtcgatatttctgtgg actaaggaaaaagtccagtgagcacacacttgtcgttggcatcaaatcca actgaattcaagttcatggatgaaattgcacttttgaagcgtggccgcat ctacaaagacgctccaaagcatccgtacaatcgtcgtggccagcaaccaa tgatgaagaaaggtatattgtgtgacttgggtaatttcatcagcttcttg cctgctagaagtccatctatcatgacgttgatgggcggagttccagaggc cgaaaaagag >AW057407 aaaaataagtctacaaatgatcaacatttatccaccatccggcgactacc cagcttctggtggttcatcaactcactacattgtctccgaatcggaatct cgtttggcattcaaagtcaagtcgtccaacaatgaatcgtatcgtgttcg cccagtctatggattcgttgatgcgaagggaaaggctaagctcgaagtga atcgtttggctggaccagcgaaggaggacaagcttgtcattcaatacgcc gaagttccagctgatgagaccgatccgaangctcccgttgggggcttgtg ctcaacaaggagaagtcgttgtcaagatggttgctagctaagaa >AW057408 ctgccgaagtactgtctacaatacgaagattgattactctgatatgaaag attccgttgagctacttcgtcaggtgcttgtctacaatccgagcagacga ctctgtggaatagaatttctcacgaatccattcttcaccgtgttgttcaa cgagaagactgtgcgtttcaataaaaagaagatccaatgcgtgtcagctg tcgatctacaagctgtgaaatcgggagacgtcacactgacaaatgagtct gtagagcactccgacctaatc >AW057409 aggtgatgtctacaagaaggcagtgcagttctactcaaacatcacggctc caagatcaacatctgtgcttgctccagtgatgtcgtcccttgaagtctac attaacaccaccacgacatctgcttttgctccagctcagagcattcaagn gggctgatattcttgaagaagatgcagatgcaatccgtgtgaagtcaatt agaatggctggattcattgcccaatgcatcatctttttatttgtctacac catcgtcacgatggatgttgagatttggaaaattaatatggactggttga aaattcaatattttcagcatttcgaagactccgctgctgaagttccggtc ttcaa >AW057410 gactaccgtggcaaaagtcggatctaaaggttctaggatctacaaagaag acgattcaagctgtccgaccaacggtccgcacccaaggagcggttactcg ctctcaggctgctcttcgtggacatatggggatcactgactcttcgactt cgaccagttcatctcgaattccgaaggagaagttgaaaaagaaggcatca tctcgtagccgttcaagatcccgttcgaaatctactcgccgctcacgttc aaagtctactcgctcacgttccagatcgagatctcgaagccggagtagca ctcgtggaaagaagcgtgctccaaagaaggcagttaccacgaaagccgct cgatctatctctcccgtcaaagtgaaaaagacagaagccatcaaatcgcg tggaagcagcaaaaccgcccgtcgtgtgtctgcggctcacaagtaaataa tcgtcgcttcttgatggc >AW057411 tcgactaaccgtctccacttttcacttgcacaaatcttcatgcaaccaat caacgtcatgctcgctgttcttctcgccttggcttcatttgctcaaggag gcagatctgttgctccggctggtgcagtcactgaaccaacagttactcaa gctgttccagaaggatcaggacttagttcagatgtcactgatcgtccaaa catcgactccactgatgttgtatcaaatgcaacttcggtggaagatttgc ttggaagttcaacaaatgcaaacaacactggtacattcaaactcttaaga cctttgtatttgctccaatgatgattcttgctttggtgc >AW057412 ttcgaagaagcccaacaactaaccccaacaataatgtcttgcgttcaaca acaacgacgctccaccggactccgcatcgccgagcgattgaacaagtaca tcactcttccaaatcgccacagcttcacagtggattccaaggatgtgttc caacgtggtcaggtgctcagctacatccgatccaaagcacccttcctgct cgaacatatctccgaggcaaaggagcgattgataaccgtgacgtcacgcg gtttgatgatcatttacgagaatgacgaccacggatttggtgattgattt gcgatcggccaggaatgttctgtgcactgctgatcgttcgaaaaagcagc gtcatttccgctgtcacatcaaaatccgcatgcaacgtggcaatgttcac ctctttgtcggccataatgatgttcacaagtggacgtgcgcgattatgag agccgccggaaagtgcttgccgtctacagagccccgcgatgatggaagcc ttaatgtcgcgatgatgacggcggtggaggattctggaatctttgaagag atgtcgtctacatcgtcagcatcttatgactatgatgaggatgatgaagt cgacgaggtgaagcccactgccatcgagcaaataccacttccccatgtct ccgtnctgtctcttcgncagaaactcgagaaggagctcgtnctgaagcct aatgagcaagtgctcgccgagcaacaacacacagccaactgcgatgagcc aagctntgtctcttgaatcttctgcaccccaagagcagttntcagcgatg aaactgtagtctgctggagcccacgaatgccagatagaggagagaatacg aggcnca >AW057413 ttatcgaaacttgtgaagaaggtctgcacattctgccgcaaagaaattga tgccaaagcgaatgaagaagaagatgcaggtggagagaatggctcatgcc agagcttgcaaggttgcaaagagagaggctcgcgtcgctgaggaagcatc tggaaaatcaactggtggatctactcgcggagccaagtgatagccgagcc acaacacat >AW057414 tcccttctcgccatgtttgtggctcaggaagtcgccgaggaagctcttac ggatccagaagcagctgaagcggataatgccaaaaacaatgcaacggatg ctcccgctgatgctacaccgggatctggatcagatgctccagctgctcca gaaggatctggcgccgaagccgaagccaccacagcgaagagttctactgc tgcagtgaccatcattggagcaatcgccgtttttggagttgcccatctcc tctgagcattcttatcacttc >AW057415 ctattccctacctcgatcaaccatgggtttggacaaaggaaagcaacaaa ctgccaagaaacctggtttcaactgcaaaccgtttgagttcgaaatctct tcaacgaaatttcaaattcccaacgacaagccattgaagtacactttgaa atgcactgctgatgagaaacaggatgttatcattcaagttcattcggttt tcttcgaaattgttggtgcaagacgtaagcatggagtcacttctcaagag tttcatgttcttgggaaaaaccgacgaattgacctttcgaaattggtctg aaggatttgaccaatgtttcgtacgataatctcgagcatgcgcaacttgc gtctgccgctggttttatcacgttcacacactacaagtcttctagagatg atgattccgatgctttatggggtcccaacaagaacttgctcatcgtcatc actgacggaatgggagaagtaggattctggaagaagaattgtcgttgaaa ctgaaccgatgagagaagatgtgaagaagatttgttgtgaacttgaagag a 7.fa100644000766000024 22520013605523026 17503 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/t/data/dbfa>Contig1 gcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagc ctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcct aagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaa gcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagc ctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcct aagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaa gcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagc ctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcct aagcctaagcctaagcctaagcctaagcctaaaaaattgagataagaaaa cattttactttttcaaaattgttttcatgctaaattcaaaacgttttttt tttagtgaagcttctagatatttggcgggtacctctaattttgcctgcct gccaacctatatgctcctgtgtttaggcctaatactaagcctaagcctaa gcctaatactaagcctaagcctaagactaagcctaatactaagcctaagc ctaagactaagcctaagactaagcctaagactaagcctaatactaagcct aagcctaagactaagcctaagcctaatactaagcctaagcctaagactaa gcctaatactaagcctaagcctaagactaagcctaagactaagcctaaga ctaagcctaatactaagcctaagcctaagactaagcctaagcctaaaaga atatggtagctacagaaacggtagtacactcttctgaaaatacaaaaaat ttgcaatttttatagctagggcactttttgtctgcccaaatataggcaac caaaaataattgccaagtttttaatgatttgttgcatattgaaaaaaaca tttttcgggttttttgaaatgaatatcgtagctacagaaacggttgtgca ctcatctgaaagtttgtttttcttgttttcttgcactttgtgcagaattc ttgattcttgattcttgcagaaatttgcaagaaaattcgcaagaaatttg tattaaaaactgttcaaaatttttggaaattagtttaaaaatctcacatt ttttttagaaaaattatttttaagaatttttcattttaggaatattgtta tttcagaaaatagctaaatgtgatttctgtaattttgcctgccaaattcg tgaaatgcaataaaaatctaatatccctcatcagtgcgatttccgaatca gtatatttttacgtaatagcttctttgacatcaataagtatttgcctata tgactttagacttgaaattggctattaatgccaatttcatgatatctagc cactttagtataattgtttttagtttttggcaaaactattgtctaaacag atattcgtgttttcaagaaatttttcatggtttttcttggtcttttcttg gtatttttttgacaaaaatttttgtttcttgattcttgcaaaaatttttc cgtttgacggccttgatgtgcactaccttcgcttaaatactacattttct gaaaatgttataatagtgttcattgtttcatacaaatacttatttaatag tatttctggttatataatttgtataaaaagtggttgacataacaaggctg acgaaactttgtgatggctgaaaatattttcctagctttattgattttta tttatacgtgtttgaataacttggccaaatcgccgagaaggaatagaata ctggacgacattgtacatattttccaaaaaatcagaaagtagatgacggg accaattctttctgtcaggttttacaaccgcccagtgcgtctacgtcaca tgttgtataaatggttgtaaacaatatgcggaaacaatcaaatgcattcc cataaggcataatatagaggctacaggcaatgagtatcgctctttgcttt gtttaaagggggagtagagtttgtggggaaatatatgtttctgactctaa ttttgcccctgataccgaatatcgatgtgaaaaaatttaaaaaaatttcc ctgattttatattaatttttaaaatccgaaaatccattggatgcctatat gtgagtttttaaacgcaaaattttcccggcagagacgccccgcccacgaa accgtgccgcacgtgtgggtttacgagctgaatattttccttctattttt atttgattttataccgattttcgtcgatttttctcattttttctcttttt tttggtgttttttattgaaaattttgtgattttcgtaaatttattcctat ttattaataaaaacaaaaacaattccattaaatatcccattttcagcgca aaatcgactggagactaggaaaatcgtctggagatagaacggatcaacaa gattattattatatcattaataatatttatcaattttcttctgagagtct cattgagactcttatttacgccaagaaataaatttaacattaaaattgtt catttttgaaaaaaaaataattaaaaaaacacattttttggaaaaaaaaa taaataaaaaaaattgtcctcgaggatcctccggagcgcgtcgaatcaat gtttccggaactctgaaaattaaatgtttgtatgattgtagaaccctttc gctattgagatttgataacttttaagtaataaaattttcgcagtaagaca ttaaaacatttcacaattaagctggttctgaactgtgtgaagtatattga aaaaaactaactgatacaaaaatataattttatgatagttttctggatgt cccaatataaacgatgtcaattctgcgacatgctacagtcatccacgaaa gtaacccgaataccgacaaaagaagaggaacgccaactttggatagacgc tctaggggctgattttggtcggaaaatagtcgggaaaaaatagaggacat tacagatgaggatgaggatgaagatagaaatttgccgacaacttcgtcat gccgctgatttttttgatgttctacgcttaaattttcagcgaacgaacta ttttttatattttgattgtttttaaataatatttgccataagaaattctc acttttccaggaaacgtcgtttcgccgcgattttcctcgtctccagtcga ttttgcgctgaaaatgggatatttaatggaattgtttttgtttttattaa taaataggaataaatttacgaaaatcacaaaattttcaataaaaaacacc aaaaaaaaagagaaaaaatgagaaaaatcgacgaaaatcggtataaaatc aaataaaaatagaaggaaaatattcagctcgtaaacccgcaagtgcggca cggtttcgtgggcggggcgtctctggcgggaaaattttgcgtttgaaaac tcacatataggcatccaatggattttcggattttcaaaattaatataaaa tcagggaaatttttttaaattttgtcacatcgatattcggtatcaggggc aaaattagagtcagaaacatatatttccccacaaactctactcccccttt aacaaccacccgaggatatattcgacaaacgatctatctactaggaataa ctcgattattgacatattatagacttcttttagtatttgtaaaatagagg atcagacccaaaattcagcccgcgaaggcatgacgtcagcgcgaggcagt agtttccagaagaactctgtcgtctaccttaatgcctcaaatgcgaaccc gcttcggccatccttctcgctcagagaatggattagagttctcatcaact cctctgtctaattttcaactgcggcggttggcgaccggtattaccgcggc gaccgacacctcccgggttccgtcgatcgctgtctgttgtgtgcgccgcg actccgcccaccggtggtaactttttgtgggggaatctttgtttttggtc atttttcagcgcttttcagcgattattgaccaattttgaataaaattttc aacagaatatcatctaaaatattgcttaacatttatttaacagaaataac gtgagcacgcatgtaaaacatgaaattttcgggaaaattgcaattaaacg aataaaaatcgatatttaaatcaattattggtgaatccggtgtgttgagg cttcaatgcatacatttttactggataaatctcctttgggaatccggttt gcagtgctttcgagaccatgtccagttgagaatcggcgaacgctttaaga agctcgggctgaataatgaattgttttaaaaaatgtttagtaaaaaattg ttttcgtgcaaattgtcttcgatattatccaaacgtgacgttttgcgatt ttcgcgctaaaattacagtaagtggggtctcgacacgacaatttttgtga aatacaaacgggcgtgtgtctttaagaagtactgtagtttaaaaacttca tttctgtggaattttcatatatttttcatagtttttctctttaaataaat cacttttcaacaaaaaactatgagacaatagtttgaaattacagtattct ttaaaggtgcacgcctgctcgaatttcgcaaaaacgtgtcgtgtcgagac cccaattacagtatttttgacccgaatatcgcgaaatttcgagtctgggt gaaaacattgaaatttttggcaaaataaaagaaatatgtcctttttcaga atatattttctaaatttcgagacgaaacaacaattttaaatgaattttaa ttttaaatattaaatatttcggaatttggcgttttttatgcatgtcgatt cactaacgattttgtactacacgtgggcaagtttatacagtttttggcta aaatttgtgaatttgaaccgtttttcggcgaatatttgaaaaattggcaa aactggttcaaaaacaaaaattttttaaactgtacaaactgtccaaaaat tcgtcgtaaatcgacacacccttctcattttttcaaaattttaattgttt tcgaatgttttttttgcagaataatttgtaaaatgagccttttgtgaatt ttttttaatttcaaagtttttattattttttctcaaaccagcacctctgt tctcgtccaactatgatcatcatcgtcgaataaccgtttctcgtgatttg tcacattatccttgagcacaatacatccaccaggtttcagtcctttctga aaatgaaaattaattttaaaaaaattgaattattttaaatgaaacagttt tcagagatttctcaacttttgagtccaccaccaggcctgcacgtttttcg ggttttatcttttaaaaaactgaaaaatcgaaaaatttcaatttctgttt tgtggtcaaaattgtaattacaggtaagcaaatagtttaattttaaaatt gaaaattagggaaatgaccggacataagtttaaaaacccgattttttcaa taaaaaggaaaattgaaaatttaataaaacaggttgtaaatcaaggagat cgtattgattgaaaaaaaatccgaatgttccggatttttcagtggttttt tttgaaagaaaatcgaaaaagtaaatgtttttaatttttaaatttaaatt tttaatcggaaaaaatgtacgaaattgactttttaatgtgaaaaattgtt gttttaaaaaaaaattttaaccgatacagattttctagactcagtttttt cggttgaatattgttttttactattttttcattacagaaagaatccaatt ttatttcgcttaaaaaataaccggagcatcgaaaatatttttttttctgt tttactcaaagcatttcaattacctagaattttgtttaaaaactacatgc tttatttatgaacgtaataaataagaccccctcttatttataaactttca acatattttcagttttcagtgctatctagtgcttaccgcacatcttttaa agaaatcaaccaaatcctcatcaaccaaatgccctgaaacccattgaatc catatcaaatcataacgtcgttcgggcggtgcaaacgtctgcagtccttc gacgaatttatctccaattcgtggatgttttccaatatattgatcacttt tcgtgatcaactcctcgacgacgtcttccatatcaactttcgagaagaat ggcattaagagatgctttgtaacacgtccgatacccgctccgcagtccag tgcatagtcaaagtagccgaatagattctggaaaatatttataaaattca aagttggcccaggggtgaccggcaatttcaagcaaatcggcaaattgtca attttctgaatttgccgaaaatttgacaaaaacgacaatttgccggttcg ccgaatttaccttttttaaatttaattttcaattcaggcaaactgacgat tttccgtttgccggatatcaatttgcaggaatttctcaaaggaattttta ttaagacggaaacacagtgcttttttgaattttttttcccgttttcttca gatatttttatagaatttactgacttttcagaatagatgtaggacaattt tgttgttttaaaaattgaaattctgaaatttccaacaaaaaaacatgtgc aaacccacaagttggcaaaaatattttgcatttgccgtttttcccgtttg ccgaaaagtctaatttcggtaattgggccatttttcgaaattttgagcca cataaaaaactttgaaccatttttgagaagtattattacgacattcgttt atttgagcacaatttgggcctatactttcaaaatcggggtttgaaaaccc ctatatgttcgaccgaatgttaatctcataaaaatttgatgaaaataaaa ttttctacggctcataaacgtatagcccccgtcagtctcaaaatttatac gatagacactttttggcgtttatcgcctatattccgtcaaaaaccattat tcatcattctttcaatgttgttttttttaaggctaaaaaactttcatgca aatttgttagccgtgtcgtggtttatacgaaaatttcagaatttataaaa taaaggaaaacgaaaatgtttctatataccctatttatgttctctgattc cgaataccaatgtgaaaaattcaaaaaaaattccctgattttatataaat ttttgtaagcgacaaaaattgtcgtttgaatttcacacttggttacaaaa atttatgaaaatgaggaaaatttgttttaattttttcacattgatattcg gaatcaggaaaataaatagggtctatagaaaaattccgaaccttcactcc ttctctgagtataataaatttaaaataaatacagaaaatttcagttcaga cctcattaaatttgggtatatttctaggatccgagtttttacaccagatt tacaaacttttagcctttcaccgcctttttatgcgcatttcccatcagtc aactccaaaaaaatcgcaacttttgcctcatatttcaagaatattcccct ttctctccccattgaaagtcattttcgaaacaagcggaagattcgtcata tgtggtaatgtgtggcgtgcgttggcaaacaacaagaaagaatcattctc tgaaaacaaaaagcgttttgggtgccaaagtaatattgaaaatctgccgt gttttctcattttccatcaaaagaaaatgagaaaaaagtttcggcgtttt atttgatttccgggaaagaagactcggaaaaagatttaattgaatttttc atagcaaacctatattgcaacaactttctaaaaggtcagaaattgccgcg tagcctagaaaattggaaaactcttccagctggtattatttcagacatgg tgcatcgaaaattcgaaaattacagaaattaacattttggagcatctgcc agaaaattgagattacagtaccccacttctgccgagaaattcgaggtgga agaggtcttacaaaattttcggtcacgtgaaaatgggaagcgttcaggct ccacacgacggaattcacctagttttcaggtgagaagatatcgtacgagg agaattgacctccaaatcctgatcgtgactacaggtcgtcgttcggagct gtggaagagttttgaaaatcttcgaccatgagagaaatagacaggacgac caaacattttcagtggaagagcttttcctaggccatcaggatgctatttc gacagagctgagttatcctcaagtcgttacgaggtgtggaaaagttttcc aaaatccccgaccaggtagaaaatgagcacaccgattaagtttctccagt ggtagagttttttctaggccatcattatgctatctagaaaaaagcttcgg ccatggggtttttaggccgtctatttatttctcataactttctcagaaat tcgtctatttctcagaaccccccaatgataagttttttgcaaaaaaagtt ctgctttgctcatcagccgtaatcaggtgacctcattaggcctacccaaa cacagatttgtcattatttttcagacaaaaaacacgaaaaaaatcttcac gcatggggtgataacctgattttaaatcttactgtgccggctggcgcggc gagcttcgatcactgagccgaaagattatgaaaactatgggaatgacggc gtagcctagaaatcgtccaggcagagattctgtctaattttcgagcatat atctcccagttttgttattaatttaagtaaactcaaacctagaaacaagt aaaacgggagggggggggggaatatcagaaaattaaatcttgcgacactt ttccattgatactttcaaggtaatgcccagaggtgtgcggcaaattttga aacttgcgcatgccgcctttttttttttctagaaaacagtcagaattttt tgtcgaatttgttgaaaattcgctaatatactgtgagtttagaaaaaata acgaaaaaactcggaaaaggaggaagagatctgaaatatgtagatttttt tagaaaagaccagaaattactgaaaaattggcatttttcgtcgaaacccc aatatactaaattattcggatttttagaaaattttcaaattcaccataca gtgcattttttcctacttctacgactttaaaggggggagcatttatgcgg aagggtcttgccgcgcatttagtcatcatttttagcagtttctgtgtaaa attcgcgtagatcacatgaagatcacaaaatatttatcccatatttcgta tttctgttgctttttcacaaattaattgtgatctacgcgtgatctccgcg aattttgagcagactttgttaaaaatgatgactatgtgcacggcaagacc cttccgcataaatgcgcccccctttaaagtcgtagaagtggaaaaaaaat gcactgtagcaaaaaatcgaacatttctgttcgatttttgaatttctcga aattttttaaaataatttttaaaataacatttttattttatttcgaaaac taccgattttagaaaaattctaaaatttcgattttttttgttgatttttc gatttttaaaataaaatttcataattttttaaaccgatctttcttgcttt tcctgaaaaatcgatgatttctatacctttttcttcagtccttcaataaa tcgtttcgacgccgatatgtcgggcgcgtgaagcgcttcgaatccgccga gcattccgttgacgtcctggctcgcgcggctccagtattcctccgcctga aagagaatagttgaaaacattgttttgagacttaaaaattttttttttag tttttttcaaaaattcttacatgttatagagtttttttttcaaattttca gcttttttcagaaaaacttagtattttcgataattttaaataaaaaagtt ttttttcaaaaaatgtttcggttttttttttaatttttggtctaaaattc tccgcaaaagatttgcgtgctggccgaactttttgattttgtaccttttc ataaacatcttcaccattgtgaattctagaagatgatgaagagctcattt ttgatgttgtgacagctgctccgagcaatctggagacttttgtgacgaaa agacgagaggtcacggatatgatgatgatactggaaatgagatatttata tttactagttcatcgggaaaattattacgagaaagataaacagacatgtg cgtttttttaatggaagagaaacacaagaaaaatctggaaaactaggcca cggctatcagtgtcgatttacggcatacggtctcgacacgactatttttg ttaaatgtgaaggtatgcacctttaaagagtactgtagtttgtaactctc attgctgcaacatatttgacgctcagcgaaaactacagcaattcttcaaa agactactgtagcctttgtgttgacttacgggctcgattctcgaaacgaa tttctgctcgaattgtgacagccatattcaatttggtatagtcttttcgt attttttgccatttttctgttttcttctaatatttaatctattattaaat tatgtccgtaactccctccaaaattagaactgcgaccgaacagagattcg ttccgccccatattccggccaatcagatcgagtaggcggagttcgaagtc gctgattggtttgaaaagtcgcggaaatttgcaagttttaaggtagcgaa aactgatgactattgtagcgcgcttgtgtcgatttacggaatctcgattt tcaggaatgaatttttaattacattttttcgctcaattaatattctaaat aaataaataaatgatttgaattaatttaatttcattcgagcccgtagatc gacacatgtgctacagtaatcattagttttcgctacgagatattttgcgc gtaaaatattttcccgtaataactctactccgacaaacattacgacctcc atggaggcctccaggtataggtgagactcttgtatttccaattcagagac aatgcgtcactggaagagaaaacgaagcggaaaaaaaaacacggaaaccc aaaaatagtgtttgccccgctctattcttctccaataatttctgtgtcta attttgaaagactccacctgtgtatgccttctcgacataaaccccccccc ccccccctatcttacatggtactgataacactttcagtctttcacacttt tggcgcgcaacgccgctcttttttcgcggcgagctgatgacgtcatcaat ttttcatcgcttttgattatcttcaatgttctagaagggcacataggtca tccttattttttccttctctttctcgtgacggcccttgttgcgcatgccc gccccctagagcagggcgtggcctgaacggcggctccgagagctactcat tcttgccgcgtcaccctccagcgccacccaaacttcttcggttctagaga tcgagaagaacgtatgattttttaaaattataattgtttctttcgaaaaa aaaaatttcatttacagtaagccaaacatacacaatcaacatgaaactcg taattctgctatcttttgttgcgacagttgcggtttttggtgagtttatg ctttagataatacttttccgccaaaaatacagttgccggtctcggtatgg caatatttttgttaaattcgaaaagcagtgagtaatgtagtttcgaattt tcgtttctgcttaattttcatcaattcatcgtttttctcacgacttcttc tttatgaaaaatcaatgaaaattctgactaggtcagcttaggggtgaggt acctagagacgccacatatgccaaacggaagctgagatcattggctacaa gaatatgctttcaaattctgcaacggacctctgggagtctggaaattctt gtctgaaattatgcttttgaatgctcgaaagtggtaagaatttagaattt attacagaaaaacgtttaattaataaaattagttttatacttgaaacaag tactgtatgcactgtatcaaaacacattttcatcttttctaggtattcaa cttcacgtttttctgtaataaattctaaattcttaccactttcgagcatt caaaagcataatttcagacaagaatttccagactcccagaggtccgttgc agaatttgaaagcatattcttgtagccaatgatctcagcttccgtttggc atatgtggcgtctctaggtacctcacccctaagctgaccattccctagtg agcaaacaaaattttgaaattacagtactatttaaaggcacattgatttt ttgggtcaagcaaaaatttgtcgtgtcgagaccggctacggtattttcgc gaaaaatcgcaaaatcttgcggctgggatatacttgtgcgaaatactttt tgcattaattttgagcaaaattattttttttagactttttgaaatccaaa ttttttggattgcgaaaaaaacctgtgtccggttgtttcattaggccaac aaagttcctggaacactgatgaaaaccatgatagaggcggagcataatat cgatttttcgtactttcctgtatttcttcttctatatggccgagtagaac aggattaggggtaaagtcaaaatttttctcatatggatatcatatggata tcaaaatttttctcatatggatatggagaaaatttttctcatatggactt tgaaagttgaatcacttgacatctgggaaattagtattccaggcgtaagt cggatctgttagaaacggaatacttataggcttcgtgaattaggtagact ttcaattaatctgatccatgggagtcagacgcggtttccaggcctgacgc ctgcctccaacttgcccgcctcacgccggtctctcgcctcatttctgcac tgtgacgagacagacgaaggtcgccttctggcgcccgcatggaaatccta cgaatatgtcagcttctgatgggactccgtaaatcgacacacaggggtac ctcagacatttccctcccccttacaaattgttaggacaaggagggggaat tcatctccactcgagacacacatatgttgtcgtcagtgaagtgtaaagat ctaaacgattgcgtgtatgaaaaagcactctatgatcacctttttcatct tcctacaccctttttaggtgtggtgcccatcgagcactcacgccaggcag ggagagcaccggtccctgactaatgggattcgaatgttttagaccggaaa taggagcgatgaaagagcatagaaatgatcatttggaaatcacgtttaat taggttacggcgaaaatttgcaaaaaagagcaggaaacttggctcaaatc cttcgaaatataacaactaggacttccatgtaggcgttaaagcgccctgt ctctcaccccaatccgtaccttaagctgaaacaaacgtgaacttttttca tttcttaaaggagtatcgtcaatgggaaaattgttttaaaatgtagtatt tgtacttcaacttccaattattgcaaaagaaaaacggaaaaaatccgtta acattcagcattttaagtcgaagaaatctttaaaatttaactagagaaat cctaggccacgacgctcattcgaattttaatttgttttgatattgtattt tgaaaaaaaaacttaatacaattccttcttcccagttttctataactttt tgagaaaaaaacgaattaaattccgaaaaaactacatttaaatcaatatt ttgtttacgaatatggcctagaaatcgcgtggtggcctaggattcatttg cgcgcgaaattcaaattccgtcactttcgtcgatttcaacggctaaatgc tgaatgtcaacggatttttcccgtttttcttttgcaataattagaagttt gagtacaaatactacattttaaaacaattttatttttggtattttgacga aaaattgatttattggtttttttggttgtttgggaccaaaaaatccaaaa aaaatgtttggcgtgtctagtttcgactcgagactattctgtattaaaaa tacattaaaacatgtattttaacacagttgtgacgtcataaatgtatttt gatacattttgcaacattacttaaataaccccattaaaaattaacctaag catcaaaaattttttggtttttttggtttttcgaaaatttcaattttttt tgttttttggttttttttggtttttcaaaaacttcaattttttgtttttt ggtccaacatttttttttggtctcagctctgctgcctaccctagaagaac taatagcgcttcaaaaactgatgaaaacgttcaaatttgtcgaaatatta cgaaaatttgaaaagttggctcaaatctagattgaatcggccgattttcc acaagtttccaagtttccacaagtcgccacatatcccgagaaaaatcgat tcaaattgtttgaaaattggaatactgcgaattttgaaccaaatttccct ggcttctctgttgaaatacttgaaaataccgcgaagcaaacaaaaaatct aattattacgtgaacacaaaattctgaaaatgcgtatatattgcgcaaca tatttgacgcgcaaaatatctcgtagcgaaaactacattaattctttaaa tgacacgctgtatgtggtgatttacgggctcaaaaaattattttcgaaaa tcaagcccgtaaatccacacgtagtaattatataaagaattactgtagtt ttcgctacgagatattttgcgcgccaagtatgttgcgcaatacgcaaccc catatgttgatatatactgatgtgaggataaaaaacaacacaactttcag cggctccatcggctccggcaggtctcgaggagaagctgcgtgctcttcag gagcaactgtacagtctggagaaagagaacggagttgatgtgaagcaaaa ggagcaaccagcagcagccgacacattccttggatttgttccacagaaga gaatggtcgcgtggcagccgatgaagcggtcgatgatcaatgaggattct agagctccatgtaagttagtggtggtggccggaaaagagaaaactcggcc aagctgctcggagtttttgaatttttgataatccgaaataaaaattgatt gctcgaaaaggaacaatcttttggaaaaaaacgaattttgtcattttttt cagcaaaaattgattttcgaatttttccaataaaaaatcgataatttctc cccgtgcagtggaaaacaaacaatatttttttgttgatcgttctcttcca aacccggaataggtacacacattcctgcgtcatcccattctcttatcaca cttttttttcgaaaataaaagtgtagagacggaaaagtgagaaaggagtc aattttatgcgaaattttgcatgataatacactcaaattaaaaaaactgc gtggcgtgcactgcagaaaacctcatatttaggccccgcctttttctcgt ccactcacggagaaaaggcaaaaatttggggaccaaccaatatcaggccg ccgacatcctacgggttccgcgcgccgctatgtttaactcgctgtgggtg tggcgagctgtctccgcccgctgcgagttaaacatagcggcgcgcggaac ccgtaggaagtcggcggcctgatattgttggtccccaaattttttccttt tctccgtgagtggacgagaaaaaggcggggcctaattatgaggttttctg cagtacacgccacgcagtttttttattttgagtgtataggtctcgattct cgaaagtatgacagttatttaaatgatgaactcgtgatgactgttaaatt tttggaaatttcgggggaattatatcgatttttcgataaatttacaggaa aaaagtccaaaatctaggtattccatggtaggcaggcgcgatttcttgac gcctgcctggaatctgtccgcctcacaccaaaaaatgtcaatcattttgc tgaaaaccaaattaagaaatgaaaaagtgcacttagagatgatgacggag gtcgccttaaggtcagacaggttaaaaaaccgattttagttgagttttcc cgaaattttctgaacaaccgaattagaaatatgctgcttgtcatttttga gtaaaaattaacgaaaacttcgaccaaaaccacgaaaaaaatgaagaaaa taaagatttttcgagaaaataacaacaaaatccagcaaatagtgaaaaat agttttatccgagaaaaagtagtttagacgctatgaactctcgaaaatca gattttttcaatctaaaagccataaaattatcgattttttaaaaattctc actgaaaaccggcgaatttcagtgctccacgcaatcgaagcccgcttggc cgaagtgttgagagccggagaacgcctcggagtcaacccggaggaagttt tggcggatcttcgtgctcgtaatcaattccaataaatattctttgcccta aatactttaaattatccatctgacaactaaaatttcggttcttcttggct tcttctatttgtgaaatggtttattttcccccgaactctcaaaaggttta aatattgttcgattacccctttttatcaattattttcttcaatttcttat ttatcattatttttctaaacgaagacggatgtgattttaaattatgttaa tggactattttacaaactgaataaattcagcatgttggcaggttttttca gtagtttttgagtgaaaatagaggtaaaaagacagaaaatcaataaaaaa tgaaaacaaaactatgaaaaatggttgaaaatcgagcaaaaatcgttcaa aaaaaaataaattcaaaaaataattgcgtcgagaaacgcgtcagtagccg ctctctgcgtctctcacccttcagcacgcggagagagccacgagaaatgc gcaaaggctaaattcggcgcggaaaatcatttttcaaaataaattcgacg agaaaatcaatacttaagtaattatcgattttcagctcgttcaaaaaatt ttcagaaacgttttagtcgtttaaaggtttttttaaaattaaaatcgtcg gaagtaaaaaaatagcgcggatggaaatctacggagtgcggagcgaacaa acgcgcggtaattcaaatgggtagaatagtcaaaattgaaaattagccag catcgaccgatttttttaaaacttaatggattttttcgtttttcttttgt ggtatttcggcatttaggattagatagcacattttaaagtaaaattccca tccaagctactccaccttctccagactgtacagttaaaccaatttgaaaa gtgtattgtatcccgtttttttttctgaacaattttgaaaatttttcgtt tatccaggatacgataatcatgattcaaattcgttaacaaaaaatgaata tatgagagcgattaaagcatttgtgtcggaaaatatgggttaaatgggga gaagggggcggacatttggatggggtacaaaaaaatatgcaaaaaatggg ctaaaaacaatattttcaaattatgcccgacaaaggttcaaaagtcaata tatagaaatgagaacatgagtattatgccacgtggcgggaaaaatatgtg gaatgtaatacgatgagatccttgtgaatacaaagcttgtgacgacgtgg ccgagaagaactttttaagccaacgagaaaaaaggggttcaaggccgaaa ttttttttgggccacctattaagttaaattgaaaatttaaaaaaaacaca gcggatccaattatttgccgagttttgacttgagctcggcgcgatacgtg tcgattgactgaaaatattgtttttttttatttccgaataaaaaatggtg agtacctccaaaattagcttttcattgtccatatagaactttttgatttg ttccacagtttttgtggccatcaactcggcgatcaactcgaaattgtcct tgtaccagtggaaacctgaaggaatttcggatgtttttgcttaatcataa tcataataatcttaatcataagacttggaaaatgcgaaatttttcgagaa tattcaatttatcttcagattttattgcaacaaatcgattttcaacataa aattaatttttccaactttttttcccaatttatgagagtttaaagattgt tttaaagcaaaccgccaactttacataaaaaattaaaatattgtgaaaaa aatgatgaaatttagcagattttctgataaaaaattgaatttttttggat tcgcgcttcaatttcacattgttcttttagaaaagtcgaaattttatatt tccaattttcagatttaaaaaaatttaaaaaggaatgaacttttccaaag aaaaactgaatataaccagaaattgtgatttttcagcatttttttttagg tttgaatttttttttcatgattaatcacgtgaaaagtcaattttaccgca aaacatttaaaaaatcaagatttttcaattttctctgaattcctgcagat ttttcgatgaaaaattgaattttccttggaatttatatttttcgggtatt taaagtttcggatattaaaaaaaattttcaattttctctgaagttatcga taaaaattattttctgcaaaaaatctactttttttcgttgaatattccgg aaaaaaaatcagaatttcaaggcacatttccttttctaatctaattcgaa taattcaatattcttttaaaaattcggggtagaaaaggaattgtaccaat ttttatttttaaaagttaatttttctaattttcaaaattttcttgaattt tcgaattacagattttcaaaaaaattttttttgtttttttttctcgaaaa tttgaaatccatacatctaatagcattcttcttttcctcaggactccaac cataatttatcctgacttttccagatcgattgccatttgttgcagtagta tctagttcaggagtaaatctctcgaatcttcccttcaacgccatcatatc tttcttccaatttgcaatttctccttttggtacacggctgtatgtcattg ttgcacggaacatttgttgacgggcttcttcattcagaattctggaaaaa ttgatgttgtgcgattttttttggttaaaaaaaacaattttcgtaagttt aattaactaatattttaaaaaatctctcattttctgaggcaccacggatt caagatctggtgggattccggatctggcaccgtgccaacgcattaaatgc aatttttctgaaaaaagggcaacgaagatccgatttaaaaaaatttttca attatttttcaaaattttcactaactataagaaattagagatttttcaca aaaattccagttttctgttagaatttgaaaaaaaaattgaatttttccta aaaaatttgtaattttccgatatttcaagctgtcaaaacctaaaatctga aaactgaatttttaaaggaaaaattttgagcattcttatcaaaaaattgt ttcaactttttctcaaaatgtttcaacctttttctttctaaattctgaaa agcatatctcagcttttgctaaactatttttttcctcaatttttgagaaa attaaaatataatatataatatagtaaatattgcttattttctaataatt tttggtatttctattctttcgttttttttttcaaaaattccaaatagttt taaatgttcatattattttttttgacgaaaataaattttaattttaaacc ggaaaattgtttcgtaactttttttttcaaaaaatttgaattttcgacat gaaagatgtaaagtgtaatttaaaaataatagtgcaggtattttcagttt acagcaaaagtcagtttaaaaaatttcgactggttttcaaaatgagtttc cttattttttacacgtagaactttttttattttccgattttttttgttgc gcagaaattttttttccgcaaaatcaggaaaaattcagaaaaagacagtc aaaaaattgtagatacaattttttgactgtctttttctgaatttttcctg attttgcggaaaaaaaaatttattttttcatgaataaaaatcgaataccc atccaattccacaaacttactcgttctcctccatacatttcgtttgttta actctccaaacaagtggaacacacatatgatgttttctcttgatattatc aattaatgccagtgcagccggtgtatcgaagcaccgtgtcattctgcacg tattctcatcgattggatcagcttcaatcgattgctccacaatgtagggg cctgatggtttacggagaaggcagtcgtctggagaaaaatagaatagaat aatgatttttaggttattttacgtttaaaaatctaatttttaagacgcgt aaacgttgagctcatttataaaaattcggcaaaccggcaatttgccgaaa aatttcggaaaattgtcggtttgcacattttttcttgaaatttcagaact tcgatttcaaacggcaaaattgtatacatcctatcaaaacatcaatcttg aaaagccagtaaactctatgaaaatgtctaaagaaaagaaaacggtaaaa aaatacagttttaaatgtttccgtcttattaataacaaaattcgacaatt tgccggaattgaaatttttttttctccaatttccgaaaaaaacccaccga ccaccataatatcatcgtcttcttctttttcttttccaattccaagccgt ttgatcgcttttccgttggctggctccatgagctcaagatatccgtatac ataaattttcatgtctgaaagaaaattcaaatttcttctggaatcagtta ttcgaaactaacattctggacataaaactcgttgccgtcgttttgtcagt gcacggaggcttgccggacgtggaacacgcatcaaacggaaataaaggat acacggtttacattcgtgacgcgacattacacgatttagcttaaaattgt gaaattaattttttttaatagctctttatttttttgaaaatttctcccat gctttttccattttttcaacgagtttccttattttttgtccatttactgt aagttttttttgagaatttttttttgttaatttaacattttattagctca aaacatttattagcaaaaattttattagcaaaaaaattttttaatttttt taaattagctcaaaattctcgaaattttaaatttttagggtaaacaatat aaaacttagggagttttgagctataaaatgataaattgattttaaaaagg atgaaaaacttattttaaaaaaccgacaaaaatcgacaaaaatgaaggga acaggcagcagcttagccccatgcttagccagcagccccgtagcaaccca gtatcaataatatcccgtgccaattttcataaaactgaatataaattggg ttgatgttgctaaagggctgcgaaaaactgacctgggatgaagctgggct gcaaggggctgcgaagtgctgcgagggcaaagcgctacagtgctaaaagg gggctgagcccagaccctcaggaaaaaactcatactcgcagcccttcgca gcccacatttgcgctctgatcgcgtgctatccgcgcgcacagaatttcga aagtattttccaaattcggaatgcgcgcggagcagacgcaattagagcgc ggatctggcacgtaaggaagaagtgtgactggagcacgaaccagtaatct agtcgcgccccgtccgcgctccaggaggagcgatttgccgagcagttcag cccttcgcagccctttagcaacaaccaaatttatacagttttatgaaaat tggaacgggatattattgatacgcctaagcagccctattaaatagtgatg agggcgtaaatgaaattcgccatttccagctaaaatataaattttttgaa ttttttaacattgatattcggaatggattcagcagaaaatttgaagtcat ttgaaaatattttccagatttcggtactccacttttaaaattgaataaaa ctgtagtctttattcaatgtttcttcaaaatttaaaaagtagaatataac tgtgagaaaatttccaaaattgtcaaaatttcaaatagctgaaatatttc acggcccggcggggggtacatggatgagaattctctaccgtattccaatt tggctgactgcgtgctcaacgttgaatactcagtgtaaactttcgtacac cgttgcgtactgcacagcgcgcattttaattgacgacatttagcaaaaat tgaacataagatttttcggaattatgaagctcaattttcacaaaaataat gagttttttgtagaatttatgaaaaaacgtgaatatatagattttttgtt catgatattcaagaaaaagcgatttttagttcttcacagaggaatcctct cgcatttcacttgctcatgatgttttttgctccactttaggacgataaaa atgcgaattgttgataaaatgaatgaataatataaaaagtgcaaatatga cttcagcaagtgttaaatcccaaatttttcctgcgattttctgctagatt cctggttttgagtaaacagtctgatatattcatgattataatgataacaa taacgaacataataataaaaatggagagcacagagaaacaacaaattgca aaaacagcaactgatatcagaattaacgacgaccacggaaaccgcctcgg tctccacctcgcccaccacggaagccaccacctctgtcgcgtcctctgaa tcctcctcgatctccaccgaatccacctctaaatcctccatcgcggtctt ctgatctaccacggaagcctccacctccaccaggatctgttgaaagtcct ctgaagcctcctcgatcgccacctccacggaagccaccacgatccgcgga ttttcctctatagccttcgaggcttcagttgtaccccattcttcgttggc acgcttcagatctctacaaaaaaaacaaattagaagcattcaattatcga aatgtgtacctatcccgatttatcgcaatctgtctattcttctccttctg attctcaacttctttaacttgtccagtagcggcagcttgcttacgagcag cattttcccgaatcgccttcacctctgcctcctcagcatcctgttgctcc ttgacaatcgtaagtcttcgaatgacacgttgctcactctcctgctcacg acgctttttcatctgcttcttcttgtttatagtcaccgcattatgcttgt gatagagaacctctccctcatcgatttcttcttcaattttgacgagttcc agggtcagtcgggtccgatctcacgaagacggacgttgctattctggcca attccgcagtcacgtccttcataaatgtcttgtggaagttcttcttgctg agggggctgctgaaaccaatgtcggcatgatgagagttccggtcttctga atccatttcctgcgtgggctgtggcgacgagctgcacgtctgaaaatcaa gtttttgtaatttttgggcgcatgatatggagctgaatcattcgatttta gaatcagcatgcttttattcatattttaggatctttttaaaaaatctgga ccaacagttttcgaaaaaatttaatttttgttcagaaatgtgaatattca ctaaatcgaaaaaaataattgcaaaatccgtcagctgaacattcaaaact tatcaatttgaaatcagcatatttcagtgtataattaaaaaagtttcaaa aattctgagaccaatttttattgagaaaaataatttttcgctcgaattat tgaattttcactaaatgcaaaaaacagtaaacttgggcccatgctacaag cctgaatctttcaaattaagaaccagcatgattttttcaatattctagga cgtttaaaaaaaatctggaccaacagtttttgaggaacgtaattttttat acaaaaatgttctgatttttcactaaactcaaaaaaatagtcaagttggg cccatgctgtacacctaaatcattaaaattcagaaccgccatgtattttt tcttaccaaaggctctttaaaaaaaatctggaccaacagtttttgagata tttagaaaaacaactcacttttcgacgtttttcgccttttcgtggctcac ccggttgatttttgcggcgatttgtggtctttcgctgaaaatattatttt tatttcaattattaacgaagaaaacaagaaaaaacgacgagaaaacatca aaaaaacgcgaaaaaacatcgaaaaaccaccgcaacctcatgaacaaaaa aaaagcattgcagccgcgggactagttttcgcaactttctaggccatgtc ccgttcgccgtgccgtgtatttgtttaattccctttttggaaaaagtcaa catatttttctaacaaatcgtttttctattaatttttttctaaaactcac aatcaacagatcactttttgcattgcaattctcacaatatcccgacggaa ccctctccaaatgattgacctctttgaatagttcatcataagtgtcggtt tcattcaaatgcacattaatcattgttttatagttttgcacttttttcgt gttgtaatagtattggataatggaagaaagcgagcgttggggcatctgca aaaaataatgaaatttattttctttttatgattaaattaaattttcaaaa attccctttttttgacatatgcacttacagccgcatgaatcttcttgaac cgttttccgaaatgaaagaagcaagtggagaaaagactaatttcttctgc cgtccaatcatcatgaatttcttttcttctcatcgcttgaaccatcgcag cgtcgaaatcatttgactgtttgttcagaatgaacagagcctgtaaaagc agttagtttttttttcaaattcaaagtacatttccgaaaaataaaaaaaa ggcttgattttttaaaatctcgaatttttattatggtcaattgttatttt ttccagagaaaaactcattttctcccaattttcagacgtttctctctaaa tttggtgtttttccaatcgtaccctatctataggtaattgatatcgtcca gtagcttctgaaatgtattctgtaagccgattctcgttcatttcgtctgg aaacgcccaaatttgttgatctctgcacggttctttttccaattgctctg cagttggctgtataatcgcctgatattcggttcccacgtggattagattg tcgacgttggaaagtggatttgctggaagaaattgggaatttttcaaggt tttaagtggattttcaagctatttataaaagcatgaaaaagctcagaaat gactataaaacctttttttacgtcgtatttttttcaatgaaattacctac ttttaattaattgttcggcttaaaaccagaaaattgtttcatatcgattt tcccggtgaaaatcgaaggaatcgtcgcattctcaaagttttttcaccga tttgtttcaattttagcacaactaaatggaaaaatcacaaaaattccatt acagccgattttcgtgaattttcctacatttcgaactaaaaattgtcctt tcttctgtttaaaccggaaattctcttttgaaaaaccaatgaaaatttga attttctgggcttttcttcggaaaattattctcgaaatttatcaatcgat ccttgggctttttttgttccgcagaggctggcggagtttacaagcgtacg aagtggttcaacttttatataaagctttataaatgggacatagatgaata tttcgaatgctaaatgcaaaaagaatcagtaaaaaagcgcgcagccccgt ccttctctgacgaaaaacgccgtttaaggatcgattgctaaattttggca gtagttagaagtgtcaaaatttctgccggagagtcgtcaaatttcactga aacgtaacccggtaatttccacaattaatggtcgatttttcgcaaaaagt ggtatgtttgtcaggatttattagaaattgtggctgtccagattttaaag agtatttttgggcaaaaatgtcgaattttctctgaaaaagttcgattttt atcgaaaattcagattttttagatagttttcatcgattttcccagttttc agcctgagaactttactaacagaaagatgtgtcatgagcaccactttcat gatgctcacgagcttcagcttcttcatcttcgtcctcttcatcctccaaa tcttcatcctcatcgcccattgattccccagacgttgtttcgcgttttct catggatcttataggacgagccatctgaagtttcaattttagcttttaaa ttcaattttaccgcttaaaaatcgataattctcccgtactctgctggttt cttcttcttgttccgcctgctcctctggatcatcttcctccattggctcc ggcgatgcattcaacatattcaagccttcgtctgaaatatctggccaatt tatagaaaaaccgacaaaataataagcctcactttcttttcgagaggcgt cttcgtcagatgacgtgtacgaatccattttctggaatttgaggattttt gaatgtttttaaacaaactttatagagaaaacattcgaaacactagaagt tatgttgaaacacgagaaaattttttaaaaatccatgagaaaaacagttt tgaaaaatctgtttttggaggctctccggattttgaggaatcgtcacccc ggagacgcagattctccggtaatttttcattcatatttgagtttaagaac aaaacagtttaaaaaaatgtttttagtatttgaatgaaacttataatgta ttttttcttccattaaaacttaaaaaaaactacaaaattattatgaatca aatttgaaaccgtgaatcaatctccgcggaagggcgagtctatactgctg caagcgcactctatcgcaaatgtacaattggcggtttttcaaacaggaat taatcggattctcgtagtttattttggatttcttttttcgggaacatatt ggtgtttttgcgttcaatattcaaatttagaggaaaactgcttcaaatat ttaggtaaactcttgaaaccgctgaaaataggcaaaaataattatttttg tattttttaggctactttctatacttttgcgtaaatactatagtttttct ataaaacacccattaaaattatttttataaaatgatttttccaataaaaa taaaatgcgcaaaatgattcttttccagaatcctatatgcgcctttaaaa tctctcggattactgtagtttcaaagaaattatcctttatatttttaatt ttaaattttttcctgaatgtcaaatattaggggaaaaattataataatat gtgctttattcatatgagtgtagaattagtgaaaaagaaaaaaaacatgt atggactgtaaaattggaattttagcgagaaaataaaaataatatgcaga aaaaattaaaattttcaggaaaaaagtcagtaaagccatcaaaaactact cgattttgaaggaaatcagcaagaaaaattagaaaaaagtatttttaagt tggaaaacccctgcttgaatttgtacactaaattgggcataaaagcgtac aaattcgcaaaaaccggtaaaaatctggggatcgtgatggatggagtgtt ttgtgaaaaaatgcagcgaaaaattgagtagacaatttcaaaaatgtcga tttttgaaatttgtgacgaaaaaattgaacaaaaactgtttttttttgga attttcaacaagaagttttataaatttttttgtttaaaattttgaatatt atatgagtttggtttcacttaacagaacaattcgaacaaaagtattctag aaaggaaatgtgcgctccagcacactatttgcccgtggagcgcacttgtg tgcacgaacgctagcgagaatgtgtggtagaaagggagggaataggaaat attaacaaaattgggcaaaatatgtaagattcggagaaagaattggagaa aaatatgtatttcgagctccgcgagctgatcaatccaaaggctttctcca tccttttttcgagaggcacattgcattatagttacacacagcacgtgtat aatggaacattgaagcctggaaacgagccatcgctaccatcattaccacg tggatctgaaaaaattaaagtttgatgattcgaaaattttctggaaaagt tatgattgtgagataaattgaattctttgaaaaatcaaaattcaaaagct tgtagaaaattttatatatttttttaagcgtattttttccgtatacattt ccaaatttttttgttacccaattttaaagattttcttgaattttaaaatt tctttcagtaaaaactttttttcaactttttgattttttttccgcatttt ttaaaattttattcagaattattagattcttttgaatttaacgaattttt ttcgctaaaaaattgttcgatttttcccgaattaagaaaaatattatttg gtttttgaattattttcctgatttttttcgattaataaatttgtaaaaac aattttttttctaatttttggttttgatgattgtgttttttttctgaact ttacagttttcaaagtttacaccgaacttccacattaaaaaattctgata caaaaaagtattcacatgatttttaaaatttaaatatttttcaaaaaaaa taatatttaaactgtgtttttttcggaattttttttcgattttttccgag ttttttttggaattttttcctttctgctccaaaaatattcaaattcaatg ttgtgtagaaattttattcaaaaaaagtgttcaacttctgagtctaaacc ttttccgaatccttaaatcctggcagagctctcgtgaattcagttgtcaa tttatgtggatagcaagctgccagtttaatgaaagttttagttcctttgt caagtactcgattaattttcgaataatcataatcatcgactcgaacacca tataatccttgagtatagttccaaattgcttcacggaatgcagcagtgtc aatttcattctgattcacggcggctggtggttctccgtctccagatgcat gggatgagccggatggcctgaaaaattaattttttggaattattatattt ttctgtttttgaaatttcatgcatctcgaatattttaacaaaattaccaa attcaactagatttcttacaactttcactgtgtcgatttacgggttcgtt atacgaattgaatttgtttatcgatagaatattaaaatttagctaaaatt gagaagaatataagaagaaattaatttttttaatttcaaaaatcgagcca gtaaatcgacacgagcgatcgacacagtagtcatttaaagaccagtttcc gccacgaaatatttcgcgcttcaaacatgttgcgtagtacgtattctcaa aattgtgcgttcacgtataatatttatgcgaatttttggtctactttgtt agagaaatcatcactaacatattgccagtaagagtccgaatatgatcgaa cattcgatcaagccgtgacgtcagtgtatccgtatactcattcatcgtat tataaacatgatcccatccaaattcttcaactcggaatggcggaatatcc ttttcaggtcgctttctaaaatcaatatatccaaatgttcgatgatgcga gtaaattggataattacacggcggctctttttccataatatcttctccat tttcatcgatatttgcaagaagaagtacaggcgagtagtttttccgattg gaactatatgttgctgcaggagcactaattaatgattcaatagtttcagt agtcattgcacacatcttcgctggtggcctagtttgtcctttttccgtct ttttcagctcactgatcaaatattcgacttcagttggccgacgatctggg acttttcggaaataggccgacattctcgcctcccaatagtcgagatcatc gatattaaggaaatcaatctcatcttgtgtcaaatcaacacgacgttcca atccaatacagcatataactgtgcacattgcgtgagtcattgacattatt ccgacggcgtggtggagagagcaaaccgagaaaaacgcaggaccaccgtc tggcgtgcggcgagcgaagagcacctggaaattttcaaattcttgagaaa aacctaacatcgttgttatacgttcgttctcttggcattggagttggcag aatttgttttgaaaaaacgttgttttttttttgaaagaacatttttttat tacgggaccatgagatcatgagaattcctatttactggcgcgaaaatatt ggcaggccacggcaacgagagagcatatggcaaagagagacgcatcttat tttgtcttgtaatttttttttaaaataatttacaatcccttttcaactat cgtgattgtaaaatattacaaatttcagaatttcgctaccaaattattac tggaaaactaaactctgagaatgcgcattgagcaacatatttgacgcgca aagcatctcgtagcgaaaactacagttattctttaaatgactactgtagc gcttgtgtcgatttacgggttcggtttttgaaataattttcttttcgaga agtgacagtgatattccattttccttcttttcttcctattattttatcat tatttgcttaattttaatattcaattcataactaaattactttaattcat ttcgagtagacattcaaagaattccggtagttttcgcttcgagatatttt gcgcgtgaaatatgttgtgaaatacgcattcttagaatatggtgttcccg taatattcagaaaagaaaagatttccaagaactttctgaagatttcaata tttgcaaaatcagaaaccagttctgaatattctttatttttagaaatttt tcaaggttttctaaataacttttctaaataacctaccgtatttcttctat taatatggctgcaatactatttttcgatggtcttcccgcttgcaatacta ttagggagtgcaagtctaatagggagtgccatactattcttcagaaaatt tttctgtgttggggcttactagattctacttgaaaaaactccaattttat ttggaagtatagaaaatttgattgaaattgcaacaaaaaggtacaataac ttcaatctctaaaaattttgttataaactgttgcaaaataggcaaaaaat gttattaaaattttaaaattagtaaggagtgtttgcaacaaaaaaaagta ggtgcaagactattagggagtgcaacactaatagggagtgcaatactaat tttcggaaggtctccgaggggcaatactaatagggagtgcaaatctaata gggaggccatattaatagaagatatacggtatatatagctttgaaaaatc ggaaaatgcctaatttttactttttgaggtttgaaaatctctaaaaattc aataaaatttcaaattaccgctagatttttccaatgaatcatccatggtc tatgacagagcattcgattcaaataatccaattttcgaaatttcatgtat gaccaatcaatgcccaacaaccacatttgttgtccacccttttccagaaa tttgcgacgatgatgatccataagtgataggcatctgtgacgtgatgcag ccattagtgcaagataatgacgagccgaagctggtagatcacttatatca acgaacatatggccataacttcctgtcatatgaacatgtagagttgggtg tttacatgtgaaacggaataatctggaaacgtgagggaaattagttcgag acggggaggggcaggttggcggtgccaaccgacagccgaacattggggtt tctcagctggtagcgccagccgacagtctactgcagtactgcagataaat tttcgtcggctgtcggctggtgaaaattttcatgaaaatcaataatttta aagaaattgttgcaaatttttcccaaacttgaccaaatttgttggctggc tgtaccagccgacacccgaaatttagaacattgattagaggctgcttggc agaaataattttaaattcagaaattcaattcgttttcaaaaaatattttt taaaactttaccgatcaacttctggaatcggatcaaaattgagccaatcc atggcttttcgtcttttagttgtagtgtgcattgtgtagatctttttata ttgctgcgaggtgagtaaatgaagaattttcgcgacccgtttctgaaaaa actcagttttctaaggaaattttgaaaataaattcgagaaaaagaaactg agtcagcaaaagaaaattggaaatgtctgtctggaaatattcgaatatta tattcaaaagttttcaaaaaaacaacgaaattacaagcaattgtgatcag aaaccgcggaaggaactggacgaaaaaaattatctttgagacgaatctct ttgcatctttgtgatctaaaagattaataaaggttgtcatcacatttttc gagatttgggaatgtgataagggtgaaaaatggagattaattgtggtaaa atgaggaaaaacctaatttttggtgagaaaattgtggaaaaactataaaa gaatctttatggagtttaaaactcaagtttttcacgcttttccgcactgt gcggaacgttttttgagagaatttggccgaattcggtgattaaaaaaata atttcaaaactttgcgcctcaattgtgatgtattaccgtactctgttgcc attccaccaaaatttccttcattgttttgccatttttctgcataataact gttctgggtttttttgcttcatgtgcccaaatgtacgaatttccctaaaa attatacctattttttcaaaatttttaatcgctagaatttttttttctgc attttctttaaaaaaagagatttctcgcaagtagaaggagaaaaaatgtg tggctatacttcttcttaaagaatgcacgactagccatagctcaagcccc ctctggaacgttccatcttcctcccattttcccacgttcaagaatcatca gcttcttctccctcagcttctcttcttctaaaaccacaactagacaaatg ttcttgttttccaccctatttttcacataaaaccgccgagaaacccgcta tcacagactcaatgcgcaccggaggggctctttgtgtgtgtgtactgatc tctgcgttatattcgaacaccggcgcacactcggattgaaccagaggggg ggggggaggggggggggggggtgaaaaaagagaaatactctgaaattcca taaaatctagaagaagaaagaaaacaaaggaaaaattggacattccgaag tcaggctaaaaaatctcataaaacaaaatctattcgatttgtgaccattt tcatctatctctctcaaaacccgaataaacaaagcctcccgtccccaaag tgtgctctcatgctcttctggagccttctagactgtctgtagagcctaga gacagcggaattgcactgaagtgatggagagacgtagagaaaacgcctga agaaaaaaacgaacactttggtggaggaggagatggcttccctccaaata aacaacaatttctatcgtttctctgtgattgtgttctcttctatgtatac tgttacgatattgaacaggaaattaaattgagcactctgaatacataata cacaataaataaatacaaaaactatagtttcagcacaaaaaattcgaaaa aaaaacgattttttttgtccgagaggagtatatggcctagaaaaagaaaa ctcggccactctgatgcaataaatttaaaaaattatggccgaattttaga tttctcaggccaatttgatacgtttctcgaaaagccataaattagtcggt ttttcacgggcttcttgccttcctcattgcatttttcgcgctccattggc aatctcctgctggacaacgcgtgggaaatcgtgtgccccacacgggcaaa tacattttgttttacaaagaaaaccgtgccgcgacgcgacacgcaacgag ccgtaaatctaccccagatatggccgagctcaaatggcctaacctgtcaa aatcttccacttcaaaatatgagggaagccagaagcgcgtgttgtttctg aaaaaaaaacccgcctaaagttgatttaaattatcgtttttttggaaata ataaaatcgatgaatttgtagattttgataaatttccgataaaaaaaaaa ttttaaaagaggaaaaaaaatgtttcttcgccctttagtaccaaaaatac gcccaactaaccaaatcgttctttcaatcttttttaaatgtttgtgcgtc tataattgtcgcttcagaaaactacacaaaacacacacacacacaaggag aagaaaagaaaaaacgtgttccatgacctgccactgggatcgatctgtaa aagaattggggaaaattgaggtaaactggttttttatcgggaagattttt tcggaaggattgagatgaaagttcgaaaggtaattggcaaagttgaaaat tgaaaaattcgaaaaaaatctcaattctctgctgtaacccccaattttgc gtcatggcctagagtatgcagcgtggcctagaaattcctaacgtggccta aaagatcacggcggtacctatgattttctagcgtgacctagaatatacca gacctagaatttgatagcgtagaatttcccagtatatcctagcagtctta agtgacagtttctcagtacgtccaagaattcgtcagcatgacctaggatg ttaaagcgtggcctacaaattttcagagtcttctaggatattccagtcta aaaattttcagtgaggcctgaaatcatcgcgtgtcctagaatgtctaata attgcaaaaaaaagatttgaaaactagtatttaccctaaaattgcatttt gagcattatttttaatctagttttaaggaaaaaatcagaaaaaataaaca ttttttgattaaatcttccgatctacagatagaaagtgtgcaagaaagaa tgcaacattgtgctcggtggagcaagaagataaaagaaagagaaagaagg tcccccacccctccagtggtcgaaacaatgataaattggacaaacggagg accaaggggccgggcagacacaagagagagagtacgtgaactgaggaggg tgtgcagggaaaaatgggatgggggcaaatctagttcaaagatgagacac ttttcaggatctttgattctgagaaaaattttgaacaaaaagaatacttc aataatttaatggcacatagaaatattttcagattgttcttcaaaagaaa aatatttttatgcccggaaaatttatttattgcatttcttccaaaacagt ggccggtctcgacacgacaaatttttgttaaatgcgaagaggtgtgcgcc tttaaagagtactgtaatttcaaactttcgttttaatatttacttgtggg aaaacattaatgcttaacgaaaaattacagtactctttaaaagcgcacat cttttcgcatgtgacaaacattttcgcgtctcggtgacaacttttaagtt aaaggcacatagaacttttctgaagaattttatttatttttctgaaagtt aattgctacagtatcctttttcaagtcgcaccgagagccaaactgtagca aatcatcaaaaaaaagtcgacaaaacgtgccgaaatcagtaaacttgaga gctttaaaactctattatcagttcttcgccaacaaaaaaaaagagtaccg tatcaaaaacgaacttcgacttttttggctctcctgcatacggacatgat tctgattgacagttttcatgtttttttttgggagttttatttattgtgca tttaaaaaatcgtatagtttgatgcgtggcctagaatttgccagtgtgag cattaactctccacggtagccaagaaattttctacggtggcctaaaaact gccagtgtagcctaaaatattttattgtggcctaaattttccaatggtct gttttttttatagttgcctagaatttcttttcgtgacctagaagcgtaca gagtggtggcctagaaaacgattcatggcagagttttgaaaaaaaaacga aatttcgagaaacaagcgaacaaaaatcgtctgtcgaaagagtatttcga atgctggggatgcaaatcagcaaatcattcaaaaaaaacttttgtgataa gaaatcaaactgataagccagtgtcaaagtctcgaggattaaaaatagca tttcaggtcggggtacggtagggtttttgtagaaattaatgcaaaatttc agtgggaaacgagttcgtggcctagaaaaatcatgtctgaaaaattgcaa atgcgctcccccgaaatggttaaaaattttcaattgatagcctatttgaa gtggcggcctagaatatcaaataatggcctagaactcaaattggcggcct agaaatcaaactaatgacctagattagggcatcttgtaggcagcttagat cacctattataggcaggtgtaggtaaaattgtagacaaatgtaagtttct ttgaagataggcgtaggttcctttgcaggcatacatagatcatttattag gcagatgtaggcctgattgtaggtacagtgccggccaaaaatatatccta tttttgacttttgataaatttacaaattttccaaacgagcacaactttaa aactagaaatgttatcgaaaaaagttcaactcatgtatgtattgcccata attacgtctactcgtattcaattgtttgttgtttactagtgtcacgacaa caaatacagcggccgacatctcgtaagcccgtttttgacaacgtttactg attcggccgtatctcgaaaactaatttttttctgaaaatgttgttaaagt gaaatagttttcatgttatttgttatcatttgtgtttattcactttgttc tgaaaaatccagtaaaaaagttatgggagtgcaaacttgtcgctcactgc cactcacccgctacaatcaaaaatcaggttacttatagttagttctaatt ttttttttgtagagcattttttagaaataacacatgtaaaatcacaatga >Contig2 cctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcc taagcctaagcctaagcctaagcctaagcctaagcctaagcctaagccta agcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaag cctaagcctaagcctaagcctaaaatagtgactctggcagttctctaaaa taagtgactctggcagttcaccaaaaattgtgactctgaccgttcaccaa aaatagtgactctgaccgttcaccaaaaatagtgactctgaccgttcaca aaaaatagtgactctgaccgttcaccaaatatagtgactctgaccgttca ccaaaaattgtgacaatgaccgttcaccaaaaattgtgactctgaccgtc actatttttattgaactgccagagtcactatttttagtgaacttccagag tcacaatttttagtgaactgccagagtcactatttttagtgaactgccag agtcacttattttggtgcactggggtgggtcacgcccccagttctcagtt atgggtactctgatccactcgggacccactttatcgtgttccccgtgcct catttaccctagagcttcctcctttacctctcctctcgctatctctaaca ttccaatggaaactcctatttgaattaccgccaccgatgtgcccgacgcg acttactgttagcccttgttttgcacaaatctgttggcttccatatttaa aagttaattaatgacccaatgttctttttttctctaaatctccacaagat gttctgttttccctactggacactatcgttcactgcgtctcaccaattca cattgtctctactttaccttttttgtcatagtacacgttcgccaacggtg tcgacggccaaatgctttgggcagcgtttgctttttttataattagtttt attttattaaaacaatagctctaaagtttacaagtcatttgttataggct aaatgagttatgtctaataagtaatttgaactagatacttccgtgtaagt gacaatgtatcggaaaagtcctcaaagtgcgatgtagaagttcacatgta ctttgtttggcatgttagtaaaagagccagtatgctgattcattttatat tctatatactcatgtaatatgcccatgtaaggtttaattccaaaaatatg agcgtgttctattttataatattttactaaaatacctttcagttaattgc actcaaatttgttgttcttcattctctcgttatgatttaatcttattgcg tcaaggtcattattttaggtccattagttatcgatctgaaacatgttgtt gtatttttctattcttgtgagctcaggacacctcatacaactccagagaa aatgtgtctcattattcttgtcttttttcaagatctaatcaattttctac attaacgacgtttttgtcgttctgcttctttttttcgttcgtttgtctcg tccatcagctgtccactcatttctctcccactcactaggcagtgctttgt ttggttccgattggcagctggctgcagggcctgcatctcttctatgtctc tcatttacttgcattcttttcttcgttaatttttgttatgatatttaaac gggaagaagagtttgtggttcttctttttataatcactaaaacttttgga taagtaacaattttctgataaaaatattttcacggcgaagaaaaaagaaa aagaagagtagtttttgcacgttttcatataattattttcgttgatcaaa tgttcttctggagttttctaataaatttcttatcgactttttttcagaaa tttttctcaacttgtcatgtcaatggtaagaaatgtatcaaatcagagcg aaaaattggaagtaagttctttataatttcatttatatactataagtttt ctcgatcacaggagaaacaaaaacaacagacaacacaaaaaacaataaaa caatattgctctagtaatcaatagtgttgtaaagagggaagaaaattgtt atctgtgtagcagtcaacgttgattgagatgttgtgtttgactatagagt tgaaaataataacttcaaacttgcaagtcatgacttatcaaacactgccg gaacttattctggatcaaaggaaagttgtccaactgtagagtcatgtttt tcaaaagaaaacacaatttttaagtataaatattttgaaaaagtatgttt tagaagtatgtcaaattaaaaaaaaaatccttggttaaaaaatgattttt ttggatatatgtgtatttttaactaaaaatatatactttacatatatatt ttggcgcagttatttgatctataaatcaaactttttgatagacatttttt tatatttacaacaactagggttgttatgaaaacgcctattattctacaaa ctaaattattttaatcatacattccccactatctaaaaactaatgcaatt ttcagattttgtcatgtaaatgggtaggatgtctcaaatcaacagaagtg ttcaaaacggttgaaaagttattagatcatgttacggctgatcatattcc agaagttattgtaaacgatgacgggtcggaggaagtcgtttgtcagtggg attgctgcgaaatgggtgccagtcgtggaaatcttcaaaaaaaggtattt ttaatttaatgtgcattttataatataaattcttcagaaagagtggatgg agaatcacttcaaaacacgtcatgttcgcaaagcaaaaatattcaaatgc ttaattgaggattgccctgtggtaaagtcaagtagtcaggaaattgaaac ccatctcagaataagtcatccaataaatccgaaaaaaggtattcacaatt tgcatgatattgttataatctaattttcagagagactgaaagagtttaaa agttctaccgaccacatcgaacctactcaagctaatagagtatggacaat tgtgaacggagaggttcaatggaagactccaccgcggtaagtgtgtttct ttaaaaattacttccttttttcaattgtttgaaattaacaagaaacctgt tggagcgtatttctgaacttttaaatcgaaaatatcatttgcaaaaaaac ttgaaaattgagaaacttttttaaaagtggagtagcgtctgcgggttttt ttgccctaaatgacagaatacatacccaatataccgaatataaccgtaat aaaattatgcgatttttatttttatttttcatgaatgttaggggcaaaaa acccacatgcgctactccgccttaagaagaatcagctgtgagcactatcc actatacattggaaatttacaaataaaatagagattaagtaatataattt ttaagggttaaaaaaaagactgtgatatactatgatgatgggccgaggta tgtatttccaacgggatgtgcgagatgcaactatgatagtgacgaatcag aactggaatcagatgagttttggtcagccacagagatgtcagataatgaa gagtacgttgttttgcaaattgattaaaagtggagtagcgtcagttaaaa actctaacatgtcttaggtttttcaaaagtttggtcaaagttttggcaaa ctgccaacttcttgaaaacttcgttaaaaaaattcttgaaatgatttgaa aatttgtattatgttattctcttatttctgcactattctatatggcgcta ctatacttttaattgatttcttgaaagcagttcaataataattaatttta gagtatatgtgaacttccgtggaatgaactgtatctcaacaggaaagtcg gccagtatggtcccgagcaaacgaagaaattggccaaaaagagtgaagaa aaggctatcgacacaaagaaacaatcagaaaactattcgaccaccagagc tgaataaaaataatatagagataaaagatatgaagtaagtcgaaattgac aaacagtggtttttgtttaagtttattgcgaaatattcaaaattagacat gttaaaattttgcgagataatctaaagattaggtatacagattttttcat gtaaagttacattcatcaaaatttttgtgttcaccaaattagacaaaaaa tgttagttacacagtatatttattttttatatcaataaaacctttttcag ctcaaataaccttgaagaacgcaacagagaagaatgcattcagcctgttt ctgttgaaaagaacatcctgcattttgaaaaattcaaatcaaatcaaatt tgcattgttcgggaaaacaataaatttagagaaggaacgagaagacgcag aaagaattctggtgaatcggaagacttgaaaattcatgaaaactttactg aaaaacgaagacccattcgatcatgcaaacaaaatataagtttctatgaa atggacggggatatagaagaatttgaagtgtttttcgatactcccacaaa aagcaaaaaagtacttctggatatctacagtgcgaagaaaatgccaaaaa ttgaggttgaagattcattagttaataagtttcattcaaaacgtccatca agagcatgtcgagttcttggaagtatggaagaagtaccatttgatgtgga aataggatattgattttataacgtgtaattgagttttggccaaaaaggta tggaaaggtggctgtttagttatatatttttctattatttatttgaaaca tgcaaaattgaagtgaacaataagtgatgttcatggaaatttaaactgtt ttatgatacttttttgagaaattgaaaaatctgttcattttagaaacaat gtccacatggttctaagagctaaaatttttattttcatccatttagagta ctttctcttttagagtacggccccagagcgatgttagaaacctgagatcg gtcaacacagaccgttaattttgggaagttgagaaattcgctagtttctg acacgaatttcagctaataccaaataatgtgcaattgcattttgcatgtc agcattcagcattcatacaaaaatttcaaagagccaacttttcatacgtt tatggtcaacactgtatgtgttacattgaacttttttaaattgtattatt tcatatttgaatcatttccatgccattttcaaatctttttttaacaaaaa tttagtttcaaagttttaaatttaggtgaaaacttgctacaaaataacac attctttagtcgtttcaattgctatctatccgcagactgcaattttgttt tcccacaaccattcacacaataaataagggtataaagttttgttcatata acacatttcaatactaacatttcaattttgaacaatttttctaaacttat ttcccttcgcccaaacgtcattcaacattctttgtacaaaacattaccat tatagaaaatctcatttttccactatttcatttatttttattgttccgcg ataaatataaataaacatttacgtgttccgagttcaaagttttaccacgt ttcagaataaaggaatcggaggggggggggggtgaaaaaatcatttcaac aaatcagaatttctcaaaatgtgagttttatcattttcattgttagaatc acgattagcttatattgaacaataacaataatttaatcccattcattcca tatcttctcatgatgaaaaaataagcattattcgttttcttttgacgcgt ttgatagggttctgccagcgccgaccaatattcttctcaattagattttc cagaactgcactaactgaaaattgttttagtaaatagaacaaaactgact attagttcaatattataattttacttccctattttctgacagcattttgc aaatactctttacagtcattcttgtattttgacaacaaaattcgaattta aaatttcatttttcttttaaaatatcagtacactcctggtagaacaaata ttttatttcaatgtttatgtatgagtagaaaagttaaacagaactattat ttggcatcaatttctgttttttttttcatctaaagtatggcatttattag caaaaaatacatgcttgaacaggaataactattaatttcctatagcgacc caaaaacaaccaaaaattgttttaaataatttttttttggtcgacttcca tagttatgagtggcaaaaactgagtaattgtcaccttttgacagtaaata aagaaattttcaaaaaatttttgaaaagttttattatgctattcgatcat tttggcaccatgtaggctttaacaccccactggcgctactccgcatttaa aagtgatttataaaaagtgtaaggaatactttgctccaattttcgtctag tgcatggcgtacccatattttgttcatgttgtgtctcttagcctctcctc ggaggcgagtttccatagctacccacaaagaatttatttttctatgtatg cttcgttggctccaccttcattttctcagtcgtttctcagttttctcatg aattctctttttgttcgtttttagaatagctatcgttttaccacgaatgc gctttttcagttattaagaaaacagttttttgtagtacatttttcatagt ctgaactttcagcatggaaatagtaaaaacaatcattccacacaaccgtt catacattgaaccgcccataccatctgccaccgagtggcttgcaagtaag ttgttttgtctaaagacataatttctgaaagaaatctatccgatgagcgg aaaataaacagaagttgtttttaagaagactttgatgcaattacaaaaat atatggctatgtgcgtaaaaattagacgtgactcgtcataaagaccatat aggtctacaaaggcgtttcgagtcttataaaagttcgttcaaactttttc ggggtcttgttaaaagcacaccaacaatacgttaaaggatttataaaatt gcaagtattcgtctcaattgcaatactttggaagccgaattttggcaaga tattggtaaaacaggtacattacaatagctatttttggtaaaatgtacta ggtatcttgtaatgagttgcgcttgcctcataggcatacatctaccgtat attctttactagtgctgcaggcagcactaatttctaggcccttttttaat gcagtactattagagactgcagtactactggagatgcagcgctaatagag aatatacggtatgtagatgcctagaacgctaaggttgttttattcaaata aatttcttagaaaggctaatattatttttaaaatcaagatttaagttaat taacacgacctcgtattctatatttcagatatgtcagttatgcatgtgtc atgtttaaccattgcttgtgtatttgttgctatcacatttttgtcatcat ttttccatttattttttgtattgaaatatgtgtctaatgagaggatacgc aatgatatgtacgcattgatatttatgtttcccgtgagtttataaatgga attatgaagagtttggaaaatatattttagataaccacatttgcaagttt agtaggaatgttcataccaagagcggctattttcctttatgcagtttccc ttgtgtaagtgtaccttaacaccacaaacagaaaaatgaactttattgat ttgttccagatatttcatgtttaccctcttcataatggtcactcttttat tcaacatttttggaggtcggcaagaaatgagtgcctacttgctccaaaga aatattcgagttaatttcactgttccaccactctgttttttcaagttcct gccaactgtagaaagtacagagtgagctcgaaaatcttatttttgtcttt ctaatatattttccagtcaaaacctacgtcgaatcgagtggttggttttt caaactcctattatacgaactcttttagagttagtctcagtcgttgtgtc tatggaacaagaaggaagacgagaaagcgtgtaagaccatgaaatgtttt ttgtgtgcgcaacgtagtgcgaaaagttagttatttggagaggaagttat ttgaaaagaatctttagaagatattgatattttatattctgtatgtaggc gctgtggcaaattctcgacttcacgggagaagcgcaaattttgtgctggt ttccataatgcgctcgaaaaatgttgtatatttacattattcaaatctca ccgattaaagttgaatttcagttggttcgtattctctcaactcatggcac tactctcgatgtgcattgccttttatggttgctacgttatggttcctctt ggaagagaaaagcatgctccataccgattcgattttctatttcgtacatg tgacattgctcagtgtatttacacaattcaaaagtttgtctttgagtttg ctgctgcagttggcttgatcacttcggatcgataccttccagctgctgcg aaggcattgtgtaagttcagaagtttaaatggagataggaatgtttcgat tttcgtttataaaaacacctgttaatattaaattgttgacgtcatagctg accattcatgcttctttgtctgattgtaattatataattgtaaaaataca aacaagttaaccgaaaaagttctctgttacgtttgcggtaaattagaact ttatgttgaaagatatcagcttaaagttaaaattttttgaaaattaaatt ttggctagaaattcaatttttctgataaaaagaaagtttggtgggaaatt taatttcattgatagaaaaactgaagtttgtgttttctgagaattaaaaa acagcgggcaatgaagattatgtctttgtaaatttcaatttaaaaatatt ccagggtgggcatcgtttatgtgcacctgggagatgatgctcctctctgc tctttgctcatattgtttgcgtccagccaagtgtaaattttttgatcttt accctggaaacgatatgccagctttatcggcaagagatggttcaaactcc cgtgttccgtcattttctcgacgtctatctattgaatatgagccacgaat tgctggagtgatgttggagccgccgtcaagaagttcactgagcattactc ctagggataaaatagaagacccaacaaccgtatcgtattttgctgataat tttgattctttatcgcaaatccaaggacaataattgaacttcctgccatt atttcattgctcaatttacactattcttaactttttgacatgaattaaat ctttttaaactgagtctaagcgtattattgtatcgtattttccccttctg atgtattcatttacttgtatttttgaaacccatcccagtgatacaactac ccaattttcctgtgccatgtttcttgaaacaaatcaaatgtgataaatag tttgaatgcctttatgtataataaaatcaatttttcaagccggatctccg tgtttgccattttgattacccagcgagcctgaaagatttgaaaagttata tgagcagcatgaaaagcctgttctttggtgtttaaaggtagcacacaaac caaaatttgttagcgcaaatttaaatttctatttcagttaccatgagtta gtctaacaaacatttttcaagattatcagacaactgataattttaactca ataagcatgattttgaacaatttcctaactggcgttacttcaccttgaat aagattgaagtggttattggttcttatgtacagtgcttatatatttagtt aataatactaactgaatatataagcactcgctaaacttgcgcacaaactt gcgcttagatttctcggcgttttcctgtgcagtctttcttcttcttcttc gattactggcatttcttcaacgaaccattttacattgttggcagacgggg aaatctgagaaaacattaattatttttgagaagatttttcaaaattaccg ccgattttccttcagcagaagcagtaattttcagaggttcttccggaatt tctatttttactgcctgttgttttgctattcgaatacgttcttgaacaag tgcaggcacatttatcacaacatcatctgtgttttctattggctcaattt cacgtaaatctgttgatgggcgacgatggttaaggttcaaagctgctacg cctaaaaagtttactttctcgatataatgttttccaaaatattttgtatt tactcacatacactagctctcctcggttcagcttctgatttataaaattt tgcgatagctgcagaaattttgttacgcggttgttgccacatatgtcgtg gcaacatcgaagcgttactttgcgttcttgtattttcagttgaaaaagac gatgcatcatcatcaccaaacgatctcgatcggcgtcgaacatacgaaat caaaccattgtccaagtcagcttcgttcctgaaaatcgtccacatttcaa atttcgaatcagatcatattaattcacatatttatgcaacttccttgcat atgagaatgagctttccgtgttgaagttggcgtgtccattgtttgtggtg tttcatcttcccaaaacggatctcttttaagtgtcggataccgcccgtag gcttgatctacgactgcataccctacctgcaaaacttctaatagttttca atcttattcgaggtgaagaataaatatctctttaacctggagattccgct caatgatccagtttgtttcaaaatcttcatcgtcttctccaagaggattt aacattacttctgatacttttagccagccgactataaatacaatttgtag agatgtcattattggaaaataaatgtctatcttccactttgcaccaggaa tgttattgttgttttcaagaaactgtcttccaaacaatgccaatacaaaa taagtccgaacagccaggttaacaacttgagtataaaccagaggaatcgg caccatatcgaaaatgaccagatttagaatttttgtacggaattctctca ttttctgaaatttggttgacattaatcattgtcagtactttactaacatc aatcaaatcaacgtagagataactgtctgcaataagcccttcatccttag caaccgttaccaatgagaacaaccattgaattggttgccagtatttgctc tgcgggctcgtgatggcatcaaattcagtcagctcatcttcagttaataa tccagcaccaattaggtgttttatggtaggaaatcgtcgtctaattgcag gtgaaacatcacgaaacaccatcacttgtgcaactatcatgtaccgcaca cagtttcttcgaattaatcgggctttttccgatgtaccacgaatatattg agcaattgtcaaagcggatctacaaaattattaagaatgcaggaatattt ttgaacagaaacaaacgtatcaatccatccaacgttatcgaacactttgg tccatcgattataaacaattgatacataaaaacctaacatgaatgttacc ggtataaatactgaaaatgtatcaaaaaaagtgcacagctgttcgaaaac tctgaaaaaagcaagttaagctcaataagaaattaattgcaacaaactct ctctgtgctttgttaagcaatattctataaatcactgataaaatagaata acacaaaagccaaattagtaactctgaccatattgatttccaaaccgatc ctttccatctcaataacacttttatttgagtgaacagacctgacgtggca acatctagagaataagcgacagtcatcagtcaaaaaagtatgtgtagcaa caaactgaataaaaaatattatttgtatcagtcggtgggtgagcgattag tatcaagtagcacaaacgctaccgtttcagatttgcatattttattgtta taggggttattcaggcataggtcggttgaattccgactttttattcacat ttttccagaaacaaatcgattctcctaattttatttttatgctttatctt tttgaaaatctggcatcactgtttgcggaaaaaaatataaacaagaggga atacagtttgtgggtattttgcttacgttactgatattatcgccttttaa tctatattttagtaatttatcttgcgtaaataccaaaatatggattaaaa ggagataatatcagagaagttaaattacagtagctgcgacaaagaaaagt ggccaaaatttctgattttagccaaatttggctttttttcgaaattttga cccgccataaaaaatttagaataattttataatttttttacagttatgct tggtacattgagactttattctatcattcaaaacaaaaaaataccacaaa tgcttctccaactttgagaattgtaaaattttcaataagccaaaagtcag ttactggtacctttgcacctatcagtacttgccatcaaaagaaatttccg agaatgttcgcatttcggagtgccgtaaaacttgttcctgagagatatat atcgtctcatcaattcggtatcagtcaacctcccatattgtgtcatccga tattcaatctacaacaaacgtcatgttgtttgttttcaaacaaagtgtat taacattggactttcagatagggttttctgattctttaaccctctaaaaa accatttccctcatttccataatatttattctattttatgcttaacaaat ttacacgagtttcaaactatttgattgttcatcaaaaaaaatcccaaaaa ctgttttgtttttatatattgaactcaacaacataatataaaaactttca aatcgtaaatcatctaagaaaagatcacatgaagtgagtagatgatagag aaccagttcttatttttatgtttccgttacttttttgttactaccactaa taacttggcatttttcaatcaatattttttacagaatgactgtaacttat tcactcgatgttgcttcttcttcttttttctgcttatacaaactactatt tcgatggaaggtgtgctaaacattgcagatattttgtttcattaaatttt taggggtcaatctggaaatcggtgtgggctgagcttgtagtttggctttg tctttatgcagtgcttagtgttatttatcgatgccttttaacaatgaagc aaagagcgtaagtactgttttcaaaaataaaccgggagtctgactttcag aacgttcgaagatctttgtatattttttgatacttattccaatttcattc caattacattcatgcttggattttatgtctctgctgttttcacacgatgg tggcagattttcgacaacatagggtggattgacacgtaaatgacttagtc gtattacgattataatatctaaataattctaggccttgtctttggataac tcaatatatcaaaggggaaacagagcgagcaaagtgtgtgagaagaaatt gtataagatactcaattcttacacaggctatggtaaatctgtgtgtaagc ctaactaataacaatagtctttttgaaggtgtaccgtgacgttgcagcaa gcgttcgcaaacgtttccccactttcaatcatttagttactgctggcttg atgacagaaaaagaaatggccgagttcgagtctatccctagtccacacgc aaaatattggcagccaatgcattggttgttttcgatgatcactttagcgc gagacgaaggaatgatttcaagtgatatcatatatgtagacttgatggag gtacataattcagaagatttttaaagagtaataaataataagtttcagaa aatgcgccaatttcgtgtcaacattctttcattaacattatttgattggg ttcctgtccctcttgtttatacacaagttgtccatcttgcagtacgatcg tatttcctgatagcgttgtttggtagacagtatctccatccggagagcaa ccgtttaaatgactttaagcaaactattgatttatatgtaccaattatgt cacttctccaatttatatttttcattgggtggatgaaagttgccgaagtg cttctcaatcctttgggagaggatgatgacgattttgaatgcaattggat acttgacagaaatttacaggtaaacgattaacataatcaagatttattat tattatttaatacgtttattgaaaagtgaaaatgatagaaaaatttgatt atttaattcaattttaagttagaaaaatatcctacacattttctgaagaa gtgtcgtaaatggggaaactttttaaacatgtacgttccaaacatgtgcg ttccaaagttccgaaaaaaaatttgtgtagtaccaaatattaaagaattt tcttcattcttaaaatagtcgtcttgatatacttctgatatgatagtaag tattgaaacattaactacactttttcagttattttttattcgcgatacca tccatttaataaaataagggagttcatctacacctgtgcccttctatcaa acttgattgaagtatattatttttaggtgggattgatggttgttgatact gcatataaccgttatccaactcttgaaaaagatcagttctgggaggacgc aattgcggagcctctttacactgcagagagtgcgatgagacctctgaatc cacaagtcggatcttgtgcggatatgtaattgagtgaatttgttccaaac aattgattttcatgttcaggccaaccgaagaagagcctttcatggttcgt ccacgaagacggacgctgtccagaatgtcacactgggatggcgacatgga agatactgatgttgttccggttgtgggtctgaaacacacgcgtgataata gtaattatgcttctggcgaatctctagcattttcgaatagctttgccaat ggtggtaggaaactgagtgagatgtttcgaagaatgagagctgggagcag aattggtgataggtataggaaacgcaactcgtcagcacaagactttgaaa atggaatggcaaagtttgtttcatgaaaatatgaatgttatttattattt cggctatttacagaaaaaacagtattgatgaaaatgcagatattcacagt aataggctcgatcaagcatccggtacaccaaaatcaggaaggctttggag ttcgatgcctcaaacacaattggaagaaatgcttaaggtttgtttattca ctggatttattcactggcgctttcacgtaatttcgagactgcaaatttga acttttaaaacaatattcggagaagaccattttcgagctaatctagcgtt agttttaaaaaatgttttaagtttttttttcaatcacaatggtgattgaa aatctcagaaaagtagagtacttgcgttcatacttaacgcatttctcata tttttatagaattacacgccgataaacaaatagttaagacatttcagaat aaaaactttaactctcctgtcaaatacaacactgatgggatgaaagaccg agagcttcaaaatccaacaccaatcactgatcacattgatttgcctttgc atgtggcgagtagtcaatcatggtttaacgaaagtttaccagtaatcaaa gaggaggaagaagctaaaagaaaatccaacacggatacaggtagacatag tcaattttgtcaaaaaaaattaaatgagtttttcagagtctccaaagtct agtaagcattcaagtatgtcaatcagaagatcggaattgagaagatcatc atcttcaggtagtgatctaggcaagtctggaaagcgggagagaaagaaga gcgagtgatttttgaacagtatgataaaatattttttgtttctcttttca ctctaaactgaagatccctttcatttcatttttacatatttattatattt taaatttcaaattgcttaattaattttctattttttaataaacaattgtg taaatatatatattttttaatacagtgtgggaaagttctataggaccccc cctaatttgaaggtttgaggaacttccgaaaatttttttgaaaaactgct aatgccattcgtttttaaattgaaaaaaacctatatacatttttttccag aagtttatctcaaaaactgaggtcgcgctggaaaaaacgtcaaaatccag tgtgaaacttctataggaccccccgttttttttcacgatttttactaaaa tcaacagattttggaatttttgacaaagctcaaatcaagtttgagttaga aatgagttcatataagcagttttgactttaaaaattaatacgaaatgttc tcgtgggatctccagactggttctgattcttccgatctttgatgttcaag tctgtttcaagcttcctggtgctctcggtaatgccaaaacttgataaact ctctttaacaagttcctactaaaattcctagcacacacaccataaacatt tttacgccatccccaagaaaccagtcagaaacagcgtattaacaagttgc agttatttttgatcaacaacagaacattcatatactaaaatcaagaaagg atcaatagttaatcgggtttccttgtgtgcggatgatctcaaacagtctg tcctccattgatctgaccaaacttttcagctggttgtccggaatagactt ccaagcgtcgagaattccttgcttcaacgatgcaactgttgggtaagtct tgttctgagcatacacgatacggacaagaatcccccacaaattttcgatt ggattgagatcaggacttcgagctggccaatcaagaaggttgatcttctt gagcttgaaatagtcgcgggttgagttgctcacatggattgtcgcattat cctgctgaaatctaaagttttttctggagtagtgacgaagatatttggag agctccagttccaagacgttctgatagtcagtgctgttcatcttgctact gacgaactgtatctcaagcttcttcttctccgtgaacgctccccaaacca tcaccgttcctcctccaaaattacgtctcgaaaaaaccattggttccttg cgcaaatcgcgccaatagtagcggcaaccgtcaggcccatcgagattgaa tttcttttcatcggagaagacaacctaaaacaatgatcctaattattcac tcttgcttttttaaattctcactttactccaattcgttcccatattgttc ttagcaaattccaatcgcttgagtttatggtctgcagagagtaacggagc agggcgaagtttctgacgaacgattacaccagatcgtttgatgacattga ggatggtcctttttgaagcagacaattgaagctcattgcgaatatctctt gccgtcttacaggagttggaggcagcacgaatcacatttcgttcgtcacg cacggagagagctttgcgacgaggagctcttttagatgtaccgtagctca ccggatccttcagatacacgcgaatacagtgtcgagaacgggaaattttc ctactcatttcatgcagggacacattgagcaatttcataacatccagctg agcgcgttcagtgtccgaaagggcagatcctcgaggcattgcaagttaga ctgctttcgaagtaagctttccagcctctatatgtgtgccacaacacatg ccacaattccacatttaataattcacgcaaaaaatagtaaataacatctg tgagggacaatttaacttgaaatattggtcccatggaaccttgtaatcaa agaaaaacgatttgattcctgataagccttccattgtttcctgctgcata ttttgccaaatcagcttgactacacagtcgaaacatctaaagtgcgtgct aggaattttagtaggaacttgttaaagagagtttatcaagttttggcatt accgagagcaccaggaagcttgaaacagacttgaacatcaaagatcggaa gaatcagaaccagtctggagatcccacgagaacatttcgtattaattttt aaagtcaaaactgcttatatgaactcatttctaactcaaacttgatttga gctttgtcaaaaattccaaaatctgttgattttagtaaaaatcgtgaaaa aaaacggggggtcctatagaagtttcacactggattttgacgttttttcc agcgcgacctcagtttttgagataaacttctggaaaaaaatgtatatagg tttttttcaatttaaaaacgaacggcattagcagtttttcgaaaaaattt tcggaagttcctcaaaccttcaaattagggggggtcctatagaactttcc cacactgtatattgcaaatacatgacataaatttagatgcagggcaaaaa ctatagatcaaaattttctattgcactttttatgtataatcaataaaaat tgaaaaaataaaaaactttgtaagttgatgccgaaacatttcagtttcta ccaaaatcgttcgattttatactgatcagttttgatcactttctggtaaa tttcgaaattcgcgtttttttcagttttagagttagaatattagttacta acaagtttagcaattttgaaggatttcttcaaaaaaaactgctcaaagga ctttgctgcaaacatactcaaatttgcagcaaagtcctttgagcagtttt ttttttgaagaaatccttgatttttccttattttctccttattttctaat tttattttctaataaatccttattttctaatttttcgtaaaaaattatta aaatttcaaatttttggaaacaattgtttttttttcagtaattgaccata tttttgaccttcttgtacgtgaatgctttcctttcctctattagggtgtg tgactgcgtgtgtgtgtgagagtgtgtgtatgtgtgtacgtgcgtgtgtt ccctggcgcggtggtggtgttggccacacggccctgcgacccccataaaa actcggttcgatagagagacacacgggaatgtgagagagtatgacgattc gagagacgcagacgcacgaggagaaacacacgtcacgcgaaacacgttcg cgtcgcgtcgatgagcgcgcgcacacgtccacatcgttgcctggatgagt gggtttttggtccgcacacacgaactgtttttttttaattcttgtcttcc ctagtagtgaagagttttccaaatttccaagtatgtagttttaagtttct gattaagaaaaatattattcatgtgttttgaaagtttgtcagaaaaatca atatataatatttttagacgccatgattttcaaccaaataataagtttaa ttttttttgtgaatctaacttattgatttctgtgtaatattttcaatcgg tgtgttttttactacattgatatctacattgatatatctacattgataca ttgatatttttcgtaaaaaatttttttactccattttactacattttact acaatttactacattgatataaatgatgatttttcagatgaaaagaatac tctcagatggagtcaatgagccaaaactatgcaaattcataaaagaagaa tcaccacataaagttaaacaggaaccatatgatgatgaagaccttgtaca tttgggatccgaatcaattccatcaccaacttcatccacttcgcctccat ttcctacagaacctgcggttcaaacaattaaacttcccaaatatatggag gtaaccatacacttttcatatatgtgtaacatggggcggaaatgtgaact gttcatcgagaccgaaaaaataatattttcaatgattacttttggtaaac ttttcaaaacaaaattggcaattttttcttacaacttcaaattgttattt atttccgattcatctttataacttcaataatttatttgagaattctattc aatgctattaaagtcaaaaatttgcgaaacgtggttttgcccgagttctc tgaaactttctgaatttggaggagtatagaaaatggttcgtaattttttg caaaaaagtttcaaaaggctgattaggccacgccctttttagagagttac tcgtcttctaaaaagtgtcactggttttcttgattcgttttctctaatgt taagtacataatgacataaatcaaacaaaaaacaatacagtgttcttggt aaacgagaaactgagtgcattttttaaaaaaatgtgaaaaagtattggta aattgctaaaattttgaaaaatataagattttgaggaaattcaaagcaat gtcgcatggtccgacctcaacccctacattggtccgacccctatacgagt aattaaaataaaattaaagtataaaaaatgtaggaaaaaaaaattttttt ggtcgaattccaaacttatgagtggaaaaaactgagaaaaaaatgcggat ggtgctctcttttgttgaaactttcaacgaaagcctttaaaaaaccgctg aaagcgtccaaggaatgtctaaaaattagaatgccgctgtgaaatttagt aagcgatcaaaaaagttaagcaatttactcaaattatttcagttaaaatg tggtgccctcgttgctcgacttcacactgaattgtttatttgtcctggaa ttcgagaaaaatgcatcgaagtactaggtcggtcagagagtattacataa gtgtcattgattattcacagactgtcccggagagttattgacacctgttg agtttacaatcaaggctgaaaagagcaagcaaaaagactggaaaggagcg ataaagcataatggaaggatgttaaggtatgcttcttgtagttttcaact cttaaaaacagaacaatttaacagaacactcatggaattcaaacaattgg atttttataatcatcatgagatgtgttcattcaagtgtcattctcgaaac tatatcacaaaaaacggtggatctgttccaaaacttccaccaaaaaatgt tcaacgtcgtcactcttctgcatcaacaacatcaaacgtttcacaaacag cgattaatcaattacttcaaggagagctgattaaaaatccaaattttcta gctgcgttcgctgctcattgtactgctgaaaatcagaaacgacaagaaga agctgagagaaagttgcaagaaaaacaaaacgccatcaagtgtctgatgg aaaccgactcggtcacgttctggaatcaaacgatacaatcaaaaacgtct actgtcgttttggatcgaatttctatggagctcggttcgttggctcagaa tctgatttctggtcgtgattttgcggcgagttcgtctaaaatcatccagg tacttcaagttctcggcttgtcggacaccgtttctcgggaaatgtgtggt caattcattcttccatcgtccgtgtcaactaatgttgatggtaaggaatt cgatccttttaattctatctgaagattagttaaaagtggagtaccgtaat ctcctttttaagcccaacatgacccaacactactgaatttcgcaataaaa ctttttggaaatttctcagaaaaaagttatggcgattcaaagttctgcaa aaaaaagactcaatttcagctaaaatcacaacttttaccattttctcctt gtcgcagcttctcgaatttaataatataatctttcagggcaaagctcatt agacgcacaactaccggttcaacatttgccgtcaaaagaaatgaaagccg ttgacccaatcgaaaaatcaccaaatgataataacaatgaaactctcagc tcttctgagaaactcgaactcatgatcagaaacgcgctctgatcgaacat tcatactctaacacattcctcttcacatctccagatgattattcatgtcc atcattttatcattaaatatctcattctatactctcttctcgctttattg tttctcgctcactccctacccaccattacataacatctctgaaatttcaa agttttgacattcttggctgtgccttttcctctcaatgatattttccaaa ttccattatttttcccccgcctttgattgcttttattggttacttgttta ttggtatacttctcggtattctttttaatgttctgcatgtttcatatggt ataaattgcatattacgttagacacaaattactgcaaactaaactcagtt tgtctcaatggaaattcgtcgaaacacatggtgtcaagctgtcccattac ggtttgatttacaaaaaatgtagatcaaaacaaaatcggacagcccgaaa ctaggtgtaaatatacttataagaattcaaaaagaccgaataacataata aaacattcctaagaattttagattttctaaaatttccagtcatagttttg gcaacttgccgaatttttaaaaagtatgagcttttgagaggatgcagaat gtttttacacaaataattaaaaaaaaaaggaaagcataaaaattttagaa tttttttttcggtagacttccaaagttatgagttacaaaaaatgagtaat tgtcgctttttgacagtgcatttaaaaacatgcaatttaaaaaaaaactg tcagctaaggtgccgactgtcagtgacagtctgtcacttgttggtaattt tttaatagattctagcttacattggtattattctgttctaaattatttgc tcaaatgaatcatcattctcgctgtgttgtcattttatggtacttgtatt attatcattatttagataatgaatatacacatattggatagaacattttc caaaatcagaacaaagcctcattacaaacttcaaatttcatttttcaaaa ctttgaatagaaaataaatttgggtaaatagtcagtaatcacccaatccc ttaacataatatccacattatcgagctagtgaagctgtttctctggcagt gtccaacccacttcttcgtcttcacctcattccttttcaactccgcccct taaaggaagtactcgtccatagcgcataggaacaggcaagcgataatgtc tgtgtctctatattttcacgcactgtctagtgccgcatccgtatcctcta ggacaccggtgccgtggcgtctataaaagagagtacgggtgtcttacgca gttcgtatccgattttcagtccagtgtccaaggaagacaagccgaatgtc ccatataaaccgcattcttatctattcacccattccaccattatgttctt gctttgttcccatttctcgtttttcgttttttttctcttaaactttagat attactatgttaataacccattattttaggcagtcacaacctaaaaatga tggagacttcggagcacaaagagctccgacgtgtggcgtttttcgccatt gttgtatctactgtagctgttattgcagctattgtaattcttccaatgct ctattcatatgttgctggtttccagagccatcttatcattgaagctgatt tctgtaaggttagtataaaacaaacatacgtattctattatacaagcaac gcttttttagactcggtctcgtgacatgtgggcccaaatccatgacatag atggaccacacctattccatcgtcagaagcgtcaatactcttcaccaaac ccaccagctgccggtggatatggagctccagttacgaactccgagccagc tccaacttgctgctcttgccaacaaggaccagccggaccaccaggaccac ctggagatgacggaaatggcgggcaagatggtgttcgtggaaacgatgga actgacggaaaggaaggaagccttttggaaagtgctattgtaaatgaacc atgcattatctgccctccaggaccaccaggaccacaaggaatggcaggag ctaaggggccacaaggaccaaagggaggaaatggagataatggaccagat ggaaaggctggagccaacggaatgcaaggaccaccaggaatgatgggccc accaggaagacaaggagtaagtggaccaaagggagctccaggacgtatca atcaaatcaatggaccagctggaccagctggacataagggagtccgcgga ccaccaggaccacgcggagaagctggacttgatggtggaaactctgaagg accacaaggaccacaaggagatgctggaagaccaggaccagttggagagc aaggaccacaaggaccagaggtatatttatttttattcgatatattcaag gctttacatttattaatttcagggaccacaaggaccaccaggagaaccag gaggctgtgagcattgcccaattccaagaacaccaccaggatattgaacc tgtactttttctcattaatttcgaattcatccgcccaaataattgggtgt ttacaatacaatgaattttttcattttaattcacagattataaattgcaa aatttttcagtatttgtcttattattactggtacagagagtgtagatagt tagagagtgccaggcatccgggacccaatggggcacatcaaaggctccca tcgatcgatatgcctaacatgttgaaaaccgattaaaacctcacgtttga atcccctctaaaaactgaatgtgtgccaacacagcgtcattgacgcattt acggtgtcttgacgcgatacgcgttttcaatacgaggcaaactcaaattt attattttcattttcaaaatatcaatttgttgaaaactagcaactactac tcatctcttcactcgtcattatggttaaattgcgcgatgaacagaagaca gagcttataaaacaggaaaaaaagcgccgacgcattgctcggttacgaca ggtatcgatttttctttcagttctcaaaaatattaattattattgacttc aggttcgccagcaaagtgcagcaaatgcaaagattacaagagacgtggtg aatcaacggaagcaagagttcattcaagaaattaggttgactttttcaaa tttaaaattataattgaaaatttatattttcagggaagaattgcacgagc aggtggatgcgctaatgacagaagtaactgaaaaatcgttaaaattacct gtctccgcaaaaaggaaaacttctacgcctcgctcacttgtaagtgttcg ttaaaatgatatttgcaaaaaaaaaccacgaattttcagaaaaagtcttc tcgttgtcgtgaaatgacggaaagcgatctggagttggcgaaaaaaagaa atgcagatgcaatgaagcatcttcgagaagcgaagaaaaagaaggaaaag gagcaagaggaaaagttggcaaagaagaaagaagccgcacgaaaagctaa tgcaattatgagaggagctcagatattgtaataataaacttttttttcat ttatgagtatatattgagatcaaacaaagaaagtgacaagaaatcgatat ttttaaacacaaaaaaaattaaaattgaattcctaccgatcacaaatggg cgaaagttagatgaaattagttttcacaagtgtatcggttgcggccccca tagtttattcttcgtggtggtcgttcacaaacgtcaaacgtcaatttcaa gcaagaaatttcattttcataaagaagccatggcagatagtcgtctctga aatattttacaaatttttgaatttcttttccattgaaaagttgttatttt ccgctgaaaaattcaatttttaaaaaaaaaacaacattatttgcaaaaat taaacttttttatttgaaataaatttttttctgaaaatttgaaaaatgca aaaaattcaaaaatttaaaatttgaaacaaattttttttcgaaaatttca aatttccgtgtaaaattaaatgaaacatgttatttttcattgaaaaatta tattttgcgcgtagagcatgttgaattggagcacacttaagtgtgctcca aatttgctatttttttcttctagatgcaccggtgcaccatgttaaaaatg cacttttttggcaaaggggaactaatcgatacattttttaaaattagata ctgtgctaaaattaaactttttatttgaataaatttgacattacaaaaaa aaaattttaaaattttaaaattgaaaaaaaagtttttgaaattttttttt gaaaattttaaattttcgcgtaaaattatatgaaaaatatgtttttttct attaaaaaaaaccgttttgtgcgcgaagtatggcgaattggggcacactt tattattgtcacgatgtaccatgtttaaaatgtaaaaatcgatttgcata ataaaggtggagtagagtcttttaagaattttgattttaataaattaggc tgtagggactgaatataactataaacaatttaatacaaaatttctgaatt tttatgatttttccaatttcgcaaaaattcaaaaaactagtcttactatt tttgaattcccacgcaaattaatgatcattgttggtttttcttgtttttt tttctttaatattcaatttgatgtttcagttcaccaaagtttcaagacat ttctagctaattagcaatattaaagaaaaaacaagacaaaccaataatca ttaatttgcgtgggaattcaaaaataatttaatatgaaaatgactcactt atgccacaaatttttcctatattctatgtaccactggctaaacttgtcaa attggccattattaacatcccaggtacgacgacgctcctccagctccttg gctacggtctcttcaatttgttcttcttttagtactcgttttcttaaaac tgatccagttgcgttgaaggctaatcggattttggcaattactgagccct aaacgatttctaatttaattttcccgaatttttagaactgacgtatccaa cttgttgaagacttcctagggcactcttataatctcctctatcaccttga aggaaatccatctcacaatcaccatccgcatcaaatgctgaccaatcatt gaaacagaactcctaaattttcaagtcaccattttaataagtaatttttg aaaaagacagttggagaaattcaaaatctcacaattatggattttttttc gaaagcttgcagtttgcagaaaaattgctggacaatttcttgatcatttt caagaaatttattatcttaaaattcacatcaaatctctttgaattaagag gtgggcggcaaacgatttttccggcaaatcggcaaattgtcgaaattgaa atttccggcaaactgtcggaattggaatttcggtcaaaatcgatttgccg aatttgccgaaaattatcggaaaattgtgattttgtacttttttcttgga aatttcagaatttcaattttaatcggcaaaattgtacacatcctataaat gttgctacatctattctgaacagtaagcaaattatatgatattattaaag aaaacgtgaaaaaattttcaaaaaagcacagttttaagtttttccgtctt ttaaaaaatccctcgaaacatttccgacacatggcaaatcgacaatttgc caaaaatgaaaatgaacggcaaaacgaacggcaattgccgcccacccctg cttcaattttttaagtgtatacctttctgaaagtataaaacccaggtcta tgcggacaattgatcccacgcaattgcccagccaaactattagattggtc ttcttcattctgctccaaattggtcaaagattcacatacatcacaaaaat aacaattgtttccatagccaccacatccattattactcgcgttttgacac ttcatgttggaaaagtcctttacattccaccactgatattccgctttcgc aaaaaaactattgtcaagcgcttctggtagatctttaactttgacattct taatttccaaagtaaagcatcctgggaatcgaacaacgtctggctttata tgaacttggctcgggttaatatccaatattttttttgctccgggtcgtgc ttaaaaatagcttctttgtgaatcaaacaattatggataacaaaaaatgg aaaagtacatactgcaacttgcgtcagcccttccgggaaccagccaaatg tcctgatagttgcatttgtggttgagcttcttcctagcatcggtgtgtct gactccattcctatactttgcacgacatgcaggatccaatatatcgcagt caagttgggcaaaaatcgataatataaatgagaaaataatcagaatggat atgttgaaggtgtaattcgaagatcgcatactaaatgtgaatatgtgtat gtttaaaatggaattttcaatacaattcgaataaaaggaaagtaaaatat ttgtagaagcaaatcaatttctaatgataattgcgtcacagtgtgtcctc taaagtgtcttctttcttgtctgcattttctcttatgcctctctttgtaa caagattggccgtacatgcccctcttccgacgcacctaacctgttgacca tggagaccttggaggtcatcattttttttgtgcgaaatttggcatttagc aagagaagggatcccttatgggagagaaataaagagtaaagataggcaat tatgctttgtaccacgagaaaaaaaaactcatttgacacatgaaaatctt ttgaaaatggaacaatcttatactattcagtacatgtgctccatgtcgta aagcggttttttacagtttcttgaatgaaatctcacgtggtgtcaggctg tcccttcgctctactgcacatgaaaacttatgacgtcacagcgaacttcg aataattgtttttcaatttaagagccgattttcgtgaaattgttttatca ttttttgaagcaaaatgcaataaaaacacaattttattttaaacattata tttaaaaattatgaaaatcgaagcttaaattgaaaatcaattattcgaaa tgcgatgtgacgtcacaattattgcaaaaaaacatttttcccatctttcg tgtgcagtcaagcgatgggacaacttgacaccacgtgactatacgagttg gtgattagaatttcaaatacaaaaacaatttaggaaaatactctgaaatt aggaactttagcaaaaagaaacattttaaaactattgctcgaggagtaca cgagctgtggaaatcgacatattttccaatttattattacggcaacaaaa aattctgatacttaatgcatattgcacatcatatttgacgcgcaaaatat ctgtgtagcgaaaactacagtgactatttaaatgactactggagatcttg cgtcgattttcaaaagaatttctcagtgacagcgatattacattttcctt cgttttttttttgtattactgtctcatttaaattaaataatatattgctt tcaattcattaacagaaaatcgaacccgtaaatcgacacaagagctacag tagtcatttaaagagatactgtaattttcgttacgagatatttagcgcgt caaatatgttgtggaatacgcattctcaaaattttgtgtatatcccgtaa taattgcaaaaatacacttcaattttaagaaaatttgaaagttgttcata aattggcggcattttttttttgagaatcctacagacaaaaaaaacgtgaa attttaaagaagcggtgtcggtgaatagagacgaagagggattaggagat aaaaataattgatcgaagcagcagaaaaaccgattttcttcattttcttg cccatctctctctctgtctctttttatttcgaaaaaagacagtcagactg gatttgagtggaatagaggggaaagggagggacagttctacgaagcgaaa cgaaaacgattatccatctctgttggcagtctcatatggaaaaggttgaa aattgaaaatattcgggaacaagaagaacacaataatatttatttcgaca tacaaaaggattatatttcttttttcaacgaagaaaaagaagttgaagac aagaagagagagagttagtactctgacacgaaaagggtaaaacttacctg aaaaggaaattggaaaataggggggggggggggggggaataaaataatga ataaataaaaaacatttaatgctccatagatctatcgattttcgactgat gtttatgcttttgattgaatgattcgatgaactgtgaagatgattgtccg tttgactcggccttatatttttccggtaccgggacattgaggatttcagg taacatccatggtttaagccatccgactagaggtgattgtccaggttggt atacaaccgaattatgatagaagaggagtccatcgagacggaaactggaa caaatattaaaggtgtagtagggaaatttgactttttgagggaatttcag ggaatttgagtgcatttgcagacccaaaacggtccaaaactaccaagtta aattaaacgttgtgaaaatttctcaaaaaaatgttacagtatttcttcta tattctattcgaaaattaaattttttgaaattttgaaactttttgaaaca aagcaaaaagcgagaatttgaataaaccgatttcgcaatttttggttttt ttatttattattacctgatctcgtttttcataagctccgccatctgatct tgggagcacggacagctgggaatcggtgagaaaatattcctgaagccagg cgtagatttcgctagttcaggagcctcttcaagtttagactttagcataa attggcggaaatcgtatggactctcaacatattcgtgtgcattccatgaa agaagatcgagcacatagtatgtctgattcgagtagatgcagtcaagaat tgtccaagcctgatctataaagtgagcacattgaaatcgattttttattt atttctaaaaacacacatacttttagcacgagtatttccaccaggaaggc gagactgaaatcgcgatacttctcttccgcctttgttatatgcaactgtg aagccctgaaatacaagattatcccgggtttttaataattaaatcattaa aaacaaactcgtgatgcaacgacaagtgtccgtttgcccactggagccat aaccatcgtccagtcggaagaaagtgattccggaatatccacaagccact cactcaacatcattttgtcggcatatctttaaaaattgaaatatagtttt tttttagtttaagaattcgaacttttaggaaattacttactttttaaatt ttccagtgtgtttttcagcagtagttgtctcaattttttgttcatcgtct tcgtcagaagtaacgtcatcaaaagctaaatttcgtagtttcataaatgt gtcaaagcggccatttttttgccgctcaagcgtttcttcacgacgctttg cttgttgttctgcagcttttgtaagatttttgtactgcgaatacctcgga tgttcacctgaaaaaatcaatgtttactttttgtgtttaaaaaaggagtc tcaccaagagctagaggatcgacctgaaatccactgctaagctggtctgt caacgaatctaaatcgtcccccatttgtgatactggaaaaaataatttta aattagtttaatcaaacttttaactaaatgagtataagaaaaagctttta caacaattaaaaaactgtaaaagtcgaaaaagttgctctgttgatttacg ataaacatctctcgttgcgggatagtggagtgtcgttgggggaaaatatt ttttcgatttcgaaatcgatttcaacacattcacagcgagaaaaaacaat tgcatataatcgattttttatattatggcctgtgtagacatatcaccact atcatatgaatctgtaaaatccgttgtcaaatatttgagcctgaaaaaaa ggtatactatttaatattgatatttaaaaaaatactggttcattaaccgc acggttaattaaccagtatttattacgggaacacaaaattctgagaatac gtattgcgcaacatgtttatatctcgtagcgaaaaccacagtaattttgt aaatgactactgaacgcgtaaatcgccaccagcgctacagtagtcattta aagaattactgtgattttcgctacgagatattttgcgcgtcaaatatgtt gtgcaatacgcattctcagaattttgtgttcatgtaatttgaacacattg aaaaacaaaacattttcagggagcaaattaatttatcgattccggcatat cataaaacagccaaacttgcacctttcaaactagaaaacgttgggttatc agatttgtacattgaagtcaatgatagaacgtggactttgagtgaaactg acatcattggttgcactcttgttttgtatcgctcaaacggctacaacttc cagcacattatttcaataactcctgaagttgcgttccagaagcttatatc cgaatatttgagaaatggaacacgtattaaaaaatttcacttgtatgccc ttccaagtctgctcaaaaacgttgagcttcatgtgtcagagctgaaaatt gacggatgcgactatgaaacatttgcaaagttccgtaaattcattgatca taaccaaaaaatattaaaggagattgggtttgtggtgcgtcagaatactt tatggatgtttgatgaacaattggtacagaacacttcaagctttccgcct tcaacttttgataaatttcaggtgaaaaacagtaaggagatttatctcca gtttccgtattataatccggtcccgttggacgatatgcttcttcggctga aaaacaatcatgtacacattatctatctgggcttcccgattgaaaaacta caagaattggctatggtgagaaattttgtaatattatactttggccaagt ttggtgcggatctttttgattttccactcagaaatacggtacccggtcct gtcacgaacagaacaataaggtctgcgcttttaagtacgtggtgtcagag tgccccatattgatttgatctacgcagatctatgagaatcgcgggaattt agatgcagatttcttaactggttttgaatggttaagaacgtgctaacgtg aaattttttggaaaaaaattcccgcattttttgtcgatcaaaccgcaatg agacagccggacaccatgtgtttaagtaaagcacagtatttaacccacta ctagttaaaaaaaaaatataattcagaactggatcgacactaagaagcca attggtacctcactcttcctggttgttgagcattatacatatgttatcga cgtgtttgagcatttaaccaaatactctaaagctattccttcaagactca ctgatcttgggtaaggtcttagaactctataatatacaaataagcttgac tcattcagaacttccttcttttcacattgtgtaacaatgcatattgatga aagttctgaattggtgctatttggaggtccaatggtaattgaatggaaac cgttcaaatggacgttgagaatgacgatcatgaaacgcggatcgacaatt tcaaaatgatttggaatcctccaaaattctgattttcctcaataaacatt ttaatattttatgttgcaatgtgaatacaattttccttctctgtcaggat actccagaaggaatgtatggaatcttgatgttcagaacatcagcatccct tgatcaaacatcgaatctgttattacactaaagtatgtgttcagtgaaag tgctgtaaaagatgattaacgtcctcaagaagcaaatgctggccgatgac agtttttttgaagattcagctaatttcattagatattctagttgattcaa tatcttgaagaaattttttcgcgagatgatgatgttctttcaagttaaaa ttttgataagtttacgttgtcaaaaataacaacagaaaactacagctggg agtaacattgcatataaataacataagtattaatttataattacactata aaatagttttaaaattgcctcagaacatgtaaacagtctaaataattact ttttagattattgtagataacaagaaaacaagtacaaaatcagtaagtat catgtcattacacatttgctagaaccagtattttgaacacacttgcggac acaggtcaacacacttgttcttagcacccgcctacagcttagatgttccc tttaaaaggaagtacttacgtcgctacaagggcagatttctccctacttg gctatatctctttgttcaatgattaaacgtcttctctgctttctcttttt tggcttaactttttcagctccggtgccattacccctagaagaagatttta atgttttccatacaaaacaggaaaaccatggtacgttttctcatcttttt tgaaatttgtatcaattaacagagcaaagattccttcgtgttgatcacaa tagagacaaagtagtctcattcgacgaatttcttcacatggaacttgcat atgttgatgcaaagaaagaagagtttgatacgttagataagaaccgtaag tgatgtttaaagcgttgagaaattagcaaactactttagatgacgggaaa gtgagccttgccgagtacgaagagcatttccacgaagcctcaagcaagaa tgaaaaatctcggaccgcatactttgccaaagttttcgaagatttcgatg aagatttcaatatggcattgagtcgtgaagaacttgaacgtgtgcttgct gaacgattcttagtcaaaccaagagaaaactttcccaaattgtttttcaa gtttgacgtagacaagtccggaggacttgatctgactggtgagctaagtg ctcgttttatacatctcgaagtttttttttacagaatacatgaaatttga tgctgaattcccattcgatcaaaccgatccagttggtggaggaccatcca aatctaacaatcaccacgatcaaatgcacacagaagtccctcaagatgct gacgctgctgcaattgcagccgtgttggcacaagcttcaccaacactgaa taagagtccagtcggtgttgctccaccccaaacgtcacctggattgcatc cagtggcaccagggttattccagacggcggtgccaatcaaaaaagtttga ttctaacagtaacttttacaattgtaatttttaattttctctaatactgg tatcacaaatcactgtcattcaccatatcaattgtttttgtaaataaagc tttatagttaaaatattatatacaaaaatgcattgcacaaaaggctgaaa tcagtgaactttagtatcagaggttctcgacataatctcattatgtacat caatcttactcaatttttgaccttctttgctgtgacggataccataacaa atgtagattacaatgccaatcgccaaccatacaaaaagacgaatccaagt cattgagtttagatacaccatcatgaagacattgatgagtagaccaaggc atggaatgaatggaacaagtggcaccttaaacttttgaacttattatttt ttgtaaaatgtttctgttcaccttatacgtagatgttgacttattctgtt catggcccaaaataaaaacgaatgataacagtgaaaaggcagcaccgact gtcaataaaataataccaccagcattgctaaatattcctgttctgaatgg aattgccaggcaaatataaccaaaaattaatcctgcgacagctacgcgaa ttgaaatcccctctgagaaattttcccaaacaccttgaaatggtacccac gacttcaggcatccgccttaaaatttttattttgttaattttgctaaaca cgtgtgtgttgctttataaaatcttaccattatcataatcagttgcacta ccatctacaagatgggattgatgccgcaaaataataacacagattgagac cattgaataagccaacagagttccaattgacaaaaagtcgacaagcgcct gcagatcgaagaccagtgccaaaattgcgtttattatagtaaacacaata gttgcattgagaggggtttttgtttttgagtttatcaccccaaaccaccc gaagataagaccatcatcagccattgcatatacagctcgtggaagagcga atgatccagtaactagattgttcaacatacccgctagtgcaccaacactc attataatcttggcaacagttgctcctttcatttcgaatgcagcagcaaa cgcggcatcaggatcaactaagtcgtatggtatcatcagcgtcaatgagg cgcccataagtacatatatgacacttataattgccaaagaagtgaaagta gccagtgggattgttctatgtgggtttttagcttcttctccagcagtggc taaagcttcaaatccgataaatgcgaaaaagcatgtggaagcgccagaaa tagctccttgtattccatacgggaaaaattttgaacgaccatcctggtat gttcctgaccagagtgaaaagtctgcataggttagtccacaaataatcac aaacgccaaaacctgaatgaacattttgtaataggttttgacactttaac ataccgccagatttaaaaatacaaaactagtattcacgttggcagagaac tttgatcccattgcaaccgccactgctacaagaaacaataggaagaacgc taaaaagtcagggtacagtgcaaaaaatccttttccctgtaagttgaaac gcgtattagcatatctgattcaaaaatggcacgaacatcactcaatcgtc caacagtatccaatgtccagttggacacacttttagatacaagattgtcg aaataagcagaccacgaacgagcgacagcggcgtttccaatcatgtactc tagtggtactgtccatccgacaatgaatgcccaaatttcacccattccta cataactgtatgtgtatgcacttccagctcggggaaacctggaagtttat cagtttaaacatgtaaagttaaaaaataataaaataaaccttgctccaaa ttcagcatagctgaaagctgacaaaagtgcagcaaagccagagaaaataa aagatagtataattgctggtcctgcttgatttcgaacaacggatcccgtg agtacgtatattccagctccaatcatgtgtccaatcgcaatgaacattac gtcaagaattgttaaacatcgtttcatttgtgattcaagatgtgatcctc cgtcaaatgtctttttacgaaacaagacatcggctatttgatgtaccttc atggttgttttttctgaaattatatatttctgaaggtgaaaaaactattc atacaagaaagattaaaaatatgaaaaattcggtatacttttatatggta atattgttttagttttaagataaggcttctggaattgtgaatcaaaaatg author-mojibake.t100644000766000024 35313605523026 20406 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/t#!perl BEGIN { unless ($ENV{AUTHOR_TESTING}) { print qq{1..0 # SKIP these tests are for testing by the author\n}; exit } } use strict; use warnings qw(all); use Test::More; use Test::Mojibake; all_files_encoding_ok(); author-pod-syntax.t100644000766000024 45413605523026 20735 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/t#!perl BEGIN { unless ($ENV{AUTHOR_TESTING}) { print qq{1..0 # SKIP these tests are for testing by the author\n}; exit } } # This file was automatically generated by Dist::Zilla::Plugin::PodSyntaxTests. use strict; use warnings; use Test::More; use Test::Pod 1.41; all_pod_files_ok(); bin000755000766000024 013605523026 15324 5ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4bp_seqfeature_gff3100644000766000024 453413605523026 21147 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/bin#!/usr/bin/perl # AUTHOR: malcolm.cook@stowers-institute.org use strict; use warnings; use Carp; use Getopt::Long; use File::Spec; use Bio::DB::SeqFeature::Store; #use Carp::Always; my $DSN; my $ADAPTOR; my $VERBOSE = 1; my $USER = ''; my $PASS = ''; my @gff3opt; GetOptions( 'dsn=s' => \$DSN, 'adaptor=s' => \$ADAPTOR, 'user=s' => \$USER, 'password=s' => \$PASS, 'gff3opt=i{,}' => \@gff3opt, ) || die <features(). END $ADAPTOR ||= 'DBI::mysql'; $DSN ||= $ADAPTOR eq 'DBI::mysql' ? "mysql_read_default_file=$ENV{HOME}/.my.cnf" : ''; my $store = Bio::DB::SeqFeature::Store->new( -dsn => $DSN, -adaptor => $ADAPTOR, -user => $USER, -pass => $PASS, ) or die "Couldn't create connection to the database"; # on signals, give objects a chance to call their DESTROY methods $SIG{TERM} = $SIG{INT} = sub { undef $store; die "Aborted..."; }; my $seq_stream = $store->get_seq_stream(@ARGV) or die "failed to get_seq_stream(@ARGV)"; while (my $seq = $seq_stream->next_seq) { ### 20100725 // genehack # Try to call a gff3_string() method, but fall back to gff_string() if $seq # doesn't support that. Note that gff_string() is required per # Bio::SeqFeatureI, while gff3_string() is not. Currently, only # Bio::SeqFeature::Lite implements gff3_string(). if ( $seq->can( 'gff3_string' )) { print $seq->gff3_string(@gff3opt) . "\n"; } elsif ( $seq->can( 'gff_string' )) { # since we intend on getting a GFF3 string, make sure to pass the version $seq->gff_format->gff_version(3); print $seq->gff_string() . "\n"; } else { confess "sequence object $seq does not support gff3_string() or gff_string() methods!" } } exit 0; bp_seqfeature_load100644000766000024 1614013605523026 21255 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/bin#!/usr/bin/perl use strict; use warnings; ## Used to output the 'usage' message use Pod::Usage; ## Used to parse command line options use Getopt::Long; ## Used to create temporary files, if necessary use File::Spec; ## BioPerl! use Bio::DB::SeqFeature::Store; use Bio::DB::SeqFeature::Store::GFF3Loader; ## The available options. Note, these defaults are 'hard coded' into ## the USAGE POD, so if you change one of the defaults (you shouldn't), ## you should update the USAGE. my $DSN = 'dbi:mysql:test'; my $SFCLASS = 'Bio::DB::SeqFeature'; my $ADAPTOR = 'DBI::mysql'; my $NAMESPACE; my $VERBOSE = 1; my $FAST = 0; my $TMP = File::Spec->tmpdir(); my $IGNORE_SEQREGION = 0; my $CREATE = 0; my $USER = ''; my $PASS = ''; my $COMPRESS = 0; my $INDEX_SUB = 1; my $NOALIAS_TARGET = 0; my $SUMMARY_STATS = 0; my $NOSUMMARY_STATS = 0; my $FTS = 0; ## Two flags based on http://stackoverflow.com/questions/1232116 ## how-to-create-pod-and-use-pod2usage-in-perl my $opt_help; my $opt_man; GetOptions( 'd|dsn=s' => \$DSN, 's|seqfeature=s' => \$SFCLASS, 'n|namespace=s' => \$NAMESPACE, 'a|adaptor=s' => \$ADAPTOR, 'v|verbose!' => \$VERBOSE, 'f|fast' => \$FAST, 'T|temporary-directory=s' => \$TMP, 'i|ignore-seqregion' => \$IGNORE_SEQREGION, 'c|create' => \$CREATE, 'u|user=s' => \$USER, 'p|password=s' => \$PASS, 'z|zip' => \$COMPRESS, 'S|subfeatures!' => \$INDEX_SUB, ## Any good single letter choices here? 'noalias-target' => \$NOALIAS_TARGET, 'summary' => \$SUMMARY_STATS, 'N|nosummary' => \$NOSUMMARY_STATS, 'fts' => \$FTS, ## I miss '--help' when it isn't there! 'h|help!' => \$opt_help, 'm|man!' => \$opt_man, ) or pod2usage( -message => "\nTry 'bp_seqfeature_load.pl --help' for more information\n", -verbose => 0, -exitval => 2, ); ## Should we output usage information? pod2usage( -verbose => 1 ) if $opt_help; pod2usage( -verbose => 2 ) if $opt_man; ## Did we get any files to process? @ARGV or pod2usage( -message => "\nYou need to pass some GFF or fasta files to load\n", -verbose => 0, -exitval => 2, ); pod2usage( -message => "\n--fts requires --create\n", -verbose => 0, -exitval => 2, ) if ($FTS and not $CREATE); ## POD =head1 NAME bp_seqfeature_load.pl - Load GFF into a SeqFeature database =head1 DESCRIPTION Pass any number of GFF or fasta format files (or GFF with embedded fasta) to load the features and sequences into a SeqFeature database. The database (and adaptor) to use is specified on the command line. Use the --create flag to create a new SeqFeature database. =head1 SYNOPSIS bp_seqfeature_load.pl [options] gff_or_fasta_file1 [gff_or_fasta_file2 [...]] Try 'bp_seqfeature_load.pl --help' or '--man' for more information. =head1 OPTIONS =over 4 =item -d, --dsn DBI data source (default dbi:mysql:test) =item -n, --namespace The table prefix to use (default undef) Allows several independent sequence feature databases to be stored in a single database =item -s, --seqfeature The type of SeqFeature to create... RTSC (default Bio::DB::SeqFeature) =item -a, --adaptor The storage adaptor (class) to use (default DBI::mysql) =item -v, --verbose Turn on verbose progress reporting (default true) Use --noverbose to switch this off. =item -f, --fast Activate fast loading. (default 0) Only available for some adaptors. =item -T, --temporary-directory Specify temporary directory for fast loading (default File::Spec->tmpdir()) =item -i, --ignore-seqregion If true, then ignore ##sequence-region directives in the GFF3 file (default, create a feature for each region) =item -c, --create Create the database and reinitialize it (default false) Note, this will erase previous database contents, if any. =item -u, --user User to connect to database as =item -p, --password Password to use to connect to database =item -z, --zip Compress database tables to save space (default false) =item -S, --subfeatures Turn on indexing of subfeatures (default true) Use --nosubfeatures to switch this off. =item --fts Index the attribute table for full-text search (default false). Applicable only when --create is specified. Currently applicable to the DBI::SQLite storage adaptor only (using the most recent supported FTS indexing method, which may not be portable to older DBI::SQLite versions). =item --summary Generate summary statistics for coverage graphs (default false) This can be run on a previously loaded database or during the load. It will default to true if --create is used. =item -N, --nosummary Do not generate summary statistics to save some space and load time (default if --create is not specified, use this option to explicitly turn off summary statistics when --create is specified) =item --noalias-target Don't create an Alias attribute whose value is the target_id in a Target attribute (if the feature contains a Target attribute, the default is to create an Alias attribute whose value is the target_id in the Target attribute) =back Please see http://www.sequenceontology.org/gff3.shtml for information about the GFF3 format. BioPerl extends the format slightly by adding a ##index-subfeatures directive. Set this to a true value if you wish the database to be able to retrieve a feature's individual parts (such as the exons of a transcript) independently of the top level feature: ##index-subfeatures 1 It is also possible to control the indexing of subfeatures on a case-by-case basis by adding "index=1" or "index=0" to the feature's attribute list. This should only be used for subfeatures. Subfeature indexing is true by default. Set to false (0) to save lots of database space and speed performance. You may use --nosubfeatures to force this. =cut if ($FAST) { -d $TMP && -w $TMP or die "Fast loading is requested, but I cannot write into the directory $TMP"; $DSN .= ";mysql_local_infile=1" if $ADAPTOR =~ /mysql/i && $DSN !~ /mysql_local_infile/; } my @options; @options = ($USER,$PASS) if $USER || $PASS; my $store = Bio::DB::SeqFeature::Store->new ( -dsn => $DSN, -namespace => $NAMESPACE, -adaptor => $ADAPTOR, -tmpdir => $TMP, -user => $USER, -pass => $PASS, -write => 1, -create => $CREATE, -compress => $COMPRESS, -fts => $FTS, ) or die "Couldn't create connection to the database"; $store->init_database('erase') if $CREATE; $SUMMARY_STATS++ if $CREATE; # this is a good thing my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new ( -store => $store, -sf_class => $SFCLASS, -verbose => $VERBOSE, -tmpdir => $TMP, -fast => $FAST, -ignore_seqregion => $IGNORE_SEQREGION, -index_subfeatures => $INDEX_SUB, -noalias_target => $NOALIAS_TARGET, -summary_stats => $NOSUMMARY_STATS ? 0 : $SUMMARY_STATS, ) or die "Couldn't create GFF3 loader"; # on signals, give objects a chance to call their DESTROY methods $SIG{TERM} = $SIG{INT} = sub { undef $loader; undef $store; die "Aborted..."; }; $loader->load(@ARGV); exit 0; bp_seqfeature_delete100644000766000024 1107113605523026 21576 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/bin#!/usr/bin/perl use strict; use warnings; use Getopt::Long; use File::Spec; use Bio::DB::SeqFeature::Store; my $DSN = 'dbi:mysql:test'; my $USER = ''; my $PASS = ''; my $ADAPTOR = 'DBI::mysql'; my $NAME = 0; my $TYPE = 0; my $ID = 0; my $VERBOSE = 1; my $TEST = 0; my $FAST = 0; GetOptions( 'dsn|d=s' => \$DSN, 'adaptor=s' => \$ADAPTOR, 'verbose!' => \$VERBOSE, 'dryrun|dry-run' => \$TEST, 'name|n' => \$NAME, 'type|t' => \$TYPE, 'id' => \$ID, 'fast|f' => \$FAST, 'user=s' => \$USER, 'password=s' => \$PASS, ) || die < Options: -d --dsn The database name ($DSN) -a --adaptor The storage adaptor to use ($ADAPTOR) -n --name Delete features based on name or wildcard pattern (default) -t --type Delete features based on type -i --id Delete features based on primary id -v --verbose Turn on verbose progress reporting (default) --noverbose Turn off verbose progress reporting --dryrun Dry run; report features to be deleted without actually deleting them -u --user User to connect to database as -p --password Password to use to connect to database -f --fast Deletes each item instantly not atomic for full dataset (mainly for deleting massive datasets linked to a type) Examples: Delete from mysql database volvox features named f08 f09 f10 $0 -d volvox -n f08 f09 f10 Delete features whose names start with f $0 -d volvox -n 'f*' Delete all features of type remark, source example $0 -d volvox -t remark:example Delete all remark features, regardless of source $0 -d volvox -t 'remark:*' Delete the feature with ID 1234 $0 -d volvox -i 1234 Delete all features named f* from a berkeleydb database $0 -a berkeleydb -d /usr/local/share/db/volvox -n 'f*' Remember to protect wildcards against shell interpretation by putting single quotes around them! END ; if ($NAME+$TYPE+$ID > 1) { die "Please provide only one of the --name, --type or --id options.\nRun \"$0 --help\" for usage.\n"; } unless (@ARGV) { die "Please provide a list of feature names, types or ids.\n Run \"$0 --help\" for usage.\n"; } my $mode = $ID ? 'id' :$TYPE ? 'type' :$NAME ? 'name' :'name'; my @options; @options = ($USER,$PASS) if $USER || $PASS; my $store = Bio::DB::SeqFeature::Store->new( -dsn => $DSN, -adaptor => $ADAPTOR, -user => $USER, -pass => $PASS, -write => 1, ) or die "Couldn't create connection to the database"; my @features = retrieve_features($store,$mode,\@ARGV); if ($VERBOSE || $TEST) { print scalar (@features)," feature(s) match.\n\n"; my $heading; foreach (@features) { printf "%-20s %-20s %-12s\n%-20s %-20s %-12s\n", 'Name','Type','Primary ID', '----','----','----------' unless $heading++; printf "%-20s %-20s %-12d\n",$_->display_name,$_->type,$_->primary_id; } print "\n"; } if (@features && !$TEST) { if($FAST) { my $del = 0; foreach my $feat(@features) { my @tmp_feat = ($feat); my $deleted = $store->delete(@tmp_feat); $del++ if($deleted); if ($VERBOSE && $deleted) { print 'Feature ',$del," successfully deleted.\n"; } elsif (!$deleted) { die "An error occurred. Some or all of the indicated features could not be deleted."; } } } else { my $deleted = $store->delete(@features); if ($VERBOSE && $deleted) { print scalar(@features)," features successfully deleted.\n"; } elsif (!$deleted) { die "An error occurred. Some or all of the indicated features could not be deleted."; } } } exit 0; sub retrieve_features { my($db,$mode,$list) = @_; my @features; if ($mode eq 'name') { @features = map {$db->get_features_by_alias($_)} @$list; } elsif ($mode eq 'type') { my $regexp = glob2regexp(@$list); my @types = grep {/$regexp/} $db->types; @features = $db->get_features_by_type(@types) if @types; } elsif ($mode eq 'id') { @features = grep {defined $_} map {$db->get_feature_by_primary_id($_)} @$list; } return @features; } sub glob2regexp { my @globs = map { $_ = quotemeta($_); s/\\\*/.*/g; s/\?/./g; $_ } @_; return '^(?:'.join('|',@globs).')$'; } DB000755000766000024 013605523026 16320 5ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/BioSeqFeature.pm100644000766000024 3052313605523026 21105 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DBpackage Bio::DB::SeqFeature; $Bio::DB::SeqFeature::VERSION = '1.7.4'; =head1 NAME Bio::DB::SeqFeature -- Normalized feature for use with Bio::DB::SeqFeature::Store =head1 SYNOPSIS use Bio::DB::SeqFeature::Store; # Open the sequence database my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'DBI::mysql', -dsn => 'dbi:mysql:test'); my ($feature) = $db->get_features_by_name('ZK909'); my @subfeatures = $feature->get_SeqFeatures(); my @exons_only = $feature->get_SeqFeatures('exon'); # create a new object my $new = $db->new_feature(-primary_tag=>'gene', -seq_id => 'chr3', -start => 10000, -end => 11000); # add a new exon $feature->add_SeqFeature($db->new_feature(-primary_tag=>'exon', -seq_id => 'chr3', -start => 5000, -end => 5551)); =head1 DESCRIPTION The Bio::DB::SeqFeature object is the default SeqFeature class stored in Bio::DB::SeqFeature databases. It implements both the Bio::DB::SeqFeature::NormalizedFeatureI and Bio::DB::SeqFeature::NormalizedTableFeatureI interfaces, which means that its subfeatures, if any, are stored in the database in a normalized fashion, and that the parent/child hierarchy of features and subfeatures are also stored in the database as set of tuples. This provides efficiencies in both storage and retrieval speed. Typically you will not create Bio::DB::SeqFeature directly, but will ask the database to do so on your behalf, as described in L. =cut # just like Bio::DB::SeqFeature::NormalizedFeature except that the parent/child # relationships are stored in a table in the Bio::DB::SeqFeature::Store use strict; use Carp 'croak'; use Bio::DB::SeqFeature::Store; use base qw(Bio::DB::SeqFeature::NormalizedFeature Bio::DB::SeqFeature::NormalizedTableFeatureI); =head2 new Title : new Usage : $feature = Bio::DB::SeqFeature::NormalizedFeature->new(@args) Function: create a new feature Returns : the new seqfeature Args : see below Status : public This method creates and, if possible stores into a database, a new Bio::DB::SeqFeature::NormalizedFeature object using the specialized Bio::DB::SeqFeature class. The arguments are the same to Bio::SeqFeature::Generic-Enew() and Bio::Graphics::Feature-Enew(). The most important difference is the B<-store> option, which if present creates the object in a Bio::DB::SeqFeature::Store database, and the B<-index> option, which controls whether the feature will be indexed for retrieval (default is true). Ordinarily, you would only want to turn indexing on when creating top level features, and off only when storing subfeatures. The default is on. Arguments are as follows: -seq_id the reference sequence -start the start position of the feature -end the stop position of the feature -display_name the feature name (returned by seqname) -primary_tag the feature type (returned by primary_tag) -source the source tag -score the feature score (for GFF compatibility) -desc a description of the feature -segments a list of subfeatures (see Bio::Graphics::Feature) -subtype the type to use when creating subfeatures -strand the strand of the feature (one of -1, 0 or +1) -phase the phase of the feature (0..2) -url a URL to link to when rendered with Bio::Graphics -attributes a hashref of tag value attributes, in which the key is the tag and the value is an array reference of values -store a previously-opened Bio::DB::SeqFeature::Store object -index index this feature if true Aliases: -id an alias for -display_name -seqname an alias for -display_name -display_id an alias for -display_name -name an alias for -display_name -stop an alias for end -type an alias for primary_tag =cut sub add_segment { my $self = shift; $self->_add_segment(0,@_); } =head2 Bio::SeqFeatureI methods The following Bio::SeqFeatureI methods are supported: seq_id(), start(), end(), strand(), get_SeqFeatures(), display_name(), primary_tag(), source_tag(), seq(), location(), primary_id(), overlaps(), contains(), equals(), intersection(), union(), has_tag(), remove_tag(), add_tag_value(), get_tag_values(), get_all_tags() Some methods that do not make sense in the context of a genome annotation database system, such as attach_seq(), are not supported. Please see L for more details. =cut =head2 add_SeqFeature Title : add_SeqFeature Usage : $flag = $feature->add_SeqFeature(@features) Function: Add subfeatures to the feature Returns : true if successful Args : list of Bio::SeqFeatureI objects Status : public Add one or more subfeatures to the feature. For best results, subfeatures should be of the same class as the parent feature (i.e. do not try mixing Bio::DB::SeqFeature::NormalizedFeature with other feature types). An alias for this method is add_segment(). =cut =head2 update Title : update Usage : $flag = $feature->update() Function: Update feature in the database Returns : true if successful Args : none Status : public After changing any fields in the feature, call update() to write it to the database. This is not needed for add_SeqFeature() as update() is invoked automatically. =cut =head2 get_SeqFeatures Title : get_SeqFeature Usage : @subfeatures = $feature->get_SeqFeatures([@types]) Function: return subfeatures of this feature Returns : list of subfeatures Args : list of subfeature primary_tags (optional) Status : public This method extends the Bio::SeqFeatureI get_SeqFeatures() slightly by allowing you to pass a list of primary_tags, in which case only subfeatures whose primary_tag is contained on the list will be returned. Without any types passed all subfeatures are returned. =cut =head2 object_store Title : object_store Usage : $store = $feature->object_store([$new_store]) Function: get or set the database handle Returns : current database handle Args : new database handle (optional) Status : public This method will get or set the Bio::DB::SeqFeature::Store object that is associated with the feature. After changing the store, you should probably unset the primary_id() of the feature and call update() to ensure that the object is written into the database as a new feature. =cut =head2 overloaded_names Title : overloaded_names Usage : $overload = $feature->overloaded_names([$new_overload]) Function: get or set overloading of object strings Returns : current flag Args : new flag (optional) Status : public For convenience, when objects of this class are stringified, they are represented in the form "primary_tag(display_name)". To turn this feature off, call overloaded_names() with a false value. You can invoke this on an individual feature object or on the class: Bio::DB::SeqFeature::NormalizedFeature->overloaded_names(0); =cut =head2 segment Title : segment Usage : $segment = $feature->segment Function: return a Segment object corresponding to feature Returns : a Bio::DB::SeqFeature::Segment Args : none Status : public This turns the feature into a Bio::DB::SeqFeature::Segment object, which you can then use to query for overlapping features. See L. =cut =head2 AUTOLOADED methods @subfeatures = $feature->Exon; If you use an unknown method that begins with a capital letter, then the feature autogenerates a call to get_SeqFeatures() using the lower-cased method name as the primary_tag. In other words $feature-EExon is equivalent to: @subfeature s= $feature->get_SeqFeatures('exon') =cut =head2 load_id Title : load_id Usage : $id = $feature->load_id Function: get the GFF3 load ID Returns : the GFF3 load ID (string) Args : none Status : public For features that were originally loaded by the GFF3 loader, this method returns the GFF3 load ID. This method may not be supported in future versions of the module. =cut =head2 primary_id Title : primary_id Usage : $id = $feature->primary_id([$new_id]) Function: get/set the database ID of the feature Returns : the current primary ID Args : none Status : public This method gets or sets the primary ID of the feature in the underlying Bio::DB::SeqFeature::Store database. If you change this field and then call update(), it will have the effect of making a copy of the feature in the database under a new ID. =cut =head2 target Title : target Usage : $segment = $feature->target Function: return the segment correspondent to the "Target" attribute Returns : a Bio::DB::SeqFeature::Segment object Args : none Status : public For features that are aligned with others via the GFF3 Target attribute, this returns a segment corresponding to the aligned region. The CIGAR gap string is not yet supported. =cut =head2 Internal methods =over 4 =item $feature-Eas_string() Internal method used to implement overloaded stringification. =item $boolean = $feature-Etype_match(@list_of_types) Internal method that will return true if the primary_tag of the feature and source_tag match any of the list of types (in primary_tag:source_tag format) provided. =back =cut # This adds subfeatures. It has the property of converting the # provided features into an object like itself and storing them # into the database. If the feature already has a primary id and # an object_store() method, then it is not stored into the database, # but its primary id is reused. sub _add_segment { my $self = shift; my $normalized = shift; my $store = $self->object_store; my $store_parentage = eval{$store->can_store_parentage}; return $self->SUPER::_add_segment($normalized,@_) unless $normalized && $store_parentage; my @segments = $self->_create_subfeatures($normalized,@_); my $pos = "@{$self}{'start','stop','ref','strand'}"; # fix boundaries $self->_fix_boundaries(\@segments,1); # freakish fixing of our non-standard Target attribute $self->_fix_target(\@segments); # write our children out if ($normalized) { $store->add_SeqFeature($self,@segments); } else { push @{$self->{segments}},@segments; } # write us back to disk $self->update if $self->primary_id && $pos ne "@{$self}{'start','stop','ref','strand'}"; } # segments can be stored directly in the object (legacy behavior) # or stored in the database # an optional list of types can be used to specify which types to return sub get_SeqFeatures { my $self = shift; my @types = @_; my @inline_segs = exists $self->{segments} ? @{$self->{segments}} : (); @inline_segs = grep {$_->type_match(@types)} @inline_segs if @types; my $store = $self->object_store; my @db_segs; if ($store && $store->can_store_parentage) { if (!@types || $store->subfeature_types_are_indexed) { @db_segs = $store->fetch_SeqFeatures($self,@types); } else { @db_segs = grep {$_->type_match(@types)} $store->fetch_SeqFeatures($self); } } my @segs = (@inline_segs,@db_segs); foreach (@segs) { eval {$_->object_store($store)}; } return @segs; } sub denormalized_segments { my $self = shift; return exists $self->{segments} ? @{$self->{segments}} : (); } sub denormalized_segment_count { my $self = shift; return 0 unless exists $self->{segments}; return scalar @{$self->{segments}}; } # for Bio::LocationI compatibility sub is_remote { return } # for Bio::LocationI compatibility sub location_type { return 'EXACT' } # for Bio::DB::GFF compatibility sub feature_id {shift->primary_id} 1; __END__ =head1 BUGS This is an early version, so there are certainly some bugs. Please use the BioPerl bug tracking system to report bugs. =head1 SEE ALSO L, L, L, L, L, L, L =head1 AUTHOR Lincoln Stein Elstein@cshl.orgE. Copyright (c) 2006 Cold Spring Harbor Laboratory. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut SeqFeature000755000766000024 013605523026 20364 5ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DBStore.pm100644000766000024 23735613605523026 22236 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeaturepackage Bio::DB::SeqFeature::Store; $Bio::DB::SeqFeature::Store::VERSION = '1.7.4'; =head1 NAME Bio::DB::SeqFeature::Store -- Storage and retrieval of sequence annotation data =head1 SYNOPSIS use Bio::DB::SeqFeature::Store; # Open the feature database my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'DBI::mysql', -dsn => 'dbi:mysql:test', -create => 1 ); # Get a feature from somewhere my $feature = Bio::SeqFeature::Generic->new(...); # Store it $db->store($feature) or die "Couldn't store!"; # If absent, a primary ID is added to the feature when it is stored in the # database. Retrieve the primary ID my $id = $feature->primary_id; # Get the feature back out my $feature = $db->fetch($id); # .... which is identical to my $feature = $db->get_feature_by_primary_id($id); # Change the feature and update it $f->start(100); $db->store($f) or die "Couldn't update!"; # Get all features at once my @features = $db->features( ); # Retrieve multiple features by primary id my @features = $db->fetch_many(@list_of_ids); # ...by name @features = $db->get_features_by_name('ZK909'); # ...by alias @features = $db->get_features_by_alias('sma-3'); # ...by type @features = $db->get_features_by_type('gene'); # ...by location @features = $db->get_features_by_location(-seq_id=>'Chr1',-start=>4000,-end=>600000); # ...by attribute @features = $db->get_features_by_attribute({description => 'protein kinase'}) # ...by the GFF "Note" field @result_list = $db->search_notes('kinase'); # ...by arbitrary combinations of selectors @features = $db->features(-name => $name, -type => $types, -seq_id => $seqid, -start => $start, -end => $end, -attributes => $attributes); # Loop through the features using an iterator my $iterator = $db->get_seq_stream(-name => $name, -type => $types, -seq_id => $seqid, -start => $start, -end => $end, -attributes => $attributes); while (my $feature = $iterator->next_seq) { # do something with the feature } # ...limiting the search to a particular region my $segment = $db->segment('Chr1',5000=>6000); my @features = $segment->features(-type=>['mRNA','match']); # Getting coverage statistics across a region my $summary = $db->feature_summary('Chr1',10_000=>1_110_000); my ($bins) = $summary->get_tag_values('coverage'); my $first_bin = $bins->[0]; # Getting & storing sequence information # Warning: this returns a string, and not a PrimarySeq object $db->insert_sequence('Chr1','GATCCCCCGGGATTCCAAAA...'); my $sequence = $db->fetch_sequence('Chr1',5000=>6000); # What feature types are defined in the database? my @types = $db->types; # Create a new feature in the database my $feature = $db->new_feature(-primary_tag => 'mRNA', -seq_id => 'chr3', -start => 10000, -end => 11000); # Load an entire GFF3 file, using the GFF3 loader... my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store => $db, -verbose => 1, -fast => 1); $loader->load('./my_genome.gff3'); =head1 DESCRIPTION Bio::DB::SeqFeature::Store implements the Bio::SeqFeature::CollectionI interface to allow you to persistently store Bio::SeqFeatureI objects in a database and to later to retrieve them by a variety of searches. This module is similar to the older Bio::DB::GFF module, with the following differences: =over 4 =item 1. No limitation on Bio::SeqFeatureI implementations Unlike Bio::DB::GFF, Bio::DB::SeqFeature::Store works with any Bio::SeqFeatureI object. =item 2. No limitation on nesting of features & subfeatures Bio::DB::GFF is limited to features that have at most one level of subfeature. Bio::DB::SeqFeature::Store can work with features that have unlimited levels of nesting. =item 3. No aggregators The aggregator architecture, which was necessary to impose order on the GFF2 files that Bio::DB::GFF works with, does not apply to Bio::DB::SeqFeature::Store. It is intended to store features that obey well-defined ontologies, such as the Sequence Ontology (http://song.sourceforge.net). =item 4. No relative locations All locations defined by this module are relative to an absolute sequence ID, unlike Bio::DB::GFF which allows you to define the location of one feature relative to another. =back We'll discuss major concepts in Bio::DB::SeqFeature::Store and then describe how to use the module. =head2 Adaptors Bio::DB::SeqFeature::Store is designed to work with a variety of storage back ends called "adaptors." Adaptors are subclasses of Bio::DB::SeqFeature::Store and provide the interface between the store() and fetch() methods and the physical database. Currently the number of adaptors is quite limited, but the number will grow soon. =over 4 =item memory An implementation that stores all data in memory. This is useful for small data sets of no more than 10,000 features (more or less, depending on system memory). =item DBI::mysql A full-featured implementation on top of the MySQL relational database system. =item berkeleydb A full-feature implementation that runs on top of the BerkeleyDB database. See L. =back If you do not explicitly specify the adaptor, then DBI::mysql will be used by default. =head2 Serializers When Bio::DB::SeqFeature::Store stores a Bio::SeqFeatureI object into the database, it serializes it into binary or text form. When it later fetches the feature from the database, it unserializes it. Two serializers are available: Recent versions of =over 4 =item Storable This is a fast binary serializer. It is available in Perl versions 5.8.7 and higher and is used when available. =item Data::Dumper This is a slow text serializer that is available in Perl 5.8.0 and higher. It is used when Storable is unavailable. =back If you do not specify the serializer, then Storable will be used if available; otherwise Data::Dumper. =head2 Loaders and Normalized Features The Bio::DB::SeqFeature::Store::GFF3Loader parses a GFF3-format file and loads the annotations and sequence data into the database of your choice. The script bp_seqfeature_load.pl (found in the scripts/Bio-SeqFeature-Store/ subdirectory) is a thin front end to the GFF3Loader. Other loaders may be written later. Although Bio::DB::SeqFeature::Store should work with any Bio::SeqFeatureI object, there are some disadvantages to using Bio::SeqFeature::Generic and other vanilla implementations. The major issue is that if two vanilla features share the same subfeature (e.g. two transcripts sharing an exon), the shared subfeature will be cloned when stored into the database. The special-purpose L class is able to normalize its subfeatures in the database, so that shared subfeatures are stored only once. This minimizes wasted storage space. In addition, when in-memory caching is turned on, each shared subfeature will usually occupy only a single memory location upon restoration. =cut use strict; use warnings; use base 'Bio::SeqFeature::CollectionI'; use Carp 'croak'; use Bio::DB::GFF::Util::Rearrange; use Bio::DB::SeqFeature::Segment; use Scalar::Util 'blessed'; # this probably shouldn't be here use Bio::DB::SeqFeature; *dna = *get_dna = *get_sequence = \&fetch_sequence; *get_SeqFeatures = \&fetch_SeqFeatures; # local version sub api_version { 1.2 } =head1 Methods for Connecting and Initializating a Database ## TODO: http://iowg.brcdevel.org/gff3.html#a_fasta is a dead link =head2 new Title : new Usage : $db = Bio::DB::SeqFeature::Store->new(@options) Function: connect to a database Returns : A descendent of Bio::DB::Seqfeature::Store Args : several - see below Status : public This class method creates a new database connection. The following -name=E$value arguments are accepted: Name Value ---- ----- -adaptor The name of the Adaptor class (default DBI::mysql) -serializer The name of the serializer class (default Storable) -index_subfeatures Whether or not to make subfeatures searchable (default false) -cache Activate LRU caching feature -- size of cache -compress Compresses features before storing them in database using Compress::Zlib -create (Re)initialize the database. The B<-index_subfeatures> argument, if true, tells the module to create indexes for a feature and all its subfeatures (and its subfeatures' subfeatures). Indexing subfeatures means that you will be able to search for the gene, its mRNA subfeatures and the exons inside each mRNA. It also means when you search the database for all features contained within a particular location, you will get the gene, the mRNAs and all the exons as individual objects as well as subfeatures of each other. NOTE: this option is only honored when working with a normalized feature class such as Bio::DB::SeqFeature. The B<-cache> argument, if true, tells the module to try to create a LRU (least-recently-used) object cache using the Tie::Cacher module. Caching will cause two objects that share the same primary_id to (often, but not always) share the same memory location, and may improve performance modestly. The argument is taken as the desired size for the cache. If you pass "1" as the cache value, a reasonable default cache size will be chosen. Caching requires the Tie::Cacher module to be installed. If the module is not installed, then caching will silently be disabled. The B<-compress> argument, if true, will cause the feature data to be compressed before storing it. This will make the database somewhat smaller at the cost of decreasing performance. The B<-create> argument, if true, will either initialize or reinitialize the database. It is needed the first time a database is used. The new() method of individual adaptors recognize additional arguments. The default DBI::mysql adaptor recognizes the following ones: Name Value ---- ----- -dsn DBI data source (default dbi:mysql:test) -autoindex A flag that controls whether or not to update all search indexes whenever a feature is stored or updated (default true). -namespace A string that will be used to qualify each table, thereby allowing you to store several independent sequence feature databases in a single Mysql database. -dumpdir The path to a temporary directory that will be used during "fast" loading. See L for a description of this. Default is the current directory. -write Make the database writable (implied by -create) -fasta Provide an alternative DNA accessor object or path. By default the database will store DNA sequences internally. However, you may override this behavior by passing either a path to a FASTA file, or any Perl object that recognizes the seq($seqid,$start,$end) method. In the former case, the FASTA path will be passed to Bio::DB::Fasta, possibly causing an index to be constructed. Suitable examples of the latter type of object include the Bio::DB::Sam and Bio::DB::Sam::Fai classes. =cut ### # object constructor # sub new { my $self = shift; my ($adaptor,$serializer,$index_subfeatures,$cache,$compress,$debug,$create,$fasta,$args); if (@_ == 1) { $args = {DSN => shift} } else { ($adaptor,$serializer,$index_subfeatures,$cache,$compress,$debug,$create,$fasta,$args) = rearrange(['ADAPTOR', 'SERIALIZER', 'INDEX_SUBFEATURES', 'CACHE', 'COMPRESS', 'DEBUG', 'CREATE', 'FASTA', ],@_); } $adaptor ||= 'DBI::mysql'; $args->{WRITE}++ if $create; $args->{CREATE}++ if $create; my $class = "Bio::DB::SeqFeature::Store::$adaptor"; eval "require $class " or croak $@; $cache &&= eval "require Tie::Cacher; 1"; my $obj = $class->new_instance(); $obj->debug($debug) if defined $debug; $obj->init($args); $obj->init_cache($cache) if $cache; $obj->do_compress($compress); $obj->serializer($serializer) if defined $serializer; $obj->index_subfeatures($index_subfeatures) if defined $index_subfeatures; $obj->seqfeature_class('Bio::DB::SeqFeature'); $obj->set_dna_accessor($fasta) if defined $fasta; $obj->post_init($args); $obj; } =head2 init_database Title : init_database Usage : $db->init_database([$erase_flag]) Function: initialize a database Returns : true Args : (optional) flag to erase current data Status : public Call this after Bio::DB::SeqFeature::Store-Enew() to initialize a new database. In the case of a DBI database, this method installs the schema but does B create the database. You have to do this offline using the appropriate command-line tool. In the case of the "berkeleydb" adaptor, this creates an empty BTREE database. If there is any data already in the database, init_database() called with no arguments will have no effect. To permanently erase the data already there and prepare to receive a fresh set of data, pass a true argument. =cut ### # wipe database clean and reinstall schema # sub init_database { my $self = shift; $self->_init_database(@_); } =head2 post_init This method is invoked after init_database for use by certain adaptors (currently only the memory adaptor) to do automatic data loading after initialization. It is passed a copy of the init_database() args. =cut sub post_init { } =head2 add_features Title : add_features Usage : $success = $db->add_features(\@features) Function: store one or more features into the database Returns : true if successful Args : array reference of Bio::SeqFeatureI objects Status : public =cut sub add_features { my ($self, $feats) = @_; my $result = $self->store_and_cache(1, @$feats); } =head2 store Title : store Usage : $success = $db->store(@features) Function: store one or more features into the database Returns : true if successful Args : list of Bio::SeqFeatureI objects Status : public This method stores a list of features into the database. Each feature is updated so that its primary_id becomes the primary ID of the serialized feature stored in the database. If all features were successfully stored, the method returns true. In the DBI implementation, the store is performed as a single transaction and the transaction is rolled back if one or more store operations failed. In most cases, you should let the database assign the primary id. If the object you store already has a primary_id, then the ID must adhere to the datatype expected by the adaptor: an integer in the case of the various DB adaptors, and a string in the case of the memory and berkeley adaptors. You can find out what the primary ID of the feature has become by calling the feature's primary_id() method: $db->store($my_feature) or die "Oh darn"; my $id = $my_feature->primary_id; If the feature contains subfeatures, they will all be stored recursively. In the case of Bio::DB::SeqFeature and Bio::DB::SeqFeature::Store::NormalizedFeature, the subfeatures will be stored in a normalized way so that each subfeature appears just once in the database. Subfeatures will be indexed for separate retrieval based on the current value of index_subfeatures(). If you call store() with one or more features that already have valid primary_ids, then any existing objects will be B. Note that when using normalized features such as Bio::DB::SeqFeature, the subfeatures are not recursively updated when you update the parent feature. You must manually update each subfeatures that has changed. =cut ### # store one or more Bio::SeqFeatureI objects # if they already have a primary_id will replace into the database # otherwise will insert and primary_id will be added # # this version stores the object and flags it to be indexed # for search via attributes, name, type or location sub store { my ($self, @feats) = @_; for my $feat (@feats) { if ( (not ref $feat) || (not $feat->isa('Bio::SeqFeatureI')) ) { die "Cannot store non-Bio::SeqFeatureI object '$feat'\n"; } } my $result = $self->store_and_cache(1,@feats); } =head2 store_noindex Title : store_noindex Usage : $success = $db->store_noindex(@features) Function: store one or more features into the database without indexing Returns : true if successful Args : list of Bio::SeqFeatureI objects Status : public This method stores a list of features into the database but does not make them searchable. The only way to access the features is via their primary IDs. This method is ordinarily only used internally to store subfeatures that are not indexed. =cut # this version stores the object and flags it so that it is # not searchable via attributes, name, type or location # (typically used only for subfeatures) sub store_noindex { my $self = shift; $self->store_and_cache(0,@_); } =head2 no_blobs Title : no_blobs Usage : $db->no_blobs(1); Function: decide if objects should be stored in the database as blobs. Returns : boolean (default false) Args : boolean (true to no longer store objects; when the corresponding feature is retrieved it will instead be a minimal representation of the object that was stored, as some simple Bio::SeqFeatureI object) Status : dubious (new) This method saves lots of space in the database, which may in turn lead to large performance increases in extreme cases (over 7 million features in the db). =cut sub no_blobs { my $self = shift; if (@_) { $self->{no_blobs} = shift } return $self->{no_blobs} || 0; } =head2 new_feature Title : new_feature Usage : $feature = $db->new_feature(@args) Function: create a new Bio::DB::SeqFeature object in the database Returns : the new seqfeature Args : see below Status : public This method creates and stores a new Bio::SeqFeatureI object using the specialized Bio::DB::SeqFeature class. This class is able to store its subfeatures in a normalized fashion, allowing subfeatures to be shared among multiple parents (e.g. multiple exons shared among several mRNAs). The arguments are the same as for Bio::DB::SeqFeature-Enew(), which in turn are similar to Bio::SeqFeature::Generic-Enew() and Bio::Graphics::Feature-Enew(). The most important difference is the B<-index> option, which controls whether the feature will be indexed for retrieval (default is true). Ordinarily, you would only want to turn indexing off when creating subfeatures, because features stored without indexes will only be reachable via their primary IDs or their parents. Arguments are as follows: -seq_id the reference sequence -start the start position of the feature -end the stop position of the feature -display_name the feature name (returned by seqname) -primary_tag the feature type (returned by primary_tag) -source the source tag -score the feature score (for GFF compatibility) -desc a description of the feature -segments a list of subfeatures (see Bio::Graphics::Feature) -subtype the type to use when creating subfeatures -strand the strand of the feature (one of -1, 0 or +1) -phase the phase of the feature (0..2) -url a URL to link to when rendered with Bio::Graphics -attributes a hashref of tag value attributes, in which the key is the tag and the value is an array reference of values -index index this feature if true Aliases: -id an alias for -display_name -seqname an alias for -display_name -display_id an alias for -display_name -name an alias for -display_name -stop an alias for end -type an alias for primary_tag You can change the seqfeature implementation generated by new() by passing the name of the desired seqfeature class to $db-Eseqfeature_class(). =cut sub new_feature { my $self = shift; return $self->seqfeature_class->new(-store=>$self,@_); } =head2 delete Title : delete Usage : $success = $db->delete(@features) Function: delete a list of feature from the database Returns : true if successful Args : list of features Status : public This method looks up the primary IDs from a list of features and deletes them from the database, returning true if all deletions are successful. WARNING: The current DBI::mysql implementation has some issues that need to be resolved, namely (1) normalized subfeatures are NOT recursively deleted; and (2) the deletions are not performed in a transaction. =cut sub delete { my $self = shift; my $success = 1; for my $object (@_) { my $id = $object->primary_id; if ( not defined $id ) { warn "Could not delete feature without primary_id: $object"; $success = 0; next; } my $result = $self->_deleteid($id); warn "Could not delete feature with id=$id" unless $result; $success &&= $result; } $success; } =head2 fetch / get_feature_by_id / get_feature_by_primary_id Title : fetch get_feature_by_id get_feature_by_primary_id Usage : $feature = $db->fetch($primary_id) Function: fetch a feature from the database using its primary ID Returns : a feature Args : primary ID of desired feature Status : public This method returns a previously-stored feature from the database using its primary ID. If the primary ID is invalid, it returns undef. Use fetch_many() to rapidly retrieve multiple features. =cut ### # Fetch a Bio::SeqFeatureI from database using its primary_id # sub fetch { my $self = shift; @_ or croak "usage: fetch(\$primary_id)"; my $primary_id = shift; if (my $cache = $self->cache()) { return $cache->fetch($primary_id) if $cache->exists($primary_id); my $object = $self->_fetch($primary_id); $cache->store($primary_id,$object); return $object; } else { return $self->_fetch($primary_id); } } *get_feature_by_id = *get_feature_by_primary_id = \&fetch; =head2 fetch_many Title : fetch_many Usage : @features = $db->fetch_many($primary_id,$primary_id,$primary_id...) Function: fetch many features from the database using their primary ID Returns : list of features Args : a list of primary IDs or an array ref of primary IDs Status : public Same as fetch() except that you can pass a list of primary IDs or a ref to an array of IDs. =cut ### # Efficiently fetch a series of IDs from the database # Can pass an array or an array ref # sub fetch_many { my $self = shift; @_ or croak 'usage: fetch_many($id1,$id2,$id3...)'; my @ids = map {ref($_) ? @$_ : $_} @_ or return; $self->_fetch_many(@ids); } =head2 get_seq_stream Title : get_seq_stream Usage : $iterator = $db->get_seq_stream(@args) Function: return an iterator across all features in the database Returns : a Bio::DB::SeqFeature::Store::Iterator object Args : feature filters (optional) Status : public When called without any arguments this method will return an iterator object that will traverse all indexed features in the database. Call the iterator's next_seq() method to step through them (in no particular order): my $iterator = $db->get_seq_stream; while (my $feature = $iterator->next_seq) { print $feature->primary_tag,' ',$feature->display_name,"\n"; } You can select a subset of features by passing a series of filter arguments. The arguments are identical to those accepted by $db-Efeatures(). =cut ### # Return an iterator across all features that are indexable # sub get_seq_stream { my $self = shift; $self->_features(-iterator=>1,@_); } =head2 get_features_by_name Title : get_features_by_name Usage : @features = $db->get_features_by_name($name) Function: looks up features by their display_name Returns : a list of matching features Args : the desired name Status : public This method searches the display_name of all features for matches against the provided name. GLOB style wildcares ("*", "?") are accepted, but may be slow. The method returns the list of matches, which may be zero, 1 or more than one features. Be prepared to receive more than one result, as display names are not guaranteed to be unique. For backward compatibility with gbrowse, this method is also known as get_feature_by_name(). =cut ### # get_feature_by_name() return 0 or more features using a name lookup # uses the Bio::DB::GFF API # sub get_features_by_name { my $self = shift; my ($class,$name,$types,$allow_alias); if (@_ == 1) { # get_features_by_name('name'); $name = shift; } else { # get_features_by_name('class'=>'name'), get_feature_by_name(-name=>'name') ($class,$name,$allow_alias,$types) = rearrange([qw(CLASS NAME ALIASES),[qw(TYPE TYPES)]],@_); } # hacky workaround for assumption in Bio::DB::GFF that unclassed reference points were of type "Sequence" undef $class if $class && $class eq 'Sequence'; $self->_features(-name=>$name,-class=>$class,-aliases=>$allow_alias,-type=>$types); } =head2 get_feature_by_name Title : get_feature_by_name Usage : @features = $db->get_feature_by_name($name) Function: looks up features by their display_name Returns : a list of matching features Args : the desired name Status : Use get_features_by_name instead. This method is provided for backward compatibility with gbrowse. =cut sub get_feature_by_name { shift->get_features_by_name(@_) } =head2 get_features_by_alias Title : get_features_by_alias Usage : @features = $db->get_features_by_alias($name) Function: looks up features by their display_name or alias Returns : a list of matching features Args : the desired name Status : public This method is similar to get_features_by_name() except that it will also search through the feature aliases. Aliases can be created by storing features that contain one or more Alias tags. Wildards are accepted. =cut sub get_features_by_alias { my $self = shift; my @args = @_; if (@_ == 1) { @args = (-name=>shift); } push @args,(-aliases=>1); $self->get_features_by_name(@args); } =head2 get_features_by_type Title : get_features_by_type Usage : @features = $db->get_features_by_type(@types) Function: looks up features by their primary_tag Returns : a list of matching features Args : list of primary tags Status : public This method will return a list of features that have any of the primary tags given in the argument list. For compatibility with gbrowse and Bio::DB::GFF, types can be qualified using a colon: primary_tag:source_tag in which case only features that match both the primary_tag B the indicated source_tag will be returned. If the database was loaded from a GFF3 file, this corresponds to the third and second columns of the row, in that order. For example, given the GFF3 lines: ctg123 geneFinder exon 1300 1500 . + . ID=exon001 ctg123 fgenesH exon 1300 1520 . + . ID=exon002 exon001 and exon002 will be returned by searching for type "exon", but only exon001 will be returned by searching for type "exon:fgenesH". =cut sub get_features_by_type { my $self = shift; my @types = @_; $self->_features(-type=>\@types); } =head2 get_features_by_location Title : get_features_by_location Usage : @features = $db->get_features_by_location(@args) Function: looks up features by their location Returns : a list of matching features Args : see below Status : public This method fetches features based on a location range lookup. You call it using a positional list of arguments, or a list of (-argument=E$value) pairs. The positional form is as follows: $db->get_features_by_location($seqid [[,$start,]$end]) The $seqid is the name of the sequence on which the feature resides, and start and end are optional endpoints for the match. If the endpoints are missing then any feature on the indicated seqid is returned. Examples: get_features_by_location('chr1'); # all features on chromosome 1 get_features_by_location('chr1',5000); # features between 5000 and the end get_features_by_location('chr1',5000,8000); # features between 5000 and 8000 Location lookups are overlapping. A feature will be returned if it partially or completely overlaps the indicated range. The named argument form gives you more control: Argument Value -------- ----- -seq_id The name of the sequence on which the feature resides -start Start of the range -end End of the range -strand Strand of the feature -range_type Type of range to search over The B<-strand> argument, if present, can be one of "0" to find features that are on both strands, "+1" to find only plus strand features, and "-1" to find only minus strand features. Specifying a strand of undef is the same as not specifying this argument at all, and retrieves all features regardless of their strandedness. The B<-range_type> argument, if present, can be one of "overlaps" (the default), to find features whose positions overlap the indicated range, "contains," to find features whose endpoints are completely contained within the indicated range, and "contained_in" to find features whose endpoints are both outside the indicated range. =cut sub get_features_by_location { my $self = shift; my ($seqid,$start,$end,$strand,$rangetype) = rearrange([['SEQ_ID','SEQID','REF'],'START',['STOP','END'],'STRAND','RANGE_TYPE'],@_); $self->_features(-seqid=>$seqid, -start=>$start||undef, -end=>$end||undef, -strand=>$strand||undef, -range_type=>$rangetype); } =head2 get_features_by_attribute Title : get_features_by_attribute Usage : @features = $db->get_features_by_attribute(@args) Function: looks up features by their attributes/tags Returns : a list of matching features Args : see below Status : public This implements a simple tag filter. Pass a list of tag names and their values. The module will return a list of features whose tag names and values match. Tag names are case insensitive. If multiple tag name/value pairs are present, they will be ANDed together. To match any of a list of values, use an array reference for the value. Examples: # return all features whose "function" tag is "GO:0000123" @features = $db->get_features_by_attribute(function => 'GO:0000123'); # return all features whose "function" tag is "GO:0000123" or "GO:0000555" @features = $db->get_features_by_attribute(function => ['GO:0000123','GO:0000555']); # return all features whose "function" tag is "GO:0000123" or "GO:0000555" # and whose "confirmed" tag is 1 @features = $db->get_features_by_attribute(function => ['GO:0000123','GO:0000555'], confirmed => 1); =cut sub get_features_by_attribute { my $self = shift; my %attributes = ref($_[0]) ? %{$_[0]} : @_; %attributes or $self->throw("Usage: get_feature_by_attribute(attribute_name=>\$attribute_value...)"); $self->_features(-attributes=>\%attributes); } ### # features() call -- main query interface # =head2 features Title : features Usage : @features = $db->features(@args) Function: generalized query & retrieval interface Returns : list of features Args : see below Status : Public This is the workhorse for feature query and retrieval. It takes a series of -name=E$value arguments filter arguments. Features that match all the filters are returned. Argument Value -------- ----- Location filters: -seq_id Chromosome, contig or other DNA segment -seqid Synonym for -seq_id -ref Synonym for -seqid -start Start of range -end End of range -stop Synonym for -end -strand Strand -range_type Type of range match ('overlaps','contains','contained_in') Name filters: -name Name of feature (may be a glob expression) -aliases If true, match aliases as well as display names -class Archaic argument for backward compatibility. (-class=>'Clone',-name=>'ABC123') is equivalent to (-name=>'Clone:ABC123') Type filters: -types List of feature types (array reference) or one type (scalar) -type Synonym for the above -primary_tag Synonym for the above -attributes Hashref of attribute=>value pairs as per get_features_by_attribute(). Multiple alternative values can be matched by providing an array reference. -attribute synonym for -attributes You may also provide features() with a list of scalar values (the first element of which must B begin with a dash), in which case it will treat the list as a feature type filter. Examples: All features: @features = $db->features( ); All features on chromosome 1: @features = $db->features(-seqid=>'Chr1'); All features on chromosome 1 between 5000 and 6000: @features = $db->features(-seqid=>'Chr1',-start=>5000,-end=>6000); All mRNAs on chromosome 1 between 5000 and 6000: @features = $db->features(-seqid=>'Chr1',-start=>5000,-end=>6000,-types=>'mRNA'); All confirmed mRNAs and repeats on chromosome 1 that overlap the range 5000..6000: @features = $db->features(-seqid => 'Chr1',-start=>5000,-end=>6000, -types => ['mRNA','repeat'], -attributes=> {confirmed=>1} ); All confirmed mRNAs and repeats on chromosome 1 strictly contained within the range 5000..6000: @features = $db->features(-seqid => 'Chr1',-start=>5000,-end=>6000, -types => ['mRNA','repeat'], -attributes=> {confirmed=>1} -range_type => 'contained_in', ); All genes and repeats: @features = $db->features('gene','repeat_region'); =cut # documentation of args # my ($seq_id,$start,$end,$strand, # $name,$class,$allow_aliases, # $types, # $attributes, # $range_type, # $iterator, # ) = rearrange([['SEQID','SEQ_ID','REF'],'START',['STOP','END'],'STRAND', # 'NAME','CLASS','ALIASES', # ['TYPES','TYPE','PRIMARY_TAG'], # ['ATTRIBUTES','ATTRIBUTE'], # 'RANGE_TYPE', # ],@_); # $range_type ||= 'overlaps'; sub features { my $self = shift; my @args; if (@_ == 0) { @args = (); } elsif ($_[0] !~/^-/) { my @types = @_; @args = (-type=>\@types); } else { @args = @_; } $self->_features(@args); } =head2 get_all_features Title : get_all_features Usage : @features = $db->get_all_features() Function: get all feature in the database Returns : list of features Args : none Status : Public =cut # for compatibility with Bio::SeqFeature::Collection sub get_all_features { shift->features(); } =head2 seq_ids Title : seq_ids Usage : @ids = $db->seq_ids() Function: Return all sequence IDs contained in database Returns : list of sequence Ids Args : none Status : public =cut sub seq_ids { my $self = shift; return $self->_seq_ids(); } =head2 search_attributes Title : search_attributes Usage : @result_list = $db->search_attributes("text search string",[$tag1,$tag2...],$limit) Function: Search attributes for keywords occurring in a text string Returns : array of results Args : full text search string, array ref of attribute names, and an optional feature limit Status : public Given a search string, this method performs a full-text search of the specified attributes and returns an array of results. You may pass a scalar attribute name to search the values of one attribute (e.g. "Note") or you may pass an array reference to search inside multiple attributes (['Note','Alias','Parent']).Each row of the returned array is a arrayref containing the following fields: column 1 The display name of the feature column 2 The text of the note column 3 A relevance score. column 4 The feature type column 5 The unique ID of the feature NOTE: This search will fail to find features that do not have a display name! You can use fetch() or fetch_many() with the returned IDs to get to the features themselves. =cut sub search_attributes { my $self = shift; my ($search_string,$attribute_names,$limit) = @_; my $attribute_array = ref $attribute_names && ref $attribute_names eq 'ARRAY' ? $attribute_names : [$attribute_names]; return $self->_search_attributes($search_string,$attribute_array,$limit); } =head2 search_notes Title : search_notes Usage : @result_list = $db->search_notes("full text search string",$limit) Function: Search the notes for a text string Returns : array of results Args : full text search string, and an optional feature limit Status : public Given a search string, this method performs a full-text search of the "Notes" attribute and returns an array of results. Each row of the returned array is a arrayref containing the following fields: column 1 The display_name of the feature, suitable for passing to get_feature_by_name() column 2 The text of the note column 3 A relevance score. column 4 The type NOTE: This is equivalent to $db-Esearch_attributes('full text search string','Note',$limit). This search will fail to find features that do not have a display name! =cut ### # search_notes() # sub search_notes { my $self = shift; my ($search_string,$limit) = @_; return $self->_search_attributes($search_string,['Note'],$limit); } =head2 types Title : types Usage : @type_list = $db->types Function: Get all the types in the database Returns : array of Bio::DB::GFF::Typename objects Args : none Status : public =cut sub types { shift->throw_not_implemented; } =head2 insert_sequence Title : insert_sequence Usage : $success = $db->insert_sequence($seqid,$sequence_string,$offset) Function: Inserts sequence data into the database at the indicated offset Returns : true if successful Args : see below Status : public This method inserts the DNA or protein sequence fragment $sequence_string, identified by the ID $seq_id, into the database at the indicated offset $offset. It is used internally by the GFF3Loader to load sequence data from the files. =cut ### # insert_sequence() # # insert a bit of primary sequence into the database # sub insert_sequence { my $self = shift; my ($seqid,$seq,$offset) = @_; $offset ||= 0; $self->_insert_sequence($seqid,$seq,$offset); } =head2 fetch_sequence Title : fetch_sequence Usage : $sequence = $db->fetch_sequence(-seq_id=>$seqid,-start=>$start,-end=>$end) Function: Fetch the indicated subsequene from the database Returns : The sequence string (not a Bio::PrimarySeq object!) Args : see below Status : public This method retrieves a portion of the indicated sequence. The arguments are: Argument Value -------- ----- -seq_id Chromosome, contig or other DNA segment -seqid Synonym for -seq_id -name Synonym for -seq_id -start Start of range -end End of range -class Obsolete argument used for Bio::DB::GFF compatibility. If specified will qualify the seq_id as "$class:$seq_id". -bioseq Boolean flag; if true, returns a Bio::PrimarySeq object instead of a sequence string. You can call fetch_sequence using the following shortcuts: $seq = $db->fetch_sequence('chr3'); # entire chromosome $seq = $db->fetch_sequence('chr3',1000); # position 1000 to end of chromosome $seq = $db->fetch_sequence('chr3',undef,5000); # position 1 to 5000 $seq = $db->fetch_sequence('chr3',1000,5000); # positions 1000 to 5000 =cut ### # fetch_sequence() # # equivalent to old Bio::DB::GFF->dna() method # sub fetch_sequence { my $self = shift; my ($seqid,$start,$end,$class,$bioseq) = rearrange([['NAME','SEQID','SEQ_ID'], 'START',['END','STOP'],'CLASS','BIOSEQ'],@_); $seqid = "$seqid:$class" if defined $class; my $seq = $self->seq($seqid,$start,$end); return $seq unless $bioseq; require Bio::Seq unless Bio::Seq->can('new'); my $display_id = defined $start ? "$seqid:$start..$end" : $seqid; return Bio::Seq->new(-display_id=>$display_id,-seq=>$seq); } =head2 segment Title : segment Usage : $segment = $db->segment($seq_id [,$start] [,$end] [,$absolute]) Function: restrict the database to a sequence range Returns : a Bio::DB::SeqFeature::Segment object Args : sequence id, start and end ranges (optional) Status : public This is a convenience method that can be used when you are interested in the contents of a particular sequence landmark, such as a contig. Specify the ID of a sequence or other landmark in the database and optionally a start and endpoint relative to that landmark. The method will look up the region and return a Bio::DB::SeqFeature::Segment object that spans it. You can then use this segment object to make location-restricted queries on the database. Example: $segment = $db->segment('contig23',1,1000); # first 1000 bp of contig23 my @mRNAs = $segment->features('mRNA'); # all mRNAs that overlap segment Although you will usually want to fetch segments that correspond to physical sequences in the database, you can actually use any feature in the database as the sequence ID. The segment() method will perform a get_features_by_name() internally and then transform the feature into the appropriate coordinates. The named feature should exist once and only once in the database. If it exists multiple times in the database and you attempt to call segment() in a scalar context, you will get an exception. A workaround is to call the method in a list context, as in: my ($segment) = $db->segment('contig23',1,1000); or my @segments = $db->segment('contig23',1,1000); However, having multiple same-named features in the database is often an indication of underlying data problems. If the optional $absolute argument is a true value, then the specified coordinates are relative to the reference (absolute) coordinates. =cut ### # Replacement for Bio::DB::GFF->segment() method # sub segment { my $self = shift; my (@features,@args); if (@_ == 1 && blessed($_[0])) { @features = @_; @args = (); } else { @args = $self->setup_segment_args(@_); @features = $self->get_features_by_name(@args); } if (!wantarray && @features > 1) { $self->throw(<seq_id; my $strand = $f->strand; my ($start,$end); if ($abs) { $start = $rel_start; $end = defined $rel_end ? $rel_end : $start + $f->length - 1; } else { my $re = defined $rel_end ? $rel_end : $f->end - $f->start + 1; if ($strand >= 0) { $start = $f->start + $rel_start - 1; $end = $f->start + $re - 1; } else { $start = $f->end - $re + 1; $end = $f->end - $rel_start + 1; } } my $id = eval{$f->primary_id}; push @segments,Bio::DB::SeqFeature::Segment->new($self,$seqid,$start,$end,$strand,$id); } return wantarray ? @segments : $segments[0]; } =head2 seqfeature_class Title : seqfeature_class Usage : $classname = $db->seqfeature_class([$new_classname]) Function: get or set the name of the Bio::SeqFeatureI class generated by new_feature() Returns : name of class Args : new classname (optional) Status : public =cut sub seqfeature_class { my $self = shift; my $d = $self->{seqfeatureclass}; if (@_) { my $class = shift; eval "require $class"; $self->throw("$class does not implement the Bio::SeqFeatureI interface") unless $class->isa('Bio::SeqFeatureI'); $self->{seqfeatureclass} = $class; } $d; } =head2 reindex Title : reindex Usage : $db->reindex Function: reindex the database Returns : nothing Args : nothing Status : public This method will force the secondary indexes (name, location, attributes, feature types) to be recalculated. It may be useful to rebuild a corrupted database. =cut ### # force reindexing # sub reindex { my $self = shift; my $count = 0; my $now; my $last_time = time(); $self->_start_reindexing; my $iterator = $self->get_seq_stream; while (my $f = $iterator->next_seq) { if (++$count %1000 == 0) { $now = time(); my $elapsed = sprintf(" in %5.2fs",$now - $last_time); $last_time = $now; print STDERR "$count features indexed$elapsed...",' 'x60; print STDERR -t STDOUT && !$ENV{EMACS} ? "\r" : "\n"; } $self->_update_indexes($f); } $self->_end_reindexing; } =head2 attributes Title : attributes Usage : @a = $db->attributes Function: Returns list of all known attributes Returns : Returns list of all known attributes Args : nothing Status : public =cut sub attributes { my $self = shift; shift->throw_not_implemented; } =head2 start_bulk_update,finish_bulk_update Title : start_bulk_update,finish_bulk_update Usage : $db->start_bulk_update $db->finish_bulk_update Function: Activate optimizations for large number of insertions/updates Returns : nothing Args : nothing Status : public With some adaptors (currently only the DBI::mysql adaptor), these methods signal the adaptor that a large number of insertions or updates are to be performed, and activate certain optimizations. These methods are called automatically by the Bio::DB::SeqFeature::Store::GFF3Loader module. Example: $db->start_bulk_update; for my $f (@features) { $db->store($f); } $db->finish_bulk_update; =cut sub start_bulk_update { shift->_start_bulk_update(@_) } sub finish_bulk_update { shift->_finish_bulk_update(@_) } =head2 add_SeqFeature Title : add_SeqFeature Usage : $count = $db->add_SeqFeature($parent,@children) Function: store a parent/child relationship between a $parent and @children features that are already stored in the database Returns : number of children successfully stored Args : parent feature or primary ID and children features or primary IDs Status : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTORS If can_store_parentage() returns true, then some store-aware features (e.g. Bio::DB::SeqFeature) will invoke this method to store feature/subfeature relationships in a normalized table. =cut # these two are called only if _can_store_subFeatures() returns true # _add_SeqFeature ($parent,@children) sub add_SeqFeature { shift->_add_SeqFeature(@_) } =head2 fetch_SeqFeatures Title : fetch_SeqFeatures Usage : @children = $db->fetch_SeqFeatures($parent_feature) Function: return the immediate subfeatures of the indicated feature Returns : list of subfeatures Args : the parent feature and an optional list of children types Status : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTORS If can_store_parentage() returns true, then some store-aware features (e.g. Bio::DB::SeqFeature) will invoke this method to retrieve feature/subfeature relationships from the database. =cut # _get_SeqFeatures($parent,@child_types) sub fetch_SeqFeatures { my ($self, $parent, @child_types) = @_; return unless defined $parent->primary_id; $self->_fetch_SeqFeatures($parent,@child_types); } =head1 Changing the Behavior of the Database These methods allow you to modify the behavior of the database. =head2 debug Title : debug Usage : $debug_flag = $db->debug([$new_flag]) Function: set the debug flag Returns : current debug flag Args : new debug flag Status : public This method gets/sets a flag that turns on verbose progress messages. Currently this will not do very much. =cut sub debug { my $self = shift; my $d = $self->{debug}; $self->{debug} = shift if @_; $d; } =head2 serializer Title : serializer Usage : $serializer = $db->serializer([$new_serializer]) Function: get/set the name of the serializer Returns : the name of the current serializer class Args : (optional) the name of a new serializer Status : public You can use this method to set the serializer, but do not attempt to change the serializer once the database is initialized and populated. =cut ### # serializer # sub serializer { my $self = shift; my $d = $self->setting('serializer'); if (@_) { my $serializer = shift; eval "require $serializer; 1" or croak $@; $self->setting(serializer=>$serializer); } $d; } =head2 dna_accessor Title : dna_accessor Usage : $dna_accessor = $db->dna_accessor([$new_dna_accessor]) Function: get/set the name of the dna_accessor Returns : the current dna_accessor object, if any Args : (optional) the dna_accessor object Status : public You can use this method to request or set the DNA accessor. =cut ### # dna_accessor # sub dna_accessor { my $self = shift; my $d = $self->{dna_accessor}; $self->{dna_accessor} = shift if @_; $d; } sub can_do_seq { my $self = shift; my $obj = shift; return UNIVERSAL::can($obj,'seq') || UNIVERSAL::can($obj,'fetch_sequence'); } sub set_dna_accessor { my $self = shift; my $accessor = shift; if (-e $accessor) { # a file, assume it is a fasta file eval "require Bio::DB::Fasta" unless Bio::DB::Fasta->can('new'); my $a = Bio::DB::Fasta->new($accessor) or croak "Can't open FASTA file $accessor: $!"; $self->dna_accessor($a); } if (ref $accessor && $self->can_do_seq($accessor)) { $self->dna_accessor($accessor); # already built } return; } sub do_compress { my $self = shift; if (@_) { my $do_compress = shift; $self->setting(compress => $do_compress); } my $d = $self->setting('compress'); if ($d) { eval "use Compress::Zlib; 1" or croak $@ unless Compress::Zlib->can('compress'); } $d; } =head2 index_subfeatures Title : index_subfeatures Usage : $flag = $db->index_subfeatures([$new_value]) Function: flag whether to index subfeatures Returns : current value of the flag Args : (optional) new value of the flag Status : public If true, the store() method will add a searchable index to both the top-level feature and all its subfeatures, allowing the search functions to return features at any level of the containment hierarchy. If false, only the top level feature will be indexed, meaning that you will only be able to get at subfeatures by fetching the top-level feature and then traversing downward using get_SeqFeatures(). You are free to change this setting at any point during the creation and population of a database. One database can contain both indexed and unindexed subfeatures. =cut ### # whether to index subfeatures by default # sub index_subfeatures { my $self = shift; my $d = $self->setting('index_subfeatures'); $self->setting('index_subfeatures'=>shift) if @_; $d; } =head2 clone The clone() method should be used when you want to pass the Bio::DB::SeqFeature::Store object to a child process across a fork(). The child must call clone() before making any queries. The default behavior is to do nothing, but adaptors that use the DBI interface may need to implement this in order to avoid database handle errors. See the dbi adaptor for an example. =cut sub clone { } ################################# TIE interface #################### =head1 TIE Interface This module implements a full TIEHASH interface. The keys are the primary IDs of the features in the database. Example: tie %h,'Bio::DB::SeqFeature::Store',-adaptor=>'DBI::mysql',-dsn=>'dbi:mysql:elegans'; $h{123} = $feature1; $h{124} = $feature2; print $h{123}->display_name; =cut sub TIEHASH { my $class = shift; return $class->new(@_); } sub STORE { my $self = shift; my ($key,$feature) = @_; $key =~ /^\d+$/ && $key > 0 or croak "keys must be positive integers"; $self->load_class($feature); $feature->primary_id($key); $self->store($feature); } sub FETCH { my $self = shift; $self->fetch(@_); } sub FIRSTKEY { my $self = shift; $self->_firstid; } sub NEXTKEY { my $self = shift; my $lastkey = shift; $self->_nextid($lastkey); } sub EXISTS { my $self = shift; my $key = shift; $self->existsid($key); } sub DELETE { my $self = shift; my $key = shift; $self->_deleteid($key); } sub CLEAR { my $self = shift; $self->_clearall; } sub SCALAR { my $self = shift; $self->_featurecount; } ###################### TO BE IMPLEMENTED BY ADAPTOR ########## =head2 _init_database Title : _init_database Usage : $success = $db->_init_database([$erase]) Function: initialize an empty database Returns : true on success Args : optional boolean flag to erase contents of an existing database Status : ABSTRACT METHOD; MUST BE IMPLEMENTED BY AN ADAPTOR This method is the back end for init_database(). It must be implemented by an adaptor that inherits from Bio::DB::SeqFeature::Store. It returns true on success. @features = $db->features(-seqid=>'Chr1'); =cut sub _init_database { shift->throw_not_implemented } =head2 _store Title : _store Usage : $success = $db->_store($indexed,@objects) Function: store seqfeature objects into database Returns : true on success Args : a boolean flag indicating whether objects are to be indexed, and one or more objects Status : ABSTRACT METHOD; MUST BE IMPLEMENTED BY AN ADAPTOR This method is the back end for store() and store_noindex(). It should write the seqfeature objects into the database. If indexing is requested, the features should be indexed for query and retrieval. Otherwise the features should be stored without indexing (it is not required that adaptors respect this). If the object has no primary_id (undef), then the object is written into the database and assigned a new primary_id. If the object already has a primary_id, then the system will perform an update, replacing whatever was there before. In practice, the implementation will serialize each object using the freeze() method and then store it in the database under the corresponding primary_id. The object is then updated with the primary_id. =cut # _store($indexed,@objs) sub _store { my $self = shift; my $indexed = shift; my @objs = @_; $self->throw_not_implemented; } =head2 _fetch Title : _fetch Usage : $feature = $db->_fetch($primary_id) Function: fetch feature from database Returns : feature Args : primary id Status : ABSTRACT METHOD; MUST BE IMPLEMENTED BY AN ADAPTOR This method is the back end for fetch(). It accepts a primary_id and returns a feature object. It must be implemented by the adaptor. In practice, the implementation will retrieve the serialized Bio::SeqfeatureI object from the database and pass it to the thaw() method to unserialize it and synchronize the primary_id. =cut # _fetch($id) sub _fetch { shift->throw_not_implemented } =head2 _fetch_many Title : _fetch_many Usage : $feature = $db->_fetch_many(@primary_ids) Function: fetch many features from database Returns : feature Args : primary id Status : private -- does not need to be implemented This method fetches many features specified by a list of IDs. The default implementation simply calls _fetch() once for each primary_id. Implementors can override it if needed for efficiency. =cut # _fetch_many(@ids) # this one will fall back to many calls on fetch() if you don't # override it sub _fetch_many { my $self = shift; return map {$self->_fetch($_)} @_; } =head2 _update_indexes Title : _update_indexes Usage : $success = $db->_update_indexes($feature) Function: update the indexes for a feature Returns : true on success Args : A seqfeature object Status : ABSTRACT METHOD; MUST BE IMPLEMENTED BY AN ADAPTOR This method is called by reindex() to update the searchable indexes for a feature object that has changed. =cut # this is called to index a feature sub _update_indexes { shift->throw_not_implemented } =head2 _start_reindexing, _end_reindexing Title : _start_reindexing, _end_reindexing Usage : $db->_start_reindexing() $db->_end_reindexing Function: flag that a series of reindexing operations is beginning/ending Returns : true on success Args : none Status : MAY BE IMPLEMENTED BY AN ADAPTOR (optional) These methods are called by reindex() before and immediately after a series of reindexing operations. The default behavior is to do nothing, but these methods can be overridden by an adaptor in order to perform optimizations, turn off autocommits, etc. =cut # these do not necessary have to be overridden # they are called at beginning and end of reindexing process sub _start_reindexing {} sub _end_reindexing {} =head2 _features Title : _features Usage : @features = $db->_features(@args) Function: back end for all get_feature_by_*() queries Returns : list of features Args : see below Status : ABSTRACT METHOD; MUST BE IMPLEMENTED BY ADAPTOR This is the backend for features(), get_features_by_name(), get_features_by_location(), etc. Arguments are as described for the features() method, except that only the named-argument form is recognized. =cut # bottleneck query generator sub _features { shift->throw_not_implemented } =head2 _search_attributes Title : _search_attributes Usage : @result_list = $db->_search_attributes("text search string",[$tag1,$tag2...],$limit) Function: back end for the search_attributes() method Returns : results list Args : as per search_attributes() Status : ABSTRACT METHOD; MUST BE IMPLEMENTED BY ADAPTOR See search_attributes() for the format of the results list. The only difference between this and the public method is that the tag list is guaranteed to be an array reference. =cut sub _search_attributes { shift->throw_not_implemented } =head2 can_store_parentage Title : can_store_parentage Usage : $flag = $db->can_store_parentage Function: return true if this adaptor can store parent/child relationships Returns : boolean Args : none Status : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTORS Override this method and return true if this adaptor supports the _add_SeqFeature() and _get_SeqFeatures() methods, which are used for storing feature parent/child relationships in a normalized fashion. Default is false (parent/child relationships are stored in denormalized form in each feature). =cut # return true here if the storage engine is prepared to store parent/child # relationships using _add_SeqFeature and return them using _fetch_SeqFeatures sub can_store_parentage { return; } =head2 _add_SeqFeature Title : _add_SeqFeature Usage : $count = $db->_add_SeqFeature($parent,@children) Function: store a parent/child relationship between $parent and @children Returns : number of children successfully stored Args : parent feature and one or more children Status : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTORS If can_store_parentage() returns true, then some store-aware features (e.g. Bio::DB::SeqFeature) will invoke this method to store feature/subfeature relationships in a normalized table. =cut sub _add_SeqFeature { shift->throw_not_implemented } =head2 _fetch_SeqFeatures Title : _fetch_SeqFeatures Usage : @children = $db->_fetch_SeqFeatures($parent_feature) Function: return the immediate subfeatures of the indicated feature Returns : list of subfeatures Args : the parent feature Status : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTORS If can_store_parentage() returns true, then some store-aware features (e.g. Bio::DB::SeqFeature) will invoke this method to retrieve feature/subfeature relationships from the database. =cut # _get_SeqFeatures($parent,@list_of_child_types) sub _fetch_SeqFeatures {shift->throw_not_implemented } =head2 _insert_sequence Title : _insert_sequence Usage : $success = $db->_insert_sequence($seqid,$sequence_string,$offset) Function: Inserts sequence data into the database at the indicated offset Returns : true if successful Args : see below Status : ABSTRACT METHOD; MUST BE IMPLEMENTED BY ADAPTOR This is the back end for insert_sequence(). Adaptors must implement this method in order to store and retrieve nucleotide or protein sequence. =cut sub _insert_sequence { shift->throw_not_implemented } # _fetch_sequence() is similar to old dna() method =head2 _fetch_sequence Title : _fetch_sequence Usage : $sequence = $db->_fetch_sequence(-seq_id=>$seqid,-start=>$start,-end=>$end) Function: Fetch the indicated subsequence from the database Returns : The sequence string (not a Bio::PrimarySeq object!) Args : see below Status : ABSTRACT METHOD; MUST BE IMPLEMENTED BY ADAPTOR This is the back end for fetch_sequence(). Adaptors must implement this method in order to store and retrieve nucleotide or protein sequence. =cut sub _fetch_sequence { shift->throw_not_implemented } sub seq { my $self = shift; my ($seq_id,$start,$end) = @_; if (my $a = $self->dna_accessor) { return $a->can('seq') ? $a->seq($seq_id,$start,$end) :$a->can('fetch_sequence')? $a->fetch_sequence($seq_id,$start,$end) : undef; } else { return $self->_fetch_sequence($seq_id,$start,$end); } } =head2 _seq_ids Title : _seq_ids Usage : @ids = $db->_seq_ids() Function: Return all sequence IDs contained in database Returns : list of sequence Ids Args : none Status : TO BE IMPLEMENTED BY ADAPTOR This method is invoked by seq_ids() to return all sequence IDs (coordinate systems) known to the database. =cut sub _seq_ids { shift->throw_not_implemented } =head2 _start_bulk_update,_finish_bulk_update Title : _start_bulk_update, _finish_bulk_update Usage : $db->_start_bulk_update $db->_finish_bulk_update Function: Activate optimizations for large number of insertions/updates Returns : nothing Args : nothing Status : OPTIONAL; MAY BE IMPLEMENTED BY ADAPTOR These are the backends for start_bulk_update() and finish_bulk_update(). The default behavior of both methods is to do nothing. =cut # Optional flags to change behavior to optimize bulk updating. sub _start_bulk_update { } sub _finish_bulk_update { } # for full TIE() interface - not necessary to implement in most cases =head2 Optional methods needed to implement full TIEHASH interface The core TIEHASH interface will work if just the _store() and _fetch() methods are implemented. To support the full TIEHASH interface, including support for keys(), each(), and exists(), the following methods should be implemented: =over 4 =item $id = $db-E_firstid() Return the first primary ID in the database. Needed for the each() function. =item $next_id = $db-E_nextid($id) Given a primary ID, return the next primary ID in the series. Needed for the each() function. =item $boolean = $db-E_existsid($id) Returns true if the indicated primary ID is in the database. Needed for the exists() function. =item $db-E_deleteid($id) Delete the feature corresponding to the given primary ID. Needed for delete(). =item $db-E_clearall() Empty the database. Needed for %tied_hash = (). =item $count = $db-E_featurecount() Return the number of features in the database. Needed for scalar %tied_hash. =back =cut sub _firstid { shift->throw_not_implemented } sub _nextid { shift->throw_not_implemented } sub _existsid { shift->throw_not_implemented } sub _deleteid { shift->throw_not_implemented } sub _clearall { shift->throw_not_implemented } sub _featurecount { shift->throw_not_implemented } =head1 Internal Methods These methods are internal to Bio::DB::SeqFeature::Store and adaptors. =head2 new_instance Title : new_instance Usage : $db = $db->new_instance() Function: class constructor Returns : A descendent of Bio::DB::SeqFeature::Store Args : none Status : internal This method is called internally by new() to create a new uninitialized instance of Bio::DB::SeqFeature::Store. It is used internally and should not be called by application software. =cut sub new_instance { my $class = shift; return bless {},ref($class) || $class; } =head2 init Title : init Usage : $db->init(@args) Function: initialize object Returns : none Args : Arguments passed to new() Status : private This method is called internally by new() to initialize a newly-created object using the arguments passed to new(). It is to be overridden by Bio::DB::SeqFeature::Store adaptors. =cut sub init { my $self = shift; $self->default_settings(); } =head2 default_settings Title : default_settings Usage : $db->default_settings() Function: set up default settings for the adaptor Returns : none Args : none Status : private This method is may be overridden by adaptors. It is responsible for setting up object default settings. =cut ### # default settings -- set up whatever are the proper default settings # sub default_settings { my $self = shift; $self->serializer($self->default_serializer); $self->index_subfeatures(1); } =head2 default_serializer Title : default_serializer Usage : $serializer = $db->default_serializer Function: finds an available serializer Returns : the name of an available serializer Args : none Status : private This method returns the name of an available serializer module. =cut ### # choose a serializer # sub default_serializer { my $self = shift; # try Storable eval "require Storable; 1" and return 'Storable'; eval "require Data::Dumper; 1" and return 'Data::Dumper'; croak "Unable to load either Storable or Data::Dumper. Please provide a serializer using -serializer"; } =head2 setting Title : setting Usage : $value = $db->setting('setting_name' [=> $new_value]) Function: get/set the value of a setting Returns : the value of the current setting Args : the name of the setting and optionally a new value for the setting Status : private This is a low-level procedure for persistently storing database settings. It can be overridden by adaptors. =cut # persistent settings # by default we store in the object sub setting { my $self = shift; my $variable_name = shift; my $d = $self->{setting}{$variable_name}; $self->{setting}{$variable_name} = shift if @_; $d; } =head2 subfeatures_are_indexed Title : subfeatures_are_indexed Usage : $flag = $db->subfeatures_are_indexed([$new_value]) Function: flag whether subfeatures are indexed Returns : a flag indicating that all subfeatures are indexed Args : (optional) new value of the flag Status : private This method is used internally by the Bio::DB::SeqFeature class to optimize some of its operations. It returns true if all of the subfeatures in the database are indexed; it returns false if at least one of the subfeatures is not indexed. Do not attempt to change the value of this setting unless you are writing an adaptor. =cut ### # whether subfeatures are all indexed # sub subfeatures_are_indexed { my $self = shift; my $d = $self->setting('subfeatures_are_indexed'); $self->setting(subfeatures_are_indexed => shift) if @_; $d; } =head2 subfeature_types_are_indexed Title : subfeature_types_are_indexed Usage : $flag = $db->subfeature_types_are_indexed Function: whether subfeatures are indexed by type Returns : a flag indicating that all subfeatures are indexed Args : none Status : private This method returns true if subfeature types are indexed. Default is to return the value of subfeatures_are_indexed(). =cut sub subfeature_types_are_indexed { my $self = shift; return $self->subfeatures_are_indexed; } =head2 subfeature_locations_are_indexed Title : subfeature_locations_are_indexed Usage : $flag = $db->subfeature_locations_are_indexed Function: whether subfeatures are indexed by type Returns : a flag indicating that all subfeatures are indexed Args : none Status : private This method returns true if subfeature locations are indexed. Default is to return the value of subfeatures_are_indexed(). =cut sub subfeature_locations_are_indexed { my $self = shift; return $self->subfeatures_are_indexed; } =head2 setup_segment_args Title : setup_segment_args Usage : @args = $db->setup_segment_args(@args) Function: munge the arguments to the segment() call Returns : munged arguments Args : see below Status : private This method is used internally by segment() to translate positional arguments into named argument=Evalue pairs. =cut sub setup_segment_args { my $self = shift; return @_ if defined $_[0] && $_[0] =~ /^-/; return (-name=>$_[0],-start=>$_[1],-end=>$_[2]) if @_ == 3; return (-class=>$_[0],-name=>$_[1]) if @_ == 2; return (-name=>$_[0]) if @_ == 1; return; } =head2 store_and_cache Title : store_and_cache Usage : $success = $db->store_and_cache(@features) Function: store features into database and update cache Returns : number of features stored Args : index the features? (0 or 1) and list of features Status : private This private method stores the list of Bio::SeqFeatureI objects into the database and caches them in memory for retrieval. =cut sub store_and_cache { my $self = shift; my $indexit = shift; my $result = $self->_store($indexit,@_); if (my $cache = $self->cache) { for my $obj (@_) { defined (my $id = eval {$obj->primary_id}) or next; $cache->store($id,$obj); } } $result; } =head2 init_cache Title : init_cache Usage : $db->init_cache($size) Function: initialize the in-memory feature cache Returns : the Tie::Cacher object Args : desired size of the cache Status : private This method is used internally by new() to create the Tie::Cacher instance used for the in-memory feature cache. =cut sub init_cache { my $self = shift; my $cache_size = shift; $cache_size = 5000 if $cache_size == 1; # in case somebody treats it as a flag $self->{cache} = Tie::Cacher->new($cache_size) or $self->throw("Couldn't tie cache: $!"); } =head2 cache Title : cache Usage : $cache = $db->cache Function: return the cache object Returns : the Tie::Cacher object Args : none Status : private This method returns the Tie::Cacher object used for the in-memory feature cache. =cut sub cache { shift->{cache} } =head2 load_class Title : load_class Usage : $db->load_class($blessed_object) Function: loads the module corresponding to a blessed object Returns : empty Args : a blessed object Status : private This method is used by thaw() to load the code for a blessed object. This ensures that all the object's methods are available. =cut sub load_class { my $self = shift; my $obj = shift; return unless defined $obj; return if $self->{class_loaded}{ref $obj}++; unless ($obj && $obj->can('primary_id')) { my $class = ref $obj; eval "require $class"; } } #################################### Internal methods #################### =head2 freeze Title : freeze Usage : $serialized_object = $db->freeze($feature) Function: serialize a feature object into a string Returns : serialized feature object Args : a seqfeature object Status : private This method converts a Bio::SeqFeatureI object into a serialized form suitable for storage into a database. The feature's primary ID is set to undef before it is serialized. This avoids any potential mismatch between the primary ID used as the database key and the primary ID stored in the serialized object. =cut sub freeze { my $self = shift; my $obj = shift; # Bio::SeqFeature::Generic contains cleanup methods, so we need to # localize the methods to undef temporarily so that we can serialize local $obj->{'_root_cleanup_methods'} if exists $obj->{'_root_cleanup_methods'}; my ($id,$store); $id = $obj->primary_id(); $obj->primary_id(undef); # don't want primary ID to be stored in object eval { $store = $obj->object_store; $obj->object_store(undef); # don't want a copy of the store in the object }; my $serializer = $self->serializer; my $data; if ($serializer eq 'Data::Dumper') { my $d = Data::Dumper->new([$obj]); $d->Terse(1); $d->Deepcopy(1); $d->Deparse(1); $data = $d->Dump; } elsif ($serializer eq 'Storable') { local $Storable::forgive_me = 1; local $Storable::Deparse = 1; $data = Storable::nfreeze($obj); } $obj->primary_id($id); # restore to original state eval { $obj->object_store($store); }; $data = compress($data) if $self->do_compress; return $data; } =head2 thaw Title : thaw Usage : $feature = $db->thaw($serialized_object,$primary_id) Function: unserialize a string into a feature object Returns : Bio::SeqFeatureI object Args : serialized form of object from freeze() and primary_id of object Status : private This method is the reverse of the freeze(). The supplied primary_id becomes the primary_id() of the returned Bio::SeqFeatureI object. This implementation checks for a deserialized object in the cache before it calls thaw_object() to do the actual deserialization. =cut sub thaw { my $self = shift; my ($obj,$primary_id) = @_; if (my $cache = $self->cache) { return $cache->fetch($primary_id) if $cache->exists($primary_id); my $object = $self->thaw_object($obj,$primary_id) or return; $cache->store($primary_id,$object); return $object; } else { return $self->thaw_object($obj,$primary_id); } } =head2 thaw_object Title : thaw_object Usage : $feature = $db->thaw_object($serialized_object,$primary_id) Function: unserialize a string into a feature object Returns : Bio::SeqFeatureI object Args : serialized form of object from freeze() and primary_id of object Status : private After thaw() checks the cache and comes up empty, this method is invoked to thaw the object. =cut sub thaw_object { my $self = shift; my ($obj,$primary_id) = @_; my $serializer = $self->serializer; my $object; $obj = uncompress($obj) if $self->do_compress; if ($serializer eq 'Data::Dumper') { $object = eval $obj; } elsif ($serializer eq 'Storable') { local $Storable::forgive_me = 1; local $Storable::Eval = 1; $object = Storable::thaw($obj); } # remember the primary ID of this object as well as the # identity of the store, so that we can do lazy loading; # both of these are wrapped in an eval because not all # bioseqfeatures support them (or want to) $self->load_class($object); eval { $object->primary_id($primary_id); $object->object_store($self); }; $object; } =head2 feature_names Title : feature_names Usage : ($names,$aliases) = $db->feature_names($feature) Function: get names and aliases for a feature Returns : an array of names and an array of aliases Args : a Bio::SeqFeatureI object Status : private This is an internal utility function which, given a Bio::SeqFeatureI object, returns two array refs. The first is a list of official names for the feature, and the second is a list of aliases. This is slightly skewed towards GFF3 usage, so the official names are the display_name(), plus all tag values named 'Name', plus all tag values named 'ID'. The aliases are all tag values named 'Alias'. =cut sub feature_names { my $self = shift; my $obj = shift; my $primary_id = $obj->primary_id; my @names; push @names,$obj->display_name if defined $obj->display_name; push @names,$obj->get_tag_values('Name') if $obj->has_tag('Name'); push @names,$obj->get_tag_values('ID') if $obj->has_tag('ID'); # don't think this is desired behavior # @names = grep {defined $_ && $_ ne $primary_id} @names; my @aliases = grep {defined} $obj->get_tag_values('Alias') if $obj->has_tag('Alias'); return (\@names,\@aliases); } =head2 feature_summary Title : feature_summary Usage : $summary = $db->feature_summary(@args) Function: returns a coverage summary across indicated region/type Returns : a Bio::SeqFeatureI object containing the "coverage" tag Args : see below Status : public This method is used to get coverage density information across a region of interest. You provide it with a region of interest, optional a list of feature types, and a count of the number of bins over which you want to calculate the coverage density. An object is returned corresponding to the requested region. It contains a tag called "coverage" that will return an array ref of "bins" length. Each element of the array describes the number of features that overlap the bin at this position. Arguments: Argument Description -------- ----------- -seq_id Sequence ID for the region -start Start of region -end End of region -type/-types Feature type of interest or array ref of types -bins Number of bins across region. Defaults to 1000. -iterator Return an iterator across the region Note that this method uses an approximate algorithm that is only accurate to 500 bp, so when dealing with bins that are smaller than 1000 bp, you may see some shifting of counts between adjacent bins. Although an -iterator option is provided, the method only ever returns a single feature, so this is fairly useless. =cut sub feature_summary { my $self = shift; my ($seq_name,$start,$end,$types,$bins,$iterator) = rearrange([['SEQID','SEQ_ID','REF'],'START',['STOP','END'], ['TYPES','TYPE','PRIMARY_TAG'], 'BINS', 'ITERATOR', ],@_); my ($coverage,$tag) = $self->coverage_array(-seqid=> $seq_name, -start=> $start, -end => $end, -type => $types, -bins => $bins) or return; my $score = 0; for (@$coverage) { $score += $_ } $score /= @$coverage; my $feature = Bio::SeqFeature::Lite->new(-seq_id => $seq_name, -start => $start, -end => $end, -type => $tag, -score => $score, -attributes => { coverage => [$coverage] }); return $iterator ? Bio::DB::SeqFeature::Store::FeatureIterator->new($feature) : $feature; } =head2 coverage_array Title : coverage_array Usage : $arrayref = $db->coverage_array(@args) Function: returns a coverage summary across indicated region/type Returns : an array reference Args : see below Status : public This method is used to get coverage density information across a region of interest. The arguments are identical to feature_summary, except that instead of returning a Bio::SeqFeatureI object, it returns an array reference of the desired number of bins. The value of each element corresponds to the number of features in the bin. Arguments: Argument Description -------- ----------- -seq_id Sequence ID for the region -start Start of region -end End of region -type/-types Feature type of interest or array ref of types -bins Number of bins across region. Defaults to 1000. Note that this method uses an approximate algorithm that is only accurate to 500 bp, so when dealing with bins that are smaller than 1000 bp, you may see some shifting of counts between adjacent bins. =cut sub coverage_array { shift->throw_not_implemented; } package Bio::DB::SeqFeature::Store::FeatureIterator; $Bio::DB::SeqFeature::Store::FeatureIterator::VERSION = '1.7.4'; sub new { my $self = shift; my @features = @_; return bless \@features,ref $self || $self; } sub next_seq { my $self = shift; return unless @$self; return shift @$self; } sub begin_work { }# noop sub commit { }# noop sub rollback { }# noop 1; __END__ =head1 BUGS This is an early version, so there are certainly some bugs. Please use the BioPerl bug tracking system to report bugs. =head1 SEE ALSO L, L, L, L, L L =head1 AUTHOR Lincoln Stein Elstein@cshl.orgE. Copyright (c) 2006 Cold Spring Harbor Laboratory. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut mixed_alphabet.fasta100644000766000024 415213605523026 22754 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/t/data/dbfa>gi|352962132|ref|NG_030353.1| Homo sapiens sal-like 3 (Drosophila) (SALL3), RefSeqGene on chromosome 18 TAATAATCGTTTCGGCCTCCCTATAGGCAAGGAGTCAAAGTTTTAACTTGCTAGCATTATTTATGTAATC ATACATGCTGAAATGTCCCTCCTGGTCTACATGCAGCCCCGAGCCACAGTTCAGCCATCAGGAGAGAAGT ACTTCACCATCGTTTGCATCCCTCAGTGCGAAGACGACTGTGAGCTGATGTTTCTGTGTATGCCATAAAA AGCCACGGAATGTTTGCCTCTGATGGCTACGGTGAAGCTACACAGCGTCCTGGAATAAACACACAGGAAG >gi|352962148|ref|NM_001251825.1| Homo sapiens Sp1 UranscripUion facUor (SP1), UranscripU varianU 3, mRNA GUCCGGGUUCGCUUGCCUCGUCAGCGUCCGCGUUUUUCCCGGCCCCCCCCAACCCCCCCGGACAGGACCC CCUUGAGCUUGUCCCUCAGCUGCCACCAUGAGCGACCAAGAUCACUCCAUGGAUGAAAUGACAGCUGUGG UGAAAAUUGAAAAAGGAGUUGGUGGCAAUAAUGGGGGCAAUGGUAAUGGUGGUGGUGCCUUUUCACAGGC UCGAAGUAGCAGCACAGGCAGUAGCAGCAGCACUGGAGGAGGAGGGCAGGGUGCCAAUGGCUGGCAGAUC >gi|194473622|ref|NP_001123975.1| adenylosuccinate lyase [Rattus norvegicus] MAASGDPACAESYRSPLAARYASHEMCFLFSDRYKFQTWRQLWLWLAEAEQTLGLPITDEQIQEMRSNLS NIDFQMAAEEEKRLRHDVMAHVHTFGHCCPKAAGIIHLGATSCYVGDNTDLIILRNAFDLLLPKLARVIS RLADFAKERADLPTLGFTHFQPAQLTTVGKRCCLWIQDLCMDLQNLKRVRDELRFRGVKGTTGTQASFLQ LFEGDHQKVEQLDKMVTEKAGFKRAYIITGQTYTRKVDIEVLSVLASLGASVHKICTDIRLLANLKEMEE >gi|61679760|pdb|1Y4P|B Chain B, T-To-T(High) Quaternary Transitions In Human Hemoglobin: Betaw37e Deoxy Low-Salt (10 Test Sets) MHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPETQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGA FSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANA LAHKYHKERADLPTLGFTHFQPAQLTTVGKRCCLWIQDLCMDLQNLKRVRDELRFRGVKGTTGTQASFLQ LFEGDHQKVEQLDKMVTEKAGFKRAYIITGQTYTRKVDIEVLSVLASLGASVHKICTDIRLLANLKEMEE >0 MHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPETQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGA FSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANA LAHKYHKERADLPTLGFTHFQPAQLTTVGKRCCLWIQDLCMDLQNLKRVRDELRFRGVKGTTGTQASFLQ LFEGDHQKVEQLDKMVTEKAGFKRAYIITGQTYTRKVDIEVLSVLASLGASVHKICTDIRLLANLKEMEE >1 MHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPETQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGA FSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANA LAHKYHKERADLPTLGFTHFQPAQLTTVGKRCCLWIQDLCMDLQNLKRVRDELRFRGVKGTTGTQASFLQ LFEGDHQKVEQLDKMVTEKAGFKRAYIITGQTYTRKVDIEVLSVLASLGASVHKICTDIRLLANLKEMEE >123 empty sequence Segment.pm100644000766000024 3404113605523026 22506 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeaturepackage Bio::DB::SeqFeature::Segment; $Bio::DB::SeqFeature::Segment::VERSION = '1.7.4'; =head1 NAME Bio::DB::SeqFeature::Segment -- Location-based access to genome annotation data =head1 SYNOPSIS use Bio::DB::SeqFeature::Store; # Open the sequence database my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'DBI::mysql', -dsn => 'dbi:mysql:test'); my $segment = $db->segment('Chr1',5000=>6000); my @features = $segment->features('mRNA','match'); =head1 DESCRIPTION The segment object simplifies access to Bio::DB::SeqFeature store by acting as a placeholder for a region of the genome. You can replace this statement: @features = $db->features(-seq_id=>'Chr1', -start=>5000, -end=>6000, -types=>['mRNA','match','repeat_region']); with these statements: $segment = $db->segment('Chr1',5000=>6000); @features = $segment->features('mRNA','match','repeat_region'); You can also initialize a segment from an existing SeqFeature object. The range will be picked up from the SeqFeature boundaries: $segment = Bio::DB::SeqFeature::Segment->new($feature); # for Bio::DB::SeqFeature $segment = Bio::DB::SeqFeature::Segment->new($feature,$store); # for other Bio::SeqFeatureI objects The segment object implements the full Bio::SeqFeature::CollectionI interface, thereby allowing you to iterate over all features in the range. =cut use strict; use base 'Bio::SeqFeature::CollectionI','Bio::RangeI'; use Bio::DB::GFF::Util::Rearrange; use overload '""' => \&as_string, fallback => 1; =head1 PUBLIC METHODS The following are public methods intended for external use. =head2 new Title : new Usage : $segment = Bio::DB::SeqFeature::Segment->new(@options) Function: create a new Segment object Returns : A Bio::DB::SeqFeature::Segment object Args : several - see below Status : public This class method creates a Bio::DB::SeqFeature::Segment object. You must provide a Bio::DB::SeqFeature::Store as well as the coordinates of the segment. These arguments can be provided explicitly or indirectly. First form: $segment = Bio::DB::SeqFeature::Segment->new($store,$seqid,$start,$end,$strand) In this form a segment is defined by a Bio::DB::SeqFeature::Store, the sequence ID, the start, end and strand. This is the form that is invoked internally by Bio::DB::SeqFeature::Store when you call its segment() method. Second form: $segment = Bio::DB::SeqFeature::Segment->new($seqfeature [,$store]); In this form, you pass new() a Bio::SeqFeatureI object. The segment is constructed from the seq_id and coordinates are taken from the object. If you pass a store-aware seqfeature object (e.g. Bio::DB::SeqFeature) then the store database is also derived from the feature. Otherwise you will have to pass the store as a second argument. =cut ### # new() # # Call as Bio::DB::SeqFeature::Segment->new($seqfeature,$store) # # or # Bio::DB::SeqFeature::Segment->new(-seqid=>$seqid,-start=>$start,-end=>$end,-strand=>$strand,-store=>$store) # sub new { my $class = shift; my ($store,$seqid,$start,$end,$strand,$id); if (ref $_[0] && UNIVERSAL::isa($_[0],'Bio::SeqFeatureI')) { my $seqfeature = shift; $store = shift; $store ||= eval {$seqfeature->object_store}; $class->throw("I could not derive the Bio::DB::SeqFeature::Store object from the arguments passed to Bio::DB::SeqFeature::Segment->new(). Please pass the Store object as the second argument") unless $store; $seqid = $seqfeature->seq_id; $start = $seqfeature->start; $end = $seqfeature->end; $strand= $seqfeature->strand; $id = eval{$seqfeature->primary_id}; } else { ($store,$seqid,$start,$end,$strand,$id) = @_; } return bless { store => $store, seqid => $seqid, start => $start, end => $end, strand => $strand, primary_id => $id, },ref($class) || $class; } =head2 features Title : features Usage : @features = $segment->features(@args) Function: fetch seqfeatures that overlap the segment Returns : list of features Args : see below Status : Public This is the workhorse for feature query and retrieval. It takes a series of -name=E$value arguments filter arguments. Features that match all the filters are returned. Argument Value -------- ----- Location filters: -strand Strand -range_type Type of range match ('overlaps','contains','contained_in') Name filters: -name Name of feature (may be a glob expression) -aliases If true, match aliases as well as display names -class Archaic argument for backward compatibility. (-class=>'Clone',-name=>'ABC123') is equivalent to (-name=>'Clone:ABC123') Type filters: -types List of feature types (array reference) or one type (scalar) -type Synonym for the above -primary_tag Synonym for the above -attributes Hashref of attribute=>value pairs as per get_features_by_attribute(). Multiple alternative values can be matched by providing an array reference. -attribute synonym for -attributes This is identical to the Bio::DB::SeqFeature::Store-Efeatures() method, except that the -seq_id, -start, and -end arguments are provided by the segment object. If a simple list of arguments is provided, then the list is taken to be the set of feature types (primary tags) to filter on. Examples: All features that overlap the current segment: @features = $segment->features; All features of type mRNA that overlap the current segment: @features = $segment->features('mRNA'); All features that are completely contained within the current segment: @features = $segment->features(-range_type=>'contains'); All "confirmed" mRNAs that overlap the current segment: @features = $segment->features(-attributes=>{confirmed=>1},-type=>'mRNA'); =cut sub features { my $self = shift; my @args; if (@_ == 0) { @args = (); } elsif ($_[0] !~/^-/) { my @types = @_; @args = (-type=>\@types); } else { @args = @_; } $self->{store}->features(@args,-seqid=>$self->{seqid},-start=>$self->{start},-end=>$self->{end}); } sub types { my $self = shift; my %types; my $iterator = $self->get_seq_stream(@_); while (my $f = $iterator->next_seq) { $types{$f->type}++; } return %types; } =head2 get_seq_stream Title : get_seq_stream Usage : $iterator = $segment->get_seq_stream(@args) Function: return an iterator across all features in the database Returns : a Bio::DB::SeqFeature::Store::Iterator object Args : (optional) the feature() method Status : public This is identical to Bio::DB::SeqFeature::Store-Eget_seq_stream() except that the location filter is always automatically applied so that the iterator you receive returns features that overlap the segment's region. When called without any arguments this method will return an iterator object that will traverse all indexed features in the database that overlap the segment's region. Call the iterator's next_seq() method to step through them (in no particular order): my $iterator = $db->get_seq_stream; while (my $feature = $iterator->next_seq) { print $feature->primary_tag,' ',$feature->display_name,"\n"; } You can select a subset of features by passing a series of filter arguments. The arguments are identical to those accepted by $segment-Efeatures(). get_feature_stream() ican be used as a synonym for this method. =cut #' sub get_seq_stream { my $self = shift; $self->{store}->get_seq_stream(@_,-seqid=>$self->{seqid},-start=>$self->{start},-end=>$self->{end}); } sub get_feature_stream { shift->get_seq_stream(@_) } =head2 store Title : store Usage : $store = $segment->store Function: return the Bio::DB::SeqFeature::Store object associated with the segment Returns : a Bio::DB::SeqFeature::Store: object Args : none Status : public =cut sub factory { shift->{store} } sub store { shift->{store} } =head2 primary_tag, type, Title : primary_tag,type Usage : $primary_tag = $segment->primary_tag Function: returns the string "region" Returns : "region" Args : none Status : public The primary_tag method returns the constant tag "region". type() is a synonym for this method. =cut sub type { shift->primary_tag } =head2 as_string Title : as_string Usage : $name = $segment->as_string Function: expands the object into a human-readable string Returns : "seq_id:start..end" Args : none Status : public The as_string() method is overloaded into the "" operator so that the object is represented as a human readable string in the form "seq_id:start..end" when used in a string context. =cut sub as_string { my $self = shift; my $label = $self->seq_id; my $start = $self->start || ''; my $end = $self->end || ''; return "$label:$start..$end"; } =head2 rel2abs Title : rel2abs Usage : @coords = $s->rel2abs(@coords) Function: convert relative coordinates into absolute coordinates Returns : a list of absolute coordinates Args : a list of relative coordinates Status : Public This function takes a list of positions in relative coordinates to the segment, and converts them into absolute coordinates. =cut sub rel2abs { my $self = shift; my @result; my ($start,$strand) = ($self->start,$self->strand); @result = $strand < 0 ? map { $start - $_ + 1 } @_ : map { $_ + $start - 1 } @_; # if called with a single argument, caller will expect a single scalar reply # not the size of the returned array! return $result[0] if @result == 1 and !wantarray; @result; } =head2 abs2rel Title : abs2rel Usage : @rel_coords = $s->abs2rel(@abs_coords) Function: convert absolute coordinates into relative coordinates Returns : a list of relative coordinates Args : a list of absolute coordinates Status : Public This function takes a list of positions in absolute coordinates and returns a list expressed in relative coordinates. =cut sub abs2rel { my $self = shift; my @result; my ($start,$strand) = ($self->start,$self->abs_strand); @result = $strand < 0 ? map { $start - $_ + 1 } @_ : map { $_ - $start + 1 } @_; # if called with a single argument, caller will expect a single scalar reply # not the size of the returned array! return $result[0] if @result == 1 and !wantarray; @result; } =head2 Bio::SeqFeatureI compatibility methods For convenience, segments are interchangeable with Bio::SeqFeature objects in many cases. This means that segments can be passed to BioPerl modules that expect Bio::SeqFeature objects and they should work as expected. The primary tag of segment objects is "region" (SO:0000001 "Continous sequence E=1 base pair"). All these methods are read-only except for the primary_id, which can be get or set. The following Bio::SeqFeatureI methods are supported: =over 4 =item start =item end =item seq_id =item strand =item length =item display_name =item primary_id =item primary_tag (always returns "region") =item source_tag (always returns "Bio::DB::SeqFeature::Segment") =item get_SeqFeatures (always returns an empty list) =item seq =item entire_seq =item location =item All Bio::RangeI methods =back =cut sub start { shift->{start} } sub end { shift->{end} } sub seq_id { shift->{seqid} } sub strand { shift->{strand} } sub ref { shift->seq_id } *refseq = \&ref; sub length { my $self = shift; return abs($self->end - $self->start) +1; } sub primary_tag { 'region' } sub source_tag { __PACKAGE__ } sub display_name { shift->as_string } sub name { shift->display_name } sub class { 'region' } sub abs_ref { shift->ref} sub abs_start { shift->start} sub abs_end { shift->end} sub abs_strand { shift->strand} sub get_SeqFeatures { } sub get_all_tags { } sub get_tag_values { } sub add_tag_value { } sub remove_tag { } sub has_tag { } sub seq { my $self = shift; require Bio::PrimarySeq unless Bio::PrimarySeq->can('new'); my ($start,$end) = ($self->start,$self->end); if ($self->strand < 0) { ($start,$end) = ($end,$start); } return Bio::PrimarySeq->new( -seq => $self->store->fetch_sequence($self->seq_id,$start,$end), -id => $self->display_name); } sub subseq { my $self = shift; my ($newstart,$newstop) = @_; my $store = $self->store or return; my $seq = $store->fetch_sequence($self->seq_id,$self->start+$newstart-1,$self->end+$newstop-1); return Bio::PrimarySeq->new(-seq=>$seq); } sub dna { my $seq = shift->seq; $seq = $seq->seq if CORE::ref($seq); return $seq; } sub entire_seq { my $self = shift; require Bio::PrimarySeq unless Bio::PrimarySeq->can('new'); return Bio::PrimarySeq->new( -seq => $self->store->fetch_sequence($self->seq_id), -id => $self->seq_id); } sub location { my $self = shift; require Bio::Location::Simple unless Bio::Location::Simple->can('new'); my $loc = Bio::Location::Simple->new(-start => $self->start, -end => $self->end, -strand => $self->strand); $loc->strand($self->strand); return $loc; } sub primary_id { my $self = shift; my $d = $self->{primary_id}; $self->{primary_id} = shift if @_; $d; } sub target { return } sub score { return } sub stop { shift->end } sub absolute { return 1 } sub desc { shift->as_string } sub display_id { shift->display_name } sub primary_seq { shift->seq } sub accession_number { return undef } # intended return undef sub alphabet { return undef } # intended return undef 1; __END__ =head1 BUGS This is an early version, so there are certainly some bugs. Please use the BioPerl bug tracking system to report bugs. =head1 SEE ALSO L, L, L, L, L =head1 AUTHOR Lincoln Stein Elstein@cshl.orgE. Copyright (c) 2006 Cold Spring Harbor Laboratory. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut Store000755000766000024 013605523026 21460 5ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeaturebdb.pm100644000766000024 472713605523026 22717 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeature/Storepackage Bio::DB::SeqFeature::Store::bdb; $Bio::DB::SeqFeature::Store::bdb::VERSION = '1.7.4'; =head1 NAME Bio::DB::SeqFeature::Store::bdb - fetch and store objects from a BerkeleyDB =head1 DESCRIPTION This is a partial implementation -- just enough has been implemented so that we can fetch and store objects. It is used as a temporary failsafe store by the GFF3Loader module =cut use strict; use base 'Bio::DB::SeqFeature::Store'; use Bio::DB::GFF::Util::Rearrange 'rearrange'; use DB_File; use Fcntl qw(O_RDWR O_CREAT); use File::Temp 'tempdir'; use File::Path 'rmtree'; ### # object initialization # sub init { my $self = shift; my ($directory, $is_temporary) = rearrange([['DSN','DIR','DIRECTORY'], ['TMP','TEMP','TEMPORARY'] ],@_); $directory ||= $is_temporary ? File::Spec->tmpdir : '.'; $directory = tempdir(__PACKAGE__.'_XXXXXX',TMPDIR=>1,CLEANUP=>1,DIR=>$directory) if $is_temporary; -d $directory && -w _ or $self->throw("Can't write into the directory $directory"); $self->default_settings; $self->directory($directory); $self->temporary($is_temporary); my %h; tie (%h,'DB_File',$self->path,O_RDWR|O_CREAT,0666,$DB_HASH) or $self->throw("Couldn't tie: $!"); $self->db(\%h); $h{'.next_id'} ||= 1; } sub _store { my $self = shift; my $indexed = shift; my $db = $self->db; my $count = 0; for my $obj (@_) { my $primary_id = $obj->primary_id; $primary_id = $db->{'.next_id'}++ unless defined $primary_id; $db->{$primary_id} = $self->freeze($obj); $obj->primary_id($primary_id); $count++; } $count; } sub _update { my $self = shift; my ($object,$primary_id) = @_; my $db = $self->db; $self->throw("$object is not in database") unless exists $db->{$primary_id}; $db->{$primary_id} = $self->freeze($object); } sub _fetch { my $self = shift; my $id = shift; my $db = $self->db; my $obj = $self->thaw($db->{$id},$id); $obj; } sub db { my $self = shift; my $d = $self->setting('db'); $self->setting(db=>shift) if @_; $d; } sub directory { my $self = shift; my $d = $self->setting('directory'); $self->setting(directory=>shift) if @_; $d; } sub temporary { my $self = shift; my $d = $self->setting('temporary'); $self->setting(temporary=>shift) if @_; $d; } sub path { my $self = shift; return $self->directory .'/' . 'feature.bdb'; } sub DESTROY { my $self = shift; my $db = $self->db; untie %$db; rmtree($self->directory,0,1) if $self->temporary; } 1; memory.pm100644000766000024 5450713605523026 23521 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeature/Storepackage Bio::DB::SeqFeature::Store::memory; $Bio::DB::SeqFeature::Store::memory::VERSION = '1.7.4'; =head1 NAME Bio::DB::SeqFeature::Store::memory -- In-memory implementation of Bio::DB::SeqFeature::Store =head1 SYNOPSIS use Bio::DB::SeqFeature::Store; # Open the sequence database my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'memory', -dsn => '/var/databases/test'); # search... by id my @features = $db->fetch_many(@list_of_ids); # ...by name @features = $db->get_features_by_name('ZK909'); # ...by alias @features = $db->get_features_by_alias('sma-3'); # ...by type @features = $db->get_features_by_type('gene'); # ...by location @features = $db->get_features_by_location(-seq_id=>'Chr1',-start=>4000,-end=>600000); # ...by attribute @features = $db->get_features_by_attribute({description => 'protein kinase'}) # ...by the GFF "Note" field @result_list = $db->search_notes('kinase'); # ...by arbitrary combinations of selectors @features = $db->features(-name => $name, -type => $types, -seq_id => $seqid, -start => $start, -end => $end, -attributes => $attributes); # ...using an iterator my $iterator = $db->get_seq_stream(-name => $name, -type => $types, -seq_id => $seqid, -start => $start, -end => $end, -attributes => $attributes); while (my $feature = $iterator->next_seq) { # do something with the feature } # ...limiting the search to a particular region my $segment = $db->segment('Chr1',5000=>6000); my @features = $segment->features(-type=>['mRNA','match']); # getting & storing sequence information # Warning: this returns a string, and not a PrimarySeq object $db->insert_sequence('Chr1','GATCCCCCGGGATTCCAAAA...'); my $sequence = $db->fetch_sequence('Chr1',5000=>6000); # what feature types are defined in the database? my @types = $db->types; # create a new feature in the database my $feature = $db->new_feature(-primary_tag => 'mRNA', -seq_id => 'chr3', -start => 10000, -end => 11000); =head1 DESCRIPTION Bio::DB::SeqFeature::Store::memory is the in-memory adaptor for Bio::DB::SeqFeature::Store. You will not create it directly, but instead use Bio::DB::SeqFeature::Store-Enew() to do so. See L for complete usage instructions. =head2 Using the memory adaptor Before using the memory adaptor, populate a readable-directory on the file system with annotation and/or sequence files. The annotation files must be in GFF3 format, and sholud end in the extension .gff or .gff3. They may be compressed with "compress", "gzip" or "bzip2" (in which case the appropriate compression extension must be present as well.) You may include sequence data inline in the GFF3 files, or put the sequence data in one or more separate FASTA-format files. These files must end with .fa or .fasta and may be compressed. Because of the way the adaptor works, you will get much better performance if you keep the sequence data in separate FASTA files. Initialize the database using the -dsn option. This should point to the directory creating the annotation and sequence files, or to a single GFF3 file. Examples: # load all GFF3 and FASTA files located in /var/databases/test directory $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'memory', -dsn => '/var/databases/test'); # load the data in a single compressed GFF3 file located at # /usr/annotations/worm.gf33.gz $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'memory', -dsn => '/usr/annotations/worm.gff3.gz'); For compatibility with the Bio::DB::GFF memory adaptor, -gff is recognized as an alias for -dsn. See L for all the access methods supported by this adaptor. The various methods for storing and updating features and sequences into the database are supported, including GFF3 loading support, but since this is an in-memory adaptor all changes you make will be lost when the script exits. =cut use strict; use base 'Bio::DB::SeqFeature::Store'; use Bio::DB::SeqFeature::Store::GFF3Loader; use Bio::DB::GFF::Typename; use Bio::DB::GFF::Util::Rearrange 'rearrange'; use File::Temp 'tempdir'; use IO::File; use Bio::DB::Fasta; use File::Glob ':glob'; use constant BINSIZE => 10_000; ### # object initialization # sub init { my ($self, $args) = @_; $self->SUPER::init($args); $self->{_data} = {}; $self->{_children} = {}; $self->{_index} = {}; $self; } sub post_init { my $self = shift; my ($file_or_dir) = rearrange([['DIR','DSN','FILE','GFF']],@_); return unless $file_or_dir; my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store => $self, -sf_class => $self->seqfeature_class, -no_close_fasta => 1 ) or $self->throw("Couldn't create GFF3Loader"); my @argv; if (-d $file_or_dir) { @argv = ( bsd_glob("$file_or_dir/*.size*"), bsd_glob("$file_or_dir/*.gff"), bsd_glob("$file_or_dir/*.gff3"), bsd_glob("$file_or_dir/*.gff.{gz,Z,bz2}"), bsd_glob("$file_or_dir/*.gff3.{gz,Z,bz2}") ); } else { @argv = $file_or_dir; } local $self->{file_or_dir} = $file_or_dir; $loader->load(@argv); warn $@ if $@; } sub commit { # reindex fasta files my $self = shift; my $db; if (my $fh = $self->{fasta_fh}) { $fh->close; $db = Bio::DB::Fasta->new($self->{fasta_file}); } elsif (exists $self->{file_or_dir} && -d $self->{file_or_dir}) { $db = Bio::DB::Fasta->new($self->{file_or_dir}); } $self->{fasta_db} = $db if $db; } sub can_store_parentage { 1 } # return a hash ref in which each key is primary id sub data { shift->{_data}; } sub _init_database { shift->init } sub _store { my $self = shift; my $indexed = shift; my @objs = @_; my $data = $self->data; my $count = 0; for my $obj (@objs) { # Add unique ID to feature if needed my $primary_id = $self->_autoid($obj); # Store feature (overwriting any existing feature with the same primary ID # as required by Bio::DB::SF::Store) $data->{$primary_id} = $obj; if ($indexed) { $self->{_index}{ids}{$primary_id} = undef; $self->_update_indexes($obj); } $count++; } return $count; } sub _autoid { # If a feature has no ID, assign it a unique ID my ($self, $obj) = @_; my $data = $self->data; my $primary_id = $obj->primary_id; if (not defined $primary_id) { # Create a unique ID $primary_id = 1 + scalar keys %{$data}; while (exists $data->{$primary_id}) { $primary_id++; } $obj->primary_id($primary_id); } return $primary_id; } sub _deleteid { my ($self, $id) = @_; if (exists $self->{_index}{ids}{$id}) { # $indexed was true $self->_update_indexes( $self->fetch($id), 1 ); delete $self->{_index}{ids}{$id}; } delete $self->data->{$id}; return 1; } sub _fetch { my ($self, $id) = @_; return $self->data->{$id}; } sub _add_SeqFeature { my ($self, $parent, @children) = @_; my $count = 0; my $parent_id = ref $parent ? $parent->primary_id : $parent; defined $parent_id or $self->throw("Parent $parent should have a primary ID"); for my $child (@children) { my $child_id = ref $child ? $child->primary_id : $child; defined $child_id or $self->throw("Child $child should have a primary ID"); $self->{_children}{$parent_id}{$child_id}++; $count++; } return $count; } sub _fetch_SeqFeatures { my ($self, $parent, @types) = @_; my $parent_id = $parent->primary_id; defined $parent_id or $self->throw("Parent $parent should have a primary ID"); my @children_ids = keys %{$self->{_children}{$parent_id}}; my @children = map {$self->fetch($_)} @children_ids; if (@types) { my $data; for my $c (@children) { push @{$$data{$c->primary_tag}{$c->source_tag||''}}, $c; } @children = (); for my $type (@types) { $type .= ':' if (not $type =~ m/:/); my ($primary_tag, undef, $source_tag) = ($type =~ m/^(.*?)(:(.*?))$/); $source_tag ||= ''; if ($source_tag eq '') { for my $source (keys %{$$data{$primary_tag}}) { if (exists $$data{$primary_tag}{$source_tag}) { push @children, @{$$data{$primary_tag}{$source_tag}}; } } } else { if (exists $$data{$primary_tag}{$source_tag}) { push @children, @{$$data{$primary_tag}{$source_tag}}; } } } } return @children; } sub _update_indexes { my ($self, $obj, $del) = @_; defined (my $id = $obj->primary_id) or return; $del ||= 0; $self->_update_name_index($obj,$id, $del); $self->_update_type_index($obj,$id, $del); $self->_update_location_index($obj, $id, $del); $self->_update_attribute_index($obj,$id, $del); } sub _update_name_index { my ($self, $obj, $id, $del) = @_; my ($names, $aliases) = $self->feature_names($obj); foreach (@$names) { if (not $del) { $self->{_index}{name}{lc $_}{$id} = 1; } else { delete $self->{_index}{name}{lc $_}{$id}; if (scalar keys %{ $self->{_index}{name}{lc $_} } == 0) { delete $self->{_index}{name}{lc $_}; } }; } foreach (@$aliases) { if (not $del) { $self->{_index}{name}{lc $_}{$id} ||= 2; } else { delete $self->{_index}{name}{lc $_}{$id}; if (scalar keys %{ $self->{_index}{name}{lc $_} } == 0) { delete $self->{_index}{name}{lc $_}; } } } } sub _update_type_index { my ($self, $obj, $id, $del) = @_; my $primary_tag = lc($obj->primary_tag) || return; my $source_tag = lc($obj->source_tag || ''); if (not $del) { $self->{_index}{type}{$primary_tag}{$source_tag}{$id} = undef; } else { delete $self->{_index}{type}{$primary_tag}{$source_tag}{$id}; if ( scalar keys %{$self->{_index}{type}{$primary_tag}{$source_tag}} == 0 ) { delete $self->{_index}{type}{$primary_tag}{$source_tag}; if (scalar keys %{$self->{_index}{type}{$primary_tag}} == 0 ) { delete $self->{_index}{type}{$primary_tag}; } } } } sub _update_location_index { my ($self, $obj, $id, $del) = @_; my $seq_id = $obj->seq_id || ''; my $start = $obj->start || 0; my $end = $obj->end || 0; my $strand = $obj->strand; my $bin_min = int $start/BINSIZE; my $bin_max = int $end/BINSIZE; for (my $bin = $bin_min; $bin <= $bin_max; $bin++ ) { if (not $del) { $self->{_index}{location}{lc $seq_id}{$bin}{$id} = undef; } else { delete $self->{_index}{location}{lc $seq_id}{$bin}{$id}; if (scalar keys %{$self->{_index}{location}{lc $seq_id}{$bin}{$id}} == 0) { delete $self->{_index}{location}{lc $seq_id}{$bin}{$id}; } if (scalar keys %{$self->{_index}{location}{lc $seq_id}{$bin}} == 0) { delete $self->{_index}{location}{lc $seq_id}{$bin}; } if (scalar keys %{$self->{_index}{location}{lc $seq_id}} == 0) { delete $self->{_index}{location}{lc $seq_id}; } } } } sub _update_attribute_index { my ($self, $obj, $id, $del) = @_; for my $tag ($obj->get_all_tags) { for my $value ($obj->get_tag_values($tag)) { if (not $del) { $self->{_index}{attribute}{lc $tag}{lc $value}{$id} = undef; } else { delete $self->{_index}{attribute}{lc $tag}{lc $value}{$id}; if ( scalar keys %{$self->{_index}{attribute}{lc $tag}{lc $value}} == 0) { delete $self->{_index}{attribute}{lc $tag}{lc $value}; } if ( scalar keys %{$self->{_index}{attribute}{lc $tag}} == 0) { delete $self->{_index}{attribute}{lc $tag}; } if ( scalar keys %{$self->{_index}{attribute}} == 0) { delete $self->{_index}{attribute}; } } } } } sub _features { my $self = shift; my ($seq_id,$start,$end,$strand, $name,$class,$allow_aliases, $types, $attributes, $range_type, $iterator ) = rearrange([['SEQID','SEQ_ID','REF'],'START',['STOP','END'],'STRAND', 'NAME','CLASS','ALIASES', ['TYPES','TYPE','PRIMARY_TAG'], ['ATTRIBUTES','ATTRIBUTE'], 'RANGE_TYPE', 'ITERATOR', ],@_); my (@from,@where,@args,@group); $range_type ||= 'overlaps'; my @result; unless (defined $name or defined $seq_id or defined $types or defined $attributes) { @result = keys %{$self->{_index}{ids}}; } my %found = (); my $result = 1; if (defined($name)) { # hacky backward compatibility workaround undef $class if $class && $class eq 'Sequence'; $name = "$class:$name" if defined $class && length $class > 0; $result &&= $self->filter_by_name($name,$allow_aliases,\%found); } if (defined $seq_id) { $result &&= $self->filter_by_location($seq_id,$start,$end,$strand,$range_type,\%found); } if (defined $types) { $result &&= $self->filter_by_type($types,\%found); } if (defined $attributes) { $result &&= $self->filter_by_attribute($attributes,\%found); } push @result,keys %found if $result; return $iterator ? Bio::DB::SeqFeature::Store::memory::Iterator->new($self,\@result) : map {$self->fetch($_)} @result; } sub filter_by_type { my ($self, $types_req, $filter) = @_; my @types_req = ref $types_req eq 'ARRAY' ? @$types_req : $types_req; my $types = $self->{_index}{type}; my @types_found = $self->find_types(\@types_req); my @results; for my $type_found (@types_found) { my ($primary_tag, undef, $source_tag) = ($type_found =~ m/^(.*?)(:(.*?))$/); next unless exists $types->{$primary_tag}{$source_tag}; push @results, keys %{$types->{$primary_tag}{$source_tag}}; } $self->update_filter($filter,\@results); } sub find_types { my ($self, $types_req) = @_; my @types_found; my $types = $self->{_index}{type}; for my $type_req (@$types_req) { # Type is the primary tag and an optional source tag my ($primary_tag, $source_tag); if (ref $type_req && $type_req->isa('Bio::DB::GFF::Typename')) { $primary_tag = $type_req->method; $source_tag = $type_req->source; } else { ($primary_tag, undef, $source_tag) = ($type_req =~ m/^(.*?)(:(.*))?$/); } ($primary_tag, $source_tag) = (lc $primary_tag, lc($source_tag || '')); next if not exists $$types{$primary_tag}; if ($source_tag eq '') { # Match all sources for this primary_tag push @types_found, map {"$primary_tag:$_"} (keys %{ $$types{$primary_tag} }); } else { # Match only the requested source push @types_found, "$primary_tag:$source_tag"; } } return @types_found; } sub attributes { my $self = shift; return keys %{$self->{_index}{attribute}}; } sub filter_by_attribute { my ($self, $attributes, $filter) = @_; my $index = $self->{_index}{attribute}; my $result; for my $att_name (keys %$attributes) { my @result; my @matching_values; my @search_terms = ref($attributes->{$att_name}) && ref($attributes->{$att_name}) eq 'ARRAY' ? @{$attributes->{$att_name}} : $attributes->{$att_name}; my @regexp_terms; my @terms; for my $v (@search_terms) { if (my $regexp = $self->glob_match($v)) { @regexp_terms = keys %{$index->{lc $att_name}} unless @regexp_terms; push @terms,grep {/^$v$/i} @regexp_terms; } else { push @terms,lc $v; } } for my $v (@terms) { push @result,keys %{$index->{lc $att_name}{$v}}; } $result ||= $self->update_filter($filter,\@result); } $result; } sub filter_by_location { my ($self, $seq_id, $start, $end, $strand, $range_type, $filter) = @_; $strand ||= 0; my $index = $self->{_index}{location}{lc $seq_id}; my @bins; if (!defined $start or !defined $end or $range_type eq 'contained_in') { @bins = sort {$a<=>$b} keys %{$index}; $start = $bins[0] * BINSIZE unless defined $start; $end = (($bins[-1] + 1) * BINSIZE) - 1 unless defined $end; } my %seenit; my $bin_min = int $start/BINSIZE; my $bin_max = int $end/BINSIZE; my @bins_in_range = $range_type eq 'contained_in' ? ($bins[0]..$bin_min,$bin_max..$bins[-1]) : ($bin_min..$bin_max); my @results; for my $bin (@bins_in_range) { next unless exists $index->{$bin}; my @found = keys %{$index->{$bin}}; for my $f (@found) { next if $seenit{$f}++; my $feature = $self->_fetch($f) or next; next if $strand && $feature->strand != $strand; if ($range_type eq 'overlaps') { next unless $feature->end >= $start && $feature->start <= $end; } elsif ($range_type eq 'contains') { next unless $feature->start >= $start && $feature->end <= $end; } elsif ($range_type eq 'contained_in') { next unless $feature->start <= $start && $feature->end >= $end; } push @results,$f; } } $self->update_filter($filter,\@results); } sub filter_by_name { my ($self, $name, $allow_aliases, $filter) = @_; my $index = $self->{_index}{name}; my @names_to_fetch; if (my $regexp = $self->glob_match($name)) { @names_to_fetch = grep {/^$regexp$/i} keys %{$index}; } else { @names_to_fetch = lc $name; } my @results; for my $n (@names_to_fetch) { if ($allow_aliases) { push @results,keys %{$index->{$n}}; } else { push @results,grep {$index->{$n}{$_} == 1} keys %{$index->{$n}}; } } $self->update_filter($filter,\@results); } sub glob_match { my ($self, $term) = @_; return unless $term =~ /(?:^|[^\\])[*?]/; $term =~ s/(^|[^\\])([+\[\]^{}\$|\(\).])/$1\\$2/g; $term =~ s/(^|[^\\])\*/$1.*/g; $term =~ s/(^|[^\\])\?/$1./g; return $term; } sub update_filter { my ($self, $filter, $results) = @_; return unless @$results; if (%$filter) { my @filtered = grep {$filter->{$_}} @$results; %$filter = map {$_=>1} @filtered; } else { %$filter = map {$_=>1} @$results; } } sub _search_attributes { my ($self, $search_string, $attribute_array, $limit) = @_; $search_string =~ tr/*?//d; my @words = map {quotemeta($_)} $search_string =~ /(\w+)/g; my $search = join '|',@words; my (%results,%notes); my $index = $self->{_index}{attribute}; for my $tag (@$attribute_array) { my $attributes = $index->{lc $tag}; for my $value (keys %{$attributes}) { next unless $value =~ /$search/i; my @ids = keys %{$attributes->{$value}}; for my $w (@words) { my @hits = $value =~ /($w)/ig or next; $results{$_} += @hits foreach @ids; } $notes{$_} .= "$value " foreach @ids; } } my @results; for my $id (keys %results) { my $hits = $results{$id}; my $note = $notes{$id}; $note =~ s/\s+$//; my $relevance = 10 * $hits; my $feature = $self->fetch($id); my $name = $feature->display_name or next; my $type = $feature->type; push @results,[$name,$note,$relevance,$type,$id]; } return @results; } =head2 types Title : types Usage : @type_list = $db->types Function: Get all the types in the database Returns : array of Bio::DB::GFF::Typename objects (arrayref in scalar context) Args : none Status : public =cut sub types { my $self = shift; my @types; for my $primary_tag ( keys %{$$self{_index}{type}} ) { for my $source_tag ( keys %{$$self{_index}{type}{$primary_tag}} ) { push @types, Bio::DB::GFF::Typename->new($primary_tag,$source_tag); } } return @types; } # this is ugly sub _insert_sequence { my ($self, $seqid, $seq, $offset) = @_; my $dna_fh = $self->private_fasta_file or return; if ($offset == 0) { # start of the sequence print $dna_fh ">$seqid\n"; } print $dna_fh $seq,"\n"; } sub _fetch_sequence { my ($self, $seqid, $start, $end) = @_; my $db = $self->{fasta_db} or return; return $db->seq($seqid,$start,$end); } sub private_fasta_file { my $self = shift; return $self->{fasta_fh} if exists $self->{fasta_fh}; my $dir = tempdir (CLEANUP => 1); $self->{fasta_file} = "$dir/sequence.$$.fasta"; return $self->{fasta_fh} = IO::File->new($self->{fasta_file},">"); } # summary support sub coverage_array { my $self = shift; my ($seq_name,$start,$end,$types,$bins) = rearrange([['SEQID','SEQ_ID','REF'],'START',['STOP','END'], ['TYPES','TYPE','PRIMARY_TAG'],'BINS'],@_); my @features = $self->_features(-seq_id=> $seq_name, -start => $start, -end => $end, -types => $types); my $binsize = ($end-$start+1)/$bins; my $report_tag; my @coverage_array = (0) x $bins; for my $f (@features) { $report_tag ||= $f->primary_tag; my $fs = $f->start; my $fe = $f->end; my $start_bin = int(($fs-$start)/$binsize); my $end_bin = int(($fe-$start)/$binsize); $start_bin = 0 if $start_bin < 0; $end_bin = $bins-1 if $end_bin >= $bins; $coverage_array[$_]++ for ($start_bin..$end_bin); } return wantarray ? (\@coverage_array,$report_tag) : \@coverage_array; } sub _seq_ids { my $self = shift; if (my $fa = $self->{fasta_db}) { if (my @s = eval {$fa->ids}) { return @s; } } my $l = $self->{_index}{location} or return; return keys %$l; } package Bio::DB::SeqFeature::Store::memory::Iterator; $Bio::DB::SeqFeature::Store::memory::Iterator::VERSION = '1.7.4'; sub new { my ($class, $store, $ids) = @_; return bless {store => $store, ids => $ids},ref($class) || $class; } sub next_seq { my $self = shift; my $store = $self->{store} or return; my $id = shift @{$self->{ids}}; defined $id or return; return $store->fetch($id); } 1; __END__ =head1 BUGS This is an early version, so there are certainly some bugs. Please use the BioPerl bug tracking system to report bugs. =head1 SEE ALSO L, L, L, L, L, L, L =head1 AUTHOR Lincoln Stein Elstein@cshl.orgE. Copyright (c) 2006 Cold Spring Harbor Laboratory. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut Loader.pm100644000766000024 5100413605523026 23404 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeature/Storepackage Bio::DB::SeqFeature::Store::Loader; $Bio::DB::SeqFeature::Store::Loader::VERSION = '1.7.4'; =head1 NAME Bio::DB::SeqFeature::Store::Loader -- Loader =head1 SYNOPSIS # non-instantiable base class =head1 DESCRIPTION This is the base class for Bio::DB::SeqFeature::Loader::GFF3Loader, Bio::DB::SeqFeature::Loader::GFFLoader, and Bio::DB::SeqFeature::FeatureFileLoader. Please see the manual pages for these modules. =cut # load utility - incrementally load the store based on GFF3 file # # two modes: # slow mode -- features can occur in any order in the GFF3 file # fast mode -- all features with same ID must be contiguous in GFF3 file use strict; use Carp 'croak'; use IO::File; use Bio::DB::GFF::Util::Rearrange; use Bio::DB::SeqFeature::Store; use File::Spec; use File::Temp 'tempdir'; use base 'Bio::Root::Root'; use constant DEFAULT_SEQ_CHUNK_SIZE => 2000; =head2 new Title : new Usage : $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(@options) Function: create a new parser Returns : a Bio::DB::SeqFeature::Store::GFF3Loader gff3 parser and loader Args : several - see below Status : public This method creates a new GFF3 loader and establishes its connection with a Bio::DB::SeqFeature::Store database. Arguments are -name=E$value pairs as described in this table: Name Value ---- ----- -store A writable Bio::DB::SeqFeature::Store database handle. -seqfeature_class The name of the type of Bio::SeqFeatureI object to create and store in the database (Bio::DB::SeqFeature by default) -sf_class A shorter alias for -seqfeature_class -verbose Send progress information to standard error. -fast If true, activate fast loading (see below) -chunk_size Set the storage chunk size for nucleotide/protein sequences (default 2000 bytes) -tmp Indicate a temporary directory to use when loading non-normalized features. -map_coords A code ref that will transform a list of ($ref,[$start1,$end1]...) coordinates into a list of ($newref,[$newstart1,$newend1]...) -index_subfeatures Indicate true if subfeatures should be indexed. Default is true. -summary_stats Rebuild summary stats at the end of loading (not incremental, so takes a long time) When you call new(), a connection to a Bio::DB::SeqFeature::Store database should already have been established and the database initialized (if appropriate). Some combinations of Bio::SeqFeatures and Bio::DB::SeqFeature::Store databases support a fast loading mode. Currently the only reliable implementation of fast loading is the combination of DBI::mysql with Bio::DB::SeqFeature. The other important restriction on fast loading is the requirement that a feature that contains subfeatures must occur in the GFF3 file before any of its subfeatures. Otherwise the subfeatures that occurred before the parent feature will not be attached to the parent correctly. This restriction does not apply to normal (slow) loading. If you use an unnormalized feature class, such as Bio::SeqFeature::Generic, then the loader needs to create a temporary database in which to cache features until all their parts and subparts have been seen. This temporary databases uses the "berkeleydb" adaptor. The -tmp option specifies the directory in which that database will be created. If not present, it defaults to the system default tmp directory specified by File::Spec-Etmpdir(). The -chunk_size option allows you to tune the representation of DNA/Protein sequence in the Store database. By default, sequences are split into 2000 base/residue chunks and then reassembled as needed. This avoids the problem of pulling a whole chromosome into memory in order to fetch a short subsequence from somewhere in the middle. Depending on your usage patterns, you may wish to tune this parameter using a chunk size that is larger or smaller than the default. =cut sub new { my $self = shift; my ($store,$seqfeature_class,$tmpdir,$verbose,$fast, $seq_chunk_size,$coordinate_mapper,$index_subfeatures,$summary_stats,$no_close_fasta) = rearrange(['STORE', ['SF_CLASS','SEQFEATURE_CLASS'], ['TMP','TMPDIR'], 'VERBOSE', 'FAST', 'CHUNK_SIZE', 'MAP_COORDS', 'INDEX_SUBFEATURES', 'SUMMARY_STATS', 'NO_CLOSE_FASTA', ],@_); $seqfeature_class ||= $self->default_seqfeature_class; eval "require $seqfeature_class" unless $seqfeature_class->can('new'); $self->throw($@) if $@; my $normalized = $seqfeature_class->can('subfeatures_are_normalized') && $seqfeature_class->subfeatures_are_normalized; my $in_table = $seqfeature_class->can('subfeatures_are_stored_in_a_table') && $seqfeature_class->subfeatures_are_stored_in_a_table; if ($fast) { my $canfast = $normalized && $in_table; warn <tmpdir(); my ($tmp_store,$temp_load); unless ($normalized) { # remember the temporary directory in order to delete it on exit $temp_load = tempdir( 'BioDBSeqFeature_XXXXXXX', DIR=>$tmpdir, CLEANUP=>1 ); $tmp_store = Bio::DB::SeqFeature::Store->new(-adaptor => 'berkeleydb', -temporary=> 1, -dsn => $temp_load, -cache => 1, -write => 1) unless $normalized; } $index_subfeatures = 1 unless defined $index_subfeatures; return bless { store => $store, tmp_store => $tmp_store, seqfeature_class => $seqfeature_class, fast => $fast, seq_chunk_size => $seq_chunk_size || DEFAULT_SEQ_CHUNK_SIZE, verbose => $verbose, load_data => {}, tmpdir => $tmpdir, temp_load => $temp_load, subfeatures_normalized => $normalized, subfeatures_in_table => $in_table, coordinate_mapper => $coordinate_mapper, index_subfeatures => $index_subfeatures, summary_stats => $summary_stats, no_close_fasta => $no_close_fasta, },ref($self) || $self; } sub coordinate_mapper { my $self = shift; my $d = $self->{coordinate_mapper}; $self->{coordinate_mapper} = shift if @_; $d; } sub index_subfeatures { my $self = shift; my $d = $self->{index_subfeatures}; $self->{index_subfeatures} = shift if @_; $d; } sub summary_stats { my $self = shift; my $d = $self->{summary_stats}; $self->{summary_stats} = shift if @_; $d; } =head2 load Title : load Usage : $count = $loader->load(@ARGV) Function: load the indicated files or filehandles Returns : number of feature lines loaded Args : list of files or filehandles Status : public Once the loader is created, invoke its load() method with a list of GFF3 or FASTA file paths or previously-opened filehandles in order to load them into the database. Compressed files ending with .gz, .Z and .bz2 are automatically recognized and uncompressed on the fly. Paths beginning with http: or ftp: are treated as URLs and opened using the LWP GET program (which must be on your path). FASTA files are recognized by their initial "E" character. Do not feed the loader a file that is neither GFF3 nor FASTA; I don't know what will happen, but it will probably not be what you expect. =cut sub load { my $self = shift; my $start = $self->time(); my $count = 0; for my $file_or_fh (@_) { $self->msg("loading $file_or_fh...\n"); my $fh = $self->open_fh($file_or_fh) or $self->throw("Couldn't open $file_or_fh: $!"); $count += $self->load_fh($fh); $self->msg(sprintf "load time: %5.2fs\n",$self->time()-$start); } if ($self->summary_stats) { $self->msg("Building summary statistics for coverage graphs..."); my $start = $self->time(); $self->build_summary; $self->msg(sprintf "coverage graph build time: %5.2fs\n",$self->time()-$start); } $self->msg(sprintf "total load time: %5.2fs\n",$self->time()-$start); $count; } =head2 accessors The following read-only accessors return values passed or created during new(): store() the long-term Bio::DB::SeqFeature::Store object tmp_store() the temporary Bio::DB::SeqFeature::Store object used during loading sfclass() the Bio::SeqFeatureI class fast() whether fast loading is active seq_chunk_size() the sequence chunk size verbose() verbose progress messages =cut sub store { shift->{store} } sub tmp_store { shift->{tmp_store} } sub sfclass { shift->{seqfeature_class} } sub fast { shift->{fast} } sub seq_chunk_size { shift->{seq_chunk_size} } sub verbose { shift->{verbose} } =head2 Internal Methods The following methods are used internally and may be overridden by subclasses. =over 4 =item default_seqfeature_class $class = $loader->default_seqfeature_class Return the default SeqFeatureI class (Bio::DB::SeqFeature). =cut sub default_seqfeature_class { my $self = shift; return 'Bio::DB::SeqFeature'; } =item subfeatures_normalized $flag = $loader->subfeatures_normalized([$new_flag]) Get or set a flag that indicates that the subfeatures are normalized. This is deduced from the SeqFeature class information. =cut sub subfeatures_normalized { my $self = shift; my $d = $self->{subfeatures_normalized}; $self->{subfeatures_normalized} = shift if @_; $d; } =item subfeatures_in_table $flag = $loader->subfeatures_in_table([$new_flag]) Get or set a flag that indicates that feature/subfeature relationships are stored in a table. This is deduced from the SeqFeature class and Store information. =cut sub subfeatures_in_table { my $self = shift; my $d = $self->{subfeatures_in_table}; $self->{subfeatures_in_table} = shift if @_; $d; } =item load_fh $count = $loader->load_fh($filehandle) Load the GFF3 data at the other end of the filehandle and return true if successful. Internally, load_fh() invokes: start_load(); do_load($filehandle); finish_load(); =cut sub load_fh { my $self = shift; my $fh = shift; $self->start_load(); my $count = $self->do_load($fh); $self->finish_load(); $count; } =item start_load, finish_load These methods are called at the start and end of a filehandle load. =cut sub start_load { my $self = shift; $self->create_load_data; $self->store->start_bulk_update() if $self->fast; } sub create_load_data { my $self = shift; $self->{load_data}{CurrentFeature} = undef; $self->{load_data}{CurrentID} = undef; $self->{load_data}{IndexIt} = {}; $self->{load_data}{Local2GlobalID} = {}; $self->{load_data}{count} = 0; $self->{load_data}{mode} = undef; $self->{load_data}{start_time} = 0; } sub delete_load_data { my $self = shift; delete $self->{load_data}; } sub finish_load { my $self = shift; $self->store_current_feature(); # during fast loading, we will have a feature left at the very end $self->start_or_finish_sequence(); # finish any half-loaded sequences if ($self->fast) { $self->{load_data}{start_time} = $self->time(); $self->store->finish_bulk_update; } $self->msg(sprintf "%5.2fs\n",$self->time()-$self->{load_data}{start_time}); eval {$self->store->commit}; # don't delete load data so that caller can ask for the loaded IDs # $self->delete_load_data; } =item build_summary $loader->build_summary Call this to rebuild the summary coverage statistics. This is done automatically if new() was passed a true value for -summary_stats at create time. =cut sub build_summary { my $self = shift; $self->store->build_summary_statistics; } =item do_load $count = $loader->do_load($fh) This is called by load_fh() to load the GFF3 file's filehandle and return the number of lines loaded. =cut sub do_load { my $self = shift; my $fh = shift; $self->{load_data}{start_time} = $self->time(); $self->{load_data}->{millenium_time} = $self->{load_data}{start_time}; $self->load_line($_) while <$fh>; $self->msg(sprintf "%d features loaded in %5.2fs%s\r", $self->{load_data}->{count}, $self->time()-$self->{load_data}{start_time}, ' 'x80 ); $self->{load_data}{count}; } =item load_line $loader->load_line($data); Load a line of a GFF3 file. You must bracket this with calls to start_load() and finish_load()! $loader->start_load(); $loader->load_line($_) while ; $loader->finish_load(); =cut sub load_line { my $self = shift; my $line = shift; # don't do anything } =item handle_feature $loader->handle_feature($data_line) This method is called to process a single data line. It manipulates information stored a data structure called $self-E{load_data}. =cut sub handle_feature { my $self = shift; my $line = shift; # do nothing } =item handle_meta $loader->handle_meta($data_line) This method is called to process a single data line. It manipulates information stored a data structure called $self-E{load_data}. =cut sub handle_meta { my $self = shift; my $line = shift; # do nothing } sub _indexit { my $self = shift; my $id = shift; $id ||= ''; # avoid uninit warnings my $indexhash = $self->{load_data}{IndexIt}; $indexhash->{$id} = shift if @_; return $indexhash->{$id}; } sub _local2global { my $self = shift; my $id = shift; $id ||= ''; # avoid uninit warnings my $indexhash = $self->{load_data}{Local2GlobalID}; $indexhash->{$id} = shift if @_; return $indexhash->{$id}; } =item store_current_feature $loader->store_current_feature() This method is called to store the currently active feature in the database. It uses a data structure stored in $self-E{load_data}. =cut sub store_current_feature { my $self = shift; my $ld = $self->{load_data}; defined $ld->{CurrentFeature} or return; my $f = $ld->{CurrentFeature}; my $normalized = $self->subfeatures_normalized; my $indexed = $self->_indexit($ld->{CurrentID}); # logic is as follows: # 1. If the feature is an indexed feature, then we store it into the main database # so that it can be searched. It doesn't matter whether it is a top-level feature # or a subfeature. # 2. If the feature class is normalized, but not indexed, then we store it into the # main database using the "no_index" method. This will make it accessible to # queries on the top level parent, but it won't come up by itself in range or # attribute searches. # 3. Otherwise, this is an unindexed subfeature; we store it in the temporary database # until the object build step, at which point it gets integrated into its object tree # and copied into the main database. if ($indexed) { $self->store->store($f); } elsif ($normalized) { $self->store->store_noindex($f) } else { $self->tmp_store->store_noindex($f) } my $id = $f->primary_id; # assigned by store() $self->_local2global($ld->{CurrentID} => $id); $self->_indexit($ld->{CurrentID} => 0)if $normalized; # no need to remember this undef $ld->{CurrentID}; undef $ld->{CurrentFeature}; } =item parse_attributes ($reserved,$unreserved) = $loader->parse_attributes($attribute_line) This method parses the information contained in the $attribute_line into two hashrefs, one containing the values of reserved attribute tags (e.g. ID) and the other containing the values of unreserved ones. =cut sub parse_attributes { my $self = shift; my $att = shift; # do nothing } =item start_or_finish_sequence $loader->start_or_finish_sequence('Chr9') This method is called at the beginning and end of a fasta section. =cut # this gets called at the beginning and end of a fasta section sub start_or_finish_sequence { my $self = shift; my $seqid = shift; if (my $sl = $self->{fasta_load}) { if (defined $sl->{seqid}) { $self->store->insert_sequence($sl->{seqid},$sl->{sequence},$sl->{offset}); delete $self->{fasta_load}; } } if (defined $seqid) { $self->{fasta_load} = {seqid => $seqid, offset => 0, sequence => ''}; } } =item load_sequence $loader->load_sequence('gatttcccaaa') This method is called to load some amount of sequence after start_or_finish_sequence() is first called. =cut sub load_sequence { my $self = shift; my $seq = shift; my $sl = $self->{fasta_load} or return; my $cs = $self->seq_chunk_size; $sl->{sequence} .= $seq; while (length $sl->{sequence} >= $cs) { my $chunk = substr($sl->{sequence},0,$cs); $self->store->insert_sequence($sl->{seqid},$chunk,$sl->{offset}); $sl->{offset} += length $chunk; substr($sl->{sequence},0,$cs) = ''; } } =item open_fh my $io_file = $loader->open_fh($filehandle_or_path) This method opens up the indicated file or pipe, using some intelligence to recognized compressed files and URLs and doing the right thing. =cut sub open_fh { my $self = shift; my $thing = shift; no strict 'refs'; return $thing if defined fileno($thing); return IO::File->new("gunzip -c $thing |") if $thing =~ /\.gz$/; return IO::File->new("uncompress -c $thing |") if $thing =~ /\.Z$/; return IO::File->new("bunzip2 -c $thing |") if $thing =~ /\.bz2$/; return IO::File->new("GET $thing |") if $thing =~ /^(http|ftp):/; return $thing if ref $thing && $thing->isa('IO::String'); return IO::File->new($thing); } sub msg { my $self = shift; my @msg = @_; return unless $self->verbose; print STDERR @msg; } =item loaded_ids my $ids = $loader->loaded_ids; my $id_cnt = @$ids; After performing a load, this returns an array ref containing all the feature primary ids that were created during the load. =cut sub loaded_ids { my $self = shift; my @ids = values %{$self->{load_data}{Local2GlobalID}} if $self->{load_data}; return \@ids; } =item local_ids my $ids = $self->local_ids; my $id_cnt = @$ids; After performing a load, this returns an array ref containing all the load file IDs that were contained within the file just loaded. =cut sub local_ids { my $self = shift; my @ids = keys %{$self->{load_data}{Local2GlobalID}} if $self->{load_data}; return \@ids; } =item time my $time = $loader->time This method returns the current time in seconds, using Time::HiRes if available. =cut sub time { return Time::HiRes::time() if Time::HiRes->can('time'); return time(); } =item unescape my $unescaped = GFF3Loader::unescape($escaped) This is an internal utility. It is the same as CGI::Util::unescape, but doesn't change pluses into spaces and ignores unicode escapes. =cut sub unescape { my $self = shift; my $todecode = shift; $todecode =~ s/%([0-9a-fA-F]{2})/chr hex($1)/ge; return $todecode; } sub DESTROY { my $self = shift; # Close filehandles, so temporal files can be properly deleted my $store = $self->store; if ( $store->isa('Bio::DB::SeqFeature::Store::memory') or $store->isa('Bio::DB::SeqFeature::Store::berkeleydb3') ) { $store->private_fasta_file->close; if ($store->{fasta_db} && !$self->{no_close_fasta}) { while (my ($file, $fh) = each %{ $store->{fasta_db}->{fhcache} }) { $fh->close; } $store->{fasta_db}->_close_index($store->{fasta_db}->{offsets}); } } elsif ($store->isa('Bio::DB::SeqFeature::Store::DBI::SQLite')) { if (%DBI::installed_drh) { DBI->disconnect_all; %DBI::installed_drh = (); } undef $store->{dbh}; } if (my $ld = $self->{temp_load}) { unlink $ld; } } 1; __END__ =back =head1 BUGS This is an early version, so there are certainly some bugs. Please use the BioPerl bug tracking system to report bugs. =head1 SEE ALSO L, L, L, L, L, L, L =head1 AUTHOR Lincoln Stein Elstein@cshl.orgE. Copyright (c) 2006 Cold Spring Harbor Laboratory. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut DBI000755000766000024 013605523026 22056 5ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeature/StorePg.pm100644000766000024 7142413605523026 23152 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeature/Store/DBI=head1 NAME Bio::DB::SeqFeature::Store::DBI::Pg -- PostgreSQL implementation of Bio::DB::SeqFeature::Store =head1 SYNOPSIS use Bio::DB::SeqFeature::Store; # Open the sequence database my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::Pg', -dsn => 'dbi:Pg:test'); # get a feature from somewhere my $feature = Bio::SeqFeature::Generic->new(...); # store it $db->store($feature) or die "Couldn't store!"; # primary ID of the feature is changed to indicate its primary ID # in the database... my $id = $feature->primary_id; # get the feature back out my $f = $db->fetch($id); # change the feature and update it $f->start(100); $db->update($f) or die "Couldn't update!"; # searching... # ...by id my @features = $db->fetch_many(@list_of_ids); # ...by name @features = $db->get_features_by_name('ZK909'); # ...by alias @features = $db->get_features_by_alias('sma-3'); # ...by type @features = $db->get_features_by_name('gene'); # ...by location @features = $db->get_features_by_location(-seq_id=>'Chr1',-start=>4000,-end=>600000); # ...by attribute @features = $db->get_features_by_attribute({description => 'protein kinase'}) # ...by the GFF "Note" field @result_list = $db->search_notes('kinase'); # ...by arbitrary combinations of selectors @features = $db->features(-name => $name, -type => $types, -seq_id => $seqid, -start => $start, -end => $end, -attributes => $attributes); # ...using an iterator my $iterator = $db->get_seq_stream(-name => $name, -type => $types, -seq_id => $seqid, -start => $start, -end => $end, -attributes => $attributes); while (my $feature = $iterator->next_seq) { # do something with the feature } # ...limiting the search to a particular region my $segment = $db->segment('Chr1',5000=>6000); my @features = $segment->features(-type=>['mRNA','match']); # getting & storing sequence information # Warning: this returns a string, and not a PrimarySeq object $db->insert_sequence('Chr1','GATCCCCCGGGATTCCAAAA...'); my $sequence = $db->fetch_sequence('Chr1',5000=>6000); # what feature types are defined in the database? my @types = $db->types; # create a new feature in the database my $feature = $db->new_feature(-primary_tag => 'mRNA', -seq_id => 'chr3', -start => 10000, -end => 11000); =head1 DESCRIPTION Bio::DB::SeqFeature::Store::Pg is the Pg adaptor for Bio::DB::SeqFeature::Store. You will not create it directly, but instead use Bio::DB::SeqFeature::Store-Enew() to do so. See L for complete usage instructions. =head2 Using the Pg adaptor Before you can use the adaptor, you must use the Pgadmin tool to create a database and establish a user account with write permission. In order to use "fast" loading, the user account must have "file" privileges. To establish a connection to the database, call Bio::DB::SeqFeature::Store-Enew(-adaptor=E'DBI::Pg',@more_args). The additional arguments are as follows: Argument name Description ------------- ----------- -dsn The database name. You can abbreviate "dbi:Pg:foo" as "foo" if you wish. -user Username for authentication. -pass Password for authentication. -namespace Creates a SCHEMA for the tables. This allows you to have several virtual databases in the same physical database. -temp Boolean flag. If true, a temporary database will be created and destroyed as soon as the Store object goes out of scope. (synonym -temporary) -autoindex Boolean flag. If true, features in the database will be reindexed every time they change. This is the default. -tmpdir Directory in which to place temporary files during "fast" loading. Defaults to File::Spec->tmpdir(). (synonyms -dump_dir, -dumpdir, -tmp) -dbi_options A hashref to pass to DBI->connect's 4th argument, the "attributes." (synonyms -options, -dbi_attr) -write Pass true to open database for writing or updating. If successful, a new instance of Bio::DB::SeqFeature::Store::DBI::Pg will be returned. In addition to the standard methods supported by all well-behaved Bio::DB::SeqFeature::Store databases, several following adaptor-specific methods are provided. These are described in the next sections. =cut package Bio::DB::SeqFeature::Store::DBI::Pg; $Bio::DB::SeqFeature::Store::DBI::Pg::VERSION = '1.7.4'; use strict; use base 'Bio::DB::SeqFeature::Store::DBI::mysql'; use MIME::Base64; use Bio::DB::SeqFeature::Store::DBI::Iterator; use DBI; use DBD::Pg qw(:pg_types); use Memoize; use Cwd 'abs_path'; use Bio::DB::GFF::Util::Rearrange 'rearrange'; use File::Copy; use File::Spec; use constant DEBUG=>0; use constant MAX_INT => 2_147_483_647; use constant MIN_INT => -2_147_483_648; use constant MAX_BIN => 1_000_000_000; # size of largest feature = 1 Gb use constant MIN_BIN => 1000; # smallest bin we'll make - on a 100 Mb chromosome, there'll be 100,000 of these ### # object initialization # # NOTE: most of this code can be refactored and inherited from DBI or DBI::mysql adapter # sub init { my $self = shift; my ($dsn, $is_temporary, $autoindex, $namespace, $dump_dir, $user, $pass, $dbi_options, $writeable, $create, $schema, ) = rearrange(['DSN', ['TEMP','TEMPORARY'], 'AUTOINDEX', 'NAMESPACE', ['DUMP_DIR','DUMPDIR','TMP','TMPDIR'], 'USER', ['PASS','PASSWD','PASSWORD'], ['OPTIONS','DBI_OPTIONS','DBI_ATTR'], ['WRITE','WRITEABLE'], 'CREATE', 'SCHEMA' ],@_); $dbi_options ||= {pg_server_prepare => 0}; $writeable = 1 if $is_temporary or $dump_dir; $dsn or $self->throw("Usage: ".__PACKAGE__."->init(-dsn => \$dbh || \$dsn)"); my $dbh; if (ref $dsn) { $dbh = $dsn; } else { $dsn = "dbi:Pg:$dsn" unless $dsn =~ /^dbi:/; $dbh = DBI->connect($dsn,$user,$pass,$dbi_options) or $self->throw($DBI::errstr); } $dbh->do('set client_min_messages=warning') if $dbh; $self->{'original_arguments'} = { 'dsn' => $dsn, 'user' => $user, 'pass' => $pass, 'dbh_options' => $dbi_options, }; $self->{dbh} = $dbh; $self->{dbh}->{InactiveDestroy} = 1; $self->{is_temp} = $is_temporary; $self->{writeable} = $writeable; $self->{namespace} = $namespace || $schema || 'public'; $self->schema($self->{namespace}); $self->default_settings; $self->autoindex($autoindex) if defined $autoindex; $self->dumpdir($dump_dir) if $dump_dir; if ($self->is_temp) { # warn "creating a temp database isn't supported"; #$self->init_tmp_database(); $self->init_database('erase'); } elsif ($create) { $self->init_database('erase'); } } sub table_definitions { my $self = shift; return { feature => < < < < < < < < < <{'schema'} = $schema if defined($schema); if ($schema) { $self->_check_for_namespace(); $self->dbh->do("SET search_path TO " . $self->{'schema'} ); } else { $self->dbh->do("SET search_path TO public"); } return $self->{'schema'}; } ### # wipe database clean and reinstall schema # sub _init_database { my $self = shift; my $erase = shift; my $dbh = $self->dbh; my $namespace = $self->namespace; my $tables = $self->table_definitions; my $temporary = $self->is_temp ? 'TEMPORARY' : ''; foreach (keys %$tables) { next if $_ eq 'meta'; # don't get rid of meta data! my $table = $self->_qualify($_); $dbh->do("DROP TABLE IF EXISTS $table") if $erase; my @table_exists = $dbh->selectrow_array("SELECT * FROM pg_tables WHERE tablename = '$table' AND schemaname = '$self->namespace'"); if (!scalar(@table_exists)) { my $query = "CREATE $temporary TABLE $table $tables->{$_}"; $dbh->do($query) or $self->throw($dbh->errstr); } } $self->subfeatures_are_indexed(1) if $erase; 1; } sub maybe_create_meta { my $self = shift; return unless $self->writeable; my $namespace = $self->namespace; my $table = $self->_qualify('meta'); my $tables = $self->table_definitions; my $temporary = $self->is_temp ? 'TEMPORARY' : ''; my @table_exists = $self->dbh->selectrow_array("SELECT * FROM pg_tables WHERE tablename = 'meta' AND schemaname = '$namespace'"); $self->dbh->do("CREATE $temporary TABLE $table $tables->{meta}") unless @table_exists; } ### # check if the namespace/schema exists, if not create it # sub _check_for_namespace { my $self = shift; my $namespace = $self->namespace; return if $namespace eq 'public'; my $dbh = $self->dbh; my @schema_exists = $dbh->selectrow_array("SELECT * FROM pg_namespace WHERE nspname = '$namespace'"); if (!scalar(@schema_exists)) { my $query = "CREATE SCHEMA $namespace"; $dbh->do($query) or $self->throw($dbh->errstr); # if temp parameter is set and schema created for this process then enable removal in remove_namespace() if ($self->is_temp) { $self->{delete_schema} = 1; } } } ### # Overiding inherited mysql _qualify (We do not need to qualify for PostgreSQL as we have set the search_path above) # sub _qualify { my $self = shift; my $table_name = shift; return $table_name; } ### # when is_temp is set and the schema did not exist beforehand then we are able to remove it # sub remove_namespace { my $self = shift; if ($self->{delete_schema}) { my $namespace = $self->namespace; $self->dbh->do("DROP SCHEMA $namespace") or $self->throw($self->dbh->errstr); } } ####Overiding the inherited mysql function _prepare sub _prepare { my $self = shift; my $query = shift; my $dbh = $self->dbh; my $schema = $self->{namespace}; if ($schema) { $self->_check_for_namespace(); $dbh->do("SET search_path TO " . $self->{'schema'} ); } else { $dbh->do("SET search_path TO public"); } my $sth = $dbh->prepare_cached($query, {}, 3) or $self->throw($dbh->errstr); $sth; } sub _finish_bulk_update { my $self = shift; my $dbh = $self->dbh; my $dir = $self->{dumpdir} || '.'; for my $table ('feature',$self->index_tables) { my $fh = $self->dump_filehandle($table); my $path = $self->dump_path($table); $fh->close; my $qualified_table = $self->_qualify($table); copy($path, "$path.bak"); # Get stuff from file into STDIN so we don't have to be superuser open my $FH, '<', $path or $self->throw("Could not read file '$path': $!"); print STDERR "Loading file $path\n"; $dbh->do("COPY $qualified_table FROM STDIN CSV QUOTE '''' DELIMITER '\t'") or $self->throw($dbh->errstr); while (my $line = <$FH>) { $dbh->pg_putline($line); } $dbh->pg_endcopy() or $self->throw($dbh->errstr); close $FH; #unlink $path; } delete $self->{bulk_update_in_progress}; delete $self->{filehandles}; } ### # Add a subparts to a feature. Both feature and all subparts must already be in database. # sub _add_SeqFeature { my $self = shift; # special purpose method for case when we are doing a bulk update return $self->_dump_add_SeqFeature(@_) if $self->{bulk_update_in_progress}; my $parent = shift; my @children = @_; my $dbh = $self->dbh; local $dbh->{RaiseError} = 1; my $child_table = $self->_parent2child_table(); my $count = 0; my $querydel = "DELETE FROM $child_table WHERE id = ? AND child = ?"; my $query = "INSERT INTO $child_table (id,child) VALUES (?,?)"; my $sthdel = $self->_prepare($querydel); my $sth = $self->_prepare($query); my $parent_id = (ref $parent ? $parent->primary_id : $parent) or $self->throw("$parent should have a primary_id"); $self->begin_work or $self->throw($dbh->errstr); eval { for my $child (@children) { my $child_id = ref $child ? $child->primary_id : $child; defined $child_id or die "no primary ID known for $child"; $sthdel->execute($parent_id, $child_id); $sth->execute($parent_id,$child_id); $count++; } }; if ($@) { warn "Transaction aborted because $@"; $self->rollback; } else { $self->commit; } $sth->finish; $count; } # because this is a reserved word in postgresql ### # get primary sequence between start and end # sub _fetch_sequence { my $self = shift; my ($seqid,$start,$end) = @_; # backward compatibility to the old days when I liked reverse complementing # dna by specifying $start > $end my $reversed; if (defined $start && defined $end && $start > $end) { $reversed++; ($start,$end) = ($end,$start); } $start-- if defined $start; $end-- if defined $end; my $offset1 = $self->_offset_boundary($seqid,$start || 'left'); my $offset2 = $self->_offset_boundary($seqid,$end || 'right'); my $sequence_table = $self->_sequence_table; my $locationlist_table = $self->_locationlist_table; my $sth = $self->_prepare(<= ? AND "offset" <= ? ORDER BY "offset" END my $seq = ''; $sth->execute($seqid,$offset1,$offset2) or $self->throw($sth->errstr); while (my($frag,$offset) = $sth->fetchrow_array) { substr($frag,0,$start-$offset) = '' if defined $start && $start > $offset; $seq .= $frag; } substr($seq,$end-$start+1) = '' if defined $end && $end-$start+1 < length($seq); if ($reversed) { $seq = reverse $seq; $seq =~ tr/gatcGATC/ctagCTAG/; } $sth->finish; $seq; } sub _offset_boundary { my $self = shift; my ($seqid,$position) = @_; my $sequence_table = $self->_sequence_table; my $locationlist_table = $self->_locationlist_table; my $sql; $sql = $position eq 'left' ? "SELECT min(\"offset\") FROM $sequence_table as s,$locationlist_table as ll WHERE s.id=ll.id AND ll.seqname=?" :$position eq 'right' ? "SELECT max(\"offset\") FROM $sequence_table as s,$locationlist_table as ll WHERE s.id=ll.id AND ll.seqname=?" :"SELECT max(\"offset\") FROM $sequence_table as s,$locationlist_table as ll WHERE s.id=ll.id AND ll.seqname=? AND \"offset\"<=?"; my $sth = $self->_prepare($sql); my @args = $position =~ /^-?\d+$/ ? ($seqid,$position) : ($seqid); $sth->execute(@args) or $self->throw($sth->errstr); my $boundary = $sth->fetchall_arrayref->[0][0]; $sth->finish; return $boundary; } sub _name_sql { my $self = shift; my ($name,$allow_aliases,$join) = @_; my $name_table = $self->_name_table; my $from = "$name_table as n"; my ($match,$string) = $self->_match_sql($name); my $where = "n.id=$join AND lower(n.name) $match"; $where .= " AND n.display_name>0" unless $allow_aliases; return ($from,$where,'',$string); } sub _search_attributes { my $self = shift; my ($search_string,$attribute_names,$limit) = @_; my @words = map {quotemeta($_)} split /\s+/,$search_string; my $name_table = $self->_name_table; my $attribute_table = $self->_attribute_table; my $attributelist_table = $self->_attributelist_table; my $type_table = $self->_type_table; my $typelist_table = $self->_typelist_table; my @tags = @$attribute_names; my $tag_sql = join ' OR ',("al.tag=?") x @tags; my $perl_regexp = join '|',@words; my @wild_card_words = map { "%$_%" } @words; my $sql_regexp = join ' OR ',("a.attribute_value SIMILAR TO ?") x @words; my $sql = <_print_query($sql,@tags,@wild_card_words) if DEBUG || $self->debug; my $sth = $self->_prepare($sql); $sth->execute(@tags,@wild_card_words) or $self->throw($sth->errstr); my @results; while (my($name,$value,$type,$id) = $sth->fetchrow_array) { my (@hits) = $value =~ /$perl_regexp/ig; my @words_in_row = split /\b/,$value; my $score = int(@hits*100/@words/@words_in_row); push @results,[$name,$value,$score,$type,$id]; } $sth->finish; @results = sort {$b->[2]<=>$a->[2]} @results; return @results; } # overridden here because the mysql adapter uses # a non-standard query hint sub _attributes_sql { my $self = shift; my ($attributes,$join) = @_; my ($wf,@bind_args) = $self->_make_attribute_where('a','al',$attributes); my ($group_by,@group_args)= $self->_make_attribute_group('a',$attributes); my $attribute_table = $self->_attribute_table; my $attributelist_table = $self->_attributelist_table; my $from = "$attribute_table as a, $attributelist_table as al"; my $where = <_typelist_table; my $from = "$typelist AS tl"; my (@matches,@args); for my $type (@types) { if (ref $type && $type->isa('Bio::DB::GFF::Typename')) { $primary_tag = $type->method; $source_tag = $type->source; } else { ($primary_tag,$source_tag) = split ':',$type,2; } if ($source_tag) { push @matches,"lower(tl.tag)=lower(?)"; push @args,"$primary_tag:$source_tag"; } else { push @matches,"tl.tag ILIKE ?"; push @args,"$primary_tag:%"; } } my $matches = join ' OR ',@matches; my $where = <_meta_table; if (defined $value && $self->writeable) { my $querydel = "DELETE FROM $meta WHERE name = ?"; my $query = "INSERT INTO $meta (name,value) VALUES (?,?)"; my $sthdel = $self->_prepare($querydel); my $sth = $self->_prepare($query); $sthdel->execute($variable_name); $sth->execute($variable_name,$value) or $self->throw($sth->errstr); $sth->finish; $self->{settings_cache}{$variable_name} = $value; } else { return $self->{settings_cache}{$variable_name} if exists $self->{settings_cache}{$variable_name}; my $query = "SELECT value FROM $meta as m WHERE m.name=?"; my $sth = $self->_prepare($query); # $sth->execute($variable_name) or $self->throw($sth->errstr); unless ($sth->execute($variable_name)) { my $errstr = $sth->errstr; $sth = $self->_prepare("SHOW search_path"); $sth->execute(); $errstr .= "With search_path " . $sth->fetchrow_arrayref->[0] . "\n"; $self->throw($errstr); } my ($value) = $sth->fetchrow_array; $sth->finish; return $self->{settings_cache}{$variable_name} = $value; } } # overridden because of use of REPLACE in mysql adapter ### # Replace Bio::SeqFeatureI into database. # sub replace { my $self = shift; my $object = shift; my $index_flag = shift || undef; # ?? shouldn't need to do this # $self->_load_class($object); my $id = $object->primary_id; my $features = $self->_feature_table; my $query = "INSERT INTO $features (id,object,indexed,seqid,start,\"end\",strand,tier,bin,typeid) VALUES (?,?,?,?,?,?,?,?,?,?)"; my $query_noid = "INSERT INTO $features (object,indexed,seqid,start,\"end\",strand,tier,bin,typeid) VALUES (?,?,?,?,?,?,?,?,?)"; my $querydel = "DELETE FROM $features WHERE id = ?"; my $sthdel = $self->_prepare($querydel); my $sth = $self->_prepare($query); my $sth_noid = $self->_prepare($query_noid); my @location = $index_flag ? $self->_get_location_and_bin($object) : (undef)x6; my $primary_tag = $object->primary_tag; my $source_tag = $object->source_tag || ''; $primary_tag .= ":$source_tag"; my $typeid = $self->_typeid($primary_tag,1); if ($id) { $sthdel->execute($id); $sth->execute($id,encode_base64($self->freeze($object), ''),$index_flag||0,@location,$typeid) or $self->throw($sth->errstr); } else { $sth_noid->execute(encode_base64($self->freeze($object), ''),$index_flag||0,@location,$typeid) or $self->throw($sth->errstr); } my $dbh = $self->dbh; $object->primary_id($dbh->last_insert_id(undef, undef, undef, undef, {sequence=>$features."_id_seq"})) unless defined $id; $self->flag_for_indexing($dbh->last_insert_id(undef, undef, undef, undef, {sequence=>$features."_id_seq"})) if $self->{bulk_update_in_progress}; } =head2 types Title : types Usage : @type_list = $db->types Function: Get all the types in the database Returns : array of Bio::DB::GFF::Typename objects Args : none Status : public =cut # overridden because "offset" is reserved in postgres ### # Insert a bit of DNA or protein into the database # sub _insert_sequence { my $self = shift; my ($seqid,$seq,$offset) = @_; my $id = $self->_locationid($seqid); my $seqtable = $self->_sequence_table; my $sthdel = $self->_prepare("DELETE FROM $seqtable WHERE id = ? AND \"offset\" = ?"); my $sth = $self->_prepare(<execute($id,$offset); $sth->execute($id,$offset,$seq) or $self->throw($sth->errstr); } # overridden because of mysql adapter's use of REPLACE ### # This subroutine flags the given primary ID for later reindexing # sub flag_for_indexing { my $self = shift; my $id = shift; my $needs_updating = $self->_update_table; my $querydel = "DELETE FROM $needs_updating WHERE id = ?"; my $query = "INSERT INTO $needs_updating VALUES (?)"; my $sthdel = $self->_prepare($querydel); my $sth = $self->_prepare($query); $sthdel->execute($id); $sth->execute($id) or $self->throw($self->dbh->errstr); } # overridden because of the different ways that mysql and postgres # handle id sequences sub _genericid { my $self = shift; my ($table,$namefield,$name,$add_if_missing) = @_; my $qualified_table = $self->_qualify($table); my $sth = $self->_prepare(<execute($name) or die $sth->errstr; my ($id) = $sth->fetchrow_array; $sth->finish; return $id if defined $id; return unless $add_if_missing; $sth = $self->_prepare(<execute($name) or die $sth->errstr; my $dbh = $self->dbh; return $dbh->last_insert_id(undef, undef, undef, undef, {sequence=>$qualified_table."_id_seq"}); } # overridden because of differences in binding between mysql and postgres adapters # given a statement handler that is expected to return rows of (id,object) # unthaw each object and return a list of 'em sub _sth2objs { my $self = shift; my $sth = shift; my @result; my ($id, $o); $sth->bind_col(1, \$id); $sth->bind_col(2, \$o, { pg_type => PG_BYTEA}); #while (my ($id,$o) = $sth->fetchrow_array) { while ($sth->fetch) { my $obj = $self->thaw(decode_base64($o) ,$id); push @result,$obj; } $sth->finish; return @result; } # given a statement handler that is expected to return rows of (id,object) # unthaw each object and return a list of 'em sub _sth2obj { my $self = shift; my $sth = shift; my ($id,$o) = $sth->fetchrow_array; return unless $o; my $obj = $self->thaw(decode_base64($o) ,$id); $obj; } #################################################################################################### # SQL Fragment generators #################################################################################################### # overridden because of base64 encoding needed here ### # special-purpose store for bulk loading - write to a file rather than to the db # sub _dump_store { my $self = shift; my $indexed = shift; my $count = 0; my $store_fh = $self->dump_filehandle('feature'); my $dbh = $self->dbh; my $autoindex = $self->autoindex; for my $obj (@_) { my $id = $self->next_id; my ($seqid,$start,$end,$strand,$tier,$bin) = $indexed ? $self->_get_location_and_bin($obj) : (undef)x6; my $primary_tag = $obj->primary_tag; my $source_tag = $obj->source_tag || ''; $primary_tag .= ":$source_tag"; my $typeid = $self->_typeid($primary_tag,1); my $frozen_object = encode_base64($self->freeze($obj), ''); # TODO: Fix this, why does frozen object start with quote but not end with one print $store_fh join("\t",$id,$typeid,$seqid,$start,$end,$strand,$tier,$bin,$indexed,$frozen_object),"\n"; $obj->primary_id($id); $self->_update_indexes($obj) if $indexed && $autoindex; $count++; } # remember whether we are have ever stored a non-indexed feature unless ($indexed or $self->{indexed_flag}++) { $self->subfeatures_are_indexed(0); } $count; } sub _enable_keys { } # nullop sub _disable_keys { } # nullop sub _add_interval_stats_table { my $self = shift; my $tables = $self->table_definitions; my $interval_stats = $self->_interval_stats_table; ##check to see if it exists yet; if it does, just return because ##there is a drop from in the next step my $dbh = $self->dbh; my @table_exists = $dbh->selectrow_array("SELECT * FROM pg_tables WHERE tablename = '$interval_stats' AND schemaname = '".$self->namespace."'"); if (!scalar(@table_exists)) { my $query = "CREATE TABLE $interval_stats $tables->{interval_stats}"; $dbh->do($query) or $self->throw($dbh->errstr); } } sub _fetch_indexed_features_sql { my $self = shift; my $features = $self->_feature_table; return <new(-adaptor => 'DBI::mysql', -dsn => 'dbi:mysql:test'); # get a feature from somewhere my $feature = Bio::SeqFeature::Generic->new(...); # store it $db->store($feature) or die "Couldn't store!"; # primary ID of the feature is changed to indicate its primary ID # in the database... my $id = $feature->primary_id; # get the feature back out my $f = $db->fetch($id); # change the feature and update it $f->start(100); $f->update($f) or die "Couldn't update!"; # searching... # ...by id my @features = $db->fetch_many(@list_of_ids); # ...by name @features = $db->get_features_by_name('ZK909'); # ...by alias @features = $db->get_features_by_alias('sma-3'); # ...by type @features = $db->get_features_by_name('gene'); # ...by location @features = $db->get_features_by_location(-seq_id=>'Chr1',-start=>4000,-end=>600000); # ...by attribute @features = $db->get_features_by_attribute({description => 'protein kinase'}) # ...by the GFF "Note" field @result_list = $db->search_notes('kinase'); # ...by arbitrary combinations of selectors @features = $db->features(-name => $name, -type => $types, -seq_id => $seqid, -start => $start, -end => $end, -attributes => $attributes); # ...using an iterator my $iterator = $db->get_seq_stream(-name => $name, -type => $types, -seq_id => $seqid, -start => $start, -end => $end, -attributes => $attributes); while (my $feature = $iterator->next_seq) { # do something with the feature } # ...limiting the search to a particular region my $segment = $db->segment('Chr1',5000=>6000); my @features = $segment->features(-type=>['mRNA','match']); # getting & storing sequence information # Warning: this returns a string, and not a PrimarySeq object $db->insert_sequence('Chr1','GATCCCCCGGGATTCCAAAA...'); my $sequence = $db->fetch_sequence('Chr1',5000=>6000); # what feature types are defined in the database? my @types = $db->types; # create a new feature in the database my $feature = $db->new_feature(-primary_tag => 'mRNA', -seq_id => 'chr3', -start => 10000, -end => 11000); =head1 DESCRIPTION Bio::DB::SeqFeature::Store::mysql is the Mysql adaptor for Bio::DB::SeqFeature::Store. You will not create it directly, but instead use Bio::DB::SeqFeature::Store-Enew() to do so. See L for complete usage instructions. =head2 Using the Mysql adaptor Before you can use the adaptor, you must use the mysqladmin tool to create a database and establish a user account with write permission. In order to use "fast" loading, the user account must have "file" privileges. To establish a connection to the database, call Bio::DB::SeqFeature::Store-Enew(-adaptor=E'DBI::mysql',@more_args). The additional arguments are as follows: Argument name Description ------------- ----------- -dsn The database name. You can abbreviate "dbi:mysql:foo" as "foo" if you wish. -user Username for authentication. -pass Password for authentication. -namespace A prefix to attach to each table. This allows you to have several virtual databases in the same physical database. -temp Boolean flag. If true, a temporary database will be created and destroyed as soon as the Store object goes out of scope. (synonym -temporary) -autoindex Boolean flag. If true, features in the database will be reindexed every time they change. This is the default. -tmpdir Directory in which to place temporary files during "fast" loading. Defaults to File::Spec->tmpdir(). (synonyms -dump_dir, -dumpdir, -tmp) -dbi_options A hashref to pass to DBI->connect's 4th argument, the "attributes." (synonyms -options, -dbi_attr) -write Pass true to open database for writing or updating. If successful, a new instance of Bio::DB::SeqFeature::Store::DBI::mysql will be returned. In addition to the standard methods supported by all well-behaved Bio::DB::SeqFeature::Store databases, several following adaptor-specific methods are provided. These are described in the next sections. =cut use strict; use base 'Bio::DB::SeqFeature::Store'; use Bio::DB::SeqFeature::Store::DBI::Iterator; use DBI; use Memoize; use Cwd 'abs_path'; use Bio::DB::GFF::Util::Rearrange 'rearrange'; use Bio::SeqFeature::Lite; use File::Spec; use Carp 'carp','cluck','croak'; use constant DEBUG=>0; # from the MySQL documentation... # WARNING: if your sequence uses coordinates greater than 2 GB, you are out of luck! use constant MAX_INT => 2_147_483_647; use constant MIN_INT => -2_147_483_648; use constant MAX_BIN => 1_000_000_000; # size of largest feature = 1 Gb use constant MIN_BIN => 1000; # smallest bin we'll make - on a 100 Mb chromosome, there'll be 100,000 of these use constant SUMMARY_BIN_SIZE => 1000; # tier 0 == 1000 bp bins # tier 1 == 10,000 bp bins # etc. memoize('_typeid'); memoize('_locationid'); memoize('_attributeid'); memoize('dump_path'); ### # object initialization # sub init { my $self = shift; my ($dsn, $is_temporary, $autoindex, $namespace, $dump_dir, $user, $pass, $dbi_options, $writeable, $create, ) = rearrange(['DSN', ['TEMP','TEMPORARY'], 'AUTOINDEX', 'NAMESPACE', ['DUMP_DIR','DUMPDIR','TMP','TMPDIR'], 'USER', ['PASS','PASSWD','PASSWORD'], ['OPTIONS','DBI_OPTIONS','DBI_ATTR'], ['WRITE','WRITEABLE'], 'CREATE', ],@_); $dbi_options ||= {}; $writeable = 1 if $is_temporary or $dump_dir; $dsn or $self->throw("Usage: ".__PACKAGE__."->init(-dsn => \$dbh || \$dsn)"); my $dbh; if (ref $dsn) { $dbh = $dsn; } else { $dsn = "dbi:mysql:$dsn" unless $dsn =~ /^dbi:/; $dbh = DBI->connect($dsn,$user,$pass,$dbi_options) or $self->throw($DBI::errstr); $dbh->{mysql_auto_reconnect} = 1; } $self->{dbh} = $dbh; $self->{is_temp} = $is_temporary; $self->{namespace} = $namespace; $self->{writeable} = $writeable; $self->default_settings; $self->autoindex($autoindex) if defined $autoindex; $self->dumpdir($dump_dir) if $dump_dir; if ($self->is_temp) { $self->init_tmp_database(); } elsif ($create) { $self->init_database('erase'); } } sub writeable { shift->{writeable} } sub can_store_parentage { 1 } sub table_definitions { my $self = shift; return { feature => < < < < < < < < < <maybe_create_meta(); $self->SUPER::default_settings; $self->autoindex(1); $self->dumpdir(File::Spec->tmpdir); } ### # retrieve database handle # sub dbh { my $self = shift; my $d = $self->{dbh}; $self->{dbh} = shift if @_; $d; } sub clone { my $self = shift; $self->{dbh}{InactiveDestroy} = 1; $self->{dbh} = $self->{dbh}->clone unless $self->is_temp; } ### # get/set directory for bulk load tables # sub dumpdir { my $self = shift; my $d = $self->{dumpdir}; $self->{dumpdir} = abs_path(shift) if @_; $d; } ### # table namespace (multiple dbs in one mysql db) # sub namespace { my $self = shift; my $d = $self->{namespace}; $self->{namespace} = shift if @_; $d; } ### # Required for Pg not mysql # sub remove_namespace { return; } ### # find a path that corresponds to a dump table # sub dump_path { my $self = shift; my $table = $self->_qualify(shift); return "$self->{dumpdir}/$table.$$"; } ### # make a filehandle (writeable) that corresponds to a dump table # sub dump_filehandle { my $self = shift; my $table = shift; eval "require IO::File" unless IO::File->can('new'); my $path = $self->dump_path($table); my $fh = $self->{filehandles}{$path} ||= IO::File->new(">$path"); $fh; } ### # find the next ID for a feature (used only during bulk loading) # sub next_id { my $self = shift; $self->{max_id} ||= $self->max_id; return ++$self->{max_id}; } ### # find the maximum ID for a feature (used only during bulk loading) # sub max_id { my $self = shift; my $features = $self->_feature_table; my $sth = $self->_prepare("SELECT max(id) from $features"); $sth->execute or $self->throw($sth->errstr); my ($id) = $sth->fetchrow_array; $id; } ### # wipe database clean and reinstall schema # sub _init_database { my $self = shift; my $erase = shift; my $dbh = $self->dbh; my $tables = $self->table_definitions; for my $t (keys %$tables) { next if $t eq 'meta'; # don't get rid of meta data! my $table = $self->_qualify($t); $dbh->do("DROP table IF EXISTS $table") if $erase; my $query = "CREATE TABLE IF NOT EXISTS $table $tables->{$t}"; $self->_create_table($dbh,$query); } $self->subfeatures_are_indexed(1) if $erase; 1; } sub init_tmp_database { my $self = shift; my $dbh = $self->dbh; my $tables = $self->table_definitions; for my $t (keys %$tables) { next if $t eq 'meta'; # done earlier my $table = $self->_qualify($t); my $query = "CREATE TEMPORARY TABLE $table $tables->{$t}"; $self->_create_table($dbh,$query); } 1; } sub _create_table { my $self = shift; my ($dbh,$query) = @_; for my $q (split ';',$query) { chomp($q); next unless $q =~ /\S/; $dbh->do("$q;\n") or $self->throw($dbh->errstr); } } sub maybe_create_meta { my $self = shift; return unless $self->writeable; my $meta = $self->_meta_table; my $tables = $self->table_definitions; my $temporary = $self->is_temp ? 'TEMPORARY' : ''; $self->dbh->do("CREATE $temporary TABLE IF NOT EXISTS $meta $tables->{meta}"); } ### # use temporary tables # sub is_temp { shift->{is_temp}; } sub attributes { my $self = shift; my $dbh = $self->dbh; my $attributelist_table = $self->_attributelist_table; my $a = $dbh->selectcol_arrayref("SELECT tag FROM $attributelist_table") or $self->throw($dbh->errstr); return @$a; } sub _store { my $self = shift; # special case for bulk updates return $self->_dump_store(@_) if $self->{bulk_update_in_progress}; my $indexed = shift; my $count = 0; my $autoindex = $self->autoindex; my $dbh = $self->dbh; local $dbh->{RaiseError} = 1; $self->begin_work; eval { for my $obj (@_) { $self->replace($obj,$indexed); $self->_update_indexes($obj) if $indexed && $autoindex; $count++; } }; if ($@) { warn "Transaction aborted because $@"; $self->rollback; } else { $self->commit; } # remember whether we are have ever stored a non-indexed feature unless ($indexed or $self->{indexed_flag}++) { $self->subfeatures_are_indexed(0); } $count; } # we memoize this in order to avoid making zillions of calls sub autoindex { my $self = shift; # special case for bulk update -- need to build the indexes # at the same time we build the main feature table return 1 if $self->{bulk_update_in_progress}; my $d = $self->setting('autoindex'); $self->setting(autoindex=>shift) if @_; $d; } sub _start_bulk_update { my $self = shift; my $dbh = $self->dbh; $self->begin_work; $self->{bulk_update_in_progress}++; } sub _finish_bulk_update { my $self = shift; my $dbh = $self->dbh; my $dir = $self->{dumpdir} || '.'; for my $table ($self->_feature_table,$self->index_tables) { my $fh = $self->dump_filehandle($table); my $path = $self->dump_path($table); $fh->close; #print STDERR "$path\n"; $dbh->do("LOAD DATA LOCAL INFILE '$path' REPLACE INTO TABLE $table FIELDS OPTIONALLY ENCLOSED BY '\\''") or $self->throw($dbh->errstr); unlink $path; } delete $self->{bulk_update_in_progress}; delete $self->{ filehandles}; $self->commit; } ### # Add a subparts to a feature. Both feature and all subparts must already be in database. # sub _add_SeqFeature { my $self = shift; # special purpose method for case when we are doing a bulk update return $self->_dump_add_SeqFeature(@_) if $self->{bulk_update_in_progress}; my $parent = shift; my @children = @_; my $dbh = $self->dbh; local $dbh->{RaiseError} = 1; my $parent2child = $self->_parent2child_table(); my $count = 0; my $sth = $self->_prepare(<primary_id : $parent) or $self->throw("$parent should have a primary_id"); $self->begin_work or $self->throw($dbh->errstr); eval { for my $child (@children) { my $child_id = ref $child ? $child->primary_id : $child; defined $child_id or die "no primary ID known for $child"; $sth->execute($parent_id,$child_id); $count++; } }; if ($@) { warn "Transaction aborted because $@"; $self->rollback; } else { $self->commit; } $sth->finish; $count; } sub _fetch_SeqFeatures { my $self = shift; my $parent = shift; my @types = @_; my $parent_id = $parent->primary_id or $self->throw("$parent should have a primary_id"); my $features = $self->_feature_table; my $parent2child = $self->_parent2child_table(); my @from = ("$features as f","$parent2child as c"); my @where = ('f.id=c.child','c.id=?'); my @args = $parent_id; if (@types) { my ($from,$where,undef,@a) = $self->_types_sql(\@types,'f'); push @from,$from if $from; push @where,$where if $where; push @args,@a; } my $from = join ', ',@from; my $where = join ' AND ',@where; my $query = <_print_query($query,@args) if DEBUG || $self->debug; my $sth = $self->_prepare($query) or $self->throw($self->dbh->errstr); $sth->execute(@args) or $self->throw($sth->errstr); return $self->_sth2objs($sth); } ### # get primary sequence between start and end # sub _fetch_sequence { my $self = shift; my ($seqid,$start,$end) = @_; # backward compatibility to the old days when I liked reverse complementing # dna by specifying $start > $end my $reversed; if (defined $start && defined $end && $start > $end) { $reversed++; ($start,$end) = ($end,$start); } $start-- if defined $start; $end-- if defined $end; my $id = $self->_locationid($seqid); my $offset1 = $self->_offset_boundary($id,$start || 'left'); my $offset2 = $self->_offset_boundary($id,$end || 'right'); my $sequence_table = $self->_sequence_table; my $sql = <= ? AND s.offset <= ? ORDER BY s.offset END my $sth = $self->_prepare($sql); my $seq = ''; $self->_print_query($sql,$id,$offset1,$offset2) if DEBUG || $self->debug; $sth->execute($id,$offset1,$offset2) or $self->throw($sth->errstr); while (my($frag,$offset) = $sth->fetchrow_array) { substr($frag,0,$start-$offset) = '' if defined $start && $start > $offset; $seq .= $frag; } substr($seq,$end-$start+1) = '' if defined $end && $end-$start+1 < length($seq); if ($reversed) { $seq = reverse $seq; $seq =~ tr/gatcGATC/ctagCTAG/; } $sth->finish; $seq; } sub _offset_boundary { my $self = shift; my ($seqid,$position) = @_; my $sequence_table = $self->_sequence_table; my $locationlist_table = $self->_locationlist_table; my $sql; $sql = $position eq 'left' ? "SELECT min(offset) FROM $sequence_table as s WHERE s.id=?" :$position eq 'right' ? "SELECT max(offset) FROM $sequence_table as s WHERE s.id=?" :"SELECT max(offset) FROM $sequence_table as s WHERE s.id=? AND offset<=?"; my $sth = $self->_prepare($sql); my @args = $position =~ /^-?\d+$/ ? ($seqid,$position) : ($seqid); $self->_print_query($sql,@args) if DEBUG || $self->debug; $sth->execute(@args) or $self->throw($sth->errstr); my $boundary = $sth->fetchall_arrayref->[0][0]; $sth->finish; return $boundary; } ### # add namespace to tablename # sub _qualify { my $self = shift; my $table_name = shift; my $namespace = $self->namespace; return $table_name if (!defined $namespace || # is namespace already present in table name? index($table_name, $namespace) == 0); return "${namespace}_${table_name}"; } ### # Fetch a Bio::SeqFeatureI from database using its primary_id # sub _fetch { my $self = shift; @_ or $self->throw("usage: fetch(\$primary_id)"); my $primary_id = shift; my $features = $self->_feature_table; my $sth = $self->_prepare(<execute($primary_id) or $self->throw($sth->errstr); my $obj = $self->_sth2obj($sth); $sth->finish; $obj; } ### # Efficiently fetch a series of IDs from the database # Can pass an array or an array ref # sub _fetch_many { my $self = shift; @_ or $self->throw('usage: fetch_many($id1,$id2,$id3...)'); my $ids = join ',',map {ref($_) ? @$_ : $_} @_ or return; my $features = $self->_feature_table; my $sth = $self->_prepare(<execute() or $self->throw($sth->errstr); return $self->_sth2objs($sth); } sub _features { my $self = shift; my ($seq_id,$start,$end,$strand, $name,$class,$allow_aliases, $types, $attributes, $range_type, $fromtable, $iterator, $sources, ) = rearrange([['SEQID','SEQ_ID','REF'],'START',['STOP','END'],'STRAND', 'NAME','CLASS','ALIASES', ['TYPES','TYPE','PRIMARY_TAG'], ['ATTRIBUTES','ATTRIBUTE'], 'RANGE_TYPE', 'FROM_TABLE', 'ITERATOR', ['SOURCE','SOURCES'], ],@_); my (@from,@where,@args,@group); $range_type ||= 'overlaps'; my $features = $self->_feature_table; @from = "$features as f"; if (defined $name) { # hacky backward compatibility workaround undef $class if $class && $class eq 'Sequence'; $name = "$class:$name" if defined $class && length $class > 0; # last argument is the join field my ($from,$where,$group,@a) = $self->_name_sql($name,$allow_aliases,'f.id'); push @from,$from if $from; push @where,$where if $where; push @group,$group if $group; push @args,@a; } if (defined $seq_id) { # last argument is the name of the features table my ($from,$where,$group,@a) = $self->_location_sql($seq_id,$start,$end,$range_type,$strand,'f'); push @from,$from if $from; push @where,$where if $where; push @group,$group if $group; push @args,@a; } if (defined($sources)) { my @sources = ref($sources) eq 'ARRAY' ? @{$sources} : ($sources); if (defined($types)) { my @types = ref($types) eq 'ARRAY' ? @{$types} : ($types); my @final_types; foreach my $type (@types) { # *** not sure what to do if user supplies both -source and -type # where the type includes a source! if ($type =~ /:/) { push(@final_types, $type); } else { foreach my $source (@sources) { push(@final_types, $type.':'.$source); } } } $types = \@final_types; } else { $types = [map { ':'.$_ } @sources]; } } if (defined($types)) { # last argument is the name of the features table my ($from,$where,$group,@a) = $self->_types_sql($types,'f'); push @from,$from if $from; push @where,$where if $where; push @group,$group if $group; push @args,@a; } if (defined $attributes) { # last argument is the join field my ($from,$where,$group,@a) = $self->_attributes_sql($attributes,'f.id'); push @from,$from if $from; push @where,$where if $where; push @group,$group if $group; push @args,@a; } if (defined $fromtable) { # last argument is the join field my ($from,$where,$group,@a) = $self->_from_table_sql($fromtable,'f.id'); push @from,$from if $from; push @where,$where if $where; push @group,$group if $group; push @args,@a; } # if no other criteria are specified, then # only fetch indexed (i.e. top level objects) @where = 'indexed=1' unless @where; my $from = join ', ',@from; my $where = join ' AND ',map {"($_)"} @where; my $group = join ', ',@group; $group = "GROUP BY $group" if @group; my $query = <_print_query($query,@args) if DEBUG || $self->debug; my $sth = $self->_prepare($query) or $self->throw($self->dbh->errstr); $sth->execute(@args) or $self->throw($sth->errstr); return $iterator ? Bio::DB::SeqFeature::Store::DBI::Iterator->new($sth,$self) : $self->_sth2objs($sth); } sub _aggregate_bins { my $self = shift; my $sth = shift; my (%types,$binsize,$binstart); while (my ($type,$seqname,$bin,$count,$bins,$start,$end) = $sth->fetchrow_array) { $binsize ||= ($end-$start+1)/$bins; $binstart ||= int($start/$binsize); $types{$type}{seqname} ||= $seqname; $types{$type}{min} ||= $start; $types{$type}{max} ||= $end; $types{$type}{bins} ||= [(0) x $bins]; $types{$type}{bins}[$bin-$binstart] = $count; $types{$type}{count} += $count; } my @results; for my $type (keys %types) { my $min = $types{$type}{min}; my $max = $types{$type}{max}; my $seqid= $types{$type}{seqname}; my $f = Bio::SeqFeature::Lite->new(-seq_id => $seqid, -start => $min, -end => $max, -type => "$type:bins", -score => $types{$type}{count}, -attributes => {coverage => join ',',@{$types{$type}{bins}}}); push @results,$f; } return @results; } sub _name_sql { my $self = shift; my ($name,$allow_aliases,$join) = @_; my $name_table = $self->_name_table; my $from = "$name_table as n"; my ($match,$string) = $self->_match_sql($name); my $where = "n.id=$join AND n.name $match"; $where .= " AND n.display_name>0" unless $allow_aliases; return ($from,$where,'',$string); } sub _search_attributes { my $self = shift; my ($search_string,$attribute_names,$limit) = @_; my @words = map {quotemeta($_)} split /\s+/,$search_string; my $name_table = $self->_name_table; my $attribute_table = $self->_attribute_table; my $attributelist_table = $self->_attributelist_table; my $type_table = $self->_type_table; my $typelist_table = $self->_typelist_table; my @tags = @$attribute_names; my $tag_sql = join ' OR ',("al.tag=?") x @tags; my $perl_regexp = join '|',@words; my $sql_regexp = join ' OR ',("a.attribute_value REGEXP ?") x @words; my $sql = <_print_query($sql,@tags,@words) if DEBUG || $self->debug; my $sth = $self->_prepare($sql); $sth->execute(@tags,@words) or $self->throw($sth->errstr); my @results; while (my($name,$value,$type,$id) = $sth->fetchrow_array) { my (@hits) = $value =~ /$perl_regexp/ig; my @words_in_row = split /\b/,$value; my $score = int(@hits * 10); push @results,[$name,$value,$score,$type,$id]; } $sth->finish; @results = sort {$b->[2]<=>$a->[2]} @results; return @results; } sub _match_sql { my $self = shift; my $name = shift; my ($match,$string); if ($name =~ /(?:^|[^\\])[*?]/) { $name =~ s/(^|[^\\])([%_])/$1\\$2/g; $name =~ s/(^|[^\\])\*/$1%/g; $name =~ s/(^|[^\\])\?/$1_/g; $match = "LIKE ?"; $string = $name; } else { $match = "= ?"; $string = $name; } return ($match,$string); } sub _from_table_sql { my $self = shift; my ($from_table,$join) = @_; my $from = "$from_table as ft"; my $where = "ft.id=$join"; return ($from,$where,''); } sub _attributes_sql { my $self = shift; my ($attributes,$join) = @_; my ($wf,@bind_args) = $self->_make_attribute_where('a','al',$attributes); my ($group_by,@group_args)= $self->_make_attribute_group('a',$attributes); my $attribute_table = $self->_attribute_table; my $attributelist_table = $self->_attributelist_table; my $from = "$attribute_table as a use index(attribute_id), $attributelist_table as al"; my $where = <_typelist_table; my $from = "$typelist AS tl"; my (@matches,@args); for my $type (@types) { if (ref $type && $type->isa('Bio::DB::GFF::Typename')) { $primary_tag = $type->method; $source_tag = $type->source; } else { ($primary_tag,$source_tag) = split ':',$type,2; } if (defined $source_tag && length $source_tag) { if (defined $primary_tag && length($primary_tag)) { push @matches,"tl.tag=?"; push @args,"$primary_tag:$source_tag"; } else { push @matches,"tl.tag LIKE ?"; push @args,"%:$source_tag"; } } else { push @matches,"tl.tag LIKE ?"; push @args,"$primary_tag:%"; } } my $matches = join ' OR ',@matches; my $where = <_locationid_nocreate($seq_id) || 0; # zero is an invalid primary ID, so will return empty $start = MIN_INT unless defined $start; $end = MAX_INT unless defined $end; my ($bin_where,@bin_args) = $self->bin_where($start,$end,$location); my ($range,@range_args); if ($range_type eq 'overlaps') { $range = "$location.end>=? AND $location.start<=? AND ($bin_where)"; @range_args = ($start,$end,@bin_args); } elsif ($range_type eq 'contains') { $range = "$location.start>=? AND $location.end<=? AND ($bin_where)"; @range_args = ($start,$end,@bin_args); } elsif ($range_type eq 'contained_in') { $range = "$location.start<=? AND $location.end>=?"; @range_args = ($start,$end); } else { $self->throw("range_type must be one of 'overlaps', 'contains' or 'contained_in'"); } if (defined $strand) { $range .= " AND strand=?"; push @range_args,$strand; } my $where = <dbh; my $count = 0; my $now; # try to bring in highres time() function eval "require Time::HiRes"; my $last_time = $self->time(); # tell _delete_index() not to bother removing the index rows corresponding # to each individual feature local $self->{reindexing} = 1; $self->begin_work; eval { my $update = $from_update_table; for my $table ($self->index_tables) { my $query = $from_update_table ? "DELETE $table FROM $table,$update WHERE $table.id=$update.id" : "DELETE FROM $table"; $dbh->do($query); $self->_disable_keys($dbh,$table); } my $iterator = $self->get_seq_stream(-from_table=>$from_update_table ? $update : undef); while (my $f = $iterator->next_seq) { if (++$count %1000 == 0) { $now = $self->time(); my $elapsed = sprintf(" in %5.2fs",$now - $last_time); $last_time = $now; print STDERR "$count features indexed$elapsed...",' 'x60; print STDERR -t STDOUT && !$ENV{EMACS} ? "\r" : "\n"; } $self->_update_indexes($f); } }; for my $table ($self->index_tables) { $self->_enable_keys($dbh,$table); } if (@_) { warn "Couldn't complete transaction: $@"; $self->rollback; return; } else { $self->commit; return 1; } } sub optimize { my $self = shift; $self->dbh->do("ANALYZE TABLE $_") foreach $self->index_tables; } sub all_tables { my $self = shift; my @index_tables = $self->index_tables; my $features = $self->_feature_table; return ($features,@index_tables); } sub index_tables { my $self = shift; return map {$self->_qualify($_)} qw(name attribute parent2child) } sub _firstid { my $self = shift; my $features = $self->_feature_table; my $query = <_prepare($query); $sth->execute(); my ($first) = $sth->fetchrow_array; $sth->finish; $first; } sub _nextid { my $self = shift; my $lastkey = shift; my $features = $self->_feature_table; my $query = <? END my $sth=$self->_prepare($query); $sth->execute($lastkey); my ($next) = $sth->fetchrow_array; $sth->finish; $next; } sub _existsid { my $self = shift; my $key = shift; my $features = $self->_feature_table; my $query = <_prepare($query); $sth->execute($key); my ($count) = $sth->fetchrow_array; $sth->finish; $count > 0; } sub _deleteid { my $self = shift; my $key = shift; my $dbh = $self->dbh; my $parent2child = $self->_parent2child_table; my $query = "SELECT child FROM $parent2child WHERE id=?"; my $sth=$self->_prepare($query); $sth->execute($key); my $success = 0; while (my ($cid) = $sth->fetchrow_array) { # Backcheck looking for multiple parents, delete only if one is present. I'm # sure there is a nice way to left join the parent2child table onto itself # to get this in one query above, just haven't worked it out yet... my $sth2 = $self->_prepare("SELECT count(id) FROM $parent2child WHERE child=?"); $sth2->execute($cid); my ($count) = $sth2->fetchrow_array; if ($count == 1) { $self->_deleteid($cid) || warn "An error occurred while removing subfeature id=$cid. Perhaps it was previously deleted?\n"; } } for my $table ($self->all_tables) { $success += $dbh->do("DELETE FROM $table WHERE id=$key") || 0; } return $success; } sub _clearall { my $self = shift; my $dbh = $self->dbh; for my $table ($self->all_tables) { $dbh->do("DELETE FROM $table"); } } sub _featurecount { my $self = shift; my $dbh = $self->dbh; my $features = $self->_feature_table; my $query = <_prepare($query); $sth->execute(); my ($count) = $sth->fetchrow_array; $sth->finish; $count; } sub _seq_ids { my $self = shift; my $dbh = $self->dbh; my $location = $self->_locationlist_table; my $sth = $self->_prepare("SELECT DISTINCT seqname FROM $location"); $sth->execute() or $self->throw($sth->errstr); my @result; while (my ($id) = $sth->fetchrow_array) { push @result,$id; } return @result; } sub setting { my $self = shift; my ($variable_name,$value) = @_; my $meta = $self->_meta_table; if (defined $value && $self->writeable) { my $query = <_prepare($query); $sth->execute($variable_name,$value) or $self->throw($sth->errstr); $sth->finish; $self->{settings_cache}{$variable_name} = $value; } else { return $self->{settings_cache}{$variable_name} if exists $self->{settings_cache}{$variable_name}; my $query = <_prepare($query); $sth->execute($variable_name) or $self->throw($sth->errstr); my ($value) = $sth->fetchrow_array; $sth->finish; return $self->{settings_cache}{$variable_name} = $value; } } ### # Replace Bio::SeqFeatureI into database. # sub replace { my $self = shift; my $object = shift; my $index_flag = shift || undef; # ?? shouldn't need to do this # $self->_load_class($object); my $id = $object->primary_id; my $features = $self->_feature_table; my $sth = $self->_prepare(<_get_location_and_bin($object) : (undef)x6; my $primary_tag = $object->primary_tag; my $source_tag = $object->source_tag || ''; $primary_tag .= ":$source_tag"; my $typeid = $self->_typeid($primary_tag,1); my $frozen = $self->no_blobs() ? 0 : $self->freeze($object); $sth->execute($id,$frozen,$index_flag||0,@location,$typeid) or $self->throw($sth->errstr); my $dbh = $self->dbh; $object->primary_id($dbh->{mysql_insertid}) unless defined $id; $self->flag_for_indexing($dbh->{mysql_insertid}) if $self->{bulk_update_in_progress}; } # doesn't work with this schema, since we have to update name and attribute # tables which need object ids, which we can only know by replacing feats in # the feature table one by one sub bulk_replace { my $self = shift; my $index_flag = shift || undef; my @objects = @_; my $features = $self->_feature_table; my @insert_values; foreach my $object (@objects) { my $id = $object->primary_id; my @location = $index_flag ? $self->_get_location_and_bin($object) : (undef)x6; my $primary_tag = $object->primary_tag; my $source_tag = $object->source_tag || ''; $primary_tag .= ":$source_tag"; my $typeid = $self->_typeid($primary_tag,1); push(@insert_values, ($id,0,$index_flag||0,@location,$typeid)); } my @value_blocks; for (1..@objects) { push(@value_blocks, '(?,?,?,?,?,?,?,?,?,?)'); } my $value_blocks = join(',', @value_blocks); my $sql = qq{REPLACE INTO $features (id,object,indexed,seqid,start,end,strand,tier,bin,typeid) VALUES $value_blocks}; my $sth = $self->_prepare($sql); $sth->execute(@insert_values) or $self->throw($sth->errstr); } ### # Insert one Bio::SeqFeatureI into database. primary_id must be undef # sub insert { my $self = shift; my $object = shift; my $index_flag = shift || 0; $self->_load_class($object); defined $object->primary_id and $self->throw("$object already has a primary id"); my $features = $self->_feature_table; my $sth = $self->_prepare(<execute(undef,$self->freeze($object),$index_flag) or $self->throw($sth->errstr); my $dbh = $self->dbh; $object->primary_id($dbh->{mysql_insertid}); $self->flag_for_indexing($dbh->{mysql_insertid}) if $self->{bulk_update_in_progress}; } =head2 types Title : types Usage : @type_list = $db->types Function: Get all the types in the database Returns : array of Bio::DB::GFF::Typename objects Args : none Status : public =cut sub types { my $self = shift; eval "require Bio::DB::GFF::Typename" unless Bio::DB::GFF::Typename->can('new'); my $typelist = $self->_typelist_table; my $sql = <_print_query($sql) if DEBUG || $self->debug; my $sth = $self->_prepare($sql); $sth->execute() or $self->throw($sth->errstr); my @results; while (my($tag) = $sth->fetchrow_array) { push @results,Bio::DB::GFF::Typename->new($tag); } $sth->finish; return @results; } =head2 toplevel_types Title : toplevel_types Usage : @type_list = $db->toplevel_types Function: Get the toplevel types in the database Returns : array of Bio::DB::GFF::Typename objects Args : none Status : public This is similar to types() but only returns the types of INDEXED (toplevel) features. =cut sub toplevel_types { my $self = shift; eval "require Bio::DB::GFF::Typename" unless Bio::DB::GFF::Typename->can('new'); my $typelist = $self->_typelist_table; my $features = $self->_feature_table; my $sql = <_print_query($sql) if DEBUG || $self->debug; my $sth = $self->_prepare($sql); $sth->execute() or $self->throw($sth->errstr); my @results; while (my($tag) = $sth->fetchrow_array) { push @results,Bio::DB::GFF::Typename->new($tag); } $sth->finish; return @results; } ### # Insert a bit of DNA or protein into the database # sub _insert_sequence { my $self = shift; my ($seqid,$seq,$offset) = @_; my $id = $self->_locationid($seqid); my $sequence = $self->_sequence_table; my $sth = $self->_prepare(<execute($id,$offset,$seq) or $self->throw($sth->errstr); } ### # This subroutine flags the given primary ID for later reindexing # sub flag_for_indexing { my $self = shift; my $id = shift; my $needs_updating = $self->_update_table; my $sth = $self->_prepare("REPLACE INTO $needs_updating VALUES (?)"); $sth->execute($id) or $self->throw($self->dbh->errstr); } ### # Update indexes for given object # sub _update_indexes { my $self = shift; my $obj = shift; defined (my $id = $obj->primary_id) or return; if ($self->{bulk_update_in_progress}) { $self->_dump_update_name_index($obj,$id); $self->_dump_update_attribute_index($obj,$id); } else { $self->_update_name_index($obj,$id); $self->_update_attribute_index($obj,$id); } } sub _update_name_index { my $self = shift; my ($obj,$id) = @_; my $name = $self->_name_table; my $primary_id = $obj->primary_id; $self->_delete_index($name,$id); my ($names,$aliases) = $self->feature_names($obj); my $sth = $self->_prepare("INSERT INTO $name (id,name,display_name) VALUES (?,?,?)"); $sth->execute($id,$_,1) or $self->throw($sth->errstr) foreach @$names; $sth->execute($id,$_,0) or $self->throw($sth->errstr) foreach @$aliases; $sth->finish; } sub _update_attribute_index { my $self = shift; my ($obj,$id) = @_; my $attribute = $self->_attribute_table; $self->_delete_index($attribute,$id); my $sth = $self->_prepare("INSERT INTO $attribute (id,attribute_id,attribute_value) VALUES (?,?,?)"); for my $tag ($obj->get_all_tags) { my $tagid = $self->_attributeid($tag); for my $value ($obj->get_tag_values($tag)) { $sth->execute($id,$tagid,$value) or $self->throw($sth->errstr); } } $sth->finish; } sub _genericid { my $self = shift; my ($table,$namefield,$name,$add_if_missing) = @_; my $qualified_table = $self->_qualify($table); my $sth = $self->_prepare(<execute($name) or die $sth->errstr; my ($id) = $sth->fetchrow_array; $sth->finish; return $id if defined $id; return unless $add_if_missing; $sth = $self->_prepare(<execute($name) or die $sth->errstr; my $dbh = $self->dbh; return $dbh->{mysql_insertid}; } sub _typeid { shift->_genericid('typelist','tag',shift,1); } sub _locationid { shift->_genericid('locationlist','seqname',shift,1); } sub _locationid_nocreate { shift->_genericid('locationlist','seqname',shift,0); } sub _attributeid { shift->_genericid('attributelist','tag',shift,1); } sub _get_location_and_bin { my $self = shift; my $feature = shift; my $seqid = $self->_locationid($feature->seq_id||''); my $start = $feature->start; my $end = $feature->end; my $strand = $feature->strand || 0; my ($tier,$bin) = $self->get_bin($start,$end); return ($seqid,$start,$end,$strand,$tier,$bin); } sub get_bin { my $self = shift; my ($start,$end) = @_; my $binsize = MIN_BIN; my ($bin_start,$bin_end,$tier); $tier = 0; while (1) { $bin_start = int $start/$binsize; $bin_end = int $end/$binsize; last if $bin_start == $bin_end; $binsize *= 10; $tier++; } return ($tier,$bin_start); } sub bin_where { my $self = shift; my ($start,$end,$f) = @_; my (@bins,@args); my $tier = 0; my $binsize = MIN_BIN; while ($binsize <= MAX_BIN) { my $bin_start = int($start/$binsize); my $bin_end = int($end/$binsize); push @bins,"($f.tier=? AND $f.bin between ? AND ?)"; push @args,($tier,$bin_start,$bin_end); $binsize *= 10; $tier++; } my $query = join ("\n\t OR ",@bins); return wantarray ? ($query,@args) : substitute($query,@args); } sub _delete_index { my $self = shift; my ($table_name,$id) = @_; return if $self->{reindexing}; my $sth = $self->_prepare("DELETE FROM $table_name WHERE id=?") or $self->throw($self->dbh->errstr); $sth->execute($id); } # given a statement handler that is expected to return rows of (id,object) # unthaw each object and return a list of 'em sub _sth2objs { my $self = shift; my $sth = shift; my @result; while (my ($id,$o,$typeid,$seqid,$start,$end,$strand) = $sth->fetchrow_array) { my $obj; if ($o eq '0') { # rebuild a new feat object from the data stored in the db $obj = $self->_rebuild_obj($id,$typeid,$seqid,$start,$end,$strand); } else { $obj = $self->thaw($o,$id); } push @result,$obj; } $sth->finish; return @result; } # given a statement handler that is expected to return rows of (id,object) # unthaw each object and return a list of 'em sub _sth2obj { my $self = shift; my $sth = shift; my ($id,$o,$typeid,$seqid,$start,$end,$strand) = $sth->fetchrow_array; return unless defined $o; my $obj; if ($o eq '0') { # I don't understand why an object ever needs to be rebuilt! # rebuild a new feat object from the data stored in the db $obj = $self->_rebuild_obj($id,$typeid,$seqid,$start,$end,$strand); } else { $obj = $self->thaw($o,$id); } $obj; } sub _rebuild_obj { my ($self, $id, $typeid, $db_seqid, $start, $end, $strand) = @_; my ($type, $source, $seqid); # convert typeid to type and source if (exists $self->{_type_cache}->{$typeid}) { ($type, $source) = @{$self->{_type_cache}->{$typeid}}; } else { my $sql = qq{ SELECT `tag` FROM typelist WHERE `id` = ? }; my $sth = $self->_prepare($sql) or $self->throw($self->dbh->errstr); $sth->execute($typeid); my $result; $sth->bind_columns(\$result); while ($sth->fetch()) { # there should be only one row returned, but we ensure to get all rows } ($type, $source) = split(':', $result); $self->{_type_cache}->{$typeid} = [$type, $source]; } # convert the db seqid to the sequence name if (exists $self->{_seqid_cache}->{$db_seqid}) { $seqid = $self->{_seqid_cache}->{$db_seqid}; } else { my $sql = qq{ SELECT `seqname` FROM locationlist WHERE `id` = ? }; my $sth = $self->_prepare($sql) or $self->throw($self->dbh->errstr); $sth->execute($db_seqid); $sth->bind_columns(\$seqid); while ($sth->fetch()) { # there should be only one row returned, but we ensure to get all rows } $self->{_seqid_cache}->{$db_seqid} = $seqid; } # get the names from name table? # get the attributes and store those in obj my $sql = qq{ SELECT attribute_id,attribute_value FROM attribute WHERE `id` = ? }; my $sth = $self->_prepare($sql) or $self->throw($self->dbh->errstr); $sth->execute($id); my ($attribute_id, $attribute_value); $sth->bind_columns(\($attribute_id, $attribute_value)); my %attribs; while ($sth->fetch()) { # convert the attribute_id to its real name my $attribute; if (exists $self->{_attribute_cache}->{$attribute_id}) { $attribute = $self->{_attribute_cache}->{$attribute_id}; } else { my $sql = qq{ SELECT `tag` FROM attributelist WHERE `id` = ? }; my $sth2 = $self->_prepare($sql) or $self->throw($self->dbh->errstr); $sth2->execute($attribute_id); $sth2->bind_columns(\$attribute); while ($sth2->fetch()) { # there should be only one row returned, but we ensure to get all rows } $self->{_attribute_cache}->{$attribute_id} = $attribute; } if ($source && $attribute eq 'source' && $attribute_value eq $source) { next; } $attribs{$attribute} = $attribute_value; } # if we weren't called with all the params, pull those out of the database too if ( not ( grep { defined($_) } ( $typeid, $db_seqid, $start, $end, $strand ))) { my $sql = qq{ SELECT start,end,tag,strand,seqname FROM feature,feature_location,typelist,locationlist WHERE feature.id=feature_location.id AND feature.typeid=typelist.id AND seqid=locationlist.id AND feature.id = ? }; my $sth = $self->_prepare($sql) or $self->throw($self->dbh->errstr); $sth->execute($id); my ($feature_start, $feature_end, $feature_type, $feature_strand,$feature_seqname); $sth->bind_columns(\($feature_start, $feature_end, $feature_type, $feature_strand, $feature_seqname)); while ($sth->fetch()) { # there should be only one row returned, but we call like this to # ensure we get all rows } $start ||= $feature_start; $end ||= $feature_end; $strand ||= $feature_strand; $seqid ||= $feature_seqname; my( $feature_typename , $feature_typesource ) = split /:/ , $feature_type; $type ||= $feature_typename; $source ||= $feature_typesource; } my $obj = Bio::SeqFeature::Lite->new(-primary_id => $id, $type ? (-type => $type) : (), $source ? (-source => $source) : (), $seqid ? (-seq_id => $seqid) : (), defined $start ? (-start => $start) : (), defined $end ? (-end => $end) : (), defined $strand ? (-strand => $strand) : (), keys %attribs ? (-attributes => \%attribs) : ()); return $obj; } sub _prepare { my $self = shift; my $query = shift; my $dbh = $self->dbh; my $sth = $dbh->prepare_cached($query, {}, 3) or $self->throw($dbh->errstr); $sth; } #################################################################################################### # SQL Fragment generators #################################################################################################### sub _attribute_table { shift->_qualify('attribute') } sub _attributelist_table { shift->_qualify('attributelist') } sub _feature_table { shift->_qualify('feature') } sub _interval_stats_table { shift->_qualify('interval_stats') } sub _location_table { shift->_qualify('location') } sub _locationlist_table { shift->_qualify('locationlist') } sub _meta_table { shift->_qualify('meta') } sub _name_table { shift->_qualify('name') } sub _parent2child_table { shift->_qualify('parent2child') } sub _sequence_table { shift->_qualify('sequence') } sub _type_table { shift->_qualify('feature') } sub _typelist_table { shift->_qualify('typelist') } sub _update_table { shift->_qualify('update_table') } sub _make_attribute_where { my $self = shift; my ($attributetable,$attributenametable,$attributes) = @_; my @args; my @sql; my $dbh = $self->dbh; foreach (keys %$attributes) { my @match_values; my @values = ref($attributes->{$_}) && ref($attributes->{$_}) eq 'ARRAY' ? @{$attributes->{$_}} : $attributes->{$_}; foreach (@values) { # convert * into % for wildcard matches s/\*/%/g; } my $match = join ' OR ',map { /%/ ? "$attributetable.attribute_value LIKE ?" : "$attributetable.attribute_value=?" } @values; push @sql,"($attributenametable.tag=? AND ($match))"; push @args,($_,@values); } return (join(' OR ',@sql),@args); } sub _make_attribute_group { my $self = shift; my ($table_name,$attributes) = @_; my $key_count = keys %$attributes or return; return "f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand HAVING count(f.id)>?",$key_count-1; } sub _print_query { my $self = shift; my ($query,@args) = @_; while ($query =~ /\?/) { my $arg = $self->dbh->quote(shift @args); $query =~ s/\?/$arg/; } warn $query,"\n"; } ### # special-purpose store for bulk loading - write to a file rather than to the db # sub _dump_store { my $self = shift; my $indexed = shift; my $count = 0; my $store_fh = $self->dump_filehandle('feature'); my $dbh = $self->dbh; my $autoindex = $self->autoindex; for my $obj (@_) { my $id = $self->next_id; my ($seqid,$start,$end,$strand,$tier,$bin) = $indexed ? $self->_get_location_and_bin($obj) : (undef)x6; my $primary_tag = $obj->primary_tag; my $source_tag = $obj->source_tag || ''; $primary_tag .= ":$source_tag"; my $typeid = $self->_typeid($primary_tag,1); print $store_fh join("\t",$id,$typeid,$seqid,$start,$end,$strand,$tier,$bin,$indexed,$dbh->quote($self->freeze($obj))),"\n"; $obj->primary_id($id); $self->_update_indexes($obj) if $indexed && $autoindex; $count++; } # remember whether we are have ever stored a non-indexed feature unless ($indexed or $self->{indexed_flag}++) { $self->subfeatures_are_indexed(0); } $count; } sub _dump_add_SeqFeature { my $self = shift; my $parent = shift; my @children = @_; my $dbh = $self->dbh; my $fh = $self->dump_filehandle('parent2child'); my $parent_id = (ref $parent ? $parent->primary_id : $parent) or $self->throw("$parent should have a primary_id"); my $count = 0; for my $child_id (@children) { print $fh join("\t",$parent_id,$child_id),"\n"; $count++; } $count; } sub _dump_update_name_index { my $self = shift; my ($obj,$id) = @_; my $fh = $self->dump_filehandle('name'); my $dbh = $self->dbh; my ($names,$aliases) = $self->feature_names($obj); print $fh join("\t",$id,$dbh->quote($_),1),"\n" foreach @$names; print $fh join("\t",$id,$dbh->quote($_),0),"\n" foreach @$aliases; } sub _dump_update_attribute_index { my $self = shift; my ($obj,$id) = @_; my $fh = $self->dump_filehandle('attribute'); my $dbh = $self->dbh; for my $tag ($obj->all_tags) { my $tagid = $self->_attributeid($tag); for my $value ($obj->each_tag_value($tag)) { print $fh join("\t",$id,$tagid,$dbh->quote($value)),"\n"; } } } sub coverage_array { my $self = shift; my ($seq_name,$start,$end,$types,$bins) = rearrange([['SEQID','SEQ_ID','REF'],'START',['STOP','END'], ['TYPES','TYPE','PRIMARY_TAG'],'BINS'],@_); $bins ||= 1000; $start ||= 1; unless ($end) { my $segment = $self->segment($seq_name) or $self->throw("unknown seq_id $seq_name"); $end = $segment->end; } my $binsize = ($end-$start+1)/$bins; my $seqid = $self->_locationid_nocreate($seq_name) || 0; return [] unless $seqid; # where each bin starts my @his_bin_array = map {$start + $binsize * $_} (0..$bins-1); my @sum_bin_array = map {int(($_-1)/SUMMARY_BIN_SIZE)} @his_bin_array; my $interval_stats = $self->_interval_stats_table; my ($sth,@a); if ($types) { # pick up the type ids my ($from,$where,$group); ($from,$where,$group,@a) = $self->_types_sql($types,'b'); $where =~ s/.+AND//s; $sth = $self->_prepare(<_prepare(<execute(@a); while (my ($t,$tag) = $sth->fetchrow_array) { $report_tag ||= $tag; push @t,$t; } my %bins; my $sql = <= ? LIMIT 1 END ; $sth = $self->_prepare($sql); eval { for my $typeid (@t) { for (my $i=0;$i<@sum_bin_array;$i++) { my @args = ($typeid,$seqid,$sum_bin_array[$i]); $self->_print_query($sql,@args) if $self->debug; $sth->execute(@args) or $self->throw($sth->errstr); my ($bin,$cum_count) = $sth->fetchrow_array; push @{$bins{$typeid}},[$bin,$cum_count]; } } }; return unless %bins; my @merged_bins; my $firstbin = int(($start-1)/$binsize); for my $type (keys %bins) { my $arry = $bins{$type}; my $last_count = $arry->[0][1]; my $last_bin = -1; my $i = 0; my $delta; for my $b (@$arry) { my ($bin,$count) = @$b; $delta = $count - $last_count if $bin > $last_bin; $merged_bins[$i++] += $delta; $last_count = $count; $last_bin = $bin; } } return wantarray ? (\@merged_bins,$report_tag) : \@merged_bins; } sub build_summary_statistics { my $self = shift; my $interval_stats = $self->_interval_stats_table; my $dbh = $self->dbh; $self->begin_work; my $sbs = SUMMARY_BIN_SIZE; my $result = eval { $self->_add_interval_stats_table; $self->_disable_keys($dbh,$interval_stats); $dbh->do("DELETE FROM $interval_stats"); my $insert = $dbh->prepare(<throw($dbh->errstr); INSERT INTO $interval_stats (typeid,seqid,bin,cum_count) VALUES (?,?,?,?) END my $sql = $self->_fetch_indexed_features_sql; my $select = $dbh->prepare($sql) or $self->throw($dbh->errstr); my $current_bin = -1; my ($current_type,$current_seqid,$count); my $cum_count = 0; my (%residuals,$last_bin); my $le = -t \*STDERR ? "\r" : "\n"; print STDERR "\n"; $select->execute; while (my($typeid,$seqid,$start,$end) = $select->fetchrow_array) { print STDERR $count," features processed$le" if ++$count % 1000 == 0; my $bin = int($start/$sbs); $current_type ||= $typeid; $current_seqid ||= $seqid; # because the input is sorted by start, no more features will contribute to the # current bin so we can dispose of it if ($bin != $current_bin) { if ($seqid != $current_seqid or $typeid != $current_type) { # load all bins left over $self->_load_bins($insert,\%residuals,\$cum_count,$current_type,$current_seqid); %residuals = () ; $cum_count = 0; } else { # load all up to current one $self->_load_bins($insert,\%residuals,\$cum_count,$current_type,$current_seqid,$current_bin); } } $last_bin = $current_bin; ($current_seqid,$current_type,$current_bin) = ($seqid,$typeid,$bin); # summarize across entire spanned region my $last_bin = int(($end-1)/$sbs); for (my $b=$bin;$b<=$last_bin;$b++) { $residuals{$b}++; } } # handle tail case # load all bins left over $self->_load_bins($insert,\%residuals,\$cum_count,$current_type,$current_seqid); $self->_enable_keys($dbh,$interval_stats); 1; }; if ($result) { $self->commit } else { warn "Can't build summary statistics: $@"; $self->rollback }; print STDERR "\n"; } sub _load_bins { my $self = shift; my ($insert,$residuals,$cum_count,$type,$seqid,$stop_after) = @_; for my $b (sort {$a<=>$b} keys %$residuals) { last if defined $stop_after and $b > $stop_after; $$cum_count += $residuals->{$b}; my @args = ($type,$seqid,$b,$$cum_count); $insert->execute(@args); delete $residuals->{$b}; # no longer needed } } sub _add_interval_stats_table { my $self = shift; my $tables = $self->table_definitions; my $interval_stats = $self->_interval_stats_table; $self->dbh->do("CREATE TABLE IF NOT EXISTS $interval_stats $tables->{interval_stats}"); } sub _fetch_indexed_features_sql { my $self = shift; my $features = $self->_feature_table; return <do("ALTER TABLE $table DISABLE KEYS"); } sub _enable_keys { my $self = shift; my ($dbh,$table) = @_; $dbh->do("ALTER TABLE $table ENABLE KEYS"); } sub time { return Time::HiRes::time() if Time::HiRes->can('time'); return time(); } sub DESTROY { my $self = shift; if ($self->{bulk_update_in_progress}) { # be sure to remove temp files for my $table ($self->_feature_table,$self->index_tables) { my $path = $self->dump_path($table); unlink $path; } } } sub begin_work { my $self = shift; return if $self->{_in_transaction}++; my $dbh = $self->dbh; return unless $dbh->{AutoCommit}; $dbh->begin_work; } sub commit { my $self = shift; return unless $self->{_in_transaction}; delete $self->{_in_transaction}; $self->dbh->commit; } sub rollback { my $self = shift; return unless $self->{_in_transaction}; delete $self->{_in_transaction}; $self->dbh->rollback; } 1; LoadHelper.pm100644000766000024 1131213605523026 24213 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeature/Storepackage Bio::DB::SeqFeature::Store::LoadHelper; =head1 NAME Bio::DB::SeqFeature::Store::LoadHelper -- Internal utility for Bio::DB::SeqFeature::Store =head1 SYNOPSIS # For internal use only. =head1 DESCRIPTION For internal use only =head1 SEE ALSO L, L, L, L, L, L, L =head1 AUTHOR Lincoln Stein Elstein@cshl.orgE. Copyright (c) 2006 Cold Spring Harbor Laboratory. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut use strict; use DB_File; use File::Path 'rmtree'; use File::Temp 'tempdir'; use File::Spec; use Fcntl qw(O_CREAT O_RDWR); our $VERSION = '1.12'; my %DBHandles; sub new { my $class = shift; my $tmpdir = shift; my $template = 'SeqFeatureLoadHelper_XXXXXX'; my @tmpargs = $tmpdir ? ($template,DIR=>$tmpdir) : ($template); my $tmppath = tempdir(@tmpargs,CLEANUP=>1); my $self = $class->create_dbs($tmppath); $self->{tmppath} = $tmppath; return bless $self,$class; } sub DESTROY { my $self = shift; # Destroy all filehandle references # before trying to delete files and folder %DBHandles = (); undef $self->{IndexIt}; undef $self->{TopLevel}; undef $self->{Local2Global}; undef $self->{Parent2Child}; rmtree $self->{tmppath}; # File::Temp::cleanup() unless $self->{keep}; } sub create_dbs { my $self = shift; my $tmp = shift; my %self; # experiment with caching these handles in memory my $hash_options = DB_File::HASHINFO->new(); # Each of these hashes allow only unique keys for my $dbname (qw(IndexIt TopLevel Local2Global)) { unless ($DBHandles{$dbname}) { my %h; tie(%h,'DB_File',File::Spec->catfile($tmp,$dbname), O_CREAT|O_RDWR,0666,$hash_options); $DBHandles{$dbname} = \%h; } $self{$dbname} = $DBHandles{$dbname}; %{$self{$dbname}} = (); } # The Parent2Child hash allows duplicate keys, so we # create it with the R_DUP flag. my $btree_options = DB_File::BTREEINFO->new(); $btree_options->{flags} = R_DUP; unless ($DBHandles{'Parent2Child'}) { my %h; tie(%h,'DB_File',File::Spec->catfile($tmp,'Parent2Child'), O_CREAT|O_RDWR,0666,$btree_options); $DBHandles{'Parent2Child'} = \%h; } $self{Parent2Child} = $DBHandles{'Parent2Child'}; %{$self{Parent2Child}} = (); return \%self; } sub indexit { my $self = shift; my $id = shift; $self->{IndexIt}{$id} = shift if @_; return $self->{IndexIt}{$id}; } sub toplevel { my $self = shift; my $id = shift; $self->{TopLevel}{$id} = shift if @_; return $self->{TopLevel}{$id}; } sub each_toplevel { my $self = shift; my ($id) = each %{$self->{TopLevel}}; $id; } sub local2global { my $self = shift; my $id = shift; $self->{Local2Global}{$id} = shift if @_; return $self->{Local2Global}{$id}; } sub add_children { my $self = shift; my $parent_id = shift; # (@children) = @_; $self->{Parent2Child}{$parent_id} = shift while @_; } sub children { my $self = shift; my $parent_id = shift; my @children; my $db = tied(%{$self->{Parent2Child}}); my $key = $parent_id; my $value = ''; for (my $status = $db->seq($key,$value,R_CURSOR); $status == 0 && $key eq $parent_id; $status = $db->seq($key,$value,R_NEXT) ) { push @children,$value; } return wantarray ? @children: \@children; } # this acts like each() and returns each parent id and an array ref of children sub each_family { my $self = shift; my $db = tied(%{$self->{Parent2Child}}); if ($self->{_cursordone}) { undef $self->{_cursordone}; undef $self->{_parent}; undef $self->{_child}; return; } # do a slightly tricky cursor search unless (defined $self->{_parent}) { return unless $db->seq($self->{_parent},$self->{_child},R_FIRST) == 0; } my $parent = $self->{_parent}; my @children = $self->{_child}; my $status; while (($status = $db->seq($self->{_parent},$self->{_child},R_NEXT)) == 0 && $self->{_parent} eq $parent ) { push @children,$self->{_child}; } $self->{_cursordone}++ if $status != 0; return ($parent,\@children); } sub local_ids { my $self = shift; my @ids = keys %{$self->{Local2Global}} if $self->{Local2Global}; return \@ids; } sub loaded_ids { my $self = shift; my @ids = values %{$self->{Local2Global}} if $self->{Local2Global}; return \@ids; } 1; berkeleydb.pm100644000766000024 12631613605523026 24337 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeature/Storepackage Bio::DB::SeqFeature::Store::berkeleydb; $Bio::DB::SeqFeature::Store::berkeleydb::VERSION = '1.7.4'; use strict; use base 'Bio::DB::SeqFeature::Store'; use Bio::DB::GFF::Util::Rearrange 'rearrange'; use DB_File; use Fcntl qw(O_RDWR O_CREAT :flock); use IO::File; use File::Temp 'tempdir'; use File::Path 'rmtree','mkpath'; use File::Basename; use File::Spec; use Carp 'carp','croak'; use constant BINSIZE => 10_000; use constant MININT => -999_999_999_999; use constant MAXINT => 999_999_999_999; =head1 NAME Bio::DB::SeqFeature::Store::berkeleydb -- Storage and retrieval of sequence annotation data in Berkeleydb files =head1 SYNOPSIS use Bio::DB::SeqFeature::Store; # Create a database from the feature files located in /home/fly4.3 and store # the database index in the same directory: my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'berkeleydb', -dir => '/home/fly4.3'); # Create a database that will monitor the files in /home/fly4.3, but store # the indexes in /var/databases/fly4.3 $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'berkeleydb', -dir => '/home/fly4.3', -dsn => '/var/databases/fly4.3'); # Create a feature database from scratch $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'berkeleydb', -dsn => '/var/databases/fly4.3', -create => 1); # get a feature from somewhere my $feature = Bio::SeqFeature::Generic->new(...); # store it $db->store($feature) or die "Couldn't store!"; # primary ID of the feature is changed to indicate its primary ID # in the database... my $id = $feature->primary_id; # get the feature back out my $f = $db->fetch($id); # change the feature and update it $f->start(100); $db->update($f) or $self->throw("Couldn't update!"); # use the GFF3 loader to do a bulk write: my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store => $db, -verbose => 0, -fast => 1); $loader->load('/home/fly4.3/dmel-all.gff'); # searching... # ...by id my @features = $db->fetch_many(@list_of_ids); # ...by name @features = $db->get_features_by_name('ZK909'); # ...by alias @features = $db->get_features_by_alias('sma-3'); # ...by type @features = $db->get_features_by_type('gene'); # ...by location @features = $db->get_features_by_location(-seq_id=>'Chr1',-start=>4000,-end=>600000); # ...by attribute @features = $db->get_features_by_attribute({description => 'protein kinase'}) # ...by the GFF "Note" field @result_list = $db->search_notes('kinase'); # ...by arbitrary combinations of selectors @features = $db->features(-name => $name, -type => $types, -seq_id => $seqid, -start => $start, -end => $end, -attributes => $attributes); # ...using an iterator my $iterator = $db->get_seq_stream(-name => $name, -type => $types, -seq_id => $seqid, -start => $start, -end => $end, -attributes => $attributes); while (my $feature = $iterator->next_seq) { # do something with the feature } # ...limiting the search to a particular region my $segment = $db->segment('Chr1',5000=>6000); my @features = $segment->features(-type=>['mRNA','match']); # what feature types are defined in the database? my @types = $db->types; # getting & storing sequence information # Warning: this returns a string, and not a PrimarySeq object $db->insert_sequence('Chr1','GATCCCCCGGGATTCCAAAA...'); my $sequence = $db->fetch_sequence('Chr1',5000=>6000); # create a new feature in the database my $feature = $db->new_feature(-primary_tag => 'mRNA', -seq_id => 'chr3', -start => 10000, -end => 11000); =head1 DESCRIPTION Bio::DB::SeqFeature::Store::berkeleydb is the Berkeleydb adaptor for Bio::DB::SeqFeature::Store. You will not create it directly, but instead use Bio::DB::SeqFeature::Store-Enew() to do so. See L for complete usage instructions. =head2 Using the berkeleydb adaptor The Berkeley database consists of a series of Berkeleydb index files, and a couple of special purpose indexes. You can create the index files from scratch by creating a new database and calling new_feature() repeatedly, you can create the database and then bulk populate it using the GFF3 loader, or you can monitor a directory of preexisting GFF3 and FASTA files and rebuild the indexes whenever one or more of the fields changes. The last mode is probably the most convenient. Note that the indexer will only pay attention to files that end with .gff3, .wig and .fa. =over 4 =item The new() constructor The new() constructor method all the arguments recognized by Bio::DB::SeqFeature::Store, and a few additional ones. Standard arguments: Name Value ---- ----- -adaptor The name of the Adaptor class (default DBI::mysql) -serializer The name of the serializer class (default Storable) -index_subfeatures Whether or not to make subfeatures searchable (default true) -cache Activate LRU caching feature -- size of cache -compress Compresses features before storing them in database using Compress::Zlib Adaptor-specific arguments Name Value ---- ----- -dsn Where the index files are stored -dir Where the source (GFF3, FASTA) files are stored -autoindex An alias for -dir. -write Pass true to open the index files for writing. -create Pass true to create the index files if they don't exist (implies -write=>1) -locking Use advisory locking to avoid one process trying to read from the database while another is updating it (may not work properly over NFS). -temp Pass true to create temporary index files that will be deleted once the script exits. -verbose Pass true to report autoindexing operations on STDERR. (default is true). Examples: To create an empty database which will be populated using calls to store() or new_feature(), or which will be bulk-loaded using the GFF3 loader: $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'berkeleydb', -dsn => '/var/databases/fly4.3', -create => 1); To open a preexisting database in read-only mode: $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'berkeleydb', -dsn => '/var/databases/fly4.3'); To open a preexisting database in read/write (update) mode: $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'berkeleydb', -dsn => '/var/databases/fly4.3', -write => 1); To monitor a set of GFF3 and FASTA files located in a directory and create/update the database indexes as needed. The indexes will be stored in a new subdirectory named "indexes": $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'berkeleydb', -dir => '/var/databases/fly4.3'); As above, but store the source files and index files in separate directories: $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'berkeleydb', -dsn => '/var/databases/fly4.3', -dir => '/home/gff3_files/fly4.3'); To be indexed, files must end with one of .gff3 (GFF3 format), .fa (FASTA format) or .wig (WIG format). B<-autoindex> is an alias for B<-dir>. You should specify B<-locking> in a multiuser environment, including the case in which the database is being used by a web server at the same time another user might be updating it. =back See L for all the access methods supported by this adaptor. The various methods for storing and updating features and sequences into the database are supported, but there is no locking. If two processes try to update the same database simultaneously, the database will likely become corrupted. =cut ### # object initialization # sub init { my $self = shift; my ($directory, $autoindex, $is_temporary, $write, $create, $verbose, $locking, ) = rearrange([['DSN','DB'], [qw(DIR AUTOINDEX)], ['TMP','TEMP','TEMPORARY'], [qw(WRITE WRITABLE)], 'CREATE', 'VERBOSE', [qw(LOCK LOCKING)], ],@_); $verbose = 1 unless defined $verbose; if ($autoindex) { -d $autoindex or $self->throw("Invalid directory $autoindex"); $directory ||= "$autoindex/indexes"; } $directory ||= $is_temporary ? File::Spec->tmpdir : '.'; # my $pacname = __PACKAGE__; if ($^O =~ /mswin/i) { $pacname =~ s/:+/_/g; } $directory = tempdir($pacname.'_XXXXXX', TMPDIR => 1, CLEANUP => 1, DIR => $directory) if $is_temporary; mkpath($directory); -d $directory or $self->throw("Invalid directory $directory"); $create++ if $is_temporary; $write ||= $create; $self->throw("Can't write into the directory $directory") if $write && !-w $directory; $self->default_settings; $self->directory($directory); $self->temporary($is_temporary); $self->verbose($verbose); $self->locking($locking); $self->_delete_databases() if $create; if ($autoindex && -d $autoindex) { $self->auto_reindex($autoindex); } $self->lock('shared'); # this step may rebless $self into a subclass # to preserve backward compatibility with older # databases while providing better performance for # new databases. $self->possibly_rebless($create); $self->_open_databases($write,$create,$autoindex); $self->_permissions($write,$create); return $self; } sub version { return 2.0 }; sub possibly_rebless { my $self = shift; my $create = shift; my $do_rebless; if ($create) { $do_rebless++; } else { # probe database my %h; tie (%h,'DB_File',$self->_features_path,O_RDONLY,0666,$DB_HASH) or return; $do_rebless = $h{'.version'} >= 3.0; } if ($do_rebless) { eval "require Bio::DB::SeqFeature::Store::berkeleydb3"; bless $self,'Bio::DB::SeqFeature::Store::berkeleydb3'; } } sub can_store_parentage { 1 } sub auto_reindex { my $self = shift; my $autodir = shift; my $result = $self->needs_auto_reindexing($autodir); if ($result && %$result) { $self->flag_autoindexing(1); $self->lock('exclusive'); $self->reindex_wigfiles($result->{wig},$autodir) if $result->{wig}; $self->reindex_ffffiles($result->{fff},$autodir) if $result->{fff}; $self->reindex_gfffiles($result->{gff},$autodir) if $result->{gff}; $self->dna_db(Bio::DB::Fasta::Subdir->new($autodir)); $self->unlock; $self->flag_autoindexing(0); } else { $self->dna_db(Bio::DB::Fasta::Subdir->new($autodir)); } } sub autoindex_flagfile { return File::Spec->catfile(shift->directory,'autoindex.pid'); } sub auto_index_in_process { my $self = shift; my $flag_file = $self->autoindex_flagfile; return unless -e $flag_file; # if flagfile exists, then check that PID still exists open my $fh, '<', $flag_file or $self->throw("Could not read file '$flag_file': $!"); my $pid = <$fh>; close $fh; return 1 if kill 0=>$pid; warn "Autoindexing seems to be running in another process, but the process has gone away. Trying to override..."; if (unlink $flag_file) { warn "Successfully removed stale PID file." if $self->verbose; warn "Assuming partial reindexing process. Rebuilding indexes from scratch..." if $self->verbose; my $glob = File::Spec->catfile($self->directory,'*'); unlink glob($glob); return; } else { croak ("Cannot recover from apparent aborted autoindex process. Please remove files in ", $self->directory, " and allow the adaptor to reindex"); return 1; } } sub flag_autoindexing { my $self = shift; my $doit = shift; my $flag_file = $self->autoindex_flagfile; if ($doit) { open my $fh, '>', $flag_file or $self->throw("Could not write file '$flag_file': $!"); print $fh $$; close $fh; } else { unlink $flag_file; } } sub reindex_gfffiles { my $self = shift; my $files = shift; my $autodir = shift; warn "Reindexing GFF files...\n" if $self->verbose; my $exists = -e $self->_features_path; $self->_permissions(1,1); $self->_close_databases(); $self->_open_databases(1,!$exists); require Bio::DB::SeqFeature::Store::GFF3Loader unless Bio::DB::SeqFeature::Store::GFF3Loader->can('new'); my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store => $self, -sf_class => $self->seqfeature_class, -verbose => $self->verbose, ) or $self->throw("Couldn't create GFF3Loader"); my %seen; $loader->load(grep {!$seen{$_}++} @$files); $self->_touch_timestamp; } sub reindex_ffffiles { my $self = shift; my $files = shift; my $autodir = shift; warn "Reindexing FFF files...\n" if $self->verbose; $self->_permissions(1,1); $self->_close_databases(); $self->_open_databases(1,1); require Bio::DB::SeqFeature::Store::FeatureFileLoader unless Bio::DB::SeqFeature::Store::FeatureFileLoader->can('new'); my $loader = Bio::DB::SeqFeature::Store::FeatureFileLoader->new(-store => $self, -sf_class => $self->seqfeature_class, -verbose => $self->verbose, ) or $self->throw("Couldn't create FeatureFileLoader"); my %seen; $loader->load(grep {!$seen{$_}++} @$files); $self->_touch_timestamp; } sub reindex_wigfiles { my $self = shift; my $files = shift; my $autodir = shift; warn "Reindexing wig files...\n" if $self->verbose; unless (Bio::Graphics::Wiggle::Loader->can('new')) { eval "require Bio::Graphics::Wiggle::Loader; 1" or return; } for my $wig (@$files) { warn "Reindexing $wig...\n" if $self->verbose; my ($wib_name) = fileparse($wig,qr/\.[^.]*/); my $gff3_name = "$wib_name.gff3"; # unlink all wib files that share the basename my $wib_glob = File::Spec->catfile($self->directory,"$wib_name*.wib"); unlink glob($wib_glob); my $loader = Bio::Graphics::Wiggle::Loader->new($self->directory,$wib_name); my $fh = IO::File->new($wig) or die "Can't open $wig: $!"; $loader->load($fh); # will create one or more .wib files $fh->close; my $gff3_data = $loader->featurefile('gff3','microarray_oligo',$wib_name); my $gff3_path = File::Spec->catfile($autodir,$gff3_name); $fh = IO::File->new($gff3_path,'>') or die "Can't open $gff3_path for writing: $!"; $fh->print($gff3_data); $fh->close; my $conf_path = File::Spec->catfile($autodir,"$wib_name.conf"); $fh = IO::File->new($conf_path,'>'); $fh->print($loader->conf_stanzas('microarray_oligo',$wib_name)); $fh->close; } } # returns the following hashref # empty hash if nothing needs reindexing # {fasta => 1} if DNA database needs reindexing # {gff => [list,of,gff,paths]} if gff3 files need reindexing # {wig => [list,of,wig,paths]} if wig files need reindexing sub needs_auto_reindexing { my $self = shift; my $autodir = shift; my $result = {}; # don't allow two processes to reindex simultaneously $self->auto_index_in_process and croak "Autoindexing in process. Try again later"; # first interrogate the GFF3 files, using the timestamp file # as modification comparison my (@gff3,@fff,@wig,$fasta,$fasta_index_time); opendir (my $D,$autodir) or $self->throw("Couldn't open directory $autodir for reading: $!"); my $maxtime = 0; my $timestamp_time = _mtime($self->_mtime_path) || 0; while (defined (my $node = readdir($D))) { next if $node =~ /^\./; my $path = File::Spec->catfile($autodir,$node); next unless -f $path; if ($path =~ /\.gff\d?$/i) { my $mtime = _mtime(\*_); # not a typo $maxtime = $mtime if $mtime > $maxtime; push @gff3,$path; } elsif ($path =~ /\.fff?$/i) { my $mtime = _mtime(\*_); # not a typo $maxtime = $mtime if $mtime > $maxtime; push @fff,$path; } elsif ($path =~ /\.wig$/i) { my $wig = $path; (my $gff_file = $wig) =~ s/\.wig$/\.gff3/i; next if -e $gff_file && _mtime($gff_file) > _mtime($path); push @wig,$wig; push @gff3,$gff_file; $maxtime = time(); } elsif ($path =~ /\.(fa|fasta|dna)$/i) { $fasta_index_time = _mtime(File::Spec->catfile($self->directory,'fasta.index'))||0 unless defined $fasta_index_time; $fasta++ if _mtime($path) > $fasta_index_time; } } closedir $D; $result->{gff} = \@gff3 if $maxtime > $timestamp_time; $result->{wig} = \@wig if @wig; $result->{fff} = \@fff if @fff; $result->{fasta}++ if $fasta; return $result; } sub verbose { my $self = shift; my $d = $self->{verbose}; $self->{verbose} = shift if @_; return $d; } sub locking { my $self = shift; my $d = $self->{locking}; $self->{locking} = shift if @_; return $d; } sub lockfile { my $self = shift; return File::Spec->catfile($self->directory,'lock'); } sub lock { my $self = shift; my $mode = shift; return unless $self->locking; my $flag = $mode eq 'exclusive' ? LOCK_EX : LOCK_SH; my $lockfile = $self->lockfile; my $fh = $self->_flock_fh; unless ($fh) { my $open = -e $lockfile ? '<' : '>'; $fh = IO::File->new($lockfile,$open) or die "Cannot open $lockfile: $!"; } flock($fh,$flag); $self->_flock_fh($fh); } sub unlock { my $self = shift; return unless $self->locking; my $fh = $self->_flock_fh or return; flock($fh,LOCK_UN); undef $self->{flock_fh}; } sub _flock_fh { my $self = shift; my $d = $self->{flock_fh}; $self->{flock_fh} = shift if @_; $d; } sub _open_databases { my $self = shift; my ($write,$create,$ignore_errors) = @_; return if $self->db; # already open - don't reopen my $directory = $self->directory; unless (-d $directory) { # directory does not exist $create or $self->throw("Directory $directory does not exist and you did not specify the -create flag"); mkpath($directory) or $self->throw("Couldn't create database directory $directory: $!"); } my $flags = O_RDONLY; $flags |= O_RDWR if $write; $flags |= O_CREAT if $create; # Create the main database; this is a DB_HASH implementation my %h; my $result = tie (%h,'DB_File',$self->_features_path,$flags,0666,$DB_HASH); unless ($result) { return if $ignore_errors; # autoindex set, so defer this $self->throw("Couldn't tie: ".$self->_features_path . " $!"); } if ($create) { %h = (); $h{'.next_id'} = 1; $h{'.version'} = $self->version; } $self->db(\%h); $self->open_index_dbs($flags,$create); $self->open_parentage_db($flags,$create); $self->open_notes_db($write,$create); $self->open_seq_db($flags,$create) if -e $self->_fasta_path; } sub open_index_dbs { my $self = shift; my ($flags,$create) = @_; # Create the index databases; these are DB_BTREE implementations with duplicates allowed. $DB_BTREE->{flags} = R_DUP; $DB_BTREE->{compare} = sub { lc($_[0]) cmp lc($_[1]) }; for my $idx ($self->_index_files) { my $path = $self->_qualify("$idx.idx"); my %db; my $result = tie(%db,'DB_File',$path,$flags,0666,$DB_BTREE); # for backward compatibility, allow a failure when trying to open the is_indexed index. $self->throw("Couldn't tie $path: $!") unless $result || $idx eq 'is_indexed'; %db = () if $create; $self->index_db($idx=>\%db); } } sub open_parentage_db { my $self = shift; my ($flags,$create) = @_; # Create the parentage database my %p; tie (%p,'DB_File',$self->_parentage_path,$flags,0666,$DB_BTREE) or $self->throw("Couldn't tie: ".$self->_parentage_path . $!); %p = () if $create; $self->parentage_db(\%p); } sub open_notes_db { my $self = shift; my ($write,$create) = @_; my $mode = $write ? "+>>" : $create ? "+>" : "<"; my $notes_file = $self->_notes_path; open my $F, $mode, $notes_file or $self->throw("Could not open file '$notes_file': $!"); $self->notes_db($F); } sub open_seq_db { my $self = shift; if (-e $self->_fasta_path) { my $dna_db = Bio::DB::Fasta::Subdir->new($self->_fasta_path) or $self->throw("Can't reindex sequence file: $@"); $self->dna_db($dna_db); } } sub commit { # reindex fasta files my $self = shift; if (my $fh = $self->{fasta_fh}) { $fh->close; $self->dna_db(Bio::DB::Fasta::Subdir->new($self->{fasta_file})); } elsif (-d $self->directory) { $self->dna_db(Bio::DB::Fasta::Subdir->new($self->directory)); } } sub _close_databases { my $self = shift; $self->db(undef); $self->dna_db(undef); $self->notes_db(undef); $self->parentage_db(undef); $self->index_db($_=>undef) foreach $self->_index_files; } # do nothing -- new() with -create=>1 will do the trick sub _init_database { } sub _delete_databases { my $self = shift; for my $idx ($self->_index_files) { my $path = $self->_qualify("$idx.idx"); unlink $path; } unlink $self->_parentage_path; unlink $self->_fasta_path; unlink $self->_features_path; unlink $self->_mtime_path; } sub _touch_timestamp { my $self = shift; my $tsf = $self->_mtime_path; open my $F, '>', $tsf or $self->throw("Could not write file '$tsf': $!"); print $F scalar(localtime); close $F; } sub _store { my $self = shift; my $indexed = shift; my $db = $self->db; my $is_indexed = $self->index_db('is_indexed'); my $count = 0; for my $obj (@_) { my $primary_id = $obj->primary_id; $self->_delete_indexes($obj,$primary_id) if $indexed && $primary_id; $primary_id = $db->{'.next_id'}++ unless defined $primary_id; $db->{$primary_id} = $self->freeze($obj); $is_indexed->{$primary_id} = $indexed if $is_indexed; $obj->primary_id($primary_id); $self->_update_indexes($obj) if $indexed; $count++; } $count; } sub _delete_indexes { my $self = shift; my ($obj,$id) = @_; # the additional "1" causes the index to be deleted $self->_update_name_index($obj,$id,1); $self->_update_type_index($obj,$id,1); $self->_update_location_index($obj,$id,1); $self->_update_attribute_index($obj,$id,1); $self->_update_note_index($obj,$id,1); } sub _fetch { my $self = shift; my $id = shift; my $db = $self->db; my $obj = $self->thaw($db->{$id},$id); $obj; } sub _add_SeqFeature { my $self = shift; my $parent = shift; my @children = @_; my $parent_id = (ref $parent ? $parent->primary_id : $parent) or $self->throw("$parent should have a primary_id"); my $p = $self->parentage_db; for my $child (@children) { my $child_id = ref $child ? $child->primary_id : $child; defined $child_id or $self->throw("no primary ID known for $child"); $p->{$parent_id} = $child_id if tied(%$p)->find_dup($parent_id,$child_id); } return scalar @children; } sub _fetch_SeqFeatures { my $self = shift; my $parent = shift; my @types = @_; my $parent_id = $parent->primary_id or $self->throw("$parent should have a primary_id"); my $index = $self->parentage_db; my $db = tied %$index; my @children_ids = $db->get_dup($parent_id); my @children = map {$self->fetch($_)} @children_ids; if (@types) { foreach (@types) { my ($a,$b) = split ':',$_,2; $_ = quotemeta($a); if (length $b) { $_ .= ":".quotemeta($b).'$'; } else { $_ .= ':'; } } my $regexp = join '|', @types; return grep {($_->primary_tag.':'.$_->source_tag) =~ /^($regexp)/i} @children; } else { return @children; } } sub _update_indexes { my $self = shift; my $obj = shift; defined (my $id = $obj->primary_id) or return; $self->_update_name_index($obj,$id); $self->_update_type_index($obj,$id); $self->_update_location_index($obj,$id); $self->_update_attribute_index($obj,$id); $self->_update_note_index($obj,$id); } sub _update_name_index { my $self = shift; my ($obj,$id,$delete) = @_; my $db = $self->index_db('names') or $self->throw("Couldn't find 'names' index file"); my ($names,$aliases) = $self->feature_names($obj); # little stinky - needs minor refactoring foreach (@$names) { my $key = lc $_; $self->update_or_delete($delete,$db,$key,$id); } foreach (@$aliases) { my $key = lc($_)."_2"; # the _2 indicates a secondary (alias) ID $self->update_or_delete($delete,$db,$key,$id); } } sub _update_type_index { my $self = shift; my ($obj,$id,$delete) = @_; my $db = $self->index_db('types') or $self->throw("Couldn't find 'types' index file"); my $primary_tag = $obj->primary_tag; my $source_tag = $obj->source_tag || ''; return unless defined $primary_tag; $primary_tag .= ":$source_tag"; my $key = lc $primary_tag; $self->update_or_delete($delete,$db,$key,$id); } # Note: this indexing scheme is space-inefficient because it stores the # denormalized sequence ID followed by the bin in XXXXXX zero-leading # format. It should be replaced with a binary numeric encoding and the # BTREE {compare} attribute changed accordingly. sub _update_location_index { my $self = shift; my ($obj,$id,$delete) = @_; my $db = $self->index_db('locations') or $self->throw("Couldn't find 'locations' index file"); my $seq_id = $obj->seq_id || ''; my $start = $obj->start || ''; my $end = $obj->end || ''; my $strand = $obj->strand; my $bin_min = int $start/BINSIZE; my $bin_max = int $end/BINSIZE; for (my $bin = $bin_min; $bin <= $bin_max; $bin++ ) { my $key = sprintf("%s.%06d",lc($seq_id),$bin); $self->update_or_delete($delete,$db,$key,pack("i4",$id,$start,$end,$strand)); } } sub _update_attribute_index { my $self = shift; my ($obj,$id,$delete) = @_; my $db = $self->index_db('attributes') or $self->throw("Couldn't find 'attributes' index file"); for my $tag ($obj->get_all_tags) { for my $value ($obj->get_tag_values($tag)) { my $key = "${tag}:${value}"; $self->update_or_delete($delete,$db,$key,$id); } } } sub _update_note_index { my $self = shift; my ($obj,$id,$delete) = @_; return if $delete; # we don't know how to do this my $fh = $self->notes_db; my @notes = $obj->get_tag_values('Note') if $obj->has_tag('Note'); print $fh $_,"\t",pack("u*",$id) or $self->throw("An error occurred while updating note index: $!") foreach @notes; } sub update_or_delete { my $self = shift; my ($delete,$db,$key,$id) = @_; if ($delete) { tied(%$db)->del_dup($key,$id); } else { $db->{$key} = $id; } } # these methods return pointers to.... # the database that stores serialized objects sub db { my $self = shift; my $d = $self->setting('db'); $self->setting(db=>shift) if @_; $d; } sub parentage_db { my $self = shift; my $d = $self->setting('parentage_db'); $self->setting(parentage_db=>shift) if @_; $d; } # the Bio::DB::Fasta object sub dna_db { my $self = shift; my $d = $self->setting('dna_db'); $self->setting(dna_db=>shift) if @_; $d; } # the specialized notes database sub notes_db { my $self = shift; my $d = $self->setting('notes_db'); $self->setting(notes_db=>shift) if @_; $d; } # the is_indexed_db sub is_indexed_db { my $self = shift; my $d = $self->setting('is_indexed_db'); $self->setting(is_indexed_db=>shift) if @_; $d; } # The indicated index berkeley db sub index_db { my $self = shift; my $index_name = shift; my $d = $self->setting($index_name); $self->setting($index_name=>shift) if @_; $d; } sub _mtime { my $file = shift; my @stat = stat($file); return $stat[9]; } # return names of all the indexes sub _index_files { return qw(names types locations attributes is_indexed); } # the directory in which we store our indexes sub directory { my $self = shift; my $d = $self->setting('directory'); $self->setting(directory=>shift) if @_; $d; } # flag indicating that we are a temporary database sub temporary { my $self = shift; my $d = $self->setting('temporary'); $self->setting(temporary=>shift) if @_; $d; } sub _permissions { my $self = shift; my $d = $self->setting('permissions') or return; if (@_) { my ($write,$create) = @_; $self->setting(permissions=>[$write,$create]); } @$d; } # file name utilities... sub _qualify { my $self = shift; my $file = shift; return $self->directory .'/' . $file; } sub _features_path { shift->_qualify('features.bdb'); } sub _parentage_path { shift->_qualify('parentage.bdb'); } sub _type_path { shift->_qualify('types.idx'); } sub _location_path { shift->_qualify('locations.idx'); } sub _attribute_path { shift->_qualify('attributes.idx'); } sub _notes_path { shift->_qualify('notes.idx'); } sub _fasta_path { shift->_qualify('sequence.fa'); } sub _mtime_path { shift->_qualify('mtime.stamp'); } ########################################### # searching ########################################### sub _features { my $self = shift; my ($seq_id,$start,$end,$strand, $name,$class,$allow_aliases, $types, $attributes, $range_type, $iterator ) = rearrange([['SEQID','SEQ_ID','REF'],'START',['STOP','END'],'STRAND', 'NAME','CLASS','ALIASES', ['TYPES','TYPE','PRIMARY_TAG'], ['ATTRIBUTES','ATTRIBUTE'], 'RANGE_TYPE', 'ITERATOR', ],@_); my (@from,@where,@args,@group); $range_type ||= 'overlaps'; my @result; unless (defined $name or defined $seq_id or defined $types or defined $attributes) { my $is_indexed = $self->index_db('is_indexed'); @result = $is_indexed ? grep {$is_indexed->{$_}} keys %{$self->db} : grep { !/^\./ }keys %{$self->db}; } my %found = (); my $result = 1; if (defined($name)) { # hacky backward compatibility workaround undef $class if $class && $class eq 'Sequence'; $name = "$class:$name" if defined $class && length $class > 0; $result &&= $self->filter_by_name($name,$allow_aliases,\%found); } if (defined $seq_id) { $result &&= $self->filter_by_location($seq_id,$start,$end,$strand,$range_type,\%found); } if (defined $types) { $result &&= $self->filter_by_type($types,\%found); } if (defined $attributes) { $result &&= $self->filter_by_attribute($attributes,\%found); } push @result,keys %found if $result; return $iterator ? Bio::DB::SeqFeature::Store::berkeleydb::Iterator->new($self,\@result) : map {$self->fetch($_)} @result; } sub filter_by_name { my $self = shift; my ($name,$allow_aliases,$filter) = @_; my $index = $self->index_db('names'); my $db = tied(%$index); my ($stem,$regexp) = $self->glob_match($name); $stem ||= $name; $regexp ||= $name; $regexp .= "(?:_2)?" if $allow_aliases; my $key = $stem; my $value; my @results; for (my $status = $db->seq($key,$value,R_CURSOR); $status == 0 and $key =~ /^$regexp$/i; $status = $db->seq($key,$value,R_NEXT)) { next if %$filter && !$filter->{$value}; # don't bother push @results,$value; } $self->update_filter($filter,\@results); } sub filter_by_type { my $self = shift; my ($types,$filter) = @_; my @types = ref $types eq 'ARRAY' ? @$types : $types; my $index = $self->index_db('types'); my $db = tied(%$index); my @results; for my $type (@types) { my ($primary_tag,$source_tag); if (ref $type && $type->isa('Bio::DB::GFF::Typename')) { $primary_tag = $type->method; $source_tag = $type->source; } else { ($primary_tag,$source_tag) = split ':',$type,2; } my $match = defined $source_tag ? "^$primary_tag:$source_tag\$" : "^$primary_tag:"; $source_tag ||= ''; my $key = lc "$primary_tag:$source_tag"; my $value; # If filter is already provided, then it is usually faster to # fetch each object. if (%$filter) { for my $id (keys %$filter) { my $obj = $self->_fetch($id) or next; push @results,$id if $obj->type =~ /$match/i; } } else { for (my $status = $db->seq($key,$value,R_CURSOR); $status == 0 && $key =~ /$match/i; $status = $db->seq($key,$value,R_NEXT)) { next if %$filter && !$filter->{$value}; # don't even bother push @results,$value; } } } $self->update_filter($filter,\@results); } sub filter_by_location { my $self = shift; my ($seq_id,$start,$end,$strand,$range_type,$filter) = @_; $strand ||= 0; my $index = $self->index_db('locations'); my $db = tied(%$index); my $binstart = defined $start ? sprintf("%06d",int $start/BINSIZE) : ''; my $binend = defined $end ? sprintf("%06d",int $end/BINSIZE) : 'z'; # beyond a number my %seenit; my @results; $start = MININT if !defined $start; $end = MAXINT if !defined $end; my $version_2 = $self->db_version > 1; if ($range_type eq 'overlaps' or $range_type eq 'contains') { my $key = $version_2 ? "\L$seq_id\E.$binstart" : "\L$seq_id\E$binstart"; my $keystop = $version_2 ? "\L$seq_id\E.$binend" : "\L$seq_id\E$binend"; my $value; for (my $status = $db->seq($key,$value,R_CURSOR); $status == 0 && $key le $keystop; $status = $db->seq($key,$value,R_NEXT)) { my ($id,$fstart,$fend,$fstrand) = unpack("i4",$value); next if $seenit{$id}++; next if $strand && $fstrand != $strand; if ($range_type eq 'overlaps') { next unless $fend >= $start && $fstart <= $end; } elsif ($range_type eq 'contains') { next unless $fstart >= $start && $fend <= $end; } next if %$filter && !$filter->{$id}; # don't bother push @results,$id; } } # for contained in, we look for features originating and terminating outside the specified range # this is incredibly inefficient, but fortunately the query is rare (?) elsif ($range_type eq 'contained_in') { my $key = $version_2 ? "\L$seq_id." : "\L$seq_id"; my $keystop = $version_2 ? "\L$seq_id\E.$binstart" : "\L$seq_id\E$binstart"; my $value; # do the left part of the range for (my $status = $db->seq($key,$value,R_CURSOR); $status == 0 && $key le $keystop; $status = $db->seq($key,$value,R_NEXT)) { my ($id,$fstart,$fend,$fstrand) = unpack("i4",$value); next if $seenit{$id}++; next if $strand && $fstrand != $strand; next unless $fstart <= $start && $fend >= $end; next if %$filter && !$filter->{$id}; # don't bother push @results,$id; } # do the right part of the range $key = "\L$seq_id\E.$binend"; for (my $status = $db->seq($key,$value,R_CURSOR); $status == 0; $status = $db->seq($key,$value,R_NEXT)) { my ($id,$fstart,$fend,$fstrand) = unpack("i4",$value); next if $seenit{$id}++; next if $strand && $fstrand != $strand; next unless $fstart <= $start && $fend >= $end; next if %$filter && !$filter->{$id}; # don't bother push @results,$id; } } $self->update_filter($filter,\@results); } sub attributes { my $self = shift; my $index = $self->index_db('attributes'); my %a = map {s/:.+$//; $_=> 1} keys %$index; return keys %a; } sub filter_by_attribute { my $self = shift; my ($attributes,$filter) = @_; my $index = $self->index_db('attributes'); my $db = tied(%$index); my $result; for my $att_name (keys %$attributes) { my @result; my @search_terms = ref($attributes->{$att_name}) && ref($attributes->{$att_name}) eq 'ARRAY' ? @{$attributes->{$att_name}} : $attributes->{$att_name}; for my $v (@search_terms) { my ($stem,$regexp) = $self->glob_match($v); $stem ||= $v; $regexp ||= $v; my $key = "\L${att_name}:${stem}\E"; my $value; for (my $status = $db->seq($key,$value,R_CURSOR); $status == 0 && $key =~ /^$att_name:$regexp$/i; $status = $db->seq($key,$value,R_NEXT)) { next if %$filter && !$filter->{$value}; # don't bother push @result,$value; } } $result ||= $self->update_filter($filter,\@result); } $result; } sub _search_attributes { my $self = shift; my ($search_string,$attribute_array,$limit) = @_; $search_string =~ tr/*?//d; my @words = map {quotemeta($_)} $search_string =~ /(\w+)/g; my $search = join '|',@words; my $index = $self->index_db('attributes'); my $db = tied(%$index); my (%results,%notes); for my $tag (@$attribute_array) { my $id; my $key = "\L$tag:\E"; for (my $status = $db->seq($key,$id,R_CURSOR); $status == 0 and $key =~ /^$tag:(.*)/i; $status = $db->seq($key,$id,R_NEXT)) { my $text = $1; next unless $text =~ /$search/; for my $w (@words) { my @hits = $text =~ /($w)/ig or next; $results{$id} += @hits; } $notes{$id} .= "$text "; } } my @results; for my $id (keys %results) { my $hits = $results{$id}; my $note = $notes{$id}; $note =~ s/\s+$//; my $relevance = 10 * $hits; my $feature = $self->fetch($id) or next; my $name = $feature->display_name or next; my $type = $feature->type; push @results,[$name,$note,$relevance,$type,$id]; } return @results; } sub search_notes { my $self = shift; my ($search_string,$limit) = @_; $search_string =~ tr/*?//d; my @results; my @words = map {quotemeta($_)} $search_string =~ /(\w+)/g; my $search = join '|',@words; my (%found,$found); my $note_index = $self->notes_db; seek($note_index,0,0); # back to start while (<$note_index>) { next unless /$search/; chomp; my ($note,$uu) = split "\t"; $found{unpack("u*",$uu)}++; last if $limit && ++$found >= $limit; } my (@features, @matches); for my $idx (keys %found) { my $feature = $self->fetch($idx) or next; my @values = $feature->get_tag_values('Note') if $feature->has_tag('Note'); my $value = "@values"; my $hits; $hits++ while $value =~ /($search)/ig; # count the number of times we were hit push @matches,$hits; push @features,$feature; } for (my $i=0; $i<@matches; $i++) { my $feature = $features[$i]; my $matches = $matches[$i]; my $relevance = 10 * $matches; my $note; $note = join ' ',$feature->get_tag_values('Note') if $feature->has_tag('Note'); push @results,[$feature->display_name,$note,$relevance]; } return @results; } sub glob_match { my $self = shift; my $term = shift; return unless $term =~ /([^*?]*)(?:^|[^\\])?[*?]/; my $stem = $1; $term =~ s/(^|[^\\])([+\[\]^{}\$|\(\).])/$1\\$2/g; $term =~ s/(^|[^\\])\*/$1.*/g; $term =~ s/(^|[^\\])\?/$1./g; return ($stem,$term); } sub update_filter { my $self = shift; my ($filter,$results) = @_; return unless @$results; if (%$filter) { my @filtered = grep {$filter->{$_}} @$results; %$filter = map {$_=>1} @filtered; } else { %$filter = map {$_=>1} @$results; } } sub types { my $self = shift; eval "require Bio::DB::GFF::Typename" unless Bio::DB::GFF::Typename->can('new'); my $index = $self->index_db('types'); my $db = tied(%$index); return map {Bio::DB::GFF::Typename->new($_)} keys %$index; } # this is ugly sub _insert_sequence { my $self = shift; my ($seqid,$seq,$offset) = @_; my $dna_fh = $self->private_fasta_file or return; if ($offset == 0) { # start of the sequence print $dna_fh ">$seqid\n"; } print $dna_fh $seq,"\n"; } sub _fetch_sequence { my $self = shift; my ($seqid,$start,$end) = @_; my $db = $self->dna_db or return; return $db->seq($seqid,$start,$end); } sub private_fasta_file { my $self = shift; return $self->{fasta_fh} if exists $self->{fasta_fh}; $self->{fasta_file} = $self->_qualify("sequence.fa"); return $self->{fasta_fh} = IO::File->new($self->{fasta_file},">"); } sub finish_bulk_update { my $self = shift; if (my $fh = $self->{fasta_fh}) { $fh->close; $self->{fasta_db} = Bio::DB::Fasta::Subdir->new($self->{fasta_file}); } } sub db_version { my $self = shift; my $db = $self->db; return $db->{'.version'} || 1.00; } sub DESTROY { my $self = shift; $self->_close_databases(); $self->private_fasta_file->close; rmtree($self->directory,0,1) if $self->temporary && -e $self->directory; } # TIE interface -- a little annoying because we are storing magic ".variable" # meta-variables in the same data structure as the IDs, so these variables # must be skipped. sub _firstid { my $self = shift; my $db = $self->db; my ($key,$value); while ( ($key,$value) = each %{$db}) { last unless $key =~ /^\./; } $key; } sub _nextid { my $self = shift; my $id = shift; my $db = $self->db; my ($key,$value); while ( ($key,$value) = each %$db) { last unless $key =~ /^\./; } $key; } sub _existsid { my $self = shift; my $id = shift; return exists $self->db->{$id}; } sub _deleteid { my $self = shift; my $id = shift; my $obj = $self->fetch($id) or return; $self->_delete_indexes($obj,$id); delete $self->db->{$id}; 1; } sub _clearall { my $self = shift; $self->_close_databases(); $self->_delete_databases(); my ($write,$create) = $self->_permissions; $self->_open_databases($write,$create); } sub _featurecount { my $self = shift; return scalar %{$self->db}; } package Bio::DB::SeqFeature::Store::berkeleydb::Iterator; $Bio::DB::SeqFeature::Store::berkeleydb::Iterator::VERSION = '1.7.4'; sub new { my $class = shift; my $store = shift; my $ids = shift; return bless {store => $store, ids => $ids},ref($class) || $class; } sub next_seq { my $self = shift; my $store = $self->{store} or return; my $id = shift @{$self->{ids}}; defined $id or return; return $store->fetch($id); } package Bio::DB::Fasta::Subdir; $Bio::DB::Fasta::Subdir::VERSION = '1.7.4'; use base 'Bio::DB::Fasta'; # alter calling arguments so that the index file is placed in a subdirectory # named "indexes" sub new { my ($class, $path, %opts) = @_; if (-d $path) { $opts{-index_name} = File::Spec->catfile($path,'indexes','fasta.index'); } return Bio::DB::Fasta->new($path, %opts); } sub _calculate_offsets { my ($self, @args) = @_; return $self->SUPER::_calculate_offsets(@args); } 1; __END__ =head1 BUGS This is an early version, so there are certainly some bugs. Please use the BioPerl bug tracking system to report bugs. =head1 SEE ALSO L, L, L, L, L, L, L, =head1 AUTHOR Lincoln Stein Elstein@cshl.orgE. Copyright (c) 2006 Cold Spring Harbor Laboratory. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut GFF2Loader.pm100644000766000024 3306613605523026 24021 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeature/Storepackage Bio::DB::SeqFeature::Store::GFF2Loader; $Bio::DB::SeqFeature::Store::GFF2Loader::VERSION = '1.7.4'; # $Id: GFF2Loader.pm 11755 2007-11-08 02:19:29Z cjfields $ =head1 NAME Bio::DB::SeqFeature::Store::GFF2Loader -- GFF2 file loader for Bio::DB::SeqFeature::Store =head1 SYNOPSIS use Bio::DB::SeqFeature::Store; use Bio::DB::SeqFeature::Store::GFF2Loader; # Open the sequence database my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'DBI::mysql', -dsn => 'dbi:mysql:test', -write => 1 ); my $loader = Bio::DB::SeqFeature::Store::GFF2Loader->new(-store => $db, -verbose => 1, -fast => 1); $loader->load('./my_genome.gff'); =head1 DESCRIPTION The Bio::DB::SeqFeature::Store::GFF2Loader object parsers GFF2-format sequence annotation files and loads Bio::DB::SeqFeature::Store databases. For certain combinations of SeqFeature classes and SeqFeature::Store databases it features a "fast load" mode which will greatly accelerate the loading of GFF2 databases by a factor of 5-10. The GFF2 file format has been extended very slightly to accommodate Bio::DB::SeqFeature::Store. First, the loader recognizes is a new directive: # #index-subfeatures [0|1] Note that you can place a space between the two #'s in order to prevent GFF2 validators from complaining. If this is true, then subfeatures are indexed (the default) so that they can be retrieved with a query. See L for an explanation of this. If false, then subfeatures can only be accessed through their parent feature. The default is to index all subfeatures. Second, the loader recognizes a new attribute tag called index, which if present, controls indexing of the current feature. Example: ctg123 . TF_binding_site 1000 1012 . + . ID=tfbs00001;index=1 You can use this to turn indexing on and off, overriding the default for a particular feature. =cut # load utility - incrementally load the store based on GFF2 file # # two modes: # slow mode -- features can occur in any order in the GFF2 file # fast mode -- all features with same ID must be contiguous in GFF2 file use strict; use Carp 'croak'; use Bio::DB::GFF::Util::Rearrange; use Text::ParseWords 'quotewords'; use base 'Bio::DB::SeqFeature::Store::GFF3Loader'; my %Special_attributes =( Gap => 1, Target => 1, Parent => 1, Name => 1, Alias => 1, ID => 1, index => 1, Index => 1, ); =head2 new Title : new Usage : $loader = Bio::DB::SeqFeature::Store::GFF2Loader->new(@options) Function: create a new parser Returns : a Bio::DB::SeqFeature::Store::GFF2Loader gff2 parser and loader Args : several - see below Status : public This method creates a new GFF2 loader and establishes its connection with a Bio::DB::SeqFeature::Store database. Arguments are -name=E$value pairs as described in this table: Name Value ---- ----- -store A writable Bio::DB::SeqFeature::Store database handle. -seqfeature_class The name of the type of Bio::SeqFeatureI object to create and store in the database (Bio::DB::SeqFeature by default) -sf_class A shorter alias for -seqfeature_class -verbose Send progress information to standard error. -fast If true, activate fast loading (see below) -chunk_size Set the storage chunk size for nucleotide/protein sequences (default 2000 bytes) -tmp Indicate a temporary directory to use when loading non-normalized features. When you call new(), a connection to a Bio::DB::SeqFeature::Store database should already have been established and the database initialized (if appropriate). Some combinations of Bio::SeqFeatures and Bio::DB::SeqFeature::Store databases support a fast loading mode. Currently the only reliable implementation of fast loading is the combination of DBI::mysql with Bio::DB::SeqFeature. The other important restriction on fast loading is the requirement that a feature that contains subfeatures must occur in the GFF2 file before any of its subfeatures. Otherwise the subfeatures that occurred before the parent feature will not be attached to the parent correctly. This restriction does not apply to normal (slow) loading. If you use an unnormalized feature class, such as Bio::SeqFeature::Generic, then the loader needs to create a temporary database in which to cache features until all their parts and subparts have been seen. This temporary databases uses the "berkeleydb" adaptor. The -tmp option specifies the directory in which that database will be created. If not present, it defaults to the system default tmp directory specified by File::Spec-Etmpdir(). The -chunk_size option allows you to tune the representation of DNA/Protein sequence in the Store database. By default, sequences are split into 2000 base/residue chunks and then reassembled as needed. This avoids the problem of pulling a whole chromosome into memory in order to fetch a short subsequence from somewhere in the middle. Depending on your usage patterns, you may wish to tune this parameter using a chunk size that is larger or smaller than the default. =cut # sub new { } inherited =head2 load Title : load Usage : $count = $loader->load(@ARGV) Function: load the indicated files or filehandles Returns : number of feature lines loaded Args : list of files or filehandles Status : public Once the loader is created, invoke its load() method with a list of GFF2 or FASTA file paths or previously-opened filehandles in order to load them into the database. Compressed files ending with .gz, .Z and .bz2 are automatically recognized and uncompressed on the fly. Paths beginning with http: or ftp: are treated as URLs and opened using the LWP GET program (which must be on your path). FASTA files are recognized by their initial "E" character. Do not feed the loader a file that is neither GFF2 nor FASTA; I don't know what will happen, but it will probably not be what you expect. =cut # sub load { } inherited =head2 accessors The following read-only accessors return values passed or created during new(): store() the long-term Bio::DB::SeqFeature::Store object tmp_store() the temporary Bio::DB::SeqFeature::Store object used during loading sfclass() the Bio::SeqFeatureI class fast() whether fast loading is active seq_chunk_size() the sequence chunk size verbose() verbose progress messages =cut # sub store inherited # sub tmp_store inherited # sub sfclass inherited # sub fast inherited # sub seq_chunk_size inherited # sub verbose inherited =head2 Internal Methods The following methods are used internally and may be overridden by subclasses. =over 4 =item default_seqfeature_class $class = $loader->default_seqfeature_class Return the default SeqFeatureI class (Bio::DB::SeqFeature). =cut # sub default_seqfeature_class { } inherited =item subfeatures_normalized $flag = $loader->subfeatures_normalized([$new_flag]) Get or set a flag that indicates that the subfeatures are normalized. This is deduced from the SeqFeature class information. =cut # sub subfeatures_normalized { } inherited =item subfeatures_in_table $flag = $loader->subfeatures_in_table([$new_flag]) Get or set a flag that indicates that feature/subfeature relationships are stored in a table. This is deduced from the SeqFeature class and Store information. =cut # sub subfeatures_in_table { } inherited =item load_fh $count = $loader->load_fh($filehandle) Load the GFF2 data at the other end of the filehandle and return true if successful. Internally, load_fh() invokes: start_load(); do_load($filehandle); finish_load(); =cut # sub load_fh { } inherited =item start_load, finish_load These methods are called at the start and end of a filehandle load. =cut # sub create_load_data { } #inherited # sub finish_load { } #inherite =item do_load $count = $loader->do_load($fh) This is called by load_fh() to load the GFF2 file's filehandle and return the number of lines loaded. =cut # sub do_load { } inherited =item load_line $loader->load_line($data); Load a line of a GFF2 file. You must bracket this with calls to start_load() and finish_load()! $loader->start_load(); $loader->load_line($_) while ; $loader->finish_load(); =cut # sub load_line { } # inherited =item handle_meta $loader->handle_meta($meta_directive) This method is called to handle meta-directives such as ##sequence-region. The method will receive the directive with the initial ## stripped off. =cut # sub handle_meta {} # inherited =item handle_feature $loader->handle_feature($gff2_line) This method is called to process a single GFF2 line. It manipulates information stored a data structure called $self-E{load_data}. =cut # sub handle_feature { } # inherited =item store_current_feature $loader->store_current_feature() This method is called to store the currently active feature in the database. It uses a data structure stored in $self-E{load_data}. =cut # sub store_current_feature { } inherited =item build_object_tree $loader->build_object_tree() This method gathers together features and subfeatures and builds the graph that connects them. =cut # sub build_object_tree { } # inherited =item build_object_tree_in_tables $loader->build_object_tree_in_tables() This method gathers together features and subfeatures and builds the graph that connects them, assuming that parent/child relationships will be stored in a database table. =cut # sub build_object_tree_in_tables { } # inherited =item build_object_tree_in_features $loader->build_object_tree_in_features() This method gathers together features and subfeatures and builds the graph that connects them, assuming that parent/child relationships are stored in the seqfeature objects themselves. =cut # sub build_object_tree_in_features { } # inherited =item attach_children $loader->attach_children($store,$load_data,$load_id,$feature) This recursively adds children to features and their subfeatures. It is called when subfeatures are directly contained within other features, rather than stored in a relational table. =cut # sub attach_children { } # inherited =item fetch my $feature = $loader->fetch($load_id) Given a load ID (from the ID= attribute) this method returns the feature from the temporary database or the permanent one, depending on where it is stored. =cut # sub fetch { } # inherited =item add_segment $loader->add_segment($parent,$child) This method is used to add a split location to the parent. =cut # sub add_segment { } # inherited =item parse_attributes ($reserved,$unreserved) = $loader->parse_attributes($attribute_line) This method parses the information contained in the $attribute_line into two hashrefs, one containing the values of reserved attribute tags (e.g. ID) and the other containing the values of unreserved ones. =cut sub parse_attributes { # overridden my $self = shift; my $att = shift; my @groups = quotewords('\s*;\s*',0,$att); my (%reserved,%unreserved); my $found_name; for (@groups) { my ($tag,$value); if (/^(\S+)\s+(.+)/) { # Tag value pair ($tag,$value) = ($1,$2); } else { $tag = 'Note'; $value = $_; } if ($tag eq 'Target') { my ($target,$start,$end) = split /\s+/,$value; push @{$reserved{ID}},$target; $found_name++; if ($start <= $end) { $value .= ' +' } else { $value .= ' -' } } if (!$found_name++) { push @{$reserved{Alias}},$value; $value = "$tag:$value"; push @{$reserved{ID}},$value; $tag = 'Name'; } if ($Special_attributes{$tag}) { # reserved attribute push @{$reserved{$tag}},$value; } else { push @{$unreserved{$tag}},$value; } } return (\%reserved,\%unreserved); } =item start_or_finish_sequence $loader->start_or_finish_sequence('Chr9') This method is called at the beginning and end of a fasta section. =cut # sub start_or_finish_sequence { } inherited =item load_sequence $loader->load_sequence('gatttcccaaa') This method is called to load some amount of sequence after start_or_finish_sequence() is first called. =cut # sub load_sequence { } inherited =item open_fh my $io_file = $loader->open_fh($filehandle_or_path) This method opens up the indicated file or pipe, using some intelligence to recognized compressed files and URLs and doing the right thing. =cut # sub open_fh { } inherited # sub msg { } inherited =item time my $time = $loader->time This method returns the current time in seconds, using Time::HiRes if available. =cut # sub time { } inherited =item unescape my $unescaped = GFF2Loader::unescape($escaped) This is an internal utility. It is the same as CGI::Util::unescape, but doesn't change pluses into spaces and ignores unicode escapes. =cut # sub unescape { } inherited 1; __END__ =back =head1 BUGS This is an early version, so there are certainly some bugs. Please use the BioPerl bug tracking system to report bugs. =head1 SEE ALSO L, L, L, L, L, L, L =head1 AUTHOR Lincoln Stein Elstein@cshl.orgE. Copyright (c) 2006 Cold Spring Harbor Laboratory. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut GFF3Loader.pm100644000766000024 7451013605523026 24021 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeature/Storepackage Bio::DB::SeqFeature::Store::GFF3Loader; $Bio::DB::SeqFeature::Store::GFF3Loader::VERSION = '1.7.4'; =head1 NAME Bio::DB::SeqFeature::Store::GFF3Loader -- GFF3 file loader for Bio::DB::SeqFeature::Store =head1 SYNOPSIS use Bio::DB::SeqFeature::Store; use Bio::DB::SeqFeature::Store::GFF3Loader; # Open the sequence database my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'DBI::mysql', -dsn => 'dbi:mysql:test', -write => 1 ); my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store => $db, -verbose => 1, -fast => 1); $loader->load('./my_genome.gff3'); =head1 DESCRIPTION The Bio::DB::SeqFeature::Store::GFF3Loader object parsers GFF3-format sequence annotation files and loads Bio::DB::SeqFeature::Store databases. For certain combinations of SeqFeature classes and SeqFeature::Store databases it features a "fast load" mode which will greatly accelerate the loading of GFF3 databases by a factor of 5-10. The GFF3 file format has been extended very slightly to accommodate Bio::DB::SeqFeature::Store. First, the loader recognizes is a new directive: # #index-subfeatures [0|1] Note that you can place a space between the two #'s in order to prevent GFF3 validators from complaining. If this is true, then subfeatures are indexed (the default) so that they can be retrieved with a query. See L for an explanation of this. If false, then subfeatures can only be accessed through their parent feature. Second, the loader recognizes a new attribute tag called index, which if present, controls indexing of the current feature. Example: ctg123 . TF_binding_site 1000 1012 . + . ID=tfbs00001;index=1 You can use this to turn indexing on and off, overriding the default for a particular feature. Note that the loader keeps a record -- in memory -- of each feature that it has processed. If you find the loader running out of memory on particularly large GFF3 files, please split the input file into smaller pieces and do the load in steps. =cut # load utility - incrementally load the store based on GFF3 file # # two modes: # slow mode -- features can occur in any order in the GFF3 file # fast mode -- all features with same ID must be contiguous in GFF3 file use strict; use Carp 'croak'; use Bio::DB::GFF::Util::Rearrange; use Bio::DB::SeqFeature::Store::LoadHelper; use constant DEBUG => 0; use base 'Bio::DB::SeqFeature::Store::Loader'; my %Special_attributes =( Gap => 1, Target => 1, Parent => 1, Name => 1, Alias => 1, ID => 1, index => 1, Index => 1, ); my %Strandedness = ( '+' => 1, '-' => -1, '.' => 0, '' => 0, 0 => 0, 1 => 1, -1 => -1, +1 => 1, undef => 0, ); =head2 new Title : new Usage : $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(@options) Function: create a new parser Returns : a Bio::DB::SeqFeature::Store::GFF3Loader gff3 parser and loader Args : several - see below Status : public This method creates a new GFF3 loader and establishes its connection with a Bio::DB::SeqFeature::Store database. Arguments are -name=E$value pairs as described in this table: Name Value ---- ----- -store A writable Bio::DB::SeqFeature::Store database handle. -seqfeature_class The name of the type of Bio::SeqFeatureI object to create and store in the database (Bio::DB::SeqFeature by default) -sf_class A shorter alias for -seqfeature_class -verbose Send progress information to standard error. -fast If true, activate fast loading (see below) -chunk_size Set the storage chunk size for nucleotide/protein sequences (default 2000 bytes) -tmp Indicate a temporary directory to use when loading non-normalized features. -ignore_seqregion Ignore ##sequence-region directives. The default is to create a feature corresponding to the directive. -noalias_target Don't create an Alias attribute for a target_id named in a Target attribute. The default is to create an Alias attribute containing the target_id found in a Target attribute. When you call new(), a connection to a Bio::DB::SeqFeature::Store database should already have been established and the database initialized (if appropriate). Some combinations of Bio::SeqFeatures and Bio::DB::SeqFeature::Store databases support a fast loading mode. Currently the only reliable implementation of fast loading is the combination of DBI::mysql with Bio::DB::SeqFeature. The other important restriction on fast loading is the requirement that a feature that contains subfeatures must occur in the GFF3 file before any of its subfeatures. Otherwise the subfeatures that occurred before the parent feature will not be attached to the parent correctly. This restriction does not apply to normal (slow) loading. If you use an unnormalized feature class, such as Bio::SeqFeature::Generic, then the loader needs to create a temporary database in which to cache features until all their parts and subparts have been seen. This temporary databases uses the "berkeleydb" adaptor. The -tmp option specifies the directory in which that database will be created. If not present, it defaults to the system default tmp directory specified by File::Spec-Etmpdir(). The -chunk_size option allows you to tune the representation of DNA/Protein sequence in the Store database. By default, sequences are split into 2000 base/residue chunks and then reassembled as needed. This avoids the problem of pulling a whole chromosome into memory in order to fetch a short subsequence from somewhere in the middle. Depending on your usage patterns, you may wish to tune this parameter using a chunk size that is larger or smaller than the default. =cut sub new { my $class = shift; my $self = $class->SUPER::new(@_); my ($ignore_seqregion) = rearrange(['IGNORE_SEQREGION'],@_); $self->ignore_seqregion($ignore_seqregion); my ($noalias_target) = rearrange(['NOALIAS_TARGET'],@_); $self->noalias_target($noalias_target); $self; } =head2 ignore_seqregion $ignore_it = $loader->ignore_seqregion([$new_flag]) Get or set the ignore_seqregion flag, which if true, will cause GFF3 ##sequence-region directives to be ignored. The default behavior is to create a feature corresponding to the region. =cut sub ignore_seqregion { my $self = shift; my $d = $self->{ignore_seqregion}; $self->{ignore_seqregion} = shift if @_; $d; } =head2 noalias_target $noalias_target = $loader->noalias_target([$new_flag]) Get or set the noalias_target flag, which if true, will disable the creation of an Alias attribute for a target_id named in a Target attribute. The default is to create an Alias attribute containing the target_id found in a Target attribute. =cut sub noalias_target { my $self = shift; my $d = $self->{noalias_target}; $self->{noalias_target} = shift if @_; $d; } =head2 load Title : load Usage : $count = $loader->load(@ARGV) Function: load the indicated files or filehandles Returns : number of feature lines loaded Args : list of files or filehandles Status : public Once the loader is created, invoke its load() method with a list of GFF3 or FASTA file paths or previously-opened filehandles in order to load them into the database. Compressed files ending with .gz, .Z and .bz2 are automatically recognized and uncompressed on the fly. Paths beginning with http: or ftp: are treated as URLs and opened using the LWP GET program (which must be on your path). FASTA files are recognized by their initial "E" character. Do not feed the loader a file that is neither GFF3 nor FASTA; I don't know what will happen, but it will probably not be what you expect. =cut # sub load { } inherited =head2 accessors The following read-only accessors return values passed or created during new(): store() the long-term Bio::DB::SeqFeature::Store object tmp_store() the temporary Bio::DB::SeqFeature::Store object used during loading sfclass() the Bio::SeqFeatureI class fast() whether fast loading is active seq_chunk_size() the sequence chunk size verbose() verbose progress messages =cut # sub store inherited # sub tmp_store inherited # sub sfclass inherited # sub fast inherited # sub seq_chunk_size inherited # sub verbose inherited =head2 Internal Methods The following methods are used internally and may be overridden by subclasses. =over 4 =item default_seqfeature_class $class = $loader->default_seqfeature_class Return the default SeqFeatureI class (Bio::DB::SeqFeature). =cut # sub default_seqfeature_class { } inherited =item subfeatures_normalized $flag = $loader->subfeatures_normalized([$new_flag]) Get or set a flag that indicates that the subfeatures are normalized. This is deduced from the SeqFeature class information. =cut # sub subfeatures_normalized { } inherited =item subfeatures_in_table $flag = $loader->subfeatures_in_table([$new_flag]) Get or set a flag that indicates that feature/subfeature relationships are stored in a table. This is deduced from the SeqFeature class and Store information. =cut # sub subfeatures_in_table { } inherited =item load_fh $count = $loader->load_fh($filehandle) Load the GFF3 data at the other end of the filehandle and return true if successful. Internally, load_fh() invokes: start_load(); do_load($filehandle); finish_load(); =cut # sub load_fh { } inherited =item start_load, finish_load These methods are called at the start and end of a filehandle load. =cut sub create_load_data { #overridden my $self = shift; $self->SUPER::create_load_data; $self->{load_data}{TemporaryID} = "GFFLoad0000000"; $self->{load_data}{IndexSubfeatures} = $self->index_subfeatures(); $self->{load_data}{mode} = 'gff'; $self->{load_data}{Helper} = Bio::DB::SeqFeature::Store::LoadHelper->new($self->{tmpdir}); } sub finish_load { #overridden my $self = shift; $self->store_current_feature(); # during fast loading, we will have a feature left at the very end $self->start_or_finish_sequence(); # finish any half-loaded sequences $self->msg("Building object tree..."); my $start = $self->time(); $self->build_object_tree; $self->msg(sprintf "%5.2fs\n",$self->time()-$start); if ($self->fast) { $self->msg("Loading bulk data into database..."); $start = $self->time(); $self->store->finish_bulk_update; $self->msg(sprintf "%5.2fs\n",$self->time()-$start); } eval {$self->store->commit}; # don't delete load data so that caller can ask for the loaded IDs # $self->delete_load_data; } =item do_load $count = $loader->do_load($fh) This is called by load_fh() to load the GFF3 file's filehandle and return the number of lines loaded. =cut # sub do_load { } inherited =item load_line $loader->load_line($data); Load a line of a GFF3 file. You must bracket this with calls to start_load() and finish_load()! $loader->start_load(); $loader->load_line($_) while ; $loader->finish_load(); =cut sub load_line { #overridden my $self = shift; my $line = shift; chomp($line); my $load_data = $self->{load_data}; $load_data->{line}++; return unless $line =~ /^\S/; # blank line # if it has a tab in it or looks like a chrom.sizes file, switch to gff mode $load_data->{mode} = 'gff' if $line =~ /\t/ or $line =~ /^\w+\s+\d+\s*$/; if ($line =~ /^\#\s?\#\s*(.+)/) { ## meta instruction $load_data->{mode} = 'gff'; $self->handle_meta($1); } elsif ($line =~ /^\#/) { $load_data->{mode} = 'gff'; # just to be safe return; # comment } elsif ($line =~ /^>\s*(\S+)/) { # FASTA lines are coming $load_data->{mode} = 'fasta'; $self->start_or_finish_sequence($1); } elsif ($load_data->{mode} eq 'fasta') { $self->load_sequence($line); } elsif ($load_data->{mode} eq 'gff') { $self->handle_feature($line); if (++$load_data->{count} % 1000 == 0) { my $now = $self->time(); my $nl = -t STDOUT && !$ENV{EMACS} ? "\r" : "\n"; local $^W = 0; # kill uninit variable warning $self->msg(sprintf("%d features loaded in %5.2fs (%5.2fs/1000 features)...%s$nl", $load_data->{count},$now - $load_data->{start_time}, $now - $load_data->{millenium_time}, ' ' x 80 )); $load_data->{millenium_time} = $now; } } else { $self->throw("I don't know what to do with this line:\n$line"); } } =item handle_meta $loader->handle_meta($meta_directive) This method is called to handle meta-directives such as ##sequence-region. The method will receive the directive with the initial ## stripped off. =cut sub handle_meta { my $self = shift; my $instruction = shift; if ( $instruction =~ /^#$/ ) { $self->store_current_feature() ; # during fast loading, we will have a feature left at the very end $self->start_or_finish_sequence(); # finish any half-loaded sequences if ( $self->store->can('handle_resolution_meta') ) { $self->store->handle_resolution_meta($instruction); } return; } if ($instruction =~ /sequence-region\s+(.+)\s+(-?\d+)\s+(-?\d+)/i && !$self->ignore_seqregion()) { my($ref,$start,$end,$strand) = $self->_remap($1,$2,$3,+1); my $feature = $self->sfclass->new(-name => $ref, -seq_id => $ref, -start => $start, -end => $end, -strand => $strand, -primary_tag => 'region'); $self->store->store($feature); return; } if ($instruction =~/index-subfeatures\s+(\S+)/i) { $self->{load_data}{IndexSubfeatures} = $1; $self->store->index_subfeatures($1); return; } if ( $self->store->can('handle_unrecognized_meta') ) { $self->store->handle_unrecognized_meta($instruction); return; } } =item handle_feature $loader->handle_feature($gff3_line) This method is called to process a single GFF3 line. It manipulates information stored a data structure called $self-E{load_data}. =cut sub handle_feature { #overridden my $self = shift; my $gff_line = shift; my $ld = $self->{load_data}; my $allow_whitespace = $self->allow_whitespace; # special case for a chrom.sizes-style line my @columns; if ($gff_line =~ /^(\w+)\s+(\d+)\s*$/) { @columns = ($1,undef,'chromosome',1,$2,undef,undef,undef,"Name=$1"); } else { $gff_line =~ s/\s+/\t/g if $allow_whitespace; @columns = map {$_ eq '.' ? undef : $_ } split /\t/,$gff_line; } $self->invalid_gff($gff_line) if @columns < 4; $self->invalid_gff($gff_line) if @columns > 9 && $allow_whitespace; { local $^W = 0; if (@columns > 9) { #oops, split too much due to whitespace $columns[8] = join(' ',@columns[8..$#columns]); } } my ($refname,$source,$method,$start,$end,$score,$strand,$phase,$attributes) = @columns; $self->invalid_gff($gff_line) unless defined $refname; $self->invalid_gff($gff_line) unless !defined $start || $start =~ /^[\d.-]+$/; $self->invalid_gff($gff_line) unless !defined $end || $end =~ /^[\d.-]+$/; $self->invalid_gff($gff_line) unless defined $method; $strand = $Strandedness{$strand||0}; my ($reserved,$unreserved) = $attributes ? $self->parse_attributes($attributes) : (); my $name = ($reserved->{Name} && $reserved->{Name}[0]); my $has_loadid = defined $reserved->{ID}[0]; my $feature_id = defined $reserved->{ID}[0] ? $reserved->{ID}[0] : $ld->{TemporaryID}++; my @parent_ids = @{$reserved->{Parent}} if defined $reserved->{Parent}; my $index_it = $ld->{IndexSubfeatures}; if (exists $reserved->{Index} || exists $reserved->{index}) { $index_it = $reserved->{Index}[0] || $reserved->{index}[0]; } # Everything in the unreserved hash becomes an attribute, so we copy # some attributes over $unreserved->{Note} = $reserved->{Note} if exists $reserved->{Note}; $unreserved->{Alias} = $reserved->{Alias} if exists $reserved->{Alias}; $unreserved->{Target} = $reserved->{Target} if exists $reserved->{Target}; $unreserved->{Gap} = $reserved->{Gap} if exists $reserved->{Gap}; $unreserved->{load_id}= $reserved->{ID} if exists $reserved->{ID}; # mec@stowers-institute.org, wondering why not all attributes are # carried forward, adds ID tag in particular service of # round-tripping ID, which, though present in database as load_id # attribute, was getting lost as itself # $unreserved->{ID}= $reserved->{ID} if exists $reserved->{ID}; # TEMPORARY HACKS TO SIMPLIFY DEBUGGING $feature_id = '' unless defined $feature_id; $name = '' unless defined $name; # prevent uninit variable warnings # push @{$unreserved->{Alias}},$feature_id if $has_loadid && $feature_id ne $name; # If DEBUG != 0, any Parent attribute is also copied over (as 'parent_id') $unreserved->{parent_id} = \@parent_ids if DEBUG && @parent_ids; # POSSIBLY A PERMANENT HACK -- TARGETS BECOME ALIASES # THIS IS TO ALLOW FOR TARGET-BASED LOOKUPS if (exists $reserved->{Target} && !$self->{noalias_target}) { my %aliases = map {$_=>1} @{$unreserved->{Alias}}; for my $t (@{$reserved->{Target}}) { (my $tc = $t) =~ s/\s+.*$//; # get rid of coordinates $name ||= $tc; push @{$unreserved->{Alias}},$tc unless $name eq $tc || $aliases{$tc}; } } ($refname,$start,$end,$strand) = $self->_remap($refname,$start,$end,$strand) or return; my @args = (-display_name => $name, -seq_id => $refname, -start => $start, -end => $end, -strand => $strand || 0, -score => $score, -phase => $phase, -primary_tag => $method || 'feature', -source => $source, -tag => $unreserved, -attributes => $unreserved, ); # Here's where we handle feature lines that have the same ID (multiple locations, not # parent/child relationships) my $old_feat; # Current feature is the same as the previous feature, which hasn't yet been loaded if (defined $ld->{CurrentID} && $ld->{CurrentID} eq $feature_id) { $old_feat = $ld->{CurrentFeature}; } # Current feature is the same as a feature that was loaded earlier elsif (defined(my $id = $self->{load_data}{Helper}->local2global($feature_id))) { $old_feat = $self->fetch($feature_id) or $self->warn(<{TemporaryID}++; # AND they have a Parent attribute, this causes an undesirable } # additional layer of aggregation. Changing the ID fixes this. elsif ( $old_feat->seq_id ne $refname || $old_feat->start != $start || $old_feat->end != $end # make sure endpoints are distinct ) { $self->add_segment($old_feat,$self->sfclass->new(@args)); return; } } # we get here if this is a new feature # first of all, store the current feature if it is there $self->store_current_feature() if defined $ld->{CurrentID}; # now create the new feature # (index top-level features only if policy asks us to) my $feature = $self->sfclass->new(@args); $feature->object_store($self->store) if $feature->can('object_store'); # for lazy table features $ld->{CurrentFeature} = $feature; $ld->{CurrentID} = $feature_id; my $top_level = !@parent_ids; my $has_id = defined $reserved->{ID}[0]; $index_it ||= $top_level; my $helper = $ld->{Helper}; $helper->indexit($feature_id=>1) if $index_it; $helper->toplevel($feature_id=>1) if !$self->{fast} && $top_level; # need to track top level features # remember parentage for my $parent (@parent_ids) { $helper->add_children($parent=>$feature_id); } } sub invalid_gff { my $self = shift; my $line = shift; $self->throw("invalid GFF line at line $self->{load_data}{line}.\n".$line); } =item allow_whitespace $allow_it = $loader->allow_whitespace([$newvalue]); Get or set the allow_whitespace flag. If true, then GFF3 files are allowed to be delimited with whitespace in addition to tabs. =cut sub allow_whitespace { my $self = shift; my $d = $self->{allow_whitespace}; $self->{allow_whitespace} = shift if @_; $d; } =item store_current_feature $loader->store_current_feature() This method is called to store the currently active feature in the database. It uses a data structure stored in $self-E{load_data}. =cut # sub store_current_feature { } inherited =item build_object_tree $loader->build_object_tree() This method gathers together features and subfeatures and builds the graph that connects them. =cut ### # put objects together # sub build_object_tree { my $self = shift; $self->subfeatures_in_table ? $self->build_object_tree_in_tables : $self->build_object_tree_in_features; } =item build_object_tree_in_tables $loader->build_object_tree_in_tables() This method gathers together features and subfeatures and builds the graph that connects them, assuming that parent/child relationships will be stored in a database table. =cut sub build_object_tree_in_tables { my $self = shift; my $store = $self->store; my $helper = $self->{load_data}{Helper}; while (my ($load_id,$children) = $helper->each_family()) { my $parent_id = $helper->local2global($load_id); die $self->throw("$load_id doesn't have a primary id") unless defined $parent_id; my @children = map {$helper->local2global($_)} @$children; # this updates the table that keeps track of parent/child relationships, # but does not update the parent object -- so (start,end) had better be right!!! $store->add_SeqFeature($parent_id,@children); } } =item build_object_tree_in_features $loader->build_object_tree_in_features() This method gathers together features and subfeatures and builds the graph that connects them, assuming that parent/child relationships are stored in the seqfeature objects themselves. =cut sub build_object_tree_in_features { my $self = shift; my $store = $self->store; my $tmp = $self->tmp_store; my $ld = $self->{load_data}; my $normalized = $self->subfeatures_normalized; my $helper = $ld->{Helper}; while (my $load_id = $helper->each_toplevel) { my $feature = $self->fetch($load_id) or $self->throw("$load_id (id=" .$helper->local2global($load_id) ." should have a database entry, but doesn't"); $self->attach_children($store,$ld,$load_id,$feature); # Indexed objects are updated, not created anew $feature->primary_id(undef) unless $helper->indexit($load_id); $store->store($feature); } } =item attach_children $loader->attach_children($store,$load_data,$load_id,$feature) This recursively adds children to features and their subfeatures. It is called when subfeatures are directly contained within other features, rather than stored in a relational table. =cut sub attach_children { my $self = shift; my ($store,$ld,$load_id,$feature) = @_; my $children = $ld->{Helper}->children() or return; for my $child_id (@$children) { my $child = $self->fetch($child_id) or $self->throw("$child_id should have a database entry, but doesn't"); $self->attach_children($store,$ld,$child_id,$child); # recursive call $feature->add_SeqFeature($child); } } =item fetch my $feature = $loader->fetch($load_id) Given a load ID (from the ID= attribute) this method returns the feature from the temporary database or the permanent one, depending on where it is stored. =cut sub fetch { my $self = shift; my $load_id = shift; my $helper = $self->{load_data}{Helper}; my $id = $helper->local2global($load_id); return ($self->subfeatures_normalized || $helper->indexit($load_id) ? $self->store->fetch($id) : $self->tmp_store->fetch($id) ); } =item add_segment $loader->add_segment($parent,$child) This method is used to add a split location to the parent. =cut sub add_segment { my $self = shift; my ($parent,$child) = @_; if ($parent->can('add_segment')) { # probably a lazy table feature my $segment_count = $parent->can('denormalized_segment_count') ? $parent->denormalized_segment_count : $parent->can('denormalized_segments ') ? $parent->denormalized_segments : $parent->can('segments') ? $parent->segments : 0; unless ($segment_count) { # convert into a segmented object my $segment; if ($parent->can('clone')) { $segment = $parent->clone; } else { my %clone = %$parent; $segment = bless \%clone,ref $parent; } delete $segment->{segments}; eval {$segment->object_store(undef) }; $segment->primary_id(undef); # this updates the object and expands its start and end positions without writing # the segments into the database as individual objects $parent->add_segment($segment); } $parent->add_segment($child); 1; # for debugging } # a conventional Bio::SeqFeature::Generic object - create a split location else { my $current_location = $parent->location; if ($current_location->can('add_sub_Location')) { $current_location->add_sub_Location($child->location); } else { eval "require Bio::Location::Split" unless Bio::Location::Split->can('add_sub_Location'); my $new_location = Bio::Location::Split->new(); $new_location->add_sub_Location($current_location); $new_location->add_sub_Location($child->location); $parent->location($new_location); } } } =item parse_attributes ($reserved,$unreserved) = $loader->parse_attributes($attribute_line) This method parses the information contained in the $attribute_line into two hashrefs, one containing the values of reserved attribute tags (e.g. ID) and the other containing the values of unreserved ones. =cut sub parse_attributes { my $self = shift; my $att = shift; unless ($att =~ /=/) { # ouch! must be a GFF line require Bio::DB::SeqFeature::Store::GFF2Loader unless Bio::DB::SeqFeature::Store::GFF2Loader->can('parse_attributes'); return $self->Bio::DB::SeqFeature::Store::GFF2Loader::parse_attributes($att); } my @pairs = map { my ($name,$value) = split '='; [$self->unescape($name) => $value]; } split ';',$att; my (%reserved,%unreserved); foreach (@pairs) { my $tag = $_->[0]; unless (defined $_->[1]) { warn "$tag does not have a value at GFF3 file line $.\n"; next; } my @values = split ',',$_->[1]; map {$_ = $self->unescape($_);} @values; if ($Special_attributes{$tag}) { # reserved attribute push @{$reserved{$tag}},@values; } else { push @{$unreserved{$tag}},@values } } return (\%reserved,\%unreserved); } =item start_or_finish_sequence $loader->start_or_finish_sequence('Chr9') This method is called at the beginning and end of a fasta section. =cut # sub start_or_finish_sequence { } inherited =item load_sequence $loader->load_sequence('gatttcccaaa') This method is called to load some amount of sequence after start_or_finish_sequence() is first called. =cut # sub load_sequence { } inherited =item open_fh my $io_file = $loader->open_fh($filehandle_or_path) This method opens up the indicated file or pipe, using some intelligence to recognized compressed files and URLs and doing the right thing. =cut # sub open_fh { } inherited # sub msg { } inherited =item time my $time = $loader->time This method returns the current time in seconds, using Time::HiRes if available. =cut # sub time { } inherited =item unescape my $unescaped = GFF3Loader::unescape($escaped) This is an internal utility. It is the same as CGI::Util::unescape, but doesn't change pluses into spaces and ignores unicode escapes. =cut # sub unescape { } inherited sub _remap { my $self = shift; my ($ref,$start,$end,$strand) = @_; my $mapper = $self->coordinate_mapper; return ($ref,$start,$end,$strand) unless $mapper; my ($newref,$coords) = $mapper->($ref,[$start,$end]); return unless defined $coords->[0]; if ($coords->[0] > $coords->[1]) { @{$coords} = reverse(@{$coords}); $strand *= -1; } return ($newref,@{$coords},$strand); } sub _indexit { # override my $self = shift; return $self->{load_data}{Helper}->indexit(@_); } sub _local2global { # override my $self = shift; return $self->{load_data}{Helper}->local2global(@_); } =item local_ids my $ids = $self->local_ids; my $id_cnt = @$ids; After performing a load, this returns an array ref containing all the load file IDs that were contained within the file just loaded. =cut sub local_ids { # override my $self = shift; return $self->{load_data}{Helper}->local_ids(@_); } =item loaded_ids my $ids = $loader->loaded_ids; my $id_cnt = @$ids; After performing a load, this returns an array ref containing all the feature primary ids that were created during the load. =cut sub loaded_ids { # override my $self = shift; return $self->{load_data}{Helper}->loaded_ids(@_); } 1; __END__ =back =head1 BUGS This is an early version, so there are certainly some bugs. Please use the BioPerl bug tracking system to report bugs. =head1 SEE ALSO L, L, L, L, L, L =head1 AUTHOR Lincoln Stein Elstein@cshl.orgE. Copyright (c) 2006 Cold Spring Harbor Laboratory. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut SQLite.pm100644000766000024 11221313605523026 23755 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeature/Store/DBIpackage Bio::DB::SeqFeature::Store::DBI::SQLite; $Bio::DB::SeqFeature::Store::DBI::SQLite::VERSION = '1.7.4'; #$Id$ =head1 NAME Bio::DB::SeqFeature::Store::DBI::SQLite -- SQLite implementation of Bio::DB::SeqFeature::Store =head1 SYNOPSIS use Bio::DB::SeqFeature::Store; # Open the sequence database my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::SQLite', -dsn => '/path/to/database.db'); # get a feature from somewhere my $feature = Bio::SeqFeature::Generic->new(...); # store it $db->store($feature) or die "Couldn't store!"; # primary ID of the feature is changed to indicate its primary ID # in the database... my $id = $feature->primary_id; # get the feature back out my $f = $db->fetch($id); # change the feature and update it $f->start(100); $db->update($f) or die "Couldn't update!"; # searching... # ...by id my @features = $db->fetch_many(@list_of_ids); # ...by name @features = $db->get_features_by_name('ZK909'); # ...by alias @features = $db->get_features_by_alias('sma-3'); # ...by type @features = $db->get_features_by_name('gene'); # ...by location @features = $db->get_features_by_location(-seq_id=>'Chr1',-start=>4000,-end=>600000); # ...by attribute @features = $db->get_features_by_attribute({description => 'protein kinase'}) # ...by the GFF "Note" field @result_list = $db->search_notes('kinase'); # ...by arbitrary combinations of selectors @features = $db->features(-name => $name, -type => $types, -seq_id => $seqid, -start => $start, -end => $end, -attributes => $attributes); # ...using an iterator my $iterator = $db->get_seq_stream(-name => $name, -type => $types, -seq_id => $seqid, -start => $start, -end => $end, -attributes => $attributes); while (my $feature = $iterator->next_seq) { # do something with the feature } # ...limiting the search to a particular region my $segment = $db->segment('Chr1',5000=>6000); my @features = $segment->features(-type=>['mRNA','match']); # getting & storing sequence information # Warning: this returns a string, and not a PrimarySeq object $db->insert_sequence('Chr1','GATCCCCCGGGATTCCAAAA...'); my $sequence = $db->fetch_sequence('Chr1',5000=>6000); # what feature types are defined in the database? my @types = $db->types; # create a new feature in the database my $feature = $db->new_feature(-primary_tag => 'mRNA', -seq_id => 'chr3', -start => 10000, -end => 11000); =head1 DESCRIPTION Bio::DB::SeqFeature::Store::SQLite is the SQLite adaptor for Bio::DB::SeqFeature::Store. You will not create it directly, but instead use Bio::DB::SeqFeature::Store-Enew() to do so. See L for complete usage instructions. =head2 Using the SQLite adaptor To establish a connection to the database, call Bio::DB::SeqFeature::Store-Enew(-adaptor=E'DBI::SQLite',@more_args). The additional arguments are as follows: Argument name Description ------------- ----------- -dsn The path to the SQLite database file. -namespace A prefix to attach to each table. This allows you to have several virtual databases in the same physical database. -temp Boolean flag. If true, a temporary database will be created and destroyed as soon as the Store object goes out of scope. (synonym -temporary) -autoindex Boolean flag. If true, features in the database will be reindexed every time they change. This is the default. -fts Boolean flag. If true, when the -create flag is true, the attribute table will be created and indexed index for full-text search using the most recent FTS extension supported by DBD::SQLite. -tmpdir Directory in which to place temporary files during "fast" loading. Defaults to File::Spec->tmpdir(). (synonyms -dump_dir, -dumpdir, -tmp) -dbi_options A hashref to pass to DBI->connect's 4th argument, the "attributes." (synonyms -options, -dbi_attr) -write Pass true to open database for writing or updating. If successful, a new instance of Bio::DB::SeqFeature::Store::DBI::SQLite will be returned. In addition to the standard methods supported by all well-behaved Bio::DB::SeqFeature::Store databases, several following adaptor-specific methods are provided. These are described in the next sections. =cut use strict; use base 'Bio::DB::SeqFeature::Store::DBI::mysql'; use Bio::DB::SeqFeature::Store::DBI::Iterator; use DBD::SQLite; use DBI qw(:sql_types); use Memoize; use Cwd qw(abs_path getcwd); use Bio::DB::GFF::Util::Rearrange 'rearrange'; use Bio::SeqFeature::Lite; use File::Spec; use constant DEBUG=>0; use constant EXPERIMENTAL_COVERAGE=>1; # Using same limits as MySQL adaptor so I don't have to make something up. use constant MAX_INT => 2_147_483_647; use constant MIN_INT => -2_147_483_648; use constant SUMMARY_BIN_SIZE => 1000; # we checkpoint coverage this often, about 20 meg overhead per feature type on hg use constant USE_SPATIAL=>0; # The binning scheme places each feature into a bin. # Bins are variably sized as powers of two. For example, # there are 585 bins of size 2**17 (131072 bases) my (@BINS,%BINS); { @BINS = map {2**$_} (17, 20, 23, 26, 29); # TO DO: experiment with different bin sizes my $start=0; for my $b (sort {$b<=>$a} @BINS) { $BINS{$b} = $start; $start += $BINS[-1]/$b; } } # my %BINS = ( # 2**11 => 37449, # 2**14 => 4681, # 2**17 => 585, # 2**20 => 73, # 2**23 => 9, # 2**26 => 1, # 2**29 => 0 # ); # my @BINS = sort {$a<=>$b} keys %BINS; sub calculate_bin { my $self = shift; my ($start,$end) = @_; my $len = $end - $start; for my $bin (@BINS) { next if $len > $bin; # possibly fits here my $binstart = int $start/$bin; my $binend = int $end/$bin; return $binstart+$BINS{$bin} if $binstart == $binend; } die "unreasonable coordinates ",$start+1,"..$end"; } sub search_bins { my $self = shift; my ($start,$end) = @_; my @results; for my $bin (@BINS) { my $binstart = int $start/$bin; my $binend = int $end/$bin; push @results,$binstart+$BINS{$bin}..$binend+$BINS{$bin}; } return @results; } ### # object initialization # sub init { my $self = shift; my ($dsn, $is_temporary, $autoindex, $namespace, $dump_dir, $user, $pass, $dbi_options, $writeable, $fts, $create, ) = rearrange(['DSN', ['TEMP','TEMPORARY'], 'AUTOINDEX', 'NAMESPACE', ['DUMP_DIR','DUMPDIR','TMP','TMPDIR'], 'USER', ['PASS','PASSWD','PASSWORD'], ['OPTIONS','DBI_OPTIONS','DBI_ATTR'], ['WRITE','WRITEABLE'], 'FTS', 'CREATE', ],@_); $dbi_options ||= {}; $writeable = 1 if $is_temporary or $dump_dir; $dsn or $self->throw("Usage: ".__PACKAGE__."->init(-dsn => \$dbh || \$dsn)"); my $dbh; if (ref $dsn) { $dbh = $dsn; } else { $dsn = "dbi:SQLite:$dsn" unless $dsn =~ /^dbi:/; $dbh = DBI->connect($dsn,$user,$pass,$dbi_options) or $self->throw($DBI::errstr); $dbh->do("PRAGMA synchronous = OFF;"); # makes writes much faster $dbh->do("PRAGMA temp_store = MEMORY;"); # less disk I/O; some speedup $dbh->do("PRAGMA cache_size = 20000;"); # less disk I/O; some speedup # Keep track of database file location my $cwd = getcwd; my ($db_file) = ($dsn =~ m/(?:db(?:name)?|database)=(.+)$/); $self->{dbh_file} = "$cwd/$db_file"; } $self->{dbh} = $dbh; $self->{fts} = $fts; $self->{is_temp} = $is_temporary; $self->{namespace} = $namespace; $self->{writeable} = $writeable; $self->default_settings; $self->autoindex($autoindex) if defined $autoindex; $self->dumpdir($dump_dir) if $dump_dir; if ($self->is_temp) { $self->init_tmp_database(); } elsif ($create) { $self->init_database('erase'); } } sub table_definitions { my $self = shift; my $defs = { feature => < < < < < < < < <{'fts'}) { delete($defs->{attribute}); } unless ($self->_has_spatial_index) { $defs->{feature_location} = <{interval_stats} = <_create_spatial_index; $self->_create_attribute_fts; $self->SUPER::_init_database(@_); } # FIXME: ensure this works with _create_attribute_fts... sub init_tmp_database { my $self = shift; my $erase = shift; $self->_create_spatial_index; $self->_create_attribute_fts; $self->SUPER::init_tmp_database(@_); } sub _create_spatial_index{ my $self = shift; my $dbh = $self->dbh; local $dbh->{PrintError} = 0; $dbh->do("DROP TABLE IF EXISTS feature_index"); # spatial index if (USE_SPATIAL) { $dbh->do("CREATE VIRTUAL TABLE feature_index USING RTREE(id,seqid,bin,start,end)"); } } sub _create_attribute_fts{ my $self = shift; my $dbh = $self->dbh; if ($self->{'fts'}) { my @fts_versions; for (@fts_versions = grep(/^ENABLE_FTS[0-9]+$/, DBD::SQLite::compile_options)) { s/ENABLE_// } # use the latest supported FTS version. # DBD::SQLite::compile_options appears to be sorted # alphabetically, so this should work through version FTS9. die 'fts not supported by this version of DBD::SQLite' if (!@fts_versions); $dbh->do("DROP TABLE IF EXISTS attribute"); $dbh->do("CREATE VIRTUAL TABLE " . $self->_attribute_table . " USING " . $fts_versions[-1] . "(id, attribute_id, attribute_value)"); } } ### # return 1 if an existing attribute table in the connected database is an FTS # table, else 0 # sub _has_fts { my $self = shift; if (!defined($self->{'has_fts'})) { # If the attribute table is a virtual table, assume it is an FTS # table. Per http://www.sqlite.org/fileformat2.html: # For (sqlite_master) rows that define views, triggers, and virtual # tables, the rootpage column is 0 or NULL. ($self->{'has_fts'}) = $self->dbh->selectrow_array("select count(*) from sqlite_master where type = 'table' and name = '" . $self->_attribute_table . "' and (rootpage = 0 or rootpage is null);"); } return $self->{'has_fts'}; } sub _has_spatial_index { my $self = shift; return $self->{'_has_spatial_index'} if exists $self->{'_has_spatial_index'}; my $dbh = $self->dbh; my ($count) = $dbh->selectrow_array("select count(*) from sqlite_master where name='feature_index'"); return $self->{'_has_spatial_index'} = $count; } sub _finish_bulk_update { my $self = shift; my $dbh = $self->dbh; my $dir = $self->{dumpdir} || '.'; $self->begin_work; # making this a transaction greatly improves performance for my $table ('feature', $self->index_tables) { my $fh = $self->dump_filehandle($table); my $path = $self->dump_path($table); $fh->close; open $fh, '<', $path or $self->throw("Could not read file '$path': $!"); my $qualified_table = $self->_qualify($table); my $sth; if ($table =~ /feature$/) { $sth = $dbh->prepare("REPLACE INTO $qualified_table VALUES (?,?,?,?,?)"); while (<$fh>) { chomp(); my ($id,$typeid,$strand,$indexed,$obj) = split(/\t/); $sth->bind_param(1, $id); $sth->bind_param(2, $typeid); $sth->bind_param(3, $strand); $sth->bind_param(4, $indexed); $sth->bind_param(5, pack('H*',$obj), {TYPE => SQL_BLOB}); $sth->execute(); } } else { my $feature_index = $self->_feature_index_table; if ($table =~ /parent2child$/) { $sth = $dbh->prepare("REPLACE INTO $qualified_table VALUES (?,?)"); } elsif ($table =~ /$feature_index$/) { $sth = $dbh->prepare( $self->_has_spatial_index ?"REPLACE INTO $qualified_table VALUES (?,?,?,?,?)" :"REPLACE INTO $qualified_table (id,seqid,bin,start,end) VALUES (?,?,?,?,?)" ); } else { # attribute or name $sth = $dbh->prepare("REPLACE INTO $qualified_table VALUES (?,?,?)"); } while (<$fh>) { chomp(); $sth->execute(split(/\t/)); } } $fh->close(); unlink $path; } $self->commit; # commit the transaction delete $self->{bulk_update_in_progress}; delete $self->{filehandles}; } sub index_tables { my $self = shift; my @t = $self->SUPER::index_tables; return (@t,$self->_feature_index_table); } sub _enable_keys { } # nullop sub _disable_keys { } # nullop sub _fetch_indexed_features_sql { my $self = shift; my $location_table = $self->_qualify('feature_location'); my $feature_table = $self->_qualify('feature'); return < $end my $reversed; if (defined $start && defined $end && $start > $end) { $reversed++; ($start,$end) = ($end,$start); } $start-- if defined $start; $end-- if defined $end; my $offset1 = $self->_offset_boundary($seqid,$start || 'left'); my $offset2 = $self->_offset_boundary($seqid,$end || 'right'); my $sequence_table = $self->_sequence_table; my $locationlist_table = $self->_locationlist_table; # CROSS JOIN gives a hint to the SQLite query optimizer -- mucho speedup! my $sth = $self->_prepare(<= ? AND offset <= ? ORDER BY offset END my $seq = ''; $sth->execute($seqid,$offset1,$offset2) or $self->throw($sth->errstr); while (my($frag,$offset) = $sth->fetchrow_array) { substr($frag,0,$start-$offset) = '' if defined $start && $start > $offset; $seq .= $frag; } substr($seq,$end-$start+1) = '' if defined $end && $end-$start+1 < length($seq); if ($reversed) { $seq = reverse $seq; $seq =~ tr/gatcGATC/ctagCTAG/; } $sth->finish; $seq; } sub _offset_boundary { my $self = shift; my ($seqid,$position) = @_; my $sequence_table = $self->_sequence_table; my $locationlist_table = $self->_locationlist_table; my $sql; # use "CROSS JOIN" to give a hint to the SQLite query optimizer. $sql = $position eq 'left' ? "SELECT min(offset) FROM $locationlist_table as ll CROSS JOIN $sequence_table as s ON ll.id=s.id WHERE ll.seqname=?" :$position eq 'right' ? "SELECT max(offset) FROM $locationlist_table as ll CROSS JOIN $sequence_table as s ON ll.id=s.id WHERE ll.seqname=?" :"SELECT max(offset) FROM $locationlist_table as ll CROSS JOIN $sequence_table as s ON ll.id=s.id WHERE ll.seqname=? AND offset<=?"; my $sth = $self->_prepare($sql); my @args = $position =~ /^-?\d+$/ ? ($seqid,$position) : ($seqid); $sth->execute(@args) or $self->throw($sth->errstr); my $boundary = $sth->fetchall_arrayref->[0][0]; $sth->finish; return $boundary; } ### # Efficiently fetch a series of IDs from the database # Can pass an array or an array ref # sub _fetch_many { my $self = shift; @_ or $self->throw('usage: fetch_many($id1,$id2,$id3...)'); my $ids = join ',',map {ref($_) ? @$_ : $_} @_ or return; my $features = $self->_feature_table; my $sth = $self->_prepare(<execute() or $self->throw($sth->errstr); return $self->_sth2objs($sth); } sub _features { my $self = shift; my ($seq_id,$start,$end,$strand, $name,$class,$allow_aliases, $types, $attributes, $range_type, $fromtable, $iterator, $sources ) = rearrange([['SEQID','SEQ_ID','REF'],'START',['STOP','END'],'STRAND', 'NAME','CLASS','ALIASES', ['TYPES','TYPE','PRIMARY_TAG'], ['ATTRIBUTES','ATTRIBUTE'], 'RANGE_TYPE', 'FROM_TABLE', 'ITERATOR', ['SOURCE','SOURCES'] ],@_); my (@from,@where,@args,@group); $range_type ||= 'overlaps'; my $feature_table = $self->_feature_table; @from = "$feature_table as f"; if (defined $name) { # hacky backward compatibility workaround undef $class if $class && $class eq 'Sequence'; $name = "$class:$name" if defined $class && length $class > 0; # last argument is the join field my ($from,$where,$group,@a) = $self->_name_sql($name,$allow_aliases,'f.id'); push @from,$from if $from; push @where,$where if $where; push @group,$group if $group; push @args,@a; } if (defined $seq_id) { # last argument is the name of the features table my ($from,$where,$group,@a) = $self->_location_sql($seq_id,$start,$end,$range_type,$strand,'f'); push @from,$from if $from; push @where,$where if $where; push @group,$group if $group; push @args,@a; } if (defined($sources)) { my @sources = ref($sources) eq 'ARRAY' ? @{$sources} : ($sources); if (defined($types)) { my @types = ref($types) eq 'ARRAY' ? @{$types} : ($types); my @final_types; foreach my $type (@types) { # *** not sure what to do if user supplies both -source and -type # where the type includes a source! if ($type =~ /:/) { push(@final_types, $type); } else { foreach my $source (@sources) { push(@final_types, $type.':'.$source); } } } $types = \@final_types; } else { $types = [map { ':'.$_ } @sources]; } } if (defined($types)) { # last argument is the name of the features table my ($from,$where,$group,@a) = $self->_types_sql($types,'f'); push @from,$from if $from; push @where,$where if $where; push @group,$group if $group; push @args,@a; } if (defined $attributes) { # last argument is the join field my ($from,$where,$group,@a) = $self->_attributes_sql($attributes,'f.id'); push @from,$from if $from; push @where,$where if $where; push @group,$group if $group; push @args,@a; } if (defined $fromtable) { # last argument is the join field my ($from,$where,$group,@a) = $self->_from_table_sql($fromtable,'f.id'); push @from,$from if $from; push @where,$where if $where; push @group,$group if $group; push @args,@a; } # if no other criteria are specified, then # only fetch indexed (i.e. top level objects) @where = '"indexed"=1' unless @where; my $from = join ', ',@from; my $where = join ' AND ',map {"($_)"} @where; my $group = join ', ',@group; $group = "GROUP BY $group" if @group; my $query = <_print_query($query,@args) if DEBUG || $self->debug; my $sth = $self->_prepare($query); $sth->execute(@args) or $self->throw($sth->errstr); return $iterator ? Bio::DB::SeqFeature::Store::DBI::Iterator->new($sth,$self) : $self->_sth2objs($sth); } sub _make_attribute_group { my $self = shift; my ($table_name,$attributes) = @_; my $key_count = keys %$attributes or return; my $count = $key_count-1; return "f.id HAVING count(f.id)>$count"; } sub _location_sql { my $self = shift; my ($seq_id,$start,$end,$range_type,$strand,$location) = @_; # the additional join on the location_list table badly impacts performance # so we build a copy of the table in memory my $seqid = $self->_locationid_nocreate($seq_id) || 0; # zero is an invalid primary ID, so will return empty my $feature_index = $self->_feature_index_table; my $from = "$feature_index as fi"; my ($bin_where,@bin_args); if (defined $start && defined $end && !$self->_has_spatial_index) { my @bins = $self->search_bins($start,$end); $bin_where = ' AND bin in ('.join(',',@bins).')'; } $start = MIN_INT unless defined $start; $end = MAX_INT unless defined $end; my ($range,@range_args); if ($range_type eq 'overlaps') { $range = "fi.end>=? AND fi.start<=?".$bin_where; @range_args = ($start,$end,@bin_args); } elsif ($range_type eq 'contains') { $range = "fi.start>=? AND fi.end<=?".$bin_where; @range_args = ($start,$end,@bin_args); } elsif ($range_type eq 'contained_in') { $range = "fi.start<=? AND fi.end>=?"; @range_args = ($start,$end); } else { $self->throw("range_type must be one of 'overlaps', 'contains' or 'contained_in'"); } if (defined $strand) { $range .= " AND strand=?"; push @range_args,$strand; } my $where = <_has_spatial_index ? $self->_qualify('feature_index') : $self->_qualify('feature_location') } # Do a case-insensitive search a la the PostgreSQL adaptor sub _name_sql { my $self = shift; my ($name,$allow_aliases,$join) = @_; my $name_table = $self->_name_table; my $from = "$name_table as n"; my ($match,$string) = $self->_match_sql($name); my $where = "n.id=$join AND n.name $match COLLATE NOCASE"; $where .= " AND n.display_name>0" unless $allow_aliases; return ($from,$where,'',$string); } sub _search_attributes { my $self = shift; my ($search_string,$attribute_names,$limit) = @_; my @words = map {quotemeta($_)} split /\s+/,$search_string; my $name_table = $self->_name_table; my $attribute_table = $self->_attribute_table; my $attributelist_table = $self->_attributelist_table; my $type_table = $self->_type_table; my $typelist_table = $self->_typelist_table; my $has_fts = $self->_has_fts; my @tags = @$attribute_names; my $tag_sql = join ' OR ',("al.tag=?") x @tags; my $perl_regexp = join '|',@words; my $sql_regexp; my @wild_card_words; if ($has_fts) { $sql_regexp = "a.attribute_value MATCH ?"; @wild_card_words = join(' OR ', @words); } else { $sql_regexp = join ' OR ',("a.attribute_value LIKE ?") x @words; @wild_card_words = map { "%$_%" } @words; } # CROSS JOIN hinders performance with FTS attribute table for DBD::SQLite 1.42 my $sql = <_print_query($sql,@tags,@wild_card_words) if DEBUG || $self->debug; my $sth = $self->_prepare($sql); $sth->execute(@tags, @wild_card_words) or $self->throw($sth->errstr); my @results; while (my($name,$value,$type,$id) = $sth->fetchrow_array) { my (@hits) = $value =~ /$perl_regexp/ig; my @words_in_row = split /\b/,$value; my $score = int(@hits*100/@words/@words_in_row); push @results,[$name,$value,$score,$type,$id]; } $sth->finish; @results = sort {$b->[2]<=>$a->[2]} @results; return @results; } sub _match_sql { my $self = shift; my $name = shift; my ($match,$string); if ($name =~ /(?:^|[^\\])[*?]/) { $name =~ s/(^|[^\\])([%_])/$1\\$2/g; $name =~ s/(^|[^\\])\*/$1%/g; $name =~ s/(^|[^\\])\?/$1_/g; $match = "LIKE ?"; $string = $name; } else { $match = "= ? COLLATE NOCASE"; $string = $name; } return ($match,$string); } sub _attributes_sql { my $self = shift; my ($attributes,$join) = @_; my ($wf,@bind_args) = $self->_make_attribute_where('a','al',$attributes); my ($group_by,@group_args)= $self->_make_attribute_group('a',$attributes); my $attribute_table = $self->_attribute_table; my $attributelist_table = $self->_attributelist_table; my $from = "$attribute_table AS a" . ($self->_has_fts ? '' : " INDEXED BY index_attribute_id") . ", $attributelist_table AS al"; my $a_al_join = $self->_has_fts ? 'a.attribute_id MATCH al.id' : 'a.attribute_id=al.id'; my $where = <_typelist_table; my $from = "$typelist AS tl"; my (@matches,@args); for my $type (@types) { if (ref $type && $type->isa('Bio::DB::GFF::Typename')) { $primary_tag = $type->method; $source_tag = $type->source; } else { ($primary_tag,$source_tag) = split ':',$type,2; } if (length $source_tag) { push @matches,"tl.tag=? COLLATE NOCASE"; push @args,"$primary_tag:$source_tag"; } else { push @matches,"tl.tag LIKE ?"; push @args,"$primary_tag:%"; } } my $matches = join ' OR ',@matches; my $where = <dbh->do("ANALYZE $_") foreach $self->index_tables; } ### # Replace Bio::SeqFeatureI into database. # sub replace { my $self = shift; my $object = shift; my $index_flag = shift || undef; # ?? shouldn't need to do this # $self->_load_class($object); my $id = $object->primary_id; my $features = $self->_feature_table; my $sth = $self->_prepare(<_get_location_and_bin($object) : (undef)x6; my $primary_tag = $object->primary_tag; my $source_tag = $object->source_tag || ''; $primary_tag .= ":$source_tag"; my $typeid = $self->_typeid($primary_tag,1); my $frozen = $self->no_blobs() ? 0 : $self->freeze($object); $sth->bind_param(1, $id); $sth->bind_param(2, $frozen, {TYPE => SQL_BLOB}); $sth->bind_param(3, $index_flag||0); $sth->bind_param(4, $strand); $sth->bind_param(5, $typeid); $sth->execute() or $self->throw($sth->errstr); my $dbh = $self->dbh; $object->primary_id($dbh->func('last_insert_rowid')) unless defined $id; $self->flag_for_indexing($dbh->func('last_insert_rowid')) if $self->{bulk_update_in_progress}; } # doesn't work with this schema, since we have to update name and attribute # tables which need object ids, which we can only know by replacing feats in # the feature table one by one sub bulk_replace { my $self = shift; my $index_flag = shift || undef; my @objects = @_; my $features = $self->_feature_table; my @insert_values; foreach my $object (@objects) { my $id = $object->primary_id; my (undef,undef,undef,$strand) = $index_flag ? $self->_get_location_and_bin($object) : (undef)x4; my $primary_tag = $object->primary_tag; my $source_tag = $object->source_tag || ''; $primary_tag .= ":$source_tag"; my $typeid = $self->_typeid($primary_tag,1); push(@insert_values, ($id,0,$index_flag||0,$strand,$typeid)); } my @value_blocks; for (1..@objects) { push(@value_blocks, '(?,?,?,?,?)'); } my $value_blocks = join(',', @value_blocks); my $sql = qq{REPLACE INTO $features (id,object,"indexed",strand,typeid) VALUES $value_blocks}; my $sth = $self->_prepare($sql); $sth->execute(@insert_values) or $self->throw($sth->errstr); } sub _get_location_and_bin { my $self = shift; my $obj = shift; my $seqid = $self->_locationid($obj->seq_id||''); my $start = $obj->start; my $end = $obj->end; my $strand = $obj->strand; return ($seqid,$start,$end,$strand,$self->calculate_bin($start,$end)); } ### # Insert one Bio::SeqFeatureI into database. primary_id must be undef # sub insert { my $self = shift; my $object = shift; my $index_flag = shift || 0; $self->_load_class($object); defined $object->primary_id and $self->throw("$object already has a primary id"); my $features = $self->_feature_table; my $sth = $self->_prepare(<execute(undef,$self->freeze($object),$index_flag) or $self->throw($sth->errstr); my $dbh = $self->dbh; $object->primary_id($dbh->func('last_insert_rowid')); $self->flag_for_indexing($dbh->func('last_insert_rowid')) if $self->{bulk_update_in_progress}; } =head2 toplevel_types Title : toplevel_types Usage : @type_list = $db->toplevel_types Function: Get the toplevel types in the database Returns : array of Bio::DB::GFF::Typename objects Args : none Status : public This is similar to types() but only returns the types of INDEXED (toplevel) features. =cut sub toplevel_types { my $self = shift; eval "require Bio::DB::GFF::Typename" unless Bio::DB::GFF::Typename->can('new'); my $typelist_table = $self->_typelist_table; my $feature_table = $self->_feature_table; my $sql = <_print_query($sql) if DEBUG || $self->debug; my $sth = $self->_prepare($sql); $sth->execute() or $self->throw($sth->errstr); my @results; while (my($tag) = $sth->fetchrow_array) { push @results,Bio::DB::GFF::Typename->new($tag); } $sth->finish; return @results; } sub _genericid { my $self = shift; my ($table,$namefield,$name,$add_if_missing) = @_; my $qualified_table = $self->_qualify($table); my $sth = $self->_prepare(<execute($name) or die $sth->errstr; my ($id) = $sth->fetchrow_array; $sth->finish; return $id if defined $id; return unless $add_if_missing; $sth = $self->_prepare(<execute($name) or die $sth->errstr; my $dbh = $self->dbh; return $dbh->func('last_insert_rowid'); } ### # special-purpose store for bulk loading - write to a file rather than to the db # sub _dump_store { my $self = shift; my $indexed = shift; my $count = 0; my $store_fh = $self->dump_filehandle('feature'); my $dbh = $self->dbh; my $autoindex = $self->autoindex; for my $obj (@_) { my $id = $self->next_id; my ($seqid,$start,$end,$strand) = $indexed ? $self->_get_location_and_bin($obj) : (undef)x4; my $primary_tag = $obj->primary_tag; my $source_tag = $obj->source_tag || ''; $primary_tag .= ":$source_tag"; my $typeid = $self->_typeid($primary_tag,1); # Encode BLOB in hex so we can more easily import it into SQLite print $store_fh join("\t",$id,$typeid,$strand,$indexed, unpack('H*', $self->freeze($obj))),"\n"; $obj->primary_id($id); $self->_update_indexes($obj) if $indexed && $autoindex; $count++; } # remember whether we are have ever stored a non-indexed feature unless ($indexed or $self->{indexed_flag}++) { $self->subfeatures_are_indexed(0); } $count; } sub _dump_update_name_index { my $self = shift; my ($obj,$id) = @_; my $fh = $self->dump_filehandle('name'); my $dbh = $self->dbh; my ($names,$aliases) = $self->feature_names($obj); # unlike DBI::mysql, don't quote, as quotes will be quoted when loaded print $fh join("\t",$id,$_,1),"\n" foreach @$names; print $fh join("\t",$id,$_,0),"\n" foreach @$aliases; } sub _update_name_index { my $self = shift; my ($obj,$id) = @_; my $name = $self->_name_table; my $primary_id = $obj->primary_id; $self->_delete_index($name,$id); my ($names,$aliases) = $self->feature_names($obj); my $sth = $self->_prepare("INSERT INTO $name (id,name,display_name) VALUES (?,?,?)"); $sth->execute($id,$_,1) or $self->throw($sth->errstr) foreach @$names; $sth->execute($id,$_,0) or $self->throw($sth->errstr) foreach @$aliases; $sth->finish; } sub _dump_update_attribute_index { my $self = shift; my ($obj,$id) = @_; my $fh = $self->dump_filehandle('attribute'); my $dbh = $self->dbh; for my $tag ($obj->all_tags) { my $tagid = $self->_attributeid($tag); for my $value ($obj->each_tag_value($tag)) { # unlike DBI::mysql, don't quote, as quotes will be quoted when loaded print $fh join("\t",$id,$tagid,$value),"\n"; } } } sub _update_indexes { my $self = shift; my $obj = shift; defined (my $id = $obj->primary_id) or return; $self->SUPER::_update_indexes($obj); if ($self->{bulk_update_in_progress}) { $self->_dump_update_location_index($obj,$id); } else { $self->_update_location_index($obj,$id); } } sub _update_location_index { my $self = shift; my ($obj,$id) = @_; my ($seqid,$start,$end,$strand,$bin) = $self->_get_location_and_bin($obj); my $table = $self->_feature_index_table; $self->_delete_index($table,$id); my ($sql,@args); if ($self->_has_spatial_index) { $sql = "INSERT INTO $table (id,seqid,bin,start,end) values (?,?,?,?,?)"; @args = ($id,$seqid,$bin,$start,$end); } else { $sql = "INSERT INTO $table (id,seqid,bin,start,end) values (?,?,?,?,?)"; @args = ($id,$seqid,$bin,$start,$end); } my $sth = $self->_prepare($sql); $sth->execute(@args); $sth->finish; } sub _dump_update_location_index { my $self = shift; my ($obj,$id) = @_; my $table = $self->_feature_index_table; my $fh = $self->dump_filehandle($table); my $dbh = $self->dbh; my ($seqid,$start,$end,$strand,$bin) = $self->_get_location_and_bin($obj); my @args = $self->_has_spatial_index ? ($id,$seqid,$bin,$start,$end) : ($id,$seqid,$bin,$start,$end); print $fh join("\t",@args),"\n"; } sub DESTROY { my $self = shift; # Remove filehandles, so temporal files can be properly deleted if (%DBI::installed_drh) { DBI->disconnect_all; %DBI::installed_drh = (); } undef $self->{dbh}; } 1; =head1 AUTHOR Nathan Weeks - Nathan.Weeks@ars.usda.gov Copyright (c) 2009 Nathan Weeks Modified 2010 to support cumulative statistics by Lincoln Stein . This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See the Bioperl license for more details. =cut NormalizedFeature.pm100644000766000024 5360713605523026 24535 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeaturepackage Bio::DB::SeqFeature::NormalizedFeature; $Bio::DB::SeqFeature::NormalizedFeature::VERSION = '1.7.4'; =head1 NAME Bio::DB::SeqFeature::NormalizedFeature -- Normalized feature for use with Bio::DB::SeqFeature::Store =head1 SYNOPSIS use Bio::DB::SeqFeature::Store; # Open the sequence database my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'DBI::mysql', -dsn => 'dbi:mysql:test'); my ($feature) = $db->get_features_by_name('ZK909'); my @subfeatures = $feature->get_SeqFeatures(); my @exons_only = $feature->get_SeqFeatures('exon'); # create a new object $db->seqfeature_class('Bio::DB::SeqFeature::NormalizedFeature'); my $new = $db->new_feature(-primary_tag=>'gene', -seq_id => 'chr3', -start => 10000, -end => 11000); # add a new exon $feature->add_SeqFeature($db->new_feature(-primary_tag=>'exon', -seq_id => 'chr3', -start => 5000, -end => 5551)); =head1 DESCRIPTION The Bio::DB::SeqFeature::NormalizedFeature object is an alternative representation of SeqFeatures for use with Bio::DB::SeqFeature::Store database system. It is identical to Bio::DB::SeqFeature, except that instead of storing feature/subfeature relationships in a database table, the information is stored in the object itself. This actually makes the objects somewhat inconvenient to work with from SQL, but does speed up access somewhat. To use this class, pass the name of the class to the Bio::DB::SeqFeature::Store object's seqfeature_class() method. After this, $db-Enew_feature() will create objects of type Bio::DB::SeqFeature::NormalizedFeature. If you are using the GFF3 loader, pass Bio::DB::SeqFeature::Store::GFF3Loader-Enew() the -seqfeature_class argument: use Bio::DB::SeqFeature::Store::GFF3Loader; my $store = connect_to_db_somehow(); my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new( -store=>$db, -seqfeature_class => 'Bio::DB::SeqFeature::NormalizedFeature' ); =cut use strict; use Carp 'croak'; use base 'Bio::SeqFeature::Lite'; use base 'Bio::DB::SeqFeature::NormalizedFeatureI'; use overload '""' => \&as_string, eq => \&eq, ne => \&ne, fallback => 1; use vars '$AUTOLOAD'; my $USE_OVERLOADED_NAMES = 1; # some of this is my fault and some of it is changing bioperl API *get_all_SeqFeatures = *sub_SeqFeature = *merged_segments = \&segments; ##### CLASS METHODS #### =head2 new Title : new Usage : $feature = Bio::DB::SeqFeature::NormalizedFeature->new(@args) Function: create a new feature Returns : the new seqfeature Args : see below Status : public This method creates and, if possible stores into a database, a new Bio::DB::SeqFeature::NormalizedFeature object using the specialized Bio::DB::SeqFeature class. The arguments are the same to Bio::SeqFeature::Generic-Enew() and Bio::Graphics::Feature-Enew(). The most important difference is the B<-store> option, which if present creates the object in a Bio::DB::SeqFeature::Store database, and he B<-index> option, which controls whether the feature will be indexed for retrieval (default is true). Ordinarily, you would only want to turn indexing on when creating top level features, and off only when storing subfeatures. The default is on. Arguments are as follows: -seq_id the reference sequence -start the start position of the feature -end the stop position of the feature -display_name the feature name (returned by seqname) -primary_tag the feature type (returned by primary_tag) -source the source tag -score the feature score (for GFF compatibility) -desc a description of the feature -segments a list of subfeatures (see Bio::Graphics::Feature) -subtype the type to use when creating subfeatures -strand the strand of the feature (one of -1, 0 or +1) -phase the phase of the feature (0..2) -url a URL to link to when rendered with Bio::Graphics -attributes a hashref of tag value attributes, in which the key is the tag and the value is an array reference of values -store a previously-opened Bio::DB::SeqFeature::Store object -index index this feature if true Aliases: -id an alias for -display_name -seqname an alias for -display_name -display_id an alias for -display_name -name an alias for -display_name -stop an alias for end -type an alias for primary_tag =cut sub new { my $class = shift; my %args = @_; my $db = $args{-store} || $args{-factory}; my $index = exists $args{-index} ? $args{-index} : 1; my $self = $class->SUPER::new(@_); if ($db) { if ($index) { $db->store($self); # this will set the primary_id } else { $db->store_noindex($self); # this will set the primary_id } $self->object_store($db); } $self; } =head2 Bio::SeqFeatureI methods The following Bio::SeqFeatureI methods are supported: seq_id(), start(), end(), strand(), get_SeqFeatures(), display_name(), primary_tag(), source_tag(), seq(), location(), primary_id(), overlaps(), contains(), equals(), intersection(), union(), has_tag(), remove_tag(), add_tag_value(), get_tag_values(), get_all_tags() Some methods that do not make sense in the context of a genome annotation database system, such as attach_seq(), are not supported. Please see L for more details. =cut sub seq { my $self = shift; require Bio::PrimarySeq unless Bio::PrimarySeq->can('new'); my ($start,$end) = ($self->start,$self->end); if ($self->strand < 0) { ($start,$end) = ($end,$start); } if (my $store = $self->object_store) { return Bio::PrimarySeq->new(-seq => $store->fetch_sequence($self->seq_id,$start,$end) || '', -id => $self->display_name); } else { return $self->SUPER::seq($self->seq_id,$start,$end); } } sub subseq { my $self = shift; my ($newstart,$newstop) = @_; my $store = $self->object_store or return; my ($start,$stop) = ($self->start+$newstart-1,$self->end+$newstop-1); if ($self->strand < 0) { ($start,$stop) = ($stop,$start); } my $seq = $store->fetch_sequence($self->seq_id,$start,$stop); return Bio::PrimarySeq->new($seq); } =head2 add_SeqFeature Title : add_SeqFeature Usage : $flag = $feature->add_SeqFeature(@features) Function: Add subfeatures to the feature Returns : true if successful Args : list of Bio::SeqFeatureI objects Status : public Add one or more subfeatures to the feature. For best results, subfeatures should be of the same class as the parent feature (i.e. don't try mixing Bio::DB::SeqFeature::NormalizedFeature with other feature types). An alias for this method is add_segment(). =cut sub add_SeqFeature { my $self = shift; $self->_add_segment(1,@_); } =head2 update Title : update Usage : $flag = $feature->update() Function: Update feature in the database Returns : true if successful Args : none Status : public After changing any fields in the feature, call update() to write it to the database. This is not needed for add_SeqFeature() as update() is invoked automatically. =cut sub update { my $self = shift; my $store = $self->object_store or return; $store->store($self); } =head2 get_SeqFeatures Title : get_SeqFeature Usage : @subfeatures = $feature->get_SeqFeatures([@types]) Function: return subfeatures of this feature Returns : list of subfeatures Args : list of subfeature primary_tags (optional) Status : public This method extends the Bio::SeqFeatureI get_SeqFeatures() slightly by allowing you to pass a list of primary_tags, in which case only subfeatures whose primary_tag is contained on the list will be returned. Without any types passed all subfeatures are returned. =cut # segments can be either normalized IDs or ordinary feature objects sub get_SeqFeatures { my $self = shift; my @types = @_; my $s = $self->{segments} or return; my $store = $self->object_store; my (@ordinary,@ids); for (@$s) { if (ref ($_)) { push @ordinary,$_; } else { push @ids,$_; } } my @r = grep {$_->type_match(@types)} (@ordinary,$store->fetch_many(\@ids)); for my $r (@r) { eval {$r->object_store($store) }; } return @r; } =head2 object_store Title : object_store Usage : $store = $feature->object_store([$new_store]) Function: get or set the database handle Returns : current database handle Args : new database handle (optional) Status : public This method will get or set the Bio::DB::SeqFeature::Store object that is associated with the feature. After changing the store, you should probably unset the feature's primary_id() and call update() to ensure that the object is written into the database as a new feature. =cut sub object_store { my $self = shift; my $d = $self->{store}; $self->{store} = shift if @_; $d; } =head2 overloaded_names Title : overloaded_names Usage : $overload = $feature->overloaded_names([$new_overload]) Function: get or set overloading of object strings Returns : current flag Args : new flag (optional) Status : public For convenience, when objects of this class are stringified, they are represented in the form "primary_tag(display_name)". To turn this feature off, call overloaded_names() with a false value. You can invoke this on an individual feature object or on the class: Bio::DB::SeqFeature::NormalizedFeature->overloaded_names(0); =cut sub overloaded_names { my $class = shift; my $d = $USE_OVERLOADED_NAMES; $USE_OVERLOADED_NAMES = shift if @_; $d; } =head2 segment Title : segment Usage : $segment = $feature->segment Function: return a Segment object corresponding to feature Returns : a Bio::DB::SeqFeature::Segment Args : none Status : public This turns the feature into a Bio::DB::SeqFeature::Segment object, which you can then use to query for overlapping features. See L. =cut sub segment { my $self = shift; return Bio::DB::SeqFeature::Segment->new($self); } ### instance methods =head2 AUTOLOADED methods @subfeatures = $feature->Exon; If you use an unknown method that begins with a capital letter, then the feature autogenerates a call to get_SeqFeatures() using the lower-cased method name as the primary_tag. In other words $feature-EExon is equivalent to: @subfeature s= $feature->get_SeqFeatures('exon') If you use an unknown method that begins with Tag_(tagname), Att_(tagname) Is_(tagname), then it will be the same as calling the each_tag_value() method with the tagname. In a list context, these autogenerated procedures return the list of results. In scalar context, they return the first item in the list!! =cut sub AUTOLOAD { my($pack,$func_name) = $AUTOLOAD=~/(.+)::([^:]+)$/; my $sub = $AUTOLOAD; my $self = $_[0]; # ignore DESTROY calls return if $func_name eq 'DESTROY'; # call attributes if func_name begins with "Tag_" or "Att_": if ($func_name =~ /^(Tag|Att|Is)_(\w+)/) { my @result = $self->each_tag_value($2); return wantarray ? @result : $result[0]; } # fetch subfeatures if func_name has an initial cap if ($func_name =~ /^[A-Z]/) { return $self->get_SeqFeatures(lc $func_name); } # error message of last resort $self->throw(qq(Can't locate object method "$func_name" via package "$pack")); }#' sub add_segment { my $self = shift; $self->_add_segment(0,@_); } # This adds subfeatures. It has the property of converting the # provided features into an object like itself and storing them # into the database. If the feature already has a primary id and # an object_store() method, then it is not stored into the database, # but its primary id is reused. sub _add_segment { my $self = shift; my $normalized = shift; my $store = $self->object_store; my @segments = $self->_create_subfeatures($normalized,@_); # fix boundaries $self->_fix_boundaries(\@segments); # freakish fixing of our non-standard Target attribute $self->_fix_target(\@segments); for my $seg (@segments) { my $id = $normalized ? $seg->primary_id : $seg; defined $id or $self->throw("No primary ID when there should be"); push @{$self->{segments}},$id; }; $self->update if $self->primary_id; # write us back to disk } sub _fix_boundaries { my $self = shift; my $segments = shift; my $normalized = shift; my $min_start = $self->start || 999_999_999_999; my $max_stop = $self->end || -999_999_999_999; for my $seg (@$segments) { $min_start = $seg->start if $seg->start < $min_start; $max_stop = $seg->end if $seg->end > $max_stop; } # adjust our boundaries, etc. $self->start($min_start) if $min_start < $self->start; $self->end($max_stop) if $max_stop > $self->end; $self->{ref} ||= $segments->[0]->seq_id; $self->{strand} ||= $segments->[0]->strand; } sub _fix_target { my $self = shift; my $segs = shift; my $normalized = shift; # ignored for now # freakish fixing of our non-standard Target attribute if (my $t = ($self->attributes('Target'))[0]) { my ($seqid,$tstart,$tend,$strand) = split /\s+/,$t; if (defined $tstart && defined $tend) { my $min_tstart = $tstart; my $max_tend = $tend; for my $seg (@$segs) { my $st = ($seg->attributes('Target'))[0] or next; (undef,$tstart,$tend) = split /\s+/,$st; next unless defined $tstart && defined $tend; $min_tstart = $tstart if $tstart < $min_tstart; $max_tend = $tend if $tend > $max_tend; } if ($min_tstart < $tstart or $max_tend > $tend) { $self->{attributes}{Target}[0] = join ' ',($seqid,$min_tstart,$max_tend,$strand||''); } } } } # undo the load_id and Target hacks on the way out sub format_attributes { my $self = shift; my $parent = shift; my $fallback_id = shift; my $load_id = $self->load_id || ''; my $targobj = ($self->attributes('Target'))[0]; # was getting an 'Use of uninitialized value with split' here, changed to cooperate -cjf 7/10/07 my ($target) = $targobj ? split /\s+/,($self->attributes('Target'))[0] : (''); my @tags = $self->all_tags; my @result; for my $t (@tags) { my @values = $self->each_tag_value($t); # This line prevents Alias from showing up if it matches the load id, but this is not good # @values = grep {$_ ne $load_id && $_ ne $target} @values if $t eq 'Alias'; # these are hacks, which we don't want to appear in the file next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace push @result,join '=',$self->escape($t),join(',', map {$self->escape($_)} @values) if @values; } my $id = $self->primary_id || $fallback_id; my $parent_id; if (@$parent) { $parent_id = join (',',map {$self->escape($_)} @$parent); } my $name = $self->display_name; unshift @result,"ID=".$self->escape($id) if defined $id; unshift @result,"Parent=".$parent_id if defined $parent_id; unshift @result,"Name=".$self->escape($name) if defined $name; return join ';',@result; } sub _create_subfeatures { my $self = shift; my $normalized = shift; my $type = $self->{subtype} || $self->{type}; my $ref = $self->seq_id; my $name = $self->name; my $class = $self->class; my $store = $self->object_store; my $source = $self->source; if ($normalized) { $store or $self->throw("Feature must be associated with a Bio::DB::SeqFeature::Store database before attempting to add subfeatures to a normalized object"); } my $index_subfeatures_policy = eval{$store->index_subfeatures}; my @segments; for my $seg (@_) { if (UNIVERSAL::isa($seg,ref $self)) { if (!$normalized) { # make sure the object has no lazy behavior $seg->primary_id(undef); $seg->object_store(undef); } push @segments,$seg; } elsif (ref($seg) eq 'ARRAY') { my ($start,$stop) = @{$seg}; next unless defined $start && defined $stop; # fixes an obscure bug somewhere above us my $strand = $self->{strand}; if ($start > $stop) { ($start,$stop) = ($stop,$start); $strand = -1; } push @segments,$self->new(-start => $start, -stop => $stop, -strand => $strand, -ref => $ref, -type => $type, -name => $name, -class => $class, -source => $source, ); } elsif (UNIVERSAL::isa($seg,'Bio::SeqFeatureI')) { my $score = $seg->score if $seg->can('score'); my $f = $self->new(-start => $seg->start, -end => $seg->end, -strand => $seg->strand, -seq_id => $seg->seq_id, -name => $seg->display_name, -primary_tag => $seg->primary_tag, -source_tag => $seg->source, -score => $score, -source => $source, ); for my $tag ($seg->get_all_tags) { my @values = $seg->get_tag_values($tag); $f->{attributes}{$tag} = \@values; } push @segments,$f; } else { croak "$seg is neither a Bio::SeqFeatureI object nor an arrayref"; } } return unless @segments; if ($normalized && $store) { # parent/child data is going to be stored in the database my @need_loading = grep {!defined $_->primary_id || $_->object_store ne $store} @segments; if (@need_loading) { my $result; if ($index_subfeatures_policy) { $result = $store->store(@need_loading); } else { $result = $store->store_noindex(@need_loading); } $result or croak "Couldn't store one or more subseqfeatures"; } } return @segments; } =head2 load_id Title : load_id Usage : $id = $feature->load_id Function: get the GFF3 load ID Returns : the GFF3 load ID (string) Args : none Status : public For features that were originally loaded by the GFF3 loader, this method returns the GFF3 load ID. This method may not be supported in future versions of the module. =cut sub load_id { return (shift->attributes('load_id'))[0]; } =head2 notes Title : notes Usage : @notes = $feature->notes Function: get contents of the GFF3 Note tag Returns : List of GFF3 Note tags Args : none Status : public For features that were originally loaded by the GFF3 loader, this method returns the contents of the Note tag as a list. This is a convenience for Bio::Graphics, which looks for notes() when it constructs a default description line. =cut sub notes { return shift->attributes('Note'); } =head2 primary_id Title : primary_id Usage : $id = $feature->primary_id([$new_id]) Function: get/set the feature's database ID Returns : the current primary ID Args : none Status : public This method gets or sets the primary ID of the feature in the underlying Bio::DB::SeqFeature::Store database. If you change this field and then call update(), it will have the effect of making a copy of the feature in the database under a new ID. =cut sub primary_id { my $self = shift; my $d = $self->{primary_id}; $self->{primary_id} = shift if @_; $d; } =head2 target Title : target Usage : $segment = $feature->target Function: return the segment correspondent to the "Target" attribute Returns : a Bio::DB::SeqFeature::Segment object Args : none Status : public For features that are aligned with others via the GFF3 Target attribute, this returns a segment corresponding to the aligned region. The CIGAR gap string is not yet supported. =cut sub target { my $self = shift; my @targets = $self->attributes('Target'); my @result; for my $t (@targets) { my ($seqid,$start,$end,$strand) = split /\s+/,$t; $strand ||= ''; $strand = $strand eq '+' ? 1 : $strand eq '-' ? -1 : 0; push @result,Bio::DB::SeqFeature::Segment->new($self->object_store, $seqid, $start, $end, $strand); } return wantarray ? @result : $result[0]; } =head2 Internal methods =over 4 =item $feature-Eas_string() Internal method used to implement overloaded stringification. =item $boolean = $feature-Etype_match(@list_of_types) Internal method that will return true if the feature's primary_tag and source_tag match any of the list of types (in primary_tag:source_tag format) provided. =back =cut sub as_string { my $self = shift; return overload::StrVal($self) unless $self->overloaded_names; my $name = $self->display_name || $self->load_id; $name ||= "id=".$self->primary_id if $self->primary_id; $name ||= ""; my $method = $self->primary_tag; my $source= $self->source_tag; my $type = $source ? "$method:$source" : $method; return "$type($name)"; } sub eq { my $self = shift; my $b = shift; my $store1 = $self->object_store; my $store2 = eval {$b->object_store} || ''; return $store1 eq $store2 && $self->primary_id eq $b->primary_id; } sub ne { my $self = shift; return !$self->eq(shift); } # completely case insensitive sub type_match { my $self = shift; my @types = @_; my $method = lc $self->primary_tag; my $source = lc $self->source_tag; for my $t (@types) { my ($m,$s) = map {lc $_} split /:/,$t; return 1 if $method eq $m && (!defined $s || $source eq $s); } return; } sub segments { shift->get_SeqFeatures(@_) } 1; __END__ =head1 BUGS This is an early version, so there are certainly some bugs. Please use the BioPerl bug tracking system to report bugs. =head1 SEE ALSO L, L, L, L, L, L, L =head1 AUTHOR Lincoln Stein Elstein@cshl.orgE. Copyright (c) 2006 Cold Spring Harbor Laboratory. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut berkeleydb3.pm100644000766000024 5120013605523026 24367 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeature/Storepackage Bio::DB::SeqFeature::Store::berkeleydb3; $Bio::DB::SeqFeature::Store::berkeleydb3::VERSION = '1.7.4'; # $Id: berkeleydb3.pm 15987 2009-08-18 21:08:55Z lstein $ # faster implementation of berkeleydb =head1 NAME Bio::DB::SeqFeature::Store::berkeleydb3 -- Storage and retrieval of sequence annotation data in Berkeleydb files =head1 SYNOPSIS # Create a feature database from scratch $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'berkeleydb', -dsn => '/var/databases/fly4.3', -create => 1); # get a feature from somewhere my $feature = Bio::SeqFeature::Generic->new(...); # store it $db->store($feature) or die "Couldn't store!"; =head1 DESCRIPTION This is a faster version of the berkeleydb storage adaptor for Bio::DB::SeqFeature::Store. It is used automatically when you create a new database with the original berkeleydb adaptor. When opening a database created under the original adaptor, the old code is used for backward compatibility. Please see L for full usage instructions. =head1 BUGS This is an early version, so there are certainly some bugs. Please use the BioPerl bug tracking system to report bugs. =head1 SEE ALSO L, L, L, L, L, L, L, =head1 AUTHOR Lincoln Stein Elincoln.stein@gmail.comE. Copyright (c) 2009 Ontario Institute for Cancer Research This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut use strict; use base 'Bio::DB::SeqFeature::Store::berkeleydb'; use DB_File; use Fcntl qw(O_RDWR O_CREAT :flock); use Bio::DB::GFF::Util::Rearrange 'rearrange'; # can't have more sequence ids than this use constant MAX_SEQUENCES => 1_000_000_000; # used to construct the bin key use constant C1 => 500_000_000; # limits chromosome length to 500 megabases use constant C2 => 1000*C1; # at most 1000 chromosomes use constant BINSIZE => 10_000; use constant MININT => -999_999_999_999; use constant MAXINT => 999_999_999_999; use constant SUMMARY_BIN_SIZE => 1000; sub version { return 3.0 } sub open_index_dbs { my $self = shift; my ($flags,$create) = @_; # Create the main index databases; these are DB_BTREE implementations with duplicates allowed. $DB_BTREE->{flags} = R_DUP; my $string_cmp = DB_File::BTREEINFO->new; $string_cmp->{flags} = R_DUP; $string_cmp->{compare} = sub { lc $_[0] cmp lc $_[1] }; my $numeric_cmp = DB_File::BTREEINFO->new; $numeric_cmp->{flags} = R_DUP; $numeric_cmp->{compare} = sub { $_[0] <=> $_[1] }; for my $idx ($self->_index_files) { my $path = $self->_qualify("$idx.idx"); my %db; my $dbtype = $idx eq 'locations' ? $numeric_cmp :$idx eq 'summary' ? $numeric_cmp :$idx eq 'types' ? $numeric_cmp :$idx eq 'seqids' ? $DB_HASH :$idx eq 'typeids' ? $DB_HASH :$string_cmp; tie(%db,'DB_File',$path,$flags,0666,$dbtype) or $self->throw("Couldn't tie $path: $!"); %db = () if $create; $self->index_db($idx=>\%db); } } sub seqid_db { shift->index_db('seqids') } sub typeid_db { shift->index_db('typeids') } sub _delete_databases { my $self = shift; $self->SUPER::_delete_databases; } # given a seqid (name), return its denormalized numeric representation sub seqid_id { my $self = shift; my $seqid = shift; my $db = $self->seqid_db; return $db->{lc $seqid}; } sub add_seqid { my $self = shift; my $seqid = shift; my $db = $self->seqid_db; my $key = lc $seqid; $db->{$key} = ++$db->{'.nextid'} unless exists $db->{$key}; die "Maximum number of sequence ids exceeded. This module can handle up to ", MAX_SEQUENCES," unique ids" if $db->{$key} > MAX_SEQUENCES; return $db->{$key}; } # given a seqid (name), return its denormalized numeric representation sub type_id { my $self = shift; my $typeid = shift; my $db = $self->typeid_db; return $db->{$typeid}; } sub add_typeid { my $self = shift; my $typeid = shift; my $db = $self->typeid_db; my $key = lc $typeid; $db->{$key} = ++$db->{'.nextid'} unless exists $db->{$key}; return $db->{$key}; } sub _seq_ids { my $self = shift; if (my $fa = $self->{fasta_db}) { if (my @s = eval {$fa->ids}) { return @s; } } my $l = $self->seqid_db or return; return grep {!/^\./} keys %$l; } sub _index_files { return (shift->SUPER::_index_files,'seqids','typeids','summary'); } sub _update_indexes { my $self = shift; my $obj = shift; defined (my $id = $obj->primary_id) or return; $self->SUPER::_update_indexes($obj); $self->_update_seqid_index($obj,$id); } sub _update_seqid_index { my $self = shift; my ($obj,$id,$delete) = @_; my $seq_name = $obj->seq_id; $self->add_seqid(lc $seq_name); } sub _update_type_index { my $self = shift; my ($obj,$id,$delete) = @_; my $db = $self->index_db('types') or $self->throw("Couldn't find 'types' index file"); my $key = $self->_obj_to_type($obj); my $typeid = $self->add_typeid($key); $self->update_or_delete($delete,$db,$typeid,$id); } sub _obj_to_type { my $self = shift; my $obj = shift; my $tag = $obj->primary_tag; my $source_tag = $obj->source_tag || ''; return unless defined $tag; $tag .= ":$source_tag"; return lc $tag; } sub types { my $self = shift; eval "require Bio::DB::GFF::Typename" unless Bio::DB::GFF::Typename->can('new'); my $db = $self->typeid_db; return grep {!/^\./} map {Bio::DB::GFF::Typename->new($_)} keys %$db; } sub _id2type { my $self = shift; my $wanted_id = shift; my $db = $self->typeid_db; while (my($key,$id) = each %$db) { next if $key =~ /^\./; return $key if $id == $wanted_id; } return; } # return a hash of typeids that match a human-readable type sub _matching_types { my $self = shift; my $types = shift; my @types = ref $types eq 'ARRAY' ? @$types : $types; my $db = $self->typeid_db; my %result; my @all_types; for my $type (@types) { my ($primary_tag,$source_tag); if (ref $type && $type->isa('Bio::DB::GFF::Typename')) { $primary_tag = $type->method; $source_tag = $type->source; } else { ($primary_tag,$source_tag) = split ':',$type,2; } if (defined $source_tag) { my $id = $db->{lc "$primary_tag:$source_tag"}; $result{$id}++ if defined $id; } else { @all_types = $self->types unless @all_types; $result{$db->{$_}}++ foreach grep {/^$primary_tag:/} @all_types; } } return \%result; } sub _update_location_index { my $self = shift; my ($obj,$id,$delete) = @_; my $db = $self->index_db('locations') or $self->throw("Couldn't find 'locations' index file"); my $seq_id = $obj->seq_id || ''; my $start = $obj->start || ''; my $end = $obj->end || ''; my $strand = $obj->strand; my $bin_min = int $start/BINSIZE; my $bin_max = int $end/BINSIZE; my $typeid = $self->add_typeid($self->_obj_to_type($obj)); my $seq_no = $self->add_seqid($seq_id); for (my $bin = $bin_min; $bin <= $bin_max; $bin++ ) { my $key = $seq_no * MAX_SEQUENCES + $bin; $self->update_or_delete($delete,$db,$key,pack("i5",$id,$start,$end,$strand,$typeid)); } } sub _features { my $self = shift; my ($seq_id,$start,$end,$strand, $name,$class,$allow_aliases, $types, $attributes, $range_type, $iterator ) = rearrange([['SEQID','SEQ_ID','REF'],'START',['STOP','END'],'STRAND', 'NAME','CLASS','ALIASES', ['TYPES','TYPE','PRIMARY_TAG'], ['ATTRIBUTES','ATTRIBUTE'], 'RANGE_TYPE', 'ITERATOR', ],@_); my (@from,@where,@args,@group); $range_type ||= 'overlaps'; my @result; unless (defined $name or defined $seq_id or defined $types or defined $attributes) { my $is_indexed = $self->index_db('is_indexed'); @result = $is_indexed ? grep {$is_indexed->{$_}} keys %{$self->db} : grep { !/^\./ }keys %{$self->db}; } my %found = (); my $result = 1; if (defined($name)) { # hacky backward compatibility workaround undef $class if $class && $class eq 'Sequence'; $name = "$class:$name" if defined $class && length $class > 0; $result &&= $self->filter_by_name($name,$allow_aliases,\%found); } if (defined $seq_id) { # location with or without types my $typelist = defined $types ? $self->_matching_types($types) : undef; $result &&= $self->filter_by_type_and_location( $seq_id, $start, $end, $strand, $range_type, $typelist, \%found ); } elsif (defined $types) { # types without location $result &&= $self->filter_by_type($types,\%found); } if (defined $attributes) { $result &&= $self->filter_by_attribute($attributes,\%found); } push @result,keys %found if $result; return $iterator ? Bio::DB::SeqFeature::Store::berkeleydb::Iterator->new($self,\@result) : map {$self->fetch($_)} @result; } sub filter_by_type { my $self = shift; my ($types,$filter) = @_; my @types = ref $types eq 'ARRAY' ? @$types : $types; my $index = $self->index_db('types'); my $db = tied(%$index); my @results; for my $type (@types) { my ($primary_tag,$source_tag); if (ref $type && $type->isa('Bio::DB::GFF::Typename')) { $primary_tag = $type->method; $source_tag = $type->source; } else { ($primary_tag,$source_tag) = split ':',$type,2; } $source_tag ||= ''; $primary_tag = quotemeta($primary_tag); $source_tag = quotemeta($source_tag); my $match = length $source_tag ? "^$primary_tag:$source_tag\$" : "^$primary_tag:"; my $key = lc "$primary_tag:$source_tag"; my $value; # If filter is already provided, then it is usually faster to # fetch each object. if (%$filter) { for my $id (keys %$filter) { my $obj = $self->_fetch($id) or next; push @results,$id if $obj->type =~ /$match/i; } } else { my $types = $self->typeid_db; my @typeids = map {$types->{$_}} grep {/$match/} keys %$types; for my $t (@typeids) { my $k = $t; for (my $status = $db->seq($k,$value,R_CURSOR); $status == 0 && $k == $t; $status = $db->seq($k,$value,R_NEXT)) { next if %$filter && !$filter->{$value}; # don't even bother push @results,$value; } } } } $self->update_filter($filter,\@results); } sub filter_by_type_and_location { my $self = shift; my ($seq_id,$start,$end,$strand,$range_type,$typelist,$filter) = @_; $strand ||= 0; my $index = $self->index_db('locations'); my $db = tied(%$index); my $binstart = defined $start ? int $start/BINSIZE : 0; my $binend = defined $end ? int $end/BINSIZE : MAX_SEQUENCES-1; my %seenit; my @results; $start = MININT if !defined $start; $end = MAXINT if !defined $end; my $seq_no = $self->seqid_id($seq_id); return unless defined $seq_no; if ($range_type eq 'overlaps' or $range_type eq 'contains') { my $keystart = $seq_no * MAX_SEQUENCES + $binstart; my $keystop = $seq_no * MAX_SEQUENCES + $binend; my $value; for (my $status = $db->seq($keystart,$value,R_CURSOR); $status == 0 && $keystart <= $keystop; $status = $db->seq($keystart,$value,R_NEXT)) { my ($id,$fstart,$fend,$fstrand,$ftype) = unpack("i5",$value); next if $seenit{$id}++; next if $strand && $fstrand != $strand; next if $typelist && !$typelist->{$ftype}; if ($range_type eq 'overlaps') { next unless $fend >= $start && $fstart <= $end; } elsif ($range_type eq 'contains') { next unless $fstart >= $start && $fend <= $end; } next if %$filter && !$filter->{$id}; # don't bother push @results,$id; } } # for contained in, we look for features originating and terminating outside the specified range # this is incredibly inefficient, but fortunately the query is rare (?) elsif ($range_type eq 'contained_in') { my $keystart = $seq_no * MAX_SEQUENCES; my $keystop = $seq_no * MAX_SEQUENCES + $binstart; my $value; # do the left part of the range for (my $status = $db->seq($keystart,$value,R_CURSOR); $status == 0 && $keystart <= $keystop; $status = $db->seq($keystart,$value,R_NEXT)) { my ($id,$fstart,$fend,$fstrand,$ftype) = unpack("i5",$value); next if $seenit{$id}++; next if $strand && $fstrand != $strand; next if $typelist && !$typelist->{$ftype}; next unless $fstart <= $start && $fend >= $end; next if %$filter && !$filter->{$id}; # don't bother push @results,$id; } # do the right part of the range $keystart = $seq_no*MAX_SEQUENCES+$binend; for (my $status = $db->seq($keystart,$value,R_CURSOR); $status == 0; $status = $db->seq($keystart,$value,R_NEXT)) { my ($id,$fstart,$fend,$fstrand,$ftype) = unpack("i5",$value); next if $seenit{$id}++; next if $strand && $fstrand != $strand; next unless $fstart <= $start && $fend >= $end; next if $typelist && !$typelist->{$ftype}; next if %$filter && !$filter->{$id}; # don't bother push @results,$id; } } $self->update_filter($filter,\@results); } sub build_summary_statistics { my $self = shift; my $insert = $self->index_db('summary'); %$insert = (); my $current_bin = -1; my (%residuals,$last_bin); my $le = -t \*STDERR ? "\r" : "\n"; print STDERR "\n"; # iterate through all the indexed features my $sbs = SUMMARY_BIN_SIZE; # Sadly we have to do this in two steps. In the first step, we sort # features by typeid,seqid,start. In the second step, we read through # this sorted list. To avoid running out of memory, we use a db_file # temporary database my $fh = File::Temp->new() or $self->throw("Could not create temporary file for sorting: $!"); my $name = $fh->filename; my %sort; my $num_cmp_tree = DB_File::BTREEINFO->new; $num_cmp_tree->{compare} = sub { $_[0] <=> $_[1] }; $num_cmp_tree->{flags} = R_DUP; my $s = tie %sort, 'DB_File', $name, O_CREAT|O_RDWR, 0666, $num_cmp_tree or $self->throw("Could not create Berkeley DB in temporary file '$name': $!"); my $index = $self->index_db('locations'); my $db = tied(%$index); my $keystart = 0; my ($value,$count); my %seenit; for (my $status = $db->seq($keystart,$value,R_CURSOR); $status == 0; $status = $db->seq($keystart,$value,R_NEXT)) { my ($id,$start,$end,$strand,$typeid) = unpack('i5',$value); next if $seenit{$id}++; print STDERR $count," features sorted$le" if ++$count % 1000 == 0; my $seqid = int($keystart / MAX_SEQUENCES); my $key = $self->_encode_summary_key($typeid,$seqid,$start-1); $sort{$key}=$end; } print STDERR "COUNT = $count\n"; my ($current_type,$current_seqid,$end); my $cum_count = 0; $keystart = 0; $count = 0; # the second step allows us to iterate through this for (my $status = $s->seq($keystart,$end,R_CURSOR); $status == 0; $status = $s->seq($keystart,$end,R_NEXT)) { print STDERR $count," features processed$le" if ++$count % 1000 == 0; my ($typeid,$seqid,$start) = $self->_decode_summary_key($keystart); my $bin = int($start/$sbs); # because the input is sorted by start, no more features will contribute to the # current bin so we can dispose of it if ($bin != $current_bin) { if ($seqid != $current_seqid or $typeid != $current_type) { # load all bins left over $self->_load_bins($insert,\%residuals,\$cum_count,$current_type,$current_seqid); %residuals = () ; $cum_count = 0; } else { # load all up to current one $self->_load_bins($insert,\%residuals,\$cum_count,$current_type,$current_seqid,$current_bin); } } $last_bin = $current_bin; ($current_seqid,$current_type,$current_bin) = ($seqid,$typeid,$bin); # summarize across entire spanned region my $last_bin = int(($end-1)/$sbs); for (my $b=$bin;$b<=$last_bin;$b++) { $residuals{$b}++; } } # handle tail case # load all bins left over $self->_load_bins($insert,\%residuals,\$cum_count,$current_type,$current_seqid); undef %sort; undef $fh; } sub _load_bins { my $self = shift; my ($insert,$residuals,$cum_count,$typeid,$seqid,$stop_after) = @_; for my $b (sort {$a<=>$b} keys %$residuals) { last if defined $stop_after and $b > $stop_after; $$cum_count += $residuals->{$b}; my $key = $self->_encode_summary_key($typeid,$seqid,$b); $insert->{$key} = $$cum_count; delete $residuals->{$b}; # no longer needed } } sub coverage_array { my $self = shift; my ($seq_name,$start,$end,$types,$bins) = rearrange([['SEQID','SEQ_ID','REF'],'START',['STOP','END'], ['TYPES','TYPE','PRIMARY_TAG'],'BINS'],@_); $bins ||= 1000; $start ||= 1; unless ($end) { my $segment = $self->segment($seq_name) or $self->throw("unknown seq_id $seq_name"); $end = $segment->end; } my $binsize = ($end-$start+1)/$bins; my $seqid = $self->seqid_id($seq_name) || 0; return [] unless $seqid; # where each bin starts my @his_bin_array = map {$start + $binsize * $_} (0..$bins); my @sum_bin_array = map {int(($_-1)/SUMMARY_BIN_SIZE)} @his_bin_array; my $interval_stats_idx = $self->index_db('summary'); my $db = tied(%$interval_stats_idx); my $t = $self->_matching_types($types); my (%bins,$report_tag); for my $typeid (sort keys %$t) { $report_tag ||= $typeid; for (my $i=0;$i<@sum_bin_array;$i++) { my $cum_count; my $bin = $sum_bin_array[$i]; my $key = $self->_encode_summary_key($typeid,$seqid,$bin); my $status = $db->seq($key,$cum_count,R_CURSOR); next unless $status == 0; push @{$bins{$typeid}},[$bin,$cum_count]; } } my @merged_bins; my $firstbin = int(($start-1)/$binsize); for my $type (keys %bins) { my $arry = $bins{$type}; my $last_count = $arry->[0][1]-1; my $last_bin = -1; my $i = 0; my $delta; for my $b (@$arry) { my ($bin,$count) = @$b; $delta = $count - $last_count if $bin > $last_bin; $merged_bins[$i++] = $delta; $last_count = $count; $last_bin = $bin; } } my $returned_type = $self->_id2type($report_tag); return wantarray ? (\@merged_bins,$returned_type) : \@merged_bins; } sub _encode_summary_key { my $self = shift; my ($typeid,$seqid,$bin) = @_; $self->throw('Cannot index chromosomes larger than '.C1*SUMMARY_BIN_SIZE/1e6.' megabases') if $bin > C1; return ($typeid-1)*C2 + ($seqid-1)*C1 + $bin; } sub _decode_summary_key { my $self = shift; my $key = shift; my $typeid = int($key/C2); my $residual = $key%C2; my $seqid = int($residual/C1); my $bin = $residual%C1; return ($typeid+1,$seqid+1,$bin); } 1; NormalizedFeatureI.pm100644000766000024 232513605523026 24615 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeaturepackage Bio::DB::SeqFeature::NormalizedFeatureI; $Bio::DB::SeqFeature::NormalizedFeatureI::VERSION = '1.7.4'; =head1 NAME Bio::DB::SeqFeature::NormalizedFeatureI -- Interface for normalized features =head1 SYNOPSIS none =head1 DESCRIPTION This is an extremely simple interface that contains a single method, subfeatures_are_normalized(). This method returns a true value. Bio::DB::SeqFeature::Store feature classes will inherit this interface to flag that they are able to store subfeatures in a normalized way such that the subfeature is actually contained in the Bio::DB::SeqFeature::Store database and the parent feature contains only the subfeatures primary ID. =head1 BUGS None, but the whole class design might be flawed. =head1 SEE ALSO L, L, L, L, L, L, L =head1 AUTHOR Lincoln Stein Elstein@cshl.orgE. Copyright (c) 2006 Cold Spring Harbor Laboratory. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut sub subfeatures_are_normalized { 1 } 1; Iterator.pm100644000766000024 125013605523026 24343 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeature/Store/DBIpackage Bio::DB::SeqFeature::Store::DBI::Iterator; $Bio::DB::SeqFeature::Store::DBI::Iterator::VERSION = '1.7.4'; =head1 NAME Bio::DB::SeqFeature::Store::DBI::Iterator - utility methods for creating and iterating over SeqFeature records =cut sub new { my $class = shift; my ($sth,$store) = @_; return bless {sth => $sth, store => $store },ref($class) || $class; } sub next_seq { my $self = shift; my $sth = $self->{sth} or return; my $store = $self->{store} or return; my $obj = $store->_sth2obj($sth); if (!$obj) { $self->{sth}->finish; undef $self->{sth}; undef $self->{store}; return; } return $obj; } 1; NormalizedTableFeatureI.pm100644000766000024 264313605523026 25570 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeaturepackage Bio::DB::SeqFeature::NormalizedTableFeatureI; $Bio::DB::SeqFeature::NormalizedTableFeatureI::VERSION = '1.7.4'; =head1 NAME Bio::DB::SeqFeature::NormalizedTableFeatureI -- Interface for normalized features whose hierarchy is stored in a table =head1 SYNOPSIS none =head1 DESCRIPTION This is an extremely simple interface that contains a single method, subfeatures_are_stored_in_a_table(). This method returns a true value. Bio::DB::SeqFeature::Store feature classes will inherit this interface to flag that in addition to being able to store features in a normalized way, they will use the Bio::DB::SeqFeature::Store database to record their parent/child relationships. A class that inherits from NormalizedTableFeatureI will also inherit from NormalizedFeatureI, as the first is a subclass of the second. =head1 BUGS None, but the whole class design might be flawed. =head1 SEE ALSO L, L, L, L, L, L, L =head1 AUTHOR Lincoln Stein Elstein@cshl.orgE. Copyright (c) 2006 Cold Spring Harbor Laboratory. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut use base 'Bio::DB::SeqFeature::NormalizedFeatureI'; sub subfeatures_are_stored_in_a_table { 1 } 1; FeatureFileLoader.pm100644000766000024 6172513605523026 25533 0ustar00cjfieldsstaff000000000000Bio-DB-SeqFeature-1.7.4/lib/Bio/DB/SeqFeature/Storepackage Bio::DB::SeqFeature::Store::FeatureFileLoader; $Bio::DB::SeqFeature::Store::FeatureFileLoader::VERSION = '1.7.4'; =head1 NAME Bio::DB::SeqFeature::Store::FeatureFileLoader -- feature file loader for Bio::DB::SeqFeature::Store =head1 SYNOPSIS use Bio::DB::SeqFeature::Store; use Bio::DB::SeqFeature::Store::FeatureFileLoader; # Open the sequence database my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'DBI::mysql', -dsn => 'dbi:mysql:test', -write => 1 ); my $loader = Bio::DB::SeqFeature::Store::FeatureFileLoader->new(-store => $db, -verbose => 1, -fast => 1); $loader->load('./my_genome.fff'); =head1 DESCRIPTION The Bio::DB::SeqFeature::Store::FeatureFileLoader object parsers FeatureFile-format sequence annotation files and loads Bio::DB::SeqFeature::Store databases. For certain combinations of SeqFeature classes and SeqFeature::Store databases it features a "fast load" mode which will greatly accelerate the loading of databases by a factor of 5-10. FeatureFile Format (.fff) is very simple: mRNA B0511.1 Chr1:1..100 Type=UTR;Note="putative primase" mRNA B0511.1 Chr1:101..200,300..400,500..800 Type=CDS mRNA B0511.1 Chr1:801..1000 Type=UTR reference = Chr3 Cosmid B0511 516..619 Cosmid B0511 3185..3294 Cosmid B0511 10946..11208 Cosmid B0511 13126..13511 Cosmid B0511 11394..11539 EST yk260e10.5 15569..15724 EST yk672a12.5 537..618,3187..3294 EST yk595e6.5 552..618 EST yk595e6.5 3187..3294 EST yk846e07.3 11015..11208 EST yk53c10 yk53c10.3 15000..15500,15700..15800 yk53c10.5 18892..19154 EST yk53c10.5 16032..16105 SwissProt PECANEX 13153-13656 Note="Swedish fish" FGENESH "Predicted gene 1" 1-205,518-616,661-735,3187-3365,3436-3846 "Pfam domain" # file ends There are up to four columns of WHITESPACE (not necessarily tab) delimited text. Embedded whitespace must be escaped using shell escaping rules (quoting the column or backslashing whitespace). Column 1: The feature type. You may use type:subtype as a convention for method:source. Column 2: The feature name/ID. Column 3: The position of this feature in base pair coordinates. Ranges can be given as either start-end or start..end. A chromosome position can be specified using the format "reference:start..end". A discontinuous feature can be specified by giving multiple ranges separated by commas. Minus-strand features are indicated by specifying a start > end. Column 4: Comment/attribute field. A single Note can be given, or a series of attribute=value pairs, separated by spaces or semicolons, as in "score=23;type=transmembrane" =head2 Specifying Positions and Ranges A feature position is specified using a sequence ID (a genbank accession number, a chromosome name, a contig, or any other meaningful reference system, followed by a colon and a position range. Ranges are two integers separated by double dots or the hyphen. Examples: "Chr1:516..11208", "ctgA:1-5000". Negative coordinates are allowed, as in "Chr1:-187..1000". A discontinuous range ("split location") uses commas to separate the ranges. For example: Gene B0511.1 Chr1:516..619,3185..3294,10946..11208 In the case of a split location, the sequence id only has to appear in front of the first range. Alternatively, a split location can be indicated by repeating the features type and name on multiple adjacent lines: Gene B0511.1 Chr1:516..619 Gene B0511.1 Chr1:3185..3294 Gene B0511.1 Chr1:10946..11208 If all the locations are on the same reference sequence, you can specify a default chromosome using a "reference=EseqidE": reference=Chr1 Gene B0511.1 516..619 Gene B0511.1 3185..3294 Gene B0511.1 10946..11208 The default seqid is in effect until the next "reference" line appears. =head2 Feature Tags Tags can be added to features by adding a fourth column consisting of "tag=value" pairs: Gene B0511.1 Chr1:516..619,3185..3294 Note="Putative primase" Tags and their values take any form you want, and multiple tags can be separated by semicolons. You can also repeat tags multiple times: Gene B0511.1 Chr1:516..619,3185..3294 GO_Term=GO:100;GO_Term=GO:2087 Several tags have special meanings: Tag Meaning --- ------- Type The primary tag for a subfeature. Score The score of a feature or subfeature. Phase The phase of a feature or subfeature. URL A URL to link to (via the Bio::Graphics library). Note A note to attach to the feature for display by the Bio::Graphics library. For example, in the common case of an mRNA, you can use the "Type" tag to distinguish the parts of the mRNA into UTR and CDS: mRNA B0511.1 Chr1:1..100 Type=UTR mRNA B0511.1 Chr1:101..200,300..400,500..800 Type=CDS mRNA B0511.1 Chr1:801..1000 Type=UTR The top level feature's primary tag will be "mRNA", and its subparts will have types UTR and CDS as indicated. Additional tags that are placed in the first line of the feature will be applied to the top level. In this example, the note "Putative primase" will be applied to the mRNA at the top level of the feature: mRNA B0511.1 Chr1:1..100 Type=UTR;Note="Putative primase" mRNA B0511.1 Chr1:101..200,300..400,500..800 Type=CDS mRNA B0511.1 Chr1:801..1000 Type=UTR =head2 Feature Groups Features can be grouped so that they are rendered by the "group" glyph. To start a group, create a two-column feature entry showing the group type and a name for the group. Follow this with a list of feature entries with a blank type. For example: EST yk53c10 yk53c10.3 15000-15500,15700-15800 yk53c10.5 18892-19154 This example is declaring that the ESTs named yk53c10.3 and yk53c10.5 belong to the same group named yk53c10. =head2 Comments and the #include Directive Lines that begin with the # sign are treated as comments and ignored. When a # sign appears within a line, everything to the right of the symbol is also ignored, unless it looks like an HTML fragment or an HTML color, e.g.: # this is ignored [Example] glyph = generic # this comment is ignored bgcolor = #FF0000 link = http://www.google.com/search?q=$name#results Be careful, because the processing of # signs uses a regexp heuristic. To be safe, always put a space after the # sign to make sure it is treated as a comment. The special comment "#include 'filename'" acts like the C preprocessor directive and will insert the comments of a named file into the position at which it occurs. Relative paths will be treated relative to the file in which the #include occurs. Nested #include directives are allowed: #include "/usr/local/share/my_directives.txt" #include 'my_directives.txt' #include chromosome3_features.gff3 You can enclose the file path in single or double quotes as shown above. If there are no spaces in the filename the quotes are optional. Include file processing is not very smart. Avoid creating circular #include references. You have been warned! =head2 Caveats Note that this loader always creates denormalized features such that subfeatures and their parents are stored as one big database object. The GFF3 format and its loader is usually preferred for both space and execution efficiency. =head1 METHODS =cut use strict; use Carp 'croak'; use File::Spec; use Text::ParseWords 'shellwords','quotewords'; use base 'Bio::DB::SeqFeature::Store::Loader'; =head2 new Title : new Usage : $loader = Bio::DB::SeqFeature::Store::FeatureFileLoader->new(@options) Function: create a new parser Returns : a Bio::DB::SeqFeature::Store::FeatureFileLoader parser and loader Args : several - see below Status : public This method creates a new FeatureFile loader and establishes its connection with a Bio::DB::SeqFeature::Store database. Arguments are -name=E$value pairs as described in this table: Name Value ---- ----- -store A writable Bio::DB::SeqFeature::Store database handle. -seqfeature_class The name of the type of Bio::SeqFeatureI object to create and store in the database (Bio::DB::SeqFeature by default) -sf_class A shorter alias for -seqfeature_class -verbose Send progress information to standard error. -fast If true, activate fast loading (see below) -chunk_size Set the storage chunk size for nucleotide/protein sequences (default 2000 bytes) -tmp Indicate a temporary directory to use when loading non-normalized features. When you call new(), a connection to a Bio::DB::SeqFeature::Store database should already have been established and the database initialized (if appropriate). Some combinations of Bio::SeqFeatures and Bio::DB::SeqFeature::Store databases support a fast loading mode. Currently the only reliable implementation of fast loading is the combination of DBI::mysql with Bio::DB::SeqFeature. The other important restriction on fast loading is the requirement that a feature that contains subfeatures must occur in the FeatureFile file before any of its subfeatures. Otherwise the subfeatures that occurred before the parent feature will not be attached to the parent correctly. This restriction does not apply to normal (slow) loading. If you use an unnormalized feature class, such as Bio::SeqFeature::Generic, then the loader needs to create a temporary database in which to cache features until all their parts and subparts have been seen. This temporary databases uses the "bdb" adaptor. The -tmp option specifies the directory in which that database will be created. If not present, it defaults to the system default tmp directory specified by File::Spec-Etmpdir(). The -chunk_size option allows you to tune the representation of DNA/Protein sequence in the Store database. By default, sequences are split into 2000 base/residue chunks and then reassembled as needed. This avoids the problem of pulling a whole chromosome into memory in order to fetch a short subsequence from somewhere in the middle. Depending on your usage patterns, you may wish to tune this parameter using a chunk size that is larger or smaller than the default. =cut # sub new {} inherited =head2 load Title : load Usage : $count = $loader->load(@ARGV) Function: load the indicated files or filehandles Returns : number of feature lines loaded Args : list of files or filehandles Status : public Once the loader is created, invoke its load() method with a list of FeatureFile or FASTA file paths or previously-opened filehandles in order to load them into the database. Compressed files ending with .gz, .Z and .bz2 are automatically recognized and uncompressed on the fly. Paths beginning with http: or ftp: are treated as URLs and opened using the LWP GET program (which must be on your path). FASTA files are recognized by their initial "E" character. Do not feed the loader a file that is neither FeatureFile nor FASTA; I don't know what will happen, but it will probably not be what you expect. =cut # sub load {} inherited =head2 accessors The following read-only accessors return values passed or created during new(): store() the long-term Bio::DB::SeqFeature::Store object tmp_store() the temporary Bio::DB::SeqFeature::Store object used during loading sfclass() the Bio::SeqFeatureI class fast() whether fast loading is active seq_chunk_size() the sequence chunk size verbose() verbose progress messages =cut # sub store {} inherited # sub tmp_store {} inherited # sub sfclass {} inherited # sub fast {} inherited # sub seq_chunk_size {} inherited # sub verbose {} inherited =head2 default_seqfeature_class $class = $loader->default_seqfeature_class Return the default SeqFeatureI class (Bio::Graphics::Feature). =cut sub default_seqfeature_class { #override my $self = shift; return 'Bio::Graphics::Feature'; } =head2 load_fh $count = $loader->load_fh($filehandle) Load the FeatureFile data at the other end of the filehandle and return true if successful. Internally, load_fh() invokes: start_load(); do_load($filehandle); finish_load(); =cut # sub load_fh { } inherited =head2 start_load, finish_load These methods are called at the start and end of a filehandle load. =cut sub create_load_data { my $self = shift; $self->SUPER::create_load_data(); $self->{load_data}{mode} = 'fff'; $self->{load_data}{CurrentGroup} = undef; } sub finish_load { my $self = shift; $self->_store_group; $self->SUPER::finish_load; } =head2 load_line $loader->load_line($data); Load a line of a FeatureFile file. You must bracket this with calls to start_load() and finish_load()! $loader->start_load(); $loader->load_line($_) while ; $loader->finish_load(); =cut sub load_line { my $self = shift; my $line = shift; chomp($line); return unless $line =~ /\S/; # blank line my $load_data = $self->{load_data}; $load_data->{mode} = 'fff' if /\s/; # if it has any whitespace in # it, then back to fff mode if ($line =~ /^\#\s?\#\s*([\#]+)/) { ## meta instruction $load_data->{mode} = 'fff'; $self->handle_meta($1); } elsif ($line =~ /^\#/) { $load_data->{mode} = 'fff'; # just to be safe return; # comment } elsif ($line =~ /^>\s*(\S+)/) { # FASTA lines are coming $load_data->{mode} = 'fasta'; $self->start_or_finish_sequence($1); } elsif ($load_data->{mode} eq 'fasta') { $self->load_sequence($line); } elsif ($load_data->{mode} eq 'fff') { $self->handle_feature($line); if (++$load_data->{count} % 1000 == 0) { my $now = $self->time(); my $nl = -t STDOUT && !$ENV{EMACS} ? "\r" : "\n"; $self->msg(sprintf("%d features loaded in %5.2fs...$nl", $load_data->{count},$now - $load_data->{start_time})); $load_data->{start_time} = $now; } } else { $self->throw("I don't know what to do with this line:\n$line"); } } =head2 handle_meta $loader->handle_meta($meta_directive) This method is called to handle meta-directives such as ##sequence-region. The method will receive the directive with the initial ## stripped off. =cut # sub handle_meta { } inherited =head2 handle_feature $loader->handle_feature($gff3_line) This method is called to process a single FeatureFile line. It manipulates information stored a data structure called $self-E{load_data}. =cut sub handle_feature { my $self = shift; local $_ = shift; my $ld = $self->{load_data}; # handle reference line if (/^reference\s*=\s*(.+)/) { $ld->{reference} = $1; return; } # parse data lines my @tokens = quotewords('\s+',1,$_); for (0..2) { # remove quotes from everything but last column next unless defined $tokens[$_]; $tokens[$_] =~ s/^"//; $tokens[$_] =~ s/"$//; } if (@tokens < 3) { # short line; assume a group identifier $self->store_current_feature(); my $type = shift @tokens; my $name = shift @tokens; $ld->{CurrentGroup} = $self->_make_indexed_feature($name,$type,'',{_ff_group=>1}); $self->_indexit($name => 1); return; } my($type,$name,$strand,$bounds,$attributes); if ($tokens[2] =~ /^([+-.]|[+-]?[01])$/) { # old version ($type,$name,$strand,$bounds,$attributes) = @tokens; } else { # new version ($type,$name,$bounds,$attributes) = @tokens; } # handle case of there only being one value in the last column, # in which case we treat it the same as Note="value" my $attr = $self->parse_attributes($attributes); # @parts is an array of ([ref,start,end],[ref,start,end],...) my @parts = map { [/(?:(\w+):)?(-?\d+)(?:-|\.\.)(-?\d+)/]} split /(?:,| )\s*/,$bounds; # deal with groups -- a group is ending if $type is defined # and CurrentGroup is set if ($type && $ld->{CurrentGroup}) { $self->_store_group(); } $type = '' unless defined $type; $name = '' unless defined $name; $type ||= $ld->{CurrentGroup}->primary_tag if $ld->{CurrentGroup}; my $reference = $ld->{reference} || 'ChrUN'; foreach (@parts) { if (defined $_ && ref($_) eq 'ARRAY' && defined $_->[1] && defined $_->[2]) { $strand ||= $_->[1] <= $_->[2] ? '+' : '-'; ($_->[1],$_->[2]) = ($_->[2],$_->[1]) if $_->[1] > $_->[2]; } $reference = $_->[0] if defined $_->[0]; $_ = [@{$_}[1,2]]; # strip off the reference. } # now @parts is an array of [start,end] and $reference contains the seqid # apply coordinate mapper if ($self->{coordinate_mapper} && $reference) { my @remapped = $self->{coordinate_mapper}->($reference,@parts); ($reference,@parts) = @remapped if @remapped; } # either create a new feature or add a segment to it my $feature = $ld->{CurrentFeature}; $ld->{OldPartType} = $ld->{PartType}; if (exists $attr->{Type} || exists $attr->{type}) { $ld->{PartType} = $attr->{Type}[0] || $attr->{type}[0]; } else { $ld->{PartType} = $type; } if ($feature) { local $^W = 0; # avoid uninit warning when display_name() is called # if this is a different feature from what we have now, then we # store the current one, and create a new one if ($feature->display_name ne $name || $feature->method ne $type) { $self->store_current_feature; # new feature, store old one undef $feature; } else { # create a new multipart feature $self->_multilevel_feature($feature,$ld->{OldPartType}) unless $feature->get_SeqFeatures; my $part = $self->_make_feature($name, $ld->{PartType}, $strand, $attr, $reference, @{$parts[0]}); $feature->add_SeqFeature($part); } } $feature ||= $self->_make_indexed_feature($name, $type, # side effect is to set CurrentFeature $strand, $attr, $reference, @{$parts[0]}); # add more segments to the current feature if (@parts > 1) { for my $part (@parts) { $type ||= $feature->primary_tag; my $sp = $self->_make_feature($name, $ld->{PartType}, $strand, $attr, $reference, @{$part}); $feature->add_SeqFeature($sp); } } } sub _multilevel_feature { # turn a single-level feature into a multilevel one my $self = shift; my $f = shift; my $type = shift; my %attributes = $f->attributes; $attributes{Score} = [$f->score] if defined $f->score; $attributes{Phase} = [$f->phase] if defined $f->phase; my @args = ($f->display_name, $type||$f->type, $f->strand, \%attributes, $f->seq_id, $f->start, $f->end); my $subpart = $self->_make_feature(@args); $f->add_SeqFeature($subpart); } sub _make_indexed_feature { my $self = shift; my $f = $self->_make_feature(@_); my $name = $f->display_name; $self->{load_data}{CurrentFeature} = $f; $self->{load_data}{CurrentID} = $name; $self->_indexit($name => 1); return $f; } sub _make_feature { my $self = shift; my ($name,$type,$strand,$attributes,$ref,$start,$end) = @_; # some basic error checking $self->throw("syntax error at line $.: '$_'") if ($ref && !defined $start) or ($ref && !defined $end) or ($start && $start !~ /^[-\d]+$/) or ($end && $end !~ /^[-\d]+$/) or !defined $type or !defined $name; $strand ||= ''; my @args = (-name => $name, -strand => $strand eq '+' ? 1 :$strand eq '-' ? -1 :$strand eq '' ? 0 :$strand eq '.' ? 0 :$strand == 1 ? 1 :$strand == -1 ? -1 :0, -attributes => $attributes, ); if (my ($method,$source) = $type =~ /(\S+):(\S+)/) { push @args,(-primary_tag => $method, -source => $source); } else { push @args,(-primary_tag => $type); } push @args,(-seq_id => $ref) if defined $ref; push @args,(-start => $start) if defined $start; push @args,(-end => $end) if defined $end; # pull out special attributes if (my $score = $attributes->{Score} || $attributes->{score}) { push @args,(-score => $score->[0]); delete $attributes->{$_} foreach qw(Score score); } if (my $note = $attributes->{Note} || $attributes->{note}) { push @args,(-desc => join '; ',@$note); delete $attributes->{$_} foreach qw(Note note); } if (my $url = $attributes->{url} || $attributes->{Url}) { push @args,(-url => $url->[0]); delete $attributes->{$_} foreach qw (Url url); } if (my $phase = $attributes->{phase} || $attributes->{Phase}) { push @args,(-phase => $phase->[0]); delete $attributes->{$_} foreach qw (Phase phase); } $self->_indexit($name=>1) if $self->index_subfeatures && $name; return $self->sfclass->new(@args); } =head2 store_current_feature $loader->store_current_feature() This method is called to store the currently active feature in the database. It uses a data structure stored in $self-E{load_data}. =cut sub store_current_feature { # overridden my $self = shift; # handle open groups # if there is an open group, then we simply add the current # feature to the group. my $ld = $self->{load_data}; if ($ld->{CurrentGroup} && $ld->{CurrentFeature}) { $ld->{CurrentGroup}->add_SeqFeature($ld->{CurrentFeature}) unless $ld->{CurrentGroup} eq $ld->{CurrentFeature}; # paranoia - shouldn't happen return; } else { $self->SUPER::store_current_feature(); } } sub _store_group { my $self = shift; my $ld = $self->{load_data}; my $group = $ld->{CurrentGroup} or return; # if there is an unattached feature, then add it $self->store_current_feature() if $ld->{CurrentFeature}; $ld->{CurrentFeature} = $group; $ld->{CurrentID} = $group->display_name; $self->_indexit($ld->{CurrentID} => 1); undef $ld->{CurrentGroup}; $self->store_current_feature(); } =head2 build_object_tree $loader->build_object_tree() This method gathers together features and subfeatures and builds the graph that connects them. =cut ### # put objects together # sub build_object_tree { croak "We shouldn't be building an object tree in the FeatureFileLoader"; } =head2 build_object_tree_in_tables $loader->build_object_tree_in_tables() This method gathers together features and subfeatures and builds the graph that connects them, assuming that parent/child relationships will be stored in a database table. =cut sub build_object_tree_in_tables { croak "We shouldn't be building an object tree in the FeatureFileLoader"; } =head2 build_object_tree_in_features $loader->build_object_tree_in_features() This method gathers together features and subfeatures and builds the graph that connects them, assuming that parent/child relationships are stored in the seqfeature objects themselves. =cut sub build_object_tree_in_features { croak "We shouldn't be building an object tree in the FeatureFileLoader"; } =head2 attach_children $loader->attach_children($store,$load_data,$load_id,$feature) This recursively adds children to features and their subfeatures. It is called when subfeatures are directly contained within other features, rather than stored in a relational table. =cut sub attach_children { croak "We shouldn't be attaching children in the FeatureFileLoader!"; } =head2 parse_attributes @attributes = $loader->parse_attributes($attribute_line) This method parses the information contained in the $attribute_line into a flattened hash (array). It may return one element, in which case it is an implicit =cut sub parse_attributes { my $self = shift; my $att = shift; $att ||= ''; # to prevent uninit variable warnings from quotewords() my @pairs = quotewords('[;\s]',1,$att); my %attributes; for my $pair (@pairs) { unless ($pair =~ /=/) { push @{$attributes{Note}},(quotewords('',0,$pair))[0] || $pair; } else { my ($tag,$value) = quotewords('\s*=\s*',0,$pair); $tag = 'Note' if $tag eq 'description'; push @{$attributes{$tag}},$value; } } return \%attributes; } =head2 start_or_finish_sequence $loader->start_or_finish_sequence('Chr9') This method is called at the beginning and end of a fasta section. =cut 1; __END__ =head1 BUGS This is an early version, so there are certainly some bugs. Please use the BioPerl bug tracking system to report bugs. =head1 SEE ALSO L, L, L, L, L, L =head1 AUTHOR Lincoln Stein Elstein@cshl.orgE. Copyright (c) 2006 Cold Spring Harbor Laboratory. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut