bio-2.0.3/0000755000175000017500000000000014141516614011616 5ustar nileshnileshbio-2.0.3/LGPL0000644000175000017500000006347614141516614012317 0ustar nileshnilesh GNU LESSER GENERAL PUBLIC LICENSE Version 2.1, February 1999 Copyright (C) 1991, 1999 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. [This is the first released version of the Lesser GPL. It also counts as the successor of the GNU Library Public License, version 2, hence the version number 2.1.] Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public Licenses are intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This license, the Lesser General Public License, applies to some specially designated software packages--typically libraries--of the Free Software Foundation and other authors who decide to use it. You can use it too, but we suggest you first think carefully about whether this license or the ordinary General Public License is the better strategy to use in any particular case, based on the explanations below. When we speak of free software, we are referring to freedom of use, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish); that you receive source code or can get it if you want it; that you can change the software and use pieces of it in new free programs; and that you are informed that you can do these things. To protect your rights, we need to make restrictions that forbid distributors to deny you these rights or to ask you to surrender these rights. These restrictions translate to certain responsibilities for you if you distribute copies of the library or if you modify it. For example, if you distribute copies of the library, whether gratis or for a fee, you must give the recipients all the rights that we gave you. You must make sure that they, too, receive or can get the source code. If you link other code with the library, you must provide complete object files to the recipients, so that they can relink them with the library after making changes to the library and recompiling it. And you must show them these terms so they know their rights. We protect your rights with a two-step method: (1) we copyright the library, and (2) we offer you this license, which gives you legal permission to copy, distribute and/or modify the library. To protect each distributor, we want to make it very clear that there is no warranty for the free library. Also, if the library is modified by someone else and passed on, the recipients should know that what they have is not the original version, so that the original author's reputation will not be affected by problems that might be introduced by others. Finally, software patents pose a constant threat to the existence of any free program. We wish to make sure that a company cannot effectively restrict the users of a free program by obtaining a restrictive license from a patent holder. Therefore, we insist that any patent license obtained for a version of the library must be consistent with the full freedom of use specified in this license. Most GNU software, including some libraries, is covered by the ordinary GNU General Public License. This license, the GNU Lesser General Public License, applies to certain designated libraries, and is quite different from the ordinary General Public License. We use this license for certain libraries in order to permit linking those libraries into non-free programs. When a program is linked with a library, whether statically or using a shared library, the combination of the two is legally speaking a combined work, a derivative of the original library. The ordinary General Public License therefore permits such linking only if the entire combination fits its criteria of freedom. The Lesser General Public License permits more lax criteria for linking other code with the library. We call this license the "Lesser" General Public License because it does Less to protect the user's freedom than the ordinary General Public License. It also provides other free software developers Less of an advantage over competing non-free programs. These disadvantages are the reason we use the ordinary General Public License for many libraries. However, the Lesser license provides advantages in certain special circumstances. For example, on rare occasions, there may be a special need to encourage the widest possible use of a certain library, so that it becomes a de-facto standard. To achieve this, non-free programs must be allowed to use the library. A more frequent case is that a free library does the same job as widely used non-free libraries. In this case, there is little to gain by limiting the free library to free software only, so we use the Lesser General Public License. In other cases, permission to use a particular library in non-free programs enables a greater number of people to use a large body of free software. For example, permission to use the GNU C Library in non-free programs enables many more people to use the whole GNU operating system, as well as its variant, the GNU/Linux operating system. Although the Lesser General Public License is Less protective of the users' freedom, it does ensure that the user of a program that is linked with the Library has the freedom and the wherewithal to run that program using a modified version of the Library. The precise terms and conditions for copying, distribution and modification follow. Pay close attention to the difference between a "work based on the library" and a "work that uses the library". The former contains code derived from the library, whereas the latter must be combined with the library in order to run. GNU LESSER GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License Agreement applies to any software library or other program which contains a notice placed by the copyright holder or other authorized party saying it may be distributed under the terms of this Lesser General Public License (also called "this License"). Each licensee is addressed as "you". A "library" means a collection of software functions and/or data prepared so as to be conveniently linked with application programs (which use some of those functions and data) to form executables. The "Library", below, refers to any such software library or work which has been distributed under these terms. A "work based on the Library" means either the Library or any derivative work under copyright law: that is to say, a work containing the Library or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language. (Hereinafter, translation is included without limitation in the term "modification".) "Source code" for a work means the preferred form of the work for making modifications to it. For a library, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the library. Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running a program using the Library is not restricted, and output from such a program is covered only if its contents constitute a work based on the Library (independent of the use of the Library in a tool for writing it). Whether that is true depends on what the Library does and what the program that uses the Library does. 1. You may copy and distribute verbatim copies of the Library's complete source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and distribute a copy of this License along with the Library. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Library or any portion of it, thus forming a work based on the Library, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) The modified work must itself be a software library. b) You must cause the files modified to carry prominent notices stating that you changed the files and the date of any change. c) You must cause the whole of the work to be licensed at no charge to all third parties under the terms of this License. d) If a facility in the modified Library refers to a function or a table of data to be supplied by an application program that uses the facility, other than as an argument passed when the facility is invoked, then you must make a good faith effort to ensure that, in the event an application does not supply such function or table, the facility still operates, and performs whatever part of its purpose remains meaningful. (For example, a function in a library to compute square roots has a purpose that is entirely well-defined independent of the application. Therefore, Subsection 2d requires that any application-supplied function or table used by this function must be optional: if the application does not supply it, the square root function must still compute square roots.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Library, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Library, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Library. In addition, mere aggregation of another work not based on the Library with the Library (or with a work based on the Library) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may opt to apply the terms of the ordinary GNU General Public License instead of this License to a given copy of the Library. To do this, you must alter all the notices that refer to this License, so that they refer to the ordinary GNU General Public License, version 2, instead of to this License. (If a newer version than version 2 of the ordinary GNU General Public License has appeared, then you can specify that version instead if you wish.) Do not make any other change in these notices. Once this change is made in a given copy, it is irreversible for that copy, so the ordinary GNU General Public License applies to all subsequent copies and derivative works made from that copy. This option is useful when you wish to copy part of the code of the Library into a program that is not a library. 4. You may copy and distribute the Library (or a portion or derivative of it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange. If distribution of object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place satisfies the requirement to distribute the source code, even though third parties are not compelled to copy the source along with the object code. 5. A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License. However, linking a "work that uses the Library" with the Library creates an executable that is a derivative of the Library (because it contains portions of the Library), rather than a "work that uses the library". The executable is therefore covered by this License. Section 6 states terms for distribution of such executables. When a "work that uses the Library" uses material from a header file that is part of the Library, the object code for the work may be a derivative work of the Library even though the source code is not. Whether this is true is especially significant if the work can be linked without the Library, or if the work is itself a library. The threshold for this to be true is not precisely defined by law. If such an object file uses only numerical parameters, data structure layouts and accessors, and small macros and small inline functions (ten lines or less in length), then the use of the object file is unrestricted, regardless of whether it is legally a derivative work. (Executables containing this object code plus portions of the Library will still fall under Section 6.) Otherwise, if the work is a derivative of the Library, you may distribute the object code for the work under the terms of Section 6. Any executables containing that work also fall under Section 6, whether or not they are linked directly with the Library itself. 6. As an exception to the Sections above, you may also combine or link a "work that uses the Library" with the Library to produce a work containing portions of the Library, and distribute that work under terms of your choice, provided that the terms permit modification of the work for the customer's own use and reverse engineering for debugging such modifications. You must give prominent notice with each copy of the work that the Library is used in it and that the Library and its use are covered by this License. You must supply a copy of this License. If the work during execution displays copyright notices, you must include the copyright notice for the Library among them, as well as a reference directing the user to the copy of this License. Also, you must do one of these things: a) Accompany the work with the complete corresponding machine-readable source code for the Library including whatever changes were used in the work (which must be distributed under Sections 1 and 2 above); and, if the work is an executable linked with the Library, with the complete machine-readable "work that uses the Library", as object code and/or source code, so that the user can modify the Library and then relink to produce a modified executable containing the modified Library. (It is understood that the user who changes the contents of definitions files in the Library will not necessarily be able to recompile the application to use the modified definitions.) b) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (1) uses at run time a copy of the library already present on the user's computer system, rather than copying library functions into the executable, and (2) will operate properly with a modified version of the library, if the user installs one, as long as the modified version is interface-compatible with the version that the work was made with. c) Accompany the work with a written offer, valid for at least three years, to give the same user the materials specified in Subsection 6a, above, for a charge no more than the cost of performing this distribution. d) If distribution of the work is made by offering access to copy from a designated place, offer equivalent access to copy the above specified materials from the same place. e) Verify that the user has already received a copy of these materials or that you have already sent this user a copy. For an executable, the required form of the "work that uses the Library" must include any data and utility programs needed for reproducing the executable from it. However, as a special exception, the materials to be distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. It may happen that this requirement contradicts the license restrictions of other proprietary libraries that do not normally accompany the operating system. Such a contradiction means you cannot use both them and the Library together in an executable that you distribute. 7. You may place library facilities that are a work based on the Library side-by-side in a single library together with other library facilities not covered by this License, and distribute such a combined library, provided that the separate distribution of the work based on the Library and of the other library facilities is otherwise permitted, and provided that you do these two things: a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities. This must be distributed under the terms of the Sections above. b) Give prominent notice with the combined library of the fact that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work. 8. You may not copy, modify, sublicense, link with, or distribute the Library except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, link with, or distribute the Library is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 9. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Library or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Library (or any work based on the Library), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Library or works based on it. 10. Each time you redistribute the Library (or any work based on the Library), the recipient automatically receives a license from the original licensor to copy, distribute, link with or modify the Library subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties with this License. 11. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Library at all. For example, if a patent license would not permit royalty-free redistribution of the Library by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Library. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply, and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 12. If the distribution and/or use of the Library is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Library under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 13. The Free Software Foundation may publish revised and/or new versions of the Lesser General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Library specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Library does not specify a license version number, you may choose any version ever published by the Free Software Foundation. 14. If you wish to incorporate parts of the Library into other free programs whose distribution conditions are incompatible with these, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Libraries If you develop a new library, and you want it to be of the greatest possible use to the public, we recommend making it free software that everyone can redistribute and change. You can do so by permitting redistribution under these terms (or, alternatively, under the terms of the ordinary General Public License). To apply these terms, attach the following notices to the library. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Also add information on how to contact you by electronic and paper mail. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the library, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the library `Frob' (a library for tweaking knobs) written by James Random Hacker. , 1 April 1990 Ty Coon, President of Vice That's all there is to it! bio-2.0.3/GPL0000644000175000017500000004313114141516614012165 0ustar nileshnilesh GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Library General Public License instead of this License. bio-2.0.3/README_DEV.rdoc0000644000175000017500000003022414141516614014123 0ustar nileshnilesh= README.DEV Copyright:: Copyright (C) 2005, 2006 Toshiaki Katayama Copyright:: Copyright (C) 2006, 2008 Jan Aerts Copyright:: Copyright (C) 2011, 2019 Naohisa Goto = HOW TO CONTRIBUTE TO THE BIORUBY PROJECT? There are many possible ways to contribute to the BioRuby project, such as: * Join the discussion on the BioRuby mailing list * Send a bug report or write a bug fix patch * Add and correct documentation * Develop code for new features, etc. All of these are welcome! This document mainly focuses on the last option, how to contribute your code to the BioRuby distribution. This may also be helpful when you send large patches for existing codes. We would like to include your contribution as long as the scope of your module meets the field of bioinformatics. == Git Bioruby is now under git source control at http://github.com/bioruby/bioruby. There are two basic ways to contribute: with patches or pull requests. Both are explained on the bioruby wiki at http://bioruby.open-bio.org/wiki. === Preparation before sending patches or pull requests Before sending patches or pull requests, rewriting history and reordering or selecting patches are recommended. See "Creating the perfect patch series" in the Git User's Manual. http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#patch-series === Sending your contribution ==== With patches You can send patches with git-format-patch. For a smaller change, unified diff (diff -u) without using git can also be accepted. ==== With pull requests We are happy if your commits can be pulled with fast-forward. For the purpose, using git-rebase before sending pull request is recommended. See "Keeping a patch series up to date using git rebase" in the Git User's Manual. http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#using-git-rebase === Notes for the treatment of contributions in the blessed repository ==== Merging policy We do not always merge your commits as is. We may edit, rewrite, reorder, select, and/or mix your commits before and/or after merging to the blessed repository. ==== Git commit management policy We want to keep the commit history linear as far as possible, because it is easy to find problems and regressions in commits. See "Why bisecting merge commits can be harder than bisecting linear history" in the Git User's Manual. http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#bisect-merges Note that the above policy is only for the main 'blessed' repository, and it does not aim to restrict each user's fork. = LICENSE If you would like your module to be included in the BioRuby distribution, you need to give us right to change the license of your module to make it compatible with other modules in BioRuby. BioRuby was previously distributed under the LGPL license, but now is distributed under the same terms as Ruby. = CODING STYLE You will need to follow the typical coding styles of the BioRuby modules: == Use the following naming conventions * CamelCase for module and class names * '_'-separated_lowercase for method names * '_'-separated_lowercase for variable names * all UPPERCASE for constants == Indentation must not include tabs * Use 2 spaces for indentation. * Don't replace spaces to tabs. == Parenthesis in the method definition line should be written * Good: def example(str, ary) * Discouraged: def example str, ary == Comments Don't use =begin and =end blocks for comments. If you need to add comments, include it in the RDoc documentation. == Documentation should be written in the RDoc format in the source code The RDoc format is becoming the popular standard for Ruby documentation. We are now in transition from the previously used RD format to the RDoc format in API documentation. Additional tutorial documentation and working examples are encouraged with your contribution. You may use the header part of the file for this purpose as demonstrated in the previous section. == Standard documentation === of files Each file should start with a header, which covers the following topics: * copyright * license * description of the file (_not_ the classes; see below) * any references, if appropriate The header should be formatted as follows: # # = bio/db/hoge.rb - Hoge database parser classes # # Copyright:: Copyright (C) 2001, 2003-2005 Bio R. Hacker , # Copyright:: Copyright (C) 2006 Chem R. Hacker # # License:: The Ruby License # # == Description # # This file contains classes that implement an interface to the Hoge database. # # == References # # * Hoge F. et al., The Hoge database, Nucleic. Acid. Res. 123:100--123 (2030) # * http://hoge.db/ # require 'foo' module Bio autoload :Bar, 'bio/bar' class Hoge : end # Hoge end # Bio === of classes and methods within those files Classes and methods should be documented in a standardized format, as in the following example (from lib/bio/sequence.rb): # == Description # # Bio::Sequence objects represent annotated sequences in bioruby. # A Bio::Sequence object is a wrapper around the actual sequence, # represented as either a Bio::Sequence::NA or a Bio::Sequence::AA object. # For most users, this encapsulation will be completely transparent. # Bio::Sequence responds to all methods defined for Bio::Sequence::NA/AA # objects using the same arguments and returning the same values (even though # these methods are not documented specifically for Bio::Sequence). # # == Usage # # require 'bio' # # # Create a nucleic or amino acid sequence # dna = Bio::Sequence.auto('atgcatgcATGCATGCAAAA') # rna = Bio::Sequence.auto('augcaugcaugcaugcaaaa') # aa = Bio::Sequence.auto('ACDEFGHIKLMNPQRSTVWYU') # # # Print in FASTA format # puts dna.output(:fasta) # # # Print all codons # dna.window_search(3,3) do |codon| # puts codon # end # class Sequence # Create a new Bio::Sequence object # # s = Bio::Sequence.new('atgc') # puts s # => 'atgc' # # Note that this method does not intialize the contained sequence # as any kind of bioruby object, only as a simple string # # puts s.seq.class # => String # # See Bio::Sequence#na, Bio::Sequence#aa, and Bio::Sequence#auto # for methods to transform the basic String of a just created # Bio::Sequence object to a proper bioruby object # --- # *Arguments*: # * (required) _str_: String or Bio::Sequence::NA/AA object # *Returns*:: Bio::Sequence object def initialize(str) @seq = str end # The sequence identifier. For example, for a sequence # of Genbank origin, this is the accession number. attr_accessor :entry_id # An Array of Bio::Feature objects attr_accessor :features end # Sequence Preceding the class definition (class Sequence), there is at least a description and a usage example. Please use the +Description+ and +Usage+ headings. If appropriate, refer to other classes that interact with or are related to the class. The code in the usage example should, if possible, be in a format that a user can copy-and-paste into a new script to run. It should illustrate the most important uses of the class. If possible and if it would not clutter up the example too much, try to provide any input data directly into the usage example, instead of refering to ARGV or ARGF for input. dna = Bio::Sequence.auto('atgcatgcATGCATGCAAAA') Otherwise, describe the input shortly, for example: # input should be string consisting of nucleotides dna = Bio::Sequence.auto(ARGF.read) Methods should be preceded by a comment that describes what the method does, including any relevant usage examples. (In contrast to the documentation for the class itself, headings are not required.) In addition, any arguments should be listed, as well as the type of thing that is returned by the method. The format of this information is as follows: # --- # *Arguments*: # * (required) _str_: String or Bio::Sequence::NA # * (optional) _nr_: a number that means something # *Returns*:: true or false Attribute accessors can be preceded by a short description. # P-value (Float) attr_reader :pvalue For writing rdoc documentation, putting two or more attributes in a line (such as attr_reader :evalue, :pvalue) is strongly discouraged. Methods looks like attributes can also be preceded by a short description. # Scientific name (String) def scientific_name #... end # Scientific name (String) def scientific_name=(str) #... end == Exception handling Don't use $stderr.puts "WARNING" in your code. Instead, try to avoid printing error messages. For fatal errors, use +raise+ with an appropriate message. Kernel#warn can only be used to notice incompatible changes to programmers. Typically it may be used for deprecated or obsolete usage of a method. For example, warn "The Foo#bar method is obsoleted. Use Foo#baz instead." == Testing code should use 'test/unit' Unit tests should come with your modules by which you can assure what you meant to do with each method. The test code is useful to make maintenance easy and ensure stability. The use of if __FILE__ == $0 is deprecated. == Using autoload To quicken the initial load time we have replaced most of 'require' to 'autoload' since BioRuby version 0.7. During this change, we have found some tips: You should not separate the same namespace into several files. * For example, if you have separated definitions of the Bio::Foo class into two files (e.g. 'bio/foo.rb' and 'bio/bar.rb'), you need to resolve the dependencies (including the load order) yourself. * If you have a defined Bio::Foo in 'bio/foo.rb' and a defined Bio::Foo::Bar in 'bio/foo/bar.rb' add the following line in the 'bio/foo.rb' file: autoload :Bar, 'bio/foo/bar' You should not put several top level namespaces in one file. * For example, if you have Bio::A, Bio::B and Bio::C in the file 'bio/foo.rb', you need autoload :A, 'bio/foo' autoload :B, 'bio/foo' autoload :C, 'bio/foo' to load the module automatically (instead of require 'bio/foo'). In this case, you should put them under the new namespace like Bio::Foo::A, Bio::Foo::B and Bio::Foo::C in the file 'bio/foo', then use autoload :Foo, 'bio/foo' so autoload can be written in 1 line. = NAMESPACE Your module should be located under the top-level module Bio and put under the 'bioruby/lib/bio' directory. The class/module names and the file names should be short and descriptive. There are already several sub directories in 'bioruby/lib': bio/*.rb -- general and widely used basic classes bio/appl/ -- wrapper and parser for the external applications bio/data/ -- basic biological data bio/db/ -- flatfile database entry parsers bio/io/ -- I/O interfaces for files, RDB, web services etc. bio/util/ -- utilities and algorithms for bioinformatics If your module doesn't match any of the above, please propose an appropriate directory name when you contribute. Please let the staff discuss on namespaces (class names), API (method names) before commiting a new module or making changes on existing modules. = MAINTENANCE Finally, please maintain the code you've contributed. Please let us know (on the bioruby list) before you commit, so that users can discuss on the change. = RUBY VERSION and IMPLEMENTATION We are mainly using Ruby MRI (Matz' Ruby Implementation, or Matz' Ruby Interpreter). Please confirm that your code is running on current stable release versions of Ruby MRI. See README.rdoc and RELEASE_NOTES.rdoc for recommended Ruby versions. It is welcome to support JRuby, Rubinius, etc, in addition to Ruby MRI. Of course, it is strongly encouraged to write code that is not affected by differences between Ruby versions and/or implementations, as far as possible. Although we no longer support Ruby 1.8, it might be useful if your code could also run on Ruby 1.8.7 in addition to supported Ruby versions. = OS and ARCHITECTURE We hope BioRuby can be run on both UNIX (and UNIX-like OS) and Microsoft Windows. bio-2.0.3/BSDL0000644000175000017500000000240214141516614012263 0ustar nileshnileshCopyright (C) 1993-2013 Yukihiro Matsumoto. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. bio-2.0.3/README.rdoc0000644000175000017500000002103014141516614013420 0ustar nileshnilesh-- = README.rdoc - README for BioRuby Copyright:: Copyright (C) 2001-2007 Toshiaki Katayama , Copyright (C) 2008 Jan Aerts Copyright (C) 2011-2019 Naohisa Goto License:: The Ruby License * The above statement is limited to this file. See below about BioRuby's copyright and license. ++ = BioRuby Copyright (C) 2001-2019 Toshiaki Katayama BioRuby is an open source Ruby library for developing bioinformatics software. Object oriented scripting language Ruby has many features suitable for bioinformatics research, for example, clear syntax to express complex objects, regular expressions for text handling as powerful as Perl's, a wide variety of libraries including web service etc. As the syntax of the Ruby language is simple and very clean, we believe that it is easy to learn for beginners, easy to use for biologists, and also powerful enough for the software developers. In BioRuby, you can retrieve biological database entries from flat files, internet web servers and local relational databases. These database entries can be parsed to extract information you need. Biological sequences can be treated with the fulfilling methods of the Ruby's String class and with regular expressions. Daily tools like Blast, Fasta, Hmmer and many other software packages for biological analysis can be executed within the BioRuby script, and the results can be fully parsed to extract the portion you need. BioRuby supports major biological database formats and provides many ways for accessing them through flatfile indexing, web services etc. Various web services can be easily utilized by BioRuby. == FOR MORE INFORMATION See RELEASE_NOTES.rdoc for news and important changes in this version. === Documents in this distribution ==== Release notes, important changes and issues README.rdoc:: This file. General information and installation procedure. RELEASE_NOTES.rdoc:: News and important changes in this release. KNOWN_ISSUES.rdoc:: Known issues and bugs in BioRuby. doc/RELEASE_NOTES-*.rdoc:: Release notes for old versions. doc/Changes-1.3.rdoc:: News and incompatible changes from 1.2.1 to 1.3.0. doc/Changes-0.7.rd:: News and incompatible changes from 0.6.4 to 1.2.1. ==== Tutorials and other useful information doc/Tutorial.rd:: BioRuby Tutorial. doc/Tutorial.rd.html:: HTML version of Tutorial.rd. ==== BioRuby development ChangeLog:: History of changes. doc/ChangeLog-*:: ChangeLog for old versions. doc/ChangeLog-before-1.4.2:: changes before 1.4.2. doc/ChangeLog-before-1.3.1:: changes before 1.3.1. README_DEV.rdoc:: Describes ways to contribute to the BioRuby project, including coding styles and documentation guidelines. ==== Documents written in Japanese doc/Tutorial.rd.ja:: BioRuby Tutorial written in Japanese. doc/Tutorial.rd.ja.html:: HTML version of Tutorial.rd.ja. ==== Sample codes In sample/, There are many sample codes and demo scripts. === WWW BioRuby's official website is at http://bioruby.org/. You will find links to related resources including downloads, mailing lists, Wiki documentation etc. in the top page. * http://bioruby.org/ Mirror site is available, hosted on Open Bioinformatics Foundation (OBF). * http://bioruby.open-bio.org/ == WHERE TO OBTAIN === WWW The stable release is freely available from the BioRuby website. * http://bioruby.org/archive/ === RubyGems {RubyGems (packaging system for Ruby)}[http://rubygems.org/] version of the BioRuby package is also available for easy installation. * https://rubygems.org/gems/bio === git If you need the latest development version, this is provided at * https://github.com/bioruby/bioruby and can be obtained by the following procedure: % git clone git://github.com/bioruby/bioruby.git == REQUIREMENTS * Ruby 2.0.0 or later -- http://www.ruby-lang.org/ * Ruby 2.4.6, 2.5.5, 2.6.3 or later is recommended. * See KNOWN_ISSUES.rdoc for Ruby version specific problems. == OPTIONAL REQUIREMENTS Some optional libraries can be utilized to extend BioRuby's functionality. If your needs meets the following conditions, install them by using RubyGems, or download and install from the following web sites. Creating faster flatfile index using Berkley DB: * {GitHub:ruby-bdb}[https://github.com/knu/ruby-bdb] (which took over {bdb}[https://github.com/ruby-bdb/bdb]) (No RubyGems available) * {Oracle Berkeley DB}[http://www.oracle.com/technetwork/database/berkeleydb/index.html] and C compiler will be required. == INSTALL === INSTALL by using RubyGems (recommended) If you are using RubyGems, just type % gem install bio Alternatively, manually download bio-X.X.X.gem from http://bioruby.org/archive/ and install it by using gems command. === Running self-test To check if bioruby works fine on a machine, self-test codes are bundled. Note that some tests may need internet connection. To run tests, % ruby test/runner.rb For those familiar with Rake, % rake test also works. Before reporting test failure, please check KNOWN_ISSUES.rdoc about known platform-dependent issues. We are happy if you write patches to solve the issues. == SETUP If you want to use the OBDA (Open Bio Database Access) to obtain database entries, copy a sample configuration file in the BioRuby distribution bioruby-x.x.x/etc/bioinformatics/seqdatabase.ini to /etc/bioinformatics/seqdatabase.ini (system wide configuration) or ~/.bioinformatics/seqdatabase.ini (personal configuration) and change the contents according to your preference. For more information on the OBDA, see http://obda.open-bio.org/ . == USAGE You can load all BioRuby classes just by requiring 'bio.rb'. All the BioRuby classes and modules are located under the module name 'Bio' to separate the name space. #!/usr/bin/env ruby require 'bio' You can also read other documentation in the 'doc' directory. bioruby-x.x.x/doc/ == PLUGIN (Biogem) Many plugins (called Biogem) are now available. See http://biogems.info/ for list of plugins and related software utilizing BioRuby. * http://biogems.info/ Plugins (Biogems) listed below had been included in BioRuby in former days, and were split to separate packages to reduce complexity and external dependencies. * {bio-shell}[https://rubygems.org/gems/bio-shell] * {bio-executables}[https://rubygems.org/gems/bio-executables] * {bio-blast-xmlparser}[https://rubygems.org/gems/bio-blast-xmlparser] * {bioruby-phyloxml}[https://rubygems.org/gems/bioruby-phyloxml] * NOTE: Please uninstall bio-phyloxml, that have been created as a preliminary trial of splitting a module in 2012 and have not been maintained after that. * {bio-biosql}[https://rubygems.org/gems/bio-biosql] Plugins (Biogems) listed below may be useful for running existing codes. * {bio-old-biofetch-emulator}[https://rubygems.org/gems/bio-old-biofetch-emulator] -- Emulates deprecated BioRuby's BioFetch server by using other existing web services. To develop your own plugin, see "Plugins" pages of BioRuby Wiki. * http://bioruby.open-bio.org/wiki/Plugins === Recommended Plugins (gems) For existing BioRuby users, it is recommended to install the following gems: bio-shell :: If you use the BioRuby Shell. bio-executables :: If you use br_bio* commands. bio-old-biofetch-emulator :: If you run existing codes using BioFetch, including sample and demo codes in sample/. bio-blast-xmlparser :: If you treat BLAST XML result files and Expat XML parser (with development files) is installed in your system. bioruby-phyloxml :: If you use Bio::PhyloXML and Libxml2 (with developemnt files) is installed in your system. Note that it is NOT recommended to install bio-biosql unless you have really used Bio::SQL, because it depends on older version of ActiveRecords and ActiveSupport that may not be run on recent Ruby versions. == LICENSE BioRuby can be freely distributed under the same terms as Ruby. See the file COPYING (or COPYING.ja written in Japanese). As written in the file COPYING, see the file LEGAL for files distributed under different license. == REFERENCE If you use BioRuby in academic research, please consider citing the following publication. * BioRuby: Bioinformatics software for the Ruby programming language. Naohisa Goto, Pjotr Prins, Mitsuteru Nakao, Raoul Bonnal, Jan Aerts and Toshiaki Katayama. Bioinformatics (2010) 26(20): 2617-2619. * {doi: 10.1093/bioinformatics/btq475}[http://bioinformatics.oxfordjournals.org/content/26/20/2617] * {PMID: 20739307}[http://www.ncbi.nlm.nih.gov/pubmed/20739307] == CONTACT Current staff of the BioRuby project can be reached by sending e-mail to . bio-2.0.3/bioruby.gemspec0000644000175000017500000005622014141516614014643 0ustar nileshnilesh# This file is automatically generated from bioruby.gemspec.erb and # should NOT be edited by hand. # Gem::Specification.new do |s| s.name = 'bio' s.version = "2.0.3" s.author = "BioRuby project" s.email = "staff@bioruby.org" s.homepage = "http://bioruby.org/" s.license = "Ruby" s.summary = "Bioinformatics library" s.description = "BioRuby is a library for bioinformatics (biology + information science)." s.platform = Gem::Platform::RUBY s.files = [ ".travis.yml", "BSDL", "COPYING", "COPYING.ja", "ChangeLog", "GPL", "Gemfile", "KNOWN_ISSUES.rdoc", "LEGAL", "LGPL", "README.rdoc", "README_DEV.rdoc", "RELEASE_NOTES.rdoc", "Rakefile", "appveyor.yml", "bioruby.gemspec", "bioruby.gemspec.erb", "doc/ChangeLog-1.4.3", "doc/ChangeLog-1.5.0", "doc/ChangeLog-before-1.3.1", "doc/ChangeLog-before-1.4.2", "doc/Changes-0.7.rd", "doc/Changes-1.3.rdoc", "doc/RELEASE_NOTES-1.4.0.rdoc", "doc/RELEASE_NOTES-1.4.1.rdoc", "doc/RELEASE_NOTES-1.4.2.rdoc", "doc/RELEASE_NOTES-1.4.3.rdoc", "doc/RELEASE_NOTES-1.5.0.rdoc", "doc/Tutorial.rd", "doc/Tutorial.rd.html", "doc/Tutorial.rd.ja", "doc/Tutorial.rd.ja.html", "doc/bioruby.css", "etc/bioinformatics/seqdatabase.ini", "gemfiles/Gemfile.travis-jruby1.8", "gemfiles/Gemfile.travis-jruby1.9", "gemfiles/Gemfile.travis-rbx", "gemfiles/Gemfile.travis-ruby1.8", "gemfiles/Gemfile.travis-ruby1.9", "gemfiles/Gemfile.windows", "gemfiles/modify-Gemfile.rb", "gemfiles/prepare-gemspec.rb", "lib/bio.rb", "lib/bio/alignment.rb", "lib/bio/appl/bl2seq/report.rb", "lib/bio/appl/blast.rb", "lib/bio/appl/blast/format0.rb", "lib/bio/appl/blast/format8.rb", "lib/bio/appl/blast/genomenet.rb", "lib/bio/appl/blast/ncbioptions.rb", "lib/bio/appl/blast/remote.rb", "lib/bio/appl/blast/report.rb", "lib/bio/appl/blast/rexml.rb", "lib/bio/appl/blast/rpsblast.rb", "lib/bio/appl/blast/wublast.rb", "lib/bio/appl/blat/report.rb", "lib/bio/appl/clustalw.rb", "lib/bio/appl/clustalw/report.rb", "lib/bio/appl/emboss.rb", "lib/bio/appl/fasta.rb", "lib/bio/appl/fasta/format10.rb", "lib/bio/appl/gcg/msf.rb", "lib/bio/appl/gcg/seq.rb", "lib/bio/appl/genscan/report.rb", "lib/bio/appl/hmmer.rb", "lib/bio/appl/hmmer/report.rb", "lib/bio/appl/iprscan/report.rb", "lib/bio/appl/mafft.rb", "lib/bio/appl/mafft/report.rb", "lib/bio/appl/meme/mast.rb", "lib/bio/appl/meme/mast/report.rb", "lib/bio/appl/meme/motif.rb", "lib/bio/appl/muscle.rb", "lib/bio/appl/paml/baseml.rb", "lib/bio/appl/paml/baseml/report.rb", "lib/bio/appl/paml/codeml.rb", "lib/bio/appl/paml/codeml/rates.rb", "lib/bio/appl/paml/codeml/report.rb", "lib/bio/appl/paml/common.rb", "lib/bio/appl/paml/common_report.rb", "lib/bio/appl/paml/yn00.rb", "lib/bio/appl/paml/yn00/report.rb", "lib/bio/appl/phylip/alignment.rb", "lib/bio/appl/phylip/distance_matrix.rb", "lib/bio/appl/probcons.rb", "lib/bio/appl/psort.rb", "lib/bio/appl/psort/report.rb", "lib/bio/appl/pts1.rb", "lib/bio/appl/sim4.rb", "lib/bio/appl/sim4/report.rb", "lib/bio/appl/sosui/report.rb", "lib/bio/appl/spidey/report.rb", "lib/bio/appl/targetp/report.rb", "lib/bio/appl/tcoffee.rb", "lib/bio/appl/tmhmm/report.rb", "lib/bio/command.rb", "lib/bio/compat/features.rb", "lib/bio/compat/references.rb", "lib/bio/data/aa.rb", "lib/bio/data/codontable.rb", "lib/bio/data/na.rb", "lib/bio/db.rb", "lib/bio/db/aaindex.rb", "lib/bio/db/embl/common.rb", "lib/bio/db/embl/embl.rb", "lib/bio/db/embl/embl_to_biosequence.rb", "lib/bio/db/embl/format_embl.rb", "lib/bio/db/embl/sptr.rb", "lib/bio/db/embl/swissprot.rb", "lib/bio/db/embl/trembl.rb", "lib/bio/db/embl/uniprot.rb", "lib/bio/db/embl/uniprotkb.rb", "lib/bio/db/fantom.rb", "lib/bio/db/fasta.rb", "lib/bio/db/fasta/defline.rb", "lib/bio/db/fasta/fasta_to_biosequence.rb", "lib/bio/db/fasta/format_fasta.rb", "lib/bio/db/fasta/format_qual.rb", "lib/bio/db/fasta/qual.rb", "lib/bio/db/fasta/qual_to_biosequence.rb", "lib/bio/db/fastq.rb", "lib/bio/db/fastq/fastq_to_biosequence.rb", "lib/bio/db/fastq/format_fastq.rb", "lib/bio/db/genbank/common.rb", "lib/bio/db/genbank/ddbj.rb", "lib/bio/db/genbank/format_genbank.rb", "lib/bio/db/genbank/genbank.rb", "lib/bio/db/genbank/genbank_to_biosequence.rb", "lib/bio/db/genbank/genpept.rb", "lib/bio/db/genbank/refseq.rb", "lib/bio/db/gff.rb", "lib/bio/db/go.rb", "lib/bio/db/kegg/brite.rb", "lib/bio/db/kegg/common.rb", "lib/bio/db/kegg/compound.rb", "lib/bio/db/kegg/drug.rb", "lib/bio/db/kegg/enzyme.rb", "lib/bio/db/kegg/expression.rb", "lib/bio/db/kegg/genes.rb", "lib/bio/db/kegg/genome.rb", "lib/bio/db/kegg/glycan.rb", "lib/bio/db/kegg/keggtab.rb", "lib/bio/db/kegg/kgml.rb", "lib/bio/db/kegg/module.rb", "lib/bio/db/kegg/orthology.rb", "lib/bio/db/kegg/pathway.rb", "lib/bio/db/kegg/reaction.rb", "lib/bio/db/lasergene.rb", "lib/bio/db/litdb.rb", "lib/bio/db/medline.rb", "lib/bio/db/nbrf.rb", "lib/bio/db/newick.rb", "lib/bio/db/nexus.rb", "lib/bio/db/pdb.rb", "lib/bio/db/pdb/atom.rb", "lib/bio/db/pdb/chain.rb", "lib/bio/db/pdb/chemicalcomponent.rb", "lib/bio/db/pdb/model.rb", "lib/bio/db/pdb/pdb.rb", "lib/bio/db/pdb/residue.rb", "lib/bio/db/pdb/utils.rb", "lib/bio/db/prosite.rb", "lib/bio/db/rebase.rb", "lib/bio/db/sanger_chromatogram/abif.rb", "lib/bio/db/sanger_chromatogram/chromatogram.rb", "lib/bio/db/sanger_chromatogram/chromatogram_to_biosequence.rb", "lib/bio/db/sanger_chromatogram/scf.rb", "lib/bio/db/soft.rb", "lib/bio/db/transfac.rb", "lib/bio/feature.rb", "lib/bio/io/das.rb", "lib/bio/io/fastacmd.rb", "lib/bio/io/fetch.rb", "lib/bio/io/flatfile.rb", "lib/bio/io/flatfile/autodetection.rb", "lib/bio/io/flatfile/bdb.rb", "lib/bio/io/flatfile/buffer.rb", "lib/bio/io/flatfile/index.rb", "lib/bio/io/flatfile/indexer.rb", "lib/bio/io/flatfile/splitter.rb", "lib/bio/io/hinv.rb", "lib/bio/io/ncbirest.rb", "lib/bio/io/pubmed.rb", "lib/bio/io/registry.rb", "lib/bio/io/togows.rb", "lib/bio/location.rb", "lib/bio/map.rb", "lib/bio/pathway.rb", "lib/bio/reference.rb", "lib/bio/sequence.rb", "lib/bio/sequence/aa.rb", "lib/bio/sequence/adapter.rb", "lib/bio/sequence/common.rb", "lib/bio/sequence/compat.rb", "lib/bio/sequence/dblink.rb", "lib/bio/sequence/format.rb", "lib/bio/sequence/format_raw.rb", "lib/bio/sequence/generic.rb", "lib/bio/sequence/na.rb", "lib/bio/sequence/quality_score.rb", "lib/bio/sequence/sequence_masker.rb", "lib/bio/tree.rb", "lib/bio/tree/output.rb", "lib/bio/util/color_scheme.rb", "lib/bio/util/color_scheme/buried.rb", "lib/bio/util/color_scheme/helix.rb", "lib/bio/util/color_scheme/hydropathy.rb", "lib/bio/util/color_scheme/nucleotide.rb", "lib/bio/util/color_scheme/strand.rb", "lib/bio/util/color_scheme/taylor.rb", "lib/bio/util/color_scheme/turn.rb", "lib/bio/util/color_scheme/zappo.rb", "lib/bio/util/contingency_table.rb", "lib/bio/util/restriction_enzyme.rb", "lib/bio/util/restriction_enzyme/analysis.rb", "lib/bio/util/restriction_enzyme/analysis_basic.rb", "lib/bio/util/restriction_enzyme/cut_symbol.rb", "lib/bio/util/restriction_enzyme/dense_int_array.rb", "lib/bio/util/restriction_enzyme/double_stranded.rb", "lib/bio/util/restriction_enzyme/double_stranded/aligned_strands.rb", "lib/bio/util/restriction_enzyme/double_stranded/cut_location_pair.rb", "lib/bio/util/restriction_enzyme/double_stranded/cut_location_pair_in_enzyme_notation.rb", "lib/bio/util/restriction_enzyme/double_stranded/cut_locations.rb", "lib/bio/util/restriction_enzyme/double_stranded/cut_locations_in_enzyme_notation.rb", "lib/bio/util/restriction_enzyme/enzymes.yaml", "lib/bio/util/restriction_enzyme/range/cut_range.rb", "lib/bio/util/restriction_enzyme/range/cut_ranges.rb", "lib/bio/util/restriction_enzyme/range/horizontal_cut_range.rb", "lib/bio/util/restriction_enzyme/range/sequence_range.rb", "lib/bio/util/restriction_enzyme/range/sequence_range/calculated_cuts.rb", "lib/bio/util/restriction_enzyme/range/sequence_range/fragment.rb", "lib/bio/util/restriction_enzyme/range/sequence_range/fragments.rb", "lib/bio/util/restriction_enzyme/range/vertical_cut_range.rb", "lib/bio/util/restriction_enzyme/single_strand.rb", "lib/bio/util/restriction_enzyme/single_strand/cut_locations_in_enzyme_notation.rb", "lib/bio/util/restriction_enzyme/single_strand_complement.rb", "lib/bio/util/restriction_enzyme/sorted_num_array.rb", "lib/bio/util/restriction_enzyme/string_formatting.rb", "lib/bio/util/sirna.rb", "lib/bio/version.rb", "sample/any2fasta.rb", "sample/benchmark_clustalw_report.rb", "sample/biofetch.rb", "sample/color_scheme_aa.rb", "sample/color_scheme_na.rb", "sample/demo_aaindex.rb", "sample/demo_aminoacid.rb", "sample/demo_bl2seq_report.rb", "sample/demo_blast_report.rb", "sample/demo_codontable.rb", "sample/demo_das.rb", "sample/demo_fasta_remote.rb", "sample/demo_fastaformat.rb", "sample/demo_genbank.rb", "sample/demo_genscan_report.rb", "sample/demo_gff1.rb", "sample/demo_go.rb", "sample/demo_hmmer_report.rb", "sample/demo_kegg_compound.rb", "sample/demo_kegg_drug.rb", "sample/demo_kegg_genome.rb", "sample/demo_kegg_glycan.rb", "sample/demo_kegg_orthology.rb", "sample/demo_kegg_reaction.rb", "sample/demo_litdb.rb", "sample/demo_locations.rb", "sample/demo_ncbi_rest.rb", "sample/demo_nucleicacid.rb", "sample/demo_pathway.rb", "sample/demo_prosite.rb", "sample/demo_psort.rb", "sample/demo_psort_report.rb", "sample/demo_pubmed.rb", "sample/demo_sequence.rb", "sample/demo_sirna.rb", "sample/demo_sosui_report.rb", "sample/demo_targetp_report.rb", "sample/demo_tmhmm_report.rb", "sample/enzymes.rb", "sample/fasta2tab.rb", "sample/fastagrep.rb", "sample/fastasort.rb", "sample/fastq2html.cwl", "sample/fastq2html.rb", "sample/fastq2html.testdata.yaml", "sample/fsplit.rb", "sample/gb2fasta.rb", "sample/gb2tab.rb", "sample/gbtab2mysql.rb", "sample/genes2nuc.rb", "sample/genes2pep.rb", "sample/genes2tab.rb", "sample/genome2rb.rb", "sample/genome2tab.rb", "sample/goslim.rb", "sample/gt2fasta.rb", "sample/na2aa.cwl", "sample/na2aa.rb", "sample/na2aa.testdata.yaml", "sample/pmfetch.rb", "sample/pmsearch.rb", "sample/rev_comp.cwl", "sample/rev_comp.rb", "sample/rev_comp.testdata.yaml", "sample/seqdatabase.ini", "sample/ssearch2tab.rb", "sample/tdiary.rb", "sample/test_restriction_enzyme_long.rb", "sample/tfastx2tab.rb", "sample/vs-genes.rb", "test/bioruby_test_helper.rb", "test/data/HMMER/hmmpfam.out", "test/data/HMMER/hmmsearch.out", "test/data/KEGG/1.1.1.1.enzyme", "test/data/KEGG/C00025.compound", "test/data/KEGG/D00063.drug", "test/data/KEGG/G00024.glycan", "test/data/KEGG/G01366.glycan", "test/data/KEGG/K02338.orthology", "test/data/KEGG/M00118.module", "test/data/KEGG/R00006.reaction", "test/data/KEGG/T00005.genome", "test/data/KEGG/T00070.genome", "test/data/KEGG/b0529.gene", "test/data/KEGG/ec00072.pathway", "test/data/KEGG/hsa00790.pathway", "test/data/KEGG/ko00312.pathway", "test/data/KEGG/map00030.pathway", "test/data/KEGG/map00052.pathway", "test/data/KEGG/rn00250.pathway", "test/data/KEGG/test.kgml", "test/data/SOSUI/sample.report", "test/data/TMHMM/sample.report", "test/data/aaindex/DAYM780301", "test/data/aaindex/PRAM900102", "test/data/bl2seq/cd8a_cd8b_blastp.bl2seq", "test/data/bl2seq/cd8a_p53_e-5blastp.bl2seq", "test/data/blast/2.2.15.blastp.m7", "test/data/blast/b0002.faa", "test/data/blast/b0002.faa.m0", "test/data/blast/b0002.faa.m7", "test/data/blast/b0002.faa.m8", "test/data/blast/blastp-multi.m7", "test/data/clustalw/example1-seqnos.aln", "test/data/clustalw/example1.aln", "test/data/command/echoarg2.bat", "test/data/command/echoarg2.sh", "test/data/embl/AB090716.embl", "test/data/embl/AB090716.embl.rel89", "test/data/fasta/EFTU_BACSU.fasta", "test/data/fasta/example1.txt", "test/data/fasta/example2.txt", "test/data/fastq/README.txt", "test/data/fastq/error_diff_ids.fastq", "test/data/fastq/error_double_qual.fastq", "test/data/fastq/error_double_seq.fastq", "test/data/fastq/error_long_qual.fastq", "test/data/fastq/error_no_qual.fastq", "test/data/fastq/error_qual_del.fastq", "test/data/fastq/error_qual_escape.fastq", "test/data/fastq/error_qual_null.fastq", "test/data/fastq/error_qual_space.fastq", "test/data/fastq/error_qual_tab.fastq", "test/data/fastq/error_qual_unit_sep.fastq", "test/data/fastq/error_qual_vtab.fastq", "test/data/fastq/error_short_qual.fastq", "test/data/fastq/error_spaces.fastq", "test/data/fastq/error_tabs.fastq", "test/data/fastq/error_trunc_at_plus.fastq", "test/data/fastq/error_trunc_at_qual.fastq", "test/data/fastq/error_trunc_at_seq.fastq", "test/data/fastq/error_trunc_in_plus.fastq", "test/data/fastq/error_trunc_in_qual.fastq", "test/data/fastq/error_trunc_in_seq.fastq", "test/data/fastq/error_trunc_in_title.fastq", "test/data/fastq/illumina_full_range_as_illumina.fastq", "test/data/fastq/illumina_full_range_as_sanger.fastq", "test/data/fastq/illumina_full_range_as_solexa.fastq", "test/data/fastq/illumina_full_range_original_illumina.fastq", "test/data/fastq/longreads_as_illumina.fastq", "test/data/fastq/longreads_as_sanger.fastq", "test/data/fastq/longreads_as_solexa.fastq", "test/data/fastq/longreads_original_sanger.fastq", "test/data/fastq/misc_dna_as_illumina.fastq", "test/data/fastq/misc_dna_as_sanger.fastq", "test/data/fastq/misc_dna_as_solexa.fastq", "test/data/fastq/misc_dna_original_sanger.fastq", "test/data/fastq/misc_rna_as_illumina.fastq", "test/data/fastq/misc_rna_as_sanger.fastq", "test/data/fastq/misc_rna_as_solexa.fastq", "test/data/fastq/misc_rna_original_sanger.fastq", "test/data/fastq/sanger_full_range_as_illumina.fastq", "test/data/fastq/sanger_full_range_as_sanger.fastq", "test/data/fastq/sanger_full_range_as_solexa.fastq", "test/data/fastq/sanger_full_range_original_sanger.fastq", "test/data/fastq/solexa_full_range_as_illumina.fastq", "test/data/fastq/solexa_full_range_as_sanger.fastq", "test/data/fastq/solexa_full_range_as_solexa.fastq", "test/data/fastq/solexa_full_range_original_solexa.fastq", "test/data/fastq/wrapping_as_illumina.fastq", "test/data/fastq/wrapping_as_sanger.fastq", "test/data/fastq/wrapping_as_solexa.fastq", "test/data/fastq/wrapping_original_sanger.fastq", "test/data/gcg/pileup-aa.msf", "test/data/genbank/CAA35997.gp", "test/data/genbank/SCU49845.gb", "test/data/genscan/sample.report", "test/data/go/selected_component.ontology", "test/data/go/selected_gene_association.sgd", "test/data/go/selected_wikipedia2go", "test/data/iprscan/merged.raw", "test/data/iprscan/merged.txt", "test/data/litdb/1717226.litdb", "test/data/medline/20146148_modified.medline", "test/data/meme/db", "test/data/meme/mast", "test/data/meme/mast.out", "test/data/meme/meme.out", "test/data/paml/codeml/control_file.txt", "test/data/paml/codeml/models/aa.aln", "test/data/paml/codeml/models/aa.dnd", "test/data/paml/codeml/models/aa.ph", "test/data/paml/codeml/models/alignment.phy", "test/data/paml/codeml/models/results0-3.txt", "test/data/paml/codeml/models/results7-8.txt", "test/data/paml/codeml/output.txt", "test/data/paml/codeml/rates", "test/data/pir/CRAB_ANAPL.pir", "test/data/prosite/prosite.dat", "test/data/refseq/nm_126355.entret", "test/data/rpsblast/misc.rpsblast", "test/data/sanger_chromatogram/test_chromatogram_abif.ab1", "test/data/sanger_chromatogram/test_chromatogram_scf_v2.scf", "test/data/sanger_chromatogram/test_chromatogram_scf_v3.scf", "test/data/sim4/complement-A4.sim4", "test/data/sim4/simple-A4.sim4", "test/data/sim4/simple2-A4.sim4", "test/data/soft/GDS100_partial.soft", "test/data/soft/GSE3457_family_partial.soft", "test/data/uniprot/p53_human.uniprot", "test/functional/bio/sequence/test_output_embl.rb", "test/functional/bio/test_command.rb", "test/network/bio/appl/blast/test_remote.rb", "test/network/bio/appl/test_blast.rb", "test/network/bio/appl/test_pts1.rb", "test/network/bio/db/kegg/test_genes_hsa7422.rb", "test/network/bio/io/test_pubmed.rb", "test/network/bio/io/test_togows.rb", "test/network/bio/test_command.rb", "test/runner.rb", "test/unit/bio/appl/bl2seq/test_report.rb", "test/unit/bio/appl/blast/test_ncbioptions.rb", "test/unit/bio/appl/blast/test_report.rb", "test/unit/bio/appl/blast/test_rpsblast.rb", "test/unit/bio/appl/clustalw/test_report.rb", "test/unit/bio/appl/gcg/test_msf.rb", "test/unit/bio/appl/genscan/test_report.rb", "test/unit/bio/appl/hmmer/test_report.rb", "test/unit/bio/appl/iprscan/test_report.rb", "test/unit/bio/appl/mafft/test_report.rb", "test/unit/bio/appl/meme/mast/test_report.rb", "test/unit/bio/appl/meme/test_mast.rb", "test/unit/bio/appl/meme/test_motif.rb", "test/unit/bio/appl/paml/codeml/test_rates.rb", "test/unit/bio/appl/paml/codeml/test_report.rb", "test/unit/bio/appl/paml/codeml/test_report_single.rb", "test/unit/bio/appl/paml/test_codeml.rb", "test/unit/bio/appl/sim4/test_report.rb", "test/unit/bio/appl/sosui/test_report.rb", "test/unit/bio/appl/targetp/test_report.rb", "test/unit/bio/appl/test_blast.rb", "test/unit/bio/appl/test_fasta.rb", "test/unit/bio/appl/test_pts1.rb", "test/unit/bio/appl/tmhmm/test_report.rb", "test/unit/bio/data/test_aa.rb", "test/unit/bio/data/test_codontable.rb", "test/unit/bio/data/test_na.rb", "test/unit/bio/db/embl/test_common.rb", "test/unit/bio/db/embl/test_embl.rb", "test/unit/bio/db/embl/test_embl_rel89.rb", "test/unit/bio/db/embl/test_embl_to_bioseq.rb", "test/unit/bio/db/embl/test_uniprot.rb", "test/unit/bio/db/embl/test_uniprotkb.rb", "test/unit/bio/db/embl/test_uniprotkb_new_part.rb", "test/unit/bio/db/fasta/test_defline.rb", "test/unit/bio/db/fasta/test_defline_misc.rb", "test/unit/bio/db/fasta/test_format_qual.rb", "test/unit/bio/db/genbank/test_common.rb", "test/unit/bio/db/genbank/test_genbank.rb", "test/unit/bio/db/genbank/test_genpept.rb", "test/unit/bio/db/kegg/test_compound.rb", "test/unit/bio/db/kegg/test_drug.rb", "test/unit/bio/db/kegg/test_enzyme.rb", "test/unit/bio/db/kegg/test_genes.rb", "test/unit/bio/db/kegg/test_genome.rb", "test/unit/bio/db/kegg/test_glycan.rb", "test/unit/bio/db/kegg/test_kgml.rb", "test/unit/bio/db/kegg/test_module.rb", "test/unit/bio/db/kegg/test_orthology.rb", "test/unit/bio/db/kegg/test_pathway.rb", "test/unit/bio/db/kegg/test_reaction.rb", "test/unit/bio/db/pdb/test_pdb.rb", "test/unit/bio/db/sanger_chromatogram/test_abif.rb", "test/unit/bio/db/sanger_chromatogram/test_scf.rb", "test/unit/bio/db/test_aaindex.rb", "test/unit/bio/db/test_fasta.rb", "test/unit/bio/db/test_fastq.rb", "test/unit/bio/db/test_gff.rb", "test/unit/bio/db/test_go.rb", "test/unit/bio/db/test_lasergene.rb", "test/unit/bio/db/test_litdb.rb", "test/unit/bio/db/test_medline.rb", "test/unit/bio/db/test_nbrf.rb", "test/unit/bio/db/test_newick.rb", "test/unit/bio/db/test_nexus.rb", "test/unit/bio/db/test_prosite.rb", "test/unit/bio/db/test_qual.rb", "test/unit/bio/db/test_rebase.rb", "test/unit/bio/db/test_soft.rb", "test/unit/bio/io/flatfile/test_autodetection.rb", "test/unit/bio/io/flatfile/test_buffer.rb", "test/unit/bio/io/flatfile/test_splitter.rb", "test/unit/bio/io/test_fastacmd.rb", "test/unit/bio/io/test_flatfile.rb", "test/unit/bio/io/test_togows.rb", "test/unit/bio/sequence/test_aa.rb", "test/unit/bio/sequence/test_common.rb", "test/unit/bio/sequence/test_compat.rb", "test/unit/bio/sequence/test_dblink.rb", "test/unit/bio/sequence/test_na.rb", "test/unit/bio/sequence/test_quality_score.rb", "test/unit/bio/sequence/test_ruby3.rb", "test/unit/bio/sequence/test_sequence_masker.rb", "test/unit/bio/test_alignment.rb", "test/unit/bio/test_command.rb", "test/unit/bio/test_db.rb", "test/unit/bio/test_feature.rb", "test/unit/bio/test_location.rb", "test/unit/bio/test_map.rb", "test/unit/bio/test_pathway.rb", "test/unit/bio/test_reference.rb", "test/unit/bio/test_sequence.rb", "test/unit/bio/test_tree.rb", "test/unit/bio/util/restriction_enzyme/analysis/test_calculated_cuts.rb", "test/unit/bio/util/restriction_enzyme/analysis/test_cut_ranges.rb", "test/unit/bio/util/restriction_enzyme/analysis/test_sequence_range.rb", "test/unit/bio/util/restriction_enzyme/double_stranded/test_aligned_strands.rb", "test/unit/bio/util/restriction_enzyme/double_stranded/test_cut_location_pair.rb", "test/unit/bio/util/restriction_enzyme/double_stranded/test_cut_location_pair_in_enzyme_notation.rb", "test/unit/bio/util/restriction_enzyme/double_stranded/test_cut_locations.rb", "test/unit/bio/util/restriction_enzyme/double_stranded/test_cut_locations_in_enzyme_notation.rb", "test/unit/bio/util/restriction_enzyme/single_strand/test_cut_locations_in_enzyme_notation.rb", "test/unit/bio/util/restriction_enzyme/test_analysis.rb", "test/unit/bio/util/restriction_enzyme/test_cut_symbol.rb", "test/unit/bio/util/restriction_enzyme/test_dense_int_array.rb", "test/unit/bio/util/restriction_enzyme/test_double_stranded.rb", "test/unit/bio/util/restriction_enzyme/test_single_strand.rb", "test/unit/bio/util/restriction_enzyme/test_single_strand_complement.rb", "test/unit/bio/util/restriction_enzyme/test_sorted_num_array.rb", "test/unit/bio/util/restriction_enzyme/test_string_formatting.rb", "test/unit/bio/util/test_color_scheme.rb", "test/unit/bio/util/test_contingency_table.rb", "test/unit/bio/util/test_restriction_enzyme.rb", "test/unit/bio/util/test_sirna.rb" ] s.extra_rdoc_files = [ "KNOWN_ISSUES.rdoc", "README.rdoc", "README_DEV.rdoc", "RELEASE_NOTES.rdoc", "doc/Changes-1.3.rdoc", "doc/RELEASE_NOTES-1.4.0.rdoc", "doc/RELEASE_NOTES-1.4.1.rdoc", "doc/RELEASE_NOTES-1.4.2.rdoc", "doc/RELEASE_NOTES-1.4.3.rdoc", "doc/RELEASE_NOTES-1.5.0.rdoc" ] s.rdoc_options << '--main' << 'README.rdoc' s.rdoc_options << '--title' << 'BioRuby API documentation' s.rdoc_options << '--exclude' << '\.yaml\z' s.rdoc_options << '--line-numbers' << '--inline-source' s.require_path = 'lib' end bio-2.0.3/COPYING0000644000175000017500000000471114141516614012654 0ustar nileshnileshBioRuby is copyrighted free software by Toshiaki Katayama . You can redistribute it and/or modify it under either the terms of the 2-clause BSDL (see the file BSDL), or the conditions below: 1. You may make and give away verbatim copies of the source form of the software without restriction, provided that you duplicate all of the original copyright notices and associated disclaimers. 2. You may modify your copy of the software in any way, provided that you do at least ONE of the following: a) place your modifications in the Public Domain or otherwise make them Freely Available, such as by posting said modifications to Usenet or an equivalent medium, or by allowing the author to include your modifications in the software. b) use the modified software only within your corporation or organization. c) give non-standard binaries non-standard names, with instructions on where to get the original software distribution. d) make other distribution arrangements with the author. 3. You may distribute the software in object code or binary form, provided that you do at least ONE of the following: a) distribute the binaries and library files of the software, together with instructions (in the manual page or equivalent) on where to get the original distribution. b) accompany the distribution with the machine-readable source of the software. c) give non-standard binaries non-standard names, with instructions on where to get the original software distribution. d) make other distribution arrangements with the author. 4. You may modify and include the part of the software into any other software (possibly commercial). But some files in the distribution are not written by the author, so that they are not under these terms. For the list of those files and their copying conditions, see the file LEGAL. 5. The scripts and library files supplied as input to or produced as output from the software do not automatically fall under the copyright of the software, but belong to whomever generated them, and may be sold commercially, and may be aggregated with this software. 6. THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. bio-2.0.3/Rakefile0000644000175000017500000001761614141516614013276 0ustar nileshnilesh# # = Rakefile - helper of developement and packaging # # Copyright:: Copyright (C) 2009, 2012 Naohisa Goto # License:: The Ruby License # require 'rubygems' require 'erb' require 'pathname' require 'fileutils' require 'tmpdir' require 'rake/testtask' require 'rake/packagetask' begin require 'rubygems/package_task' rescue LoadError # old RubyGems/Rake version require 'rake/gempackagetask' end begin require 'rdoc/task' rescue LoadError # old RDoc/Rake version require 'rake/rdoctask' end # workaround for new module name unless defined? Rake::GemPackageTask then Rake::GemPackageTask = Gem::PackageTask end load "./lib/bio/version.rb" BIO_VERSION_RB_LOADED = true # Version string for tar.gz, tar.bz2, or zip archive. # If nil, use the value in lib/bio.rb # Note that gem version is always determined from bioruby.gemspec.erb. version = ENV['BIORUBY_VERSION'] || Bio::BIORUBY_VERSION.join(".") version = nil if version.to_s.empty? extraversion = ENV['BIORUBY_EXTRA_VERSION'] || Bio::BIORUBY_EXTRA_VERSION extraversion = nil if extraversion.to_s.empty? BIORUBY_VERSION = version BIORUBY_EXTRA_VERSION = extraversion task :default => "see-env" Rake::TestTask.new do |t| t.test_files = FileList["test/{unit,functional}/**/test_*.rb"] end Rake::TestTask.new do |t| t.name = :"test-all" t.test_files = FileList["test/{unit,functional,network}/**/test_*.rb"] end Rake::TestTask.new do |t| t.name = :"test-network" t.test_files = FileList["test/network/**/test_*.rb"] end # files not included in gem but included in tar archive tar_additional_files = [] GEM_SPEC_FILE = "bioruby.gemspec" GEM_SPEC_TEMPLATE_FILE = "bioruby.gemspec.erb" # gets gem spec string current_gem_spec_string = File.read(GEM_SPEC_FILE) rescue nil next_gem_spec_string = File.open(GEM_SPEC_TEMPLATE_FILE, "rb") do |f| ERB.new(f.read).result end # gets gem spec object current_spec = eval(current_gem_spec_string || '') next_spec = eval(next_gem_spec_string) spec = (current_spec || next_spec) # adds notice of automatically generated file next_gem_spec_string = "# This file is automatically generated from #{GEM_SPEC_TEMPLATE_FILE} and\n# should NOT be edited by hand.\n# \n" + next_gem_spec_string # compares current gemspec file and newly generated gemspec string if current_gem_spec_string && current_gem_spec_string != next_gem_spec_string then #Rake::Task[GEM_SPEC_FILE].invoke flag_update_gemspec = true else flag_update_gemspec = false end desc "Update gem spec file" task :gemspec => GEM_SPEC_FILE desc "Force update gem spec file" task :regemspec do #rm GEM_SPEC_FILE, :force => true Rake::Task[GEM_SPEC_FILE].execute(nil) end desc "Update #{GEM_SPEC_FILE}" file GEM_SPEC_FILE => [ GEM_SPEC_TEMPLATE_FILE, 'Rakefile', 'lib/bio/version.rb' ] do |t| puts "creates #{GEM_SPEC_FILE}" File.open(t.name, 'wb') do |w| w.print next_gem_spec_string end end task :package => [ GEM_SPEC_FILE ] do Rake::Task[:regemspec].invoke if flag_update_gemspec end pkg_dir = "pkg" tar_version = (BIORUBY_VERSION || spec.version) + BIORUBY_EXTRA_VERSION.to_s tar_basename = "bioruby-#{tar_version}" tar_filename = "#{tar_basename}.tar.gz" tar_pkg_filepath = File.join(pkg_dir, tar_filename) gem_filename = spec.full_name + ".gem" gem_pkg_filepath = File.join(pkg_dir, gem_filename) Rake::PackageTask.new("bioruby") do |pkg| #pkg.package_dir = "./pkg" pkg.need_tar_gz = true pkg.package_files.import(spec.files) pkg.package_files.include(*tar_additional_files) pkg.version = tar_version end Rake::GemPackageTask.new(spec) do |pkg| #pkg.package_dir = "./pkg" end Rake::RDocTask.new do |r| r.rdoc_dir = "rdoc" r.rdoc_files.include(*spec.extra_rdoc_files) r.rdoc_files.import(spec.files.find_all {|x| /\Alib\/.+\.rb\z/ =~ x}) #r.rdoc_files.exclude /\.yaml\z" opts = spec.rdoc_options.to_a.dup if i = opts.index('--main') then main = opts[i + 1] opts.delete_at(i) opts.delete_at(i) else main = 'README.rdoc' end r.main = main r.options = opts end # Tutorial files TUTORIAL_RD = 'doc/Tutorial.rd' TUTORIAL_RD_JA = 'doc/Tutorial.rd.ja' TUTORIAL_RD_HTML = TUTORIAL_RD + '.html' TUTORIAL_RD_JA_HTML = TUTORIAL_RD_JA + '.html' HTMLFILES_TUTORIAL = [ TUTORIAL_RD_HTML, TUTORIAL_RD_JA_HTML ] # Formatting RD to html. def rd2html(src, dst) title = File.basename(src) sh "rd2 -r rd/rd2html-lib.rb --with-css=bioruby.css --html-title=#{title} #{src} > #{dst}" end # Tutorial.rd to Tutorial.rd.html file TUTORIAL_RD_HTML => TUTORIAL_RD do |t| rd2html(t.prerequisites[0], t.name) end # Tutorial.rd.ja to Tutorial.html.ja file TUTORIAL_RD_JA_HTML => TUTORIAL_RD_JA do |t| rd2html(t.prerequisites[0], t.name) end desc "Update doc/Tutorial*.html" task :tutorial2html => HTMLFILES_TUTORIAL desc "Force update doc/Tutorial*.html" task :retutorial2html do # safe_unlink HTMLFILES_TUTORIAL HTMLFILES_TUTORIAL.each do |x| Rake::Task[x].execute(nil) end end # ChangeLog desc "Force update ChangeLog using git log" task :rechangelog do # The tag name in the command line should be changed # after releasing new version, updating ChangeLog, # and doing "git mv ChangeLog doc/ChangeLog-X.X.X". sh "git log --stat --summary 1.5.0..HEAD > ChangeLog" end # define mktmpdir if true then # Note: arg is a subset of Dir.mktmpdir def mktmpdir(prefix) ## prepare temporary directory for testing top = Pathname.new(File.join(Dir.pwd, "tmp")).cleanpath.to_s begin Dir.mkdir(top) rescue Errno::EEXIST end ## prepare working directory flag = false dirname = nil ret = nil begin 10.times do |n| # following 3 lines are copied from Ruby 1.9.3's tmpdir.rb and modified t = Time.now.strftime("%Y%m%d") path = "#{prefix}#{t}-#{$$}-#{rand(0x100000000).to_s(36)}" path << "-#{n}" if n > 0 begin dirname = File.join(top, path) flag = Dir.mkdir(dirname) break if flag rescue SystemCallError end end raise "Couldn't create a directory under #{tmp}." unless flag ret = yield(dirname) ensure FileUtils.remove_entry_secure(dirname, true) if flag and dirname end ret end #def mktmpdir ## Currently, Dir.mktmpdir isn't used Because of JRuby's behavior. elsif Dir.respond_to?(:mktmpdir) then def self.mktmpdir(*arg, &block) Dir.mktmpdir(*arg, &block) end else load "lib/bio/command.rb" def mktmpdir(*arg, &block) Bio::Command.mktmpdir(*arg, &block) end end def chdir_with_message(dir) $stderr.puts("chdir #{dir}") Dir.chdir(dir) end # run in different directory def work_in_another_directory pwd = Dir.pwd ret = false mktmpdir("bioruby") do |dirname| begin chdir_with_message(dirname) ret = yield(dirname) ensure chdir_with_message(pwd) end end ret end desc "task specified with BIORUBY_RAKE_DEFAULT_TASK (default \"test\")" task :"see-env" do t = ENV["BIORUBY_RAKE_DEFAULT_TASK"] if t then Rake::Task[t].invoke else Rake::Task[:test].invoke end end desc "test installed bioruby on system" task :"installed-test" do data_path = File.join(Dir.pwd, "test/data") test_runner = File.join(Dir.pwd, "test/runner.rb") data_path = Pathname.new(data_path).cleanpath.to_s test_runner = Pathname.new(test_runner).cleanpath.to_s ENV["BIORUBY_TEST_DATA"] = data_path ENV["BIORUBY_TEST_LIB"] = "" ENV["BIORUBY_TEST_GEM"] = nil work_in_another_directory do |dirname| ruby("-rbio", test_runner) end end desc "test installed bioruby gem version #{spec.version.to_s}" task :"gem-test" do data_path = File.join(Dir.pwd, "test/data") test_runner = File.join(Dir.pwd, "test/runner.rb") data_path = Pathname.new(data_path).cleanpath.to_s test_runner = Pathname.new(test_runner).cleanpath.to_s ENV["BIORUBY_TEST_DATA"] = data_path ENV["BIORUBY_TEST_LIB"] = nil ENV["BIORUBY_TEST_GEM"] = spec.version.to_s work_in_another_directory do |dirname| ruby(test_runner) end end bio-2.0.3/RELEASE_NOTES.rdoc0000644000175000017500000001613414141516614014524 0ustar nileshnilesh= BioRuby 2.0.3 RELEASE NOTES Some bug fixes have been made in BioRuby 2.0.3 after the release of 2.0.2. == Bug fixes * Fix Ruby 3.0.0 Bio::Sequence::* issue. (https://github.com/bioruby/bioruby/issues/137 ) * Fix typo (https://github.com/bioruby/bioruby/pull/145 ) == Incompatible changes === Bio::Sequence::* incompatible changes since Ruby 3.0.0 Since Ruby 3.0.0, the following methods in Bio::Sequence::NA, Bio::Sequence::AA, and Bio::Sequence::Generic return or yield String instance, instead of the Bio::Sequence::* instance. * dump * scrub For details about Ruby 3.0.0 incompatible changes, see {News for Ruby 3.0.0}[https://github.com/ruby/ruby/blob/v3_0_0/NEWS.md]. = BioRuby 2.0.2 RELEASE NOTES Some bugs fixes have been made in BioRuby 2.0.2 after the release of 2.0.1. == Bug fixes * Fix NameError in Bio::Sequence#output(:embl) (https://github.com/bioruby/bioruby/issues/135 ) * Suppress warning: Gem::Specification#has_rdoc= is deprecated (https://github.com/bioruby/bioruby/issues/138 ) * Fix misspelling URL in README.rdoc == Known issues A known issue about Ruby 3.0 is added to KNOWN_ISSUES.rdoc. The issue will be fixed in the near future. = BioRuby 2.0.1 RELEASE NOTES Some bug fixes and improvements have been made to the BioRuby 2.0.1 after the version 2.0.0 is released. == Bug fixes * Bio::GFF::GFF2::Record.parse did not return correct object. == Improvement of sample scripts The following scripts in the sample/ directiry are newly added. * color_scheme_aa.rb: Example of Bio::ColorScheme for an amino acid sequence. * fastq2html.rb: Visualization of FASTQ sequences, colored by quality scores. * rev_comp.rb: Shows reverse-complement sequences of the given sequences. The floowing scripts are modified to fix bug and/or to improve features. * na2aa.rb: Completely rewritten to fix bug. Shows translated sequences. * color_scheme_na.rb: Added support for various sequence formats. === CWL (Common Workflow Language) workflow files are added CWL (Common Workflow Language) workflow files are added for some sample scripts. The usage of each sample script will be clarified with the CWL workflow files. Two type of files are prepared for CWL workflow engine. *.cwl is a workflow definition file for each sample script. *.testdata.yaml describes sample input data for each CWL workflow. In this version, cwl files for the 3 sample scripts are added. * fastq2html.rb: fastq2html.cwl with fastq2html.testdata.yaml * na2aa.rb: na2aa.cwl with na2aa.testdata.yaml * rev_comp.rb: rev_comp.cwl with rev_comp.testdata.yaml = BioRuby 2.0.0 RELEASE NOTES A lot of changes have been made to the BioRuby 2.0.0 after the version 1.5.x is released. This document describes important and/or incompatible changes since the BioRuby 1.5.0 release. For known problems, see KNOWN_ISSUES.rdoc. == Features moved to separete gems Some features are moved to separate gems because of reducing complexity and/or to avoid external library dependency of BioRuby core. === BioRuby Shell is moved to "bio-shell" BioRuby Shell is split to "bio-shell" gem. === Executable files are moved to "bio-executables" To avoid unexpected loading of executable files by some Rails software, all executable commands are moved to "bio-executables" gem (except the "bioruby" command that is included in the above "bio-shell" gem). === Fast BLAST XML result parser by using Expat XML Parser is moved to "bio-blast-xmlparser" Fast BLAST XML result parser by using Expat XML Parser is split to "bio-blast-xmlparser" gem, because of external C library dependency. Please install "bio-blast-xmlparser" gem if possible. If it is installed, BioRuby automatically use it. === Bio::PhyloXML is moved to "bioruby-phyloxml" Bio::PhyloXML is split to "bioruby-phyloxml" gem. NOTE: Please uninstall "bio-phyloxml" gem, that have been created as a preliminary trial of splitting a module in 2012 and have not been maintained after that. === Bio::SQL is moved to "bio-biosql" Bio::SQL is split to "bio-biosql" gem. == New features and improvements === HTTPS is used to access NCBI web services As you may know, NCBI announced that all HTTP resources will be switched to HTTPS on September 30, 2016. To follow the transition, all URLs for accessing NCBI E-utilities in BioRuby are changed to use HTTPS. In BioRuby, the following classes/modules are affected. * Bio::NCBI::REST and descending classes * Bio::PubMed In some rare cases (especially when building Ruby and/or OpenSSL by yourself from source code), Ruby does not include SSL/TLS support, or Ruby fails to detect SSL root certificates. In such cases, you may need to reinstall or upgrade Ruby, OpenSSL (or alternatives), and/or SSL root certificates with appropriate configuration options. Alternatively, installing binary packages is generally a good idea. === KEGG::GENES#diseases and related methods are added The following methods are added to KEGG::GENES, contributed by @kojix2. * networks_as_strings * diseases_as_strings * diseases_as_hash * diseases * drug_targets_as_strings === Pre-calculated ambiguity codon tables in Bio::CodonTable Pre-calculated ambiguity codon tables are added, contributed by Tomoaki NISHIYAMA. == Bug fixes * Fixed a parser bug in Bio::Fasta::Report, FASTA output (-m 10) parser, contributed by William Van Etten and Mark Wilkinson via GitHub. * HTTPS is used to access GenomeNet BLAST web service, contributed by @ramadis via GitHub. * Bio::AAindex documentation fix, suggested by @kojix2 via GitHub. * Suppress warning messages in Ruby 2.4 and later. == Incompatible changes === Bio::Taxonomy is removed and merged to Bio::PhyloXML::Taxonomy Bio::Taxonomy in lib/bio/db/phyloxml/phyloxml_elements.rb was written for PhyloXML in 2009. It was intended to become general taxonomy data class in BioRuby in these days. However, no efforts have been made to improve the Bio::Taxonomy class, and it still remains to be a PhyloXML specific class. Because Bio::PhyloXML is split as a different Gem (Biogem) package, we now decide to remove Bio::Taxonomy and merge it to Bio::PhyloXML::Taxonomy. In the codes using Bio::Taxonomy directly, changing Bio::Taxonomy to Bio::PhyloXML::Taxonomy or adding the following monkey patch may be needed. module Bio unless defined? Taxonomy Taxonomy = Bio::PhyloXML::Taxonomy end end In the future, Bio::Taxonomy might be added as general taxonomy data class. The new Bio::Taxonomy might be incompatible with the current Bio::Taxonmy. === Some features are moved to separete gems Some features are split to separete gems and removed from this "bio" gem. See the above "Features moved to separete gems" topics for details. == Known issues The following issues are added or updated. See KNOWN_ISSUES.rdoc for other already known issues. == Other important news === Ruby 1.8 is no longer supported Ruby 1.8.x is no longer supported. Though unsupported, some components may still run on Ruby 1.8.7. Please use Ruby 1.8.7 at your own risk with this version of BioRuby. === Installation without RubyGems is no longer supported Installation by using setup.rb without RubyGems is no longer supported, and setup.rb is no longer included in BioRuby distribution. bio-2.0.3/Gemfile0000644000175000017500000000010614141516614013106 0ustar nileshnileshsource "https://rubygems.org" gem "rake" gem "rdoc" gem "test-unit" bio-2.0.3/gemfiles/0000755000175000017500000000000014141516614013411 5ustar nileshnileshbio-2.0.3/gemfiles/Gemfile.travis-ruby1.80000644000175000017500000000015714141516614017424 0ustar nileshnileshsource "https://rubygems.org" gem "rake", "~>10.4" # rdoc 4.3.0 requires Ruby >= 1.9.3 gem "rdoc", "~>4.2.0" bio-2.0.3/gemfiles/Gemfile.windows0000644000175000017500000000010614141516614016372 0ustar nileshnileshsource "https://rubygems.org" gem "rake" gem "rdoc" gem "test-unit" bio-2.0.3/gemfiles/Gemfile.travis-jruby1.80000644000175000017500000000015714141516614017576 0ustar nileshnileshsource "https://rubygems.org" gem "rake", "~>10.4" # rdoc 4.3.0 requires Ruby >= 1.9.3 gem "rdoc", "~>4.2.0" bio-2.0.3/gemfiles/prepare-gemspec.rb0000644000175000017500000000114714141516614017020 0ustar nileshnilesh# require 'pathname' require 'fileutils' envname_default_task = 'BIORUBY_RAKE_DEFAULT_TASK' gem_dir = Pathname.new(File.join(File.dirname(__FILE__), '..')).realpath case t = ENV[envname_default_task] when 'gem-test' # do nothing else $stderr.print "#{$0}: skipped: ENV[#{envname_default_task}]=#{t.inspect}\n" exit(0) end # update bundler to avoid Bundler's bug fixed in the latest version $stderr.puts "gem update bundler" system("gem update bundler") $stderr.puts "cd #{gem_dir}" Dir.chdir(gem_dir) args = [ 'bioruby.gemspec', '.gemspec' ] $stderr.puts(['cp', *args].join(" ")) FileUtils.cp(*args) bio-2.0.3/gemfiles/modify-Gemfile.rb0000644000175000017500000000113614141516614016574 0ustar nileshnilesh# require 'pathname' envname_default_task = 'BIORUBY_RAKE_DEFAULT_TASK' gem_dir = Pathname.new(File.join(File.dirname(__FILE__), '..')).realpath case t = ENV[envname_default_task] when 'gem-test' # do nothing else $stderr.print "#{$0}: skipped: ENV[#{envname_default_task}]=#{t.inspect}\n" exit(0) end target = ENV['BUNDLE_GEMFILE'] unless target then $stderr.puts("Error: env BUNDLE_GEMFILE is not set.") end File.open(target, 'a') do |w| $stderr.puts "Add a line to #{target}" $stderr.puts "gem 'bio', :path => '#{gem_dir}'" w.puts "" w.puts "gem 'bio', :path => '#{gem_dir}'" end bio-2.0.3/gemfiles/Gemfile.travis-rbx0000644000175000017500000000020014141516614016774 0ustar nileshnileshsource "https://rubygems.org" gem "rake" gem "rdoc" platforms :rbx do gem 'racc' gem 'rubysl', '~> 2.0' gem 'psych' end bio-2.0.3/gemfiles/Gemfile.travis-ruby1.90000644000175000017500000000006614141516614017424 0ustar nileshnileshsource "https://rubygems.org" gem "rake" gem "rdoc" bio-2.0.3/gemfiles/Gemfile.travis-jruby1.90000644000175000017500000000006614141516614017576 0ustar nileshnileshsource "https://rubygems.org" gem "rake" gem "rdoc" bio-2.0.3/doc/0000755000175000017500000000000014141516614012363 5ustar nileshnileshbio-2.0.3/doc/Changes-0.7.rd0000644000175000017500000003141014141516614014563 0ustar nileshnilesh= Incompatible and important changes since the BioRuby 0.6.4 release A lot of changes have been made to the BioRuby after the version 0.6.4 is released. --- Ruby 1.6 series are no longer supported. We use autoload functionality and many standard (bundled) libraries (such as SOAP, open-uri, pp etc.) only in Ruby >1.8.2. --- BioRuby will be loaded about 30 times faster than before. As we changed to use autoload instead of require, time required to start up the BioRuby library made surprisingly faster. Other changes (including newly introduced BioRuby shell etc.) made in this series will be described in this file. == New features --- BioRuby shell A new command line user interface for the BioRuby is now included. You can invoke the shell by % bioruby --- UnitTest Test::Unit now covers wide range of the BioRuby library. You can run them by % ruby test/runner.rb or % ruby install.rb config % ruby install.rb setup % ruby install.rb test during the installation procedure. --- Documents README, README.DEV, doc/Tutorial.rd, doc/Tutorial.rd.ja etc. are updated or newly added. == Incompatible changes --- Bio::Sequence Bio::Sequence is completely refactored to be a container class for any sequence annotations. Functionalities are separated into several files under the lib/bio/sequence/ directory as * common.rb : module provides common methods for NA and AA sequences * compat.rb : methods for backward compatibility * aa.rb : Bio::Sequence::AA class * na.rb : Bio::Sequence::NA class * format.rb : module for format conversion Bio::Sequence is no longer a sub-class of String, instead, Bio::Sequence::NA and AA inherits String directly. * Bio::Sequence::NA#gc_percent returns integer instead of float * Bio::Sequence::NA#gc (was aliased to gc_percent) is removed Previously, GC% is rounded to one decimal place. However, how many digits should be left when rounding the value is not clear and as the GC% is an rough measure by its nature, we have changed to return integer part only. If you need a precise value, you can calculate it by values from the 'composition' method by your own criteria. Also, the 'gc' method is removed as the method name doesn't represent its value is ambiguous. * Bio::Sequence#blast * Bio::Sequence#fasta These two methods are removed. Use Bio::Blast and Bio::Fasta to execute BLAST and FASTA search. --- Bio::NucleicAcid Bio::NucleicAcid::Names and Bio::NucleicAcid::Weight no longer exists. Bio::NucleicAcid::Names is renamed to Bio::NucleicAcid::Data::NAMES and can be accessed by Bio::NucleicAcid#names, Bio::NucleicAcid.names methods and Bio::NucleicAcid::WEIGHT hash as the Data module is included. Bio::NucleicAcid::Weight is renamed to Bio::NucleicAcid::Data::Weight and can be accessed by Bio::NucleicAcid#weight, Bio::NucleicAcid.weight methods and Bio::NucleicAcid::WEIGHT hash as the Data module is included. --- Bio::AminoAcid Bio::AminoAcid::Names and Bio::AminoAcid::Weight no longer exists. Bio::AminoAcid::Names is renamed to Bio::AminoAcid::Data::NAMES and can be accessed by Bio::AminoAcid#names, Bio::AminoAcid.names methods and Bio::AminoAcid::WEIGHT hash as the Data module is included. Bio::AminoAcid::Weight is renamed to Bio::AminoAcid::Data::Weight and can be accessed by Bio::AminoAcid#weight, Bio::AminoAcid.weight methods and Bio::AminoAcid::WEIGHT hash as the Data module is included. --- Bio::CodonTable Bio::CodonTable::Tables, Bio::CodonTable::Definitions, Bio::CodonTable::Starts, and Bio::CodonTable::Stops are renamed to Bio::CodonTable::TABLES, Bio::CodonTable::DEFINITIONS, Bio::CodonTable::STARTS, and Bio::CodonTable::STOPS respectively. --- Bio::KEGG::Microarrays, Bio::KEGG::Microarray * lib/bio/db/kegg/microarray.rb is renamed to lib/bio/db/kegg/expression.rb * Bio::KEGG::Microarray is renamed to Bio::KEGG::EXPRESSION * Bio::KEGG::Microarrays is removed Bio::KEGG::Microarrays was intended to store a series of microarray expressions as a Hash of Array -like data structure, gene1 => [exp1, exp2, exp3, ... ] gene2 => [exp1, exp2, exp3, ... ] however, it is not utilized well and more suitable container class can be proposed. Until then, this class is removed. # # Following changes are suspended for a while (not yet introduced for now) # # --- Bio::Pathway # # * Bio::Pathway#nodes returns an Array of the node objects instead of # the number of the node objects. # * Bio::Pathway#edges returns an Array of the edge objects instead of # the number of the edge objects. # --- Bio::GenBank Bio::GenBank#gc is removed as the value can be calculated by the Bio::Sequence::NA#gc method and the method is also changed to return integer instead of float. Bio::GenBank#varnacular_name is renamed to Bio::GenBank#vernacular_name as it was a typo. --- Bio::GenBank::Common * lib/bio/db/genbank/common.rb is removed. Renamed to Bio::NCBIDB::Common to make simplify the autoload dependency. --- Bio::EMBL::Common * lib/bio/db/embl/common.rb is removed. Renamed to Bio::EMBLDB::Common to make simplify the autoload dependency. --- Bio::KEGG::GENES * lib/bio/db/kegg/genes.rb linkdb method is changed to return a Hash of an Array of entry IDs instead of a Hash of a entry ID string. --- Bio::TRANSFAC * Bio::TFMATRIX is renamed to Bio::TRANSFAC::MATRIX * Bio::TFSITE is renamed to Bio::TRANSFAC::SITE * Bio::TFFACTOR is renamed to Bio::TRANSFAC::FACTOR * Bio::TFCELL is renamed to Bio::TRANSFAC::CELL * Bio::TFCLASS is renamed to Bio::TRANSFAC::CLASS * Bio::TFGENE is renamed to Bio::TRANSFAC::GENE --- Bio::GFF * Bio::GFF2 is renamed to Bio::GFF::GFF2 * Bio::GFF3 is renamed to Bio::GFF::GFF3 --- Bio::Alignment In 0.7.0: * Old Bio::Alignment class is renamed to Bio::Alignment::OriginalAlignment. Now, new Bio::Alignment is a module. However, you don't mind so much because most of the class methods previously existed are defined to delegate to the new Bio::Alignment::OriginalAlignment class, for keeping backward compatibility. * New classes and modules are introduced. Please refer RDoc. * each_site and some methods changed to return Bio::Alignment::Site, which inherits Array (previously returned Array). * consensus_iupac now returns only standard bases 'a', 'c', 'g', 't', 'm', 'r', 'w', 's', 'y', 'k', 'v', 'h', 'd', 'b', 'n', or nil (in SiteMethods#consensus_iupac) or '?' (or missing_char, in EnumerableExtension#consensus_iupac). Note that consensus_iupac now does not return u and invalid letters not defined in IUPAC standard even if all bases are equal. * There are more and more changes to be written... In 1.1.0: * Bio::Alignment::ClustalWFormatter is removed and methods in this module are renemed and moved to Bio::Alignment::Output. --- Bio::PDB In 0.7.0: * Bio::PDB::Atom is removed. Instead, please use Bio::PDB::Record::ATOM and Bio::PDB::Record::HETATM. * Bio::PDB::FieldDef is removed and Bio::PDB::Record is completely changed. Now, records is changed from hash to Struct objects. (Note that method_missing is no longer used.) * In records, "do_parse" is now automatically called. Users don't need to call do_parse explicitly. (0.7.0 feature: "inspect" does not call do_parse.) (0.7.1 feature: "inspect" calls do_parse.) * In the "MODEL" record, model_serial is changed to serial. * In records, record_type is changed to record_name. * In most records contains real numbers, return values are changed to float instead of string. * Pdb_AChar, Pdb_Atom, Pdb_Character, Pdb_Continuation, Pdb_Date, Pdb_IDcode, Pdb_Integer, Pdb_LString, Pdb_List, Pdb_Real, Pdb_Residue_name, Pdb_SList, Pdb_Specification_list, Pdb_String, Pdb_StringRJ and Pdb_SymOP are moved under Bio::PDB::DataType. * There are more and more changes to be written... In 0.7.1: * Heterogens and HETATMs are completely separeted from residues and ATOMs. HETATMs (Bio::PDB::Record::HETATM objects) are stored in Bio::PDB::Heterogen (which inherits Bio::PDB::Residue). * Waters (resName=="HOH") are treated as normal heterogens. Model#solvents is still available but it will be deprecated. * In Bio::PDB::Chain, adding "LIGAND" to the heterogen id is no longer available. Instead, please use Chain#get_heterogen_by_id method. In addition, Bio::{PDB|PDB::Model::PDB::Chain}#heterogens, #each_heterogen, #find_heterogen, Bio::{PDB|PDB::Model::PDB::Chain::PDB::Heterogen}#hetatms, #each_hetatm, #find_hetatm methods are added. * Bio::PDB#seqres returns Bio::Sequence::NA object if the chain seems to be a nucleic acid sequence. * There are more and more changes to be written... In 1.1.0: * In Bio::PDB::ATOM#name, #resName, #iCode, and #charge, whitespaces are stripped during initializing. * In Bio::PDB::ATOM#segID, whitespaces are right-stripped during initializing. * In Bio::PDB::ATOM#element, whitespaces are left-stripped during initializing. * Bio::PDB::HETATM#name, #resName, #iCode, #charge, #segID, and #element are also subject to the above changes, because Bio::PDB::HETATM inherits Bio::PDB::ATOM. * Bio::PDB::Residue#[] and Bio::PDB::Heterogen#[] are changed to use the name field for selecting atoms, because the element field is not useful for selecting atoms and is not used in many pdb files. * Bio::PDB#record is changed to return an empty array instead of nil for a nonexistent record. --- Bio::FlatFile In 0.7.2: * Bio::FlatFile.open, Bio::FlatFile.auto and Bio::FlatFile.new are changed not to accept the last argument to specify raw mode, e.g. :raw => true, :raw => false, true or false. Instead, please use Bio::FlatFile#raw= method after creating a new object. * Now, first argument of Bio::FlatFile.open, which shall be a database class or nil, can be omitted, and you can do Bio::FlatFile.open(filename, ...). Note that Bio::FlatFile.open(dbclass, filaname, ...) is still available. * Bio::FlatFile#io is obsoleted. Please use Bio::FlatFile#to_io instead. * When reading GenBank or GenPept files, comments at the head of the file before the first "LOCUS" lines are now skipped by default. When reading other file formats, white space characters are skipped. * File format autodetection routine is completely rewritten. If it fails to determine data format which was previously determined, please report us with the data. * Internal structure is now completely changed. Codes depend on the internal structure (which is not recommended) would not work. In 1.1.0: * Bio::FlatFile#entry_start_pos and #entry_ended_pos are enabled only when Bio::FlatFile#entry_pos_flag is true. --- Bio::ClustalW, Bio::MAFFT, Bio::Sim4 In 1.1.0: * Bio::(ClustalW|MAFFT|Sim4)#option is changed to #options. * Bio::ClustalW::errorlog and Bio::(MAFFT|Sim4)#log are removed. No replacements/alternatives are available. --- Bio::ClustalW, Bio::MAFFT In 1.1.0: * Bio::(ClustalW|MAFFT)#query_align, #query_string, #query_by_filename are changed not to get second (and third, ...) arguments. * Bio::(ClustalW|MAFFT)#query, #query_string, #query_by_filename are changed not trying to guess whether given data is nucleotide or protein. * Return value of Bio::(ClustalW|MAFFT)#query with no arguments is changed. If the program exists normally (exit status is 0), returns true. Otherwise, returns false. --- Bio::MAFFT In 1.1.0: * Bio::MAFFT#output is changed to return a string of multi-fasta formmatted text instead of Array of Bio::FastaFormat objects. To get an array of Bio::FastaFormat objects, please use report.data instead. --- Bio::MAFFT::Report In 1.1.0: * Bio::MAFFT::Report#initialize is changed to get a string of multi-fasta formmatted text instead of Array. --- Bio::BLAST::Default::Report, Bio::BLAST::Default::Report::Hit, Bio::BLAST::Default::Report::HSP, Bio::BLAST::WU::Report, Bio::BLAST::WU::Report::Hit, Bio::BLAST::WU::Report::HSP In 1.1.0: * Hit#evalue, HSP#evalue, WU::Hit#pvalue, and WU::HSP#pvalue are changed to return a Float object instead of a String object. * Report#expect, Hit#bit_score, and HSP#bit_score are changed to return a Float object or nil instead of a String object or nil. * Following methods are changed to return an integer value or nil instead of a string or nil: score, percent_identity, percent_positive, percent_gaps. --- BioRuby Shell In 1.1.0: * Shell commands seq, ent, obj are renamed to getseq, getent, getobj, respectively. === Deleted files : lib/bio/db/genbank.rb : lib/bio/db/embl.rb These files are removed as we changed to use autoload. You can safely replace require 'bio/db/genbank' or require 'bio/db/embl' in your code to require 'bio' and this change will also speeds up loading time even if you only need one of the sub classes under the genbank/ or embl/ directory. : lib/bio/extend.rb This file contained some additional methods to String and Array classes. The methods added to Array are already included in Ruby itself since the version 1.8, and the methods added to String are moved to the BioRuby shell (lib/bio/shell/plugin/seq.rb). bio-2.0.3/doc/RELEASE_NOTES-1.5.0.rdoc0000644000175000017500000002363014141516614015727 0ustar nileshnilesh= BioRuby 1.5.0 RELEASE NOTES A lot of changes have been made to the BioRuby 1.5.0 after the version 1.4.3 is released. This document describes important and/or incompatible changes since the BioRuby 1.4.3 release. For known problems, see KNOWN_ISSUES.rdoc. == NEWS === Full support of Ruby 2.0.0, 2.1, and 2.2 Ruby 2.0.0, 2.1, and 2.2 are now recommended Ruby versions for running BioRuby codes. === Support of Ruby 1.8 will be stopped This release is the final BioRuby version that can be run on Ruby 1.8. === License is updated to the new Ruby's License BioRuby is distributed under the same license as Ruby's. In October 2011, Ruby's License was changed from a dual license with GPLv2 to a dual license with 2-clause BSDL. Since BioRuby 1.5.0, we have updated to the new version of Ruby's License. For details about the license, see COPYING or COPYING.ja and BSDL. In addition, please do not forget to see LEGAL for exception files that are subjected to different licenses. === Semantic Versioning will be introduced We will adopt the Semantic Versioning since the next release version, which will be BioRuby 1.5.1. This means that BioRuby 1.5.0 is NOT subject to the Semantic Versioning. == New features and improvements === New method Bio::FastaFormat#first_name Bio::FastaFormat#first_name method is added to get the first word in the definition line. This method was proposed by Ben J. Woodcroft. === Accuracy of Bio::SiRNA Accuracy of siRNA designing algorithms in Bio::SiRNA is improved, contributed by meso_cacase. === Speed up of Bio::ClustalW::Report Running speed of Bio::ClustalW::Report is optimized, contributed by Andrew Grimm. === Many warning messages are squashed Most warning messages when running ruby with "-w" option, e.g. "assigned but unused variable", "instance variable @xxx not initialized", are suppressed. Fixes are contributed by Kenichi Kamiya, Andrew Grimm, and BioRuby core members. === Refactoring of codes Many existing codes are reviewed and refactored. Patches are contributed by Iain Barnett, Kenichi Kamiya, and BioRuby core members. == Bug fixes === Bugs due to remote server changes ==== Bio::PubMed Bio::PubMed#search, query, and pmfetch are re-implemented by using NCBI E-Utilities. They were broken because unofficial API was used. Paul Leader reported the bug and gave discussion. ==== Bio::Hinv Bio::Hinv did not work because of the API server URL is changed. ==== Bio::TogoWS::REST * Bio::TogoWS::REST#search with offset and limit did not work due to TogoWS server spec change about URI escape. * Bio::TogoWS::REST#convert did not work because of the spec change of TogoWS REST API. === Bio::Fetch Bio::Fetch with default parameters did not work because BioRuby's default BioFetch server had been down. We have decided not to restore the service. For smooth migration of codes using BioRuby's BioFetch server, we provide "bio-old-biofetch-emulator" gem. See below "Imcompatible changes" for details. The bug was reported and discussed by Jose Irizarry, Robert A. Heiler, and others. === BioSQL * Only do gsub on bio_ref.reference.authors if it exists. * Missing require when generating genbank output for BioSQL sequence. Contributed by Brynjar Smari Bjarnason. === Bugs found in data format parsers * Bio::PDB#seqres SEQRES serNum digits were extended in PDB v3.2 (2008). Thanks to a researcher who sent the patch. * Bio::Blast::Default::Report parse error when subject sequence contains spaces. Edward Rice reported the bug. * Bio::UniProtKB#gene_name raised NoMethodError when gene_names method returns nil. It should return nil. Candidate fix sent by Jose Irizarry. * Bio::PhyloXML::Parser.open_uri did not return block return value when giving a block. === Other bugs * lib/bio/shell/plugin/seq.rb: String#step and #skip (extended by bioruby shell) did not yield the last part of the string due to a change from Ruby 1.8 to 1.9. * Documentation and typo fixes. Contributed by many persons, including Iain Barnett and ctSkennerton. == Renamed features === Bio::SPTR, Bio::UniProt, Bio::SwissProt, and Bio::TrEMBL => Bio::UniProtKB The classes for parsing UniProtKB (former SwissProt and TrEMBL) data, Bio::SPTR, Bio::UniProt, Bio::SwissProt, and Bio::TrEMBL, are unified into the new class name Bio::UniProtKB, and old names are deprecated. For keeping backward compatibility, old class names Bio::SPTR, Bio::UniProt, Bio::SwissProt, and Bio::TrEMBL are still available, but warning message will be shown when using the old class names. These old class names will be deleted in the future. The file contatining Bio::UniProtKB class definition is also changed to lib/bio/db/embl/uniprotkb.rb. For keeping backward compatibility, old files sptr.rb, uniprot.rb, tremble.rb are still kept, but they will be removed in the future. == Deprecated features === Bio::RefSeq, Bio::DDBJ Bio::RefSeq and Bio::DDBJ are deprecated because they are only an alias of Bio::GenBank. Warning message will be shown when loading the classes and initializing new instances. Please use Bio::GenBank instead. lib/bio/db/genbank/ddbj.rb and lib/bio/db/genbank/refseq.rb which contain Bio::DDBJ and Bio::RefSeq, respectively, are also deprecated. For keeping backward compatibility, old file are still kept, but they will be removed in the future. == Removed features === Bio::SOAPWSDL Bio::SOAPWSDL (lib/bio/io/soapwsdl.rb) is removed because SOAP4R (SOAP/WSDL library in Ruby) is no longer bundled with Ruby since Ruby 1.9. For Ruby 1.9 or later, some gems of SOAP4R are available, but we think they are not well-maintained. Moreover, many SOAP servers have been retired (see below). So, we give up maintaining Bio::SOAPWSDL and all SOAP client classes in BioRuby. === Bio::EBI::SOAP Bio::EBI::SOAP (lib/bio/io/ebisoap.rb) is removed because Bio::SOAPWSDL is removed. === Bio::KEGG::API Bio::KEGG::API is removed because KEGG API SOAP service was discontinued in December 31, 2012. See http://www.kegg.jp/kegg/rest/ for the announcement of service discontinuation. === Bio::DBGET Bio::DBGET is removed because it only supports old original DBGET protocols that was discontinued in 2004. Note that the DBGET is still available via the web. See http://www.genome.jp/en/gn_dbget.html for details. === Bio::Ensembl Bio::Ensembl is removed because it does not work after the renewal of Ensembl web site in 2008. Instead, bio-ensembl gem which supports recent ensembl API is available. === Bio::DDBJ::XML, Bio::DDBJ::REST Bio::DDBJ::XML and Bio::DDBJ::REST are removed because DDBJ Web API (WABI) web services were suspended in 2012 and then they were completely renewed with incompatible APIs in 2013. === Bio::HGC::HiGet Bio::HGC::HiGet (lib/bio/io/higet.rb) is removed because the HiGet web server http://higet.hgc.jp/ have been down since 2011, and we think that the server will not be restored again. === Bio::NCBI::SOAP Bio::NCBI::SOAP is removed because it always raises error during the parsing of WSDL files provided by NCBI. In addition, NCBI announced that the SOAP web service for the E-utilities will be terminated on July 1, 2015. Instead, Bio::NCBI::REST, REST client for the NCBI E-utility web service, is available. === Bio::KEGG::Taxonomy Bio::KEGG::Taxonomy is removed because it does not work correctly. It raises error, it falls into infinite loop, or it returns apparently broken data. Moreover, KEGG closed public FTP site and the file "taxonomy" could only be obtained by paid subscribers. === Bio.method_missing Bio.method_missing, that aims to provide shortcuts of Bio::Shell methods with shorter name without typing "Shell", is removed because most of the methods raises error mainly due to bypassing of initialization procedure. In addition, we now think that the use of method_missing should generally be avoid unless it is really necessary. === extconf.rb extconf.rb, an alternative way to install BioRuby to the system, is removed because of avoiding potential confusions. Nowadays, extconf.rb is usually only used for building native extensions, but no native extensions are included in this release. Use gem or setup.rb to install BioRuby. == Incompatible changes Also see the above "Renamed features", "Deprecated features", and "Removed features" sections. === Bio::Fetch The BioRuby default BioFetch server http://bioruby.org/cgi-bin/biofetch.rb, that was the default server for Bio::Fetch before BioRuby 1.4, is deprecated. Due to the service stop, default server URL in Bio::Fetch is removed, and we decide not to give any server URL by default for Bio::Fetch. As an alternative, new class Bio::Fetch::EBI which uses the EBI Dbfetch server is added. When changing codes form Bio::Fetch to Bio::Fetch::EBI, be careful of the differences of database names, default and available data formats between the former BioRuby BioFetch server and the EBI Dbfetch server. Methods directly affected are: * Bio::Fetch.new (Bio::Fetch#initialize) does not have default server URL, and URL of a server must always be explicitly given as the first argument. * Bio::Fetch.query is removed. For the purpose running old codes, it is recommended to install bio-old-biofetch-emulator gem. The bio-old-biofetch-emulator gem emulates old BioRuby's default BioFetch server by using other existing web services. See https://rubygems.org/gems/bio-old-biofetch-emulator for details. We think many codes can run with no changes by simply installing the gem and adding "-r bio-old-biofetch-emulator" into the command-line when executing ruby. == Known issues The following issues are added or updated. See KNOWN_ISSUES.rdoc for other already known issues. === Bio::PDB Bio::PDB should be updated to follow PDB format version 3.3. === Bio::Blast::Report NCBI announces that that they are makeing a new version of BLAST XML data format. BioRuby should support it. === Bio::Blast::Default::Report Bio::Blast::Default::Report currently supports legacy BLAST only. It may be better to support BLAST+ text output format, although NCBI do not recommend to do so because the format is unstable. bio-2.0.3/doc/RELEASE_NOTES-1.4.2.rdoc0000644000175000017500000001224014141516614015723 0ustar nileshnilesh= BioRuby 1.4.2 RELEASE NOTES A lot of changes have been made to the BioRuby 1.4.2 after the version 1.4.1 is released. This document describes important and/or incompatible changes since the BioRuby 1.4.1 release. For known problems, see KNOWN_ISSUES.rdoc. == New features === Speed-up of Bio::RestrictionEnzyme::Analysis.cut The running speed of Bio::RestrictionEnzyme::Analysis.cut is significantly increased. The new code is 50 to 80 fold faster than the previous code when cutting 1Mbp sequence running on Ruby 1.9.2p180. The code is written by Tomoaki NISHIYAMA and Naohisa Goto. === New classes Bio::DDBJ::REST, REST interface for DDBJ web service For DDBJ Web API for Biology (WABI) web service, in additon to SOAP, REST (REpresentational State Transfer) interface is added as Bio::DDBJ::REST. Currently, only selected APIs are implemented. === Bio::Blast with remote DDBJ server uses REST instead of SOAP Bio::Blast with remote DDBJ server uses REST instead of SOAP, because Soap4r (SOAP library for Ruby) does not work well with Ruby 1.9. We can now use remote DDBJ BLAST server with Ruby 1.9. === Tutorial is updated The Tutorial.rd is updated by Pjotr Prins and Michael O'Keefe. === Many unit tests are added Added many unit tests for Bio::GenBank, Bio::GenPept, Bio::NBRF, Bio::PDB and so on. Most of them are developed by Kazuhiro Hayashi during the Google Summer of Code 2010. === Other new features * New method Bio::Fastq#to_s for convenience. Note that the use of the method may cause loss of performance. To get each input sequence entry as-is, consider using Bio::FlatFile#entry_raw. To output fastq format data, consider using Bio::Sequence#output(:fastq). * New methods Bio::NCBI::REST::EFetch.nucleotide and protein, to get data from "nucleotide" and "protein" database respectively. Because NCBI changed not to accept "gb" format for the database "sequence", the two new methods are added for convenience. * In BioRuby Shell, efetch method uses the above new methods. * In GenomeNet remote BLAST execution, database "mine-aa" and "mine-nt" with KEGG organism codes are now supported. * Support for Ruby 1.9.2 / 1.9.3 is improved. == Bug fixes === Bio::Blast * Failure of remote BLAST execution is fixed, due to the changes in GenomeNet and DDBJ. * When executing remote BLAST with "genomenet" server, options "-b" and "-v" are now correctly used to limit the number of hits to be reported. === Bio::SPTR (Bio::UniProt) * Due to the UniProtKB format changes, ID, DE, and WEB RESOURCE of CC lines were not correctly parsed. See also below about incompatible change of the fix. === Other bug fixes * Bio::Reference#pubmed_url is updated to follow recent NCBI changes. * Fixed: Bio::Newick#reparse failure. * Fixed: In Bio::MEDLINE#reference, doi field should be filled. * Fixed: Bio::Reference#endnote fails when url is not set. * Fixed: Bio::FastaFormat#query passes nil to the given factory object. * Fixed: In BioRuby Shell, efetch() with no additional arguments fails because of the NCBI site changes. * Fixed: In BioRuby Shell, getent() fails when EMBOSS seqret is not found. * Fixed: In BioRuby Shell, demo() fails due to the above two issues. == Incompatible changes === Bio::Sequence#output(:fastq) In Fastq output formatter, default width value is changed from 70 to nil. The nil means "without wrapping". The new default behavior without wrapping is generally good with many recent applications that read fastq. === Bio::SPTR CC line topic "WEB RESOURCE" In the return value of Bio::SPTR#cc('WEB RESOURCE'), "NAME" and "NOTE" are now renamed to "Name" and "Note", respectively. The change is due to the UniProt format change since UniProtKB release 12.2 of 11-Sep-2007. (See http://www.uniprot.org/docs/sp_news.htm#rel12.2 for details.) Note that "Name" and "Note" are used even when parsing older format. The change would also affect Marshal.dump (and YAML.dump) data. === Bio::Blast with the remote GenomeNet server When executing remote BLAST with "genomenet" server, options "-b" and "-v" are now correctly used to limit the number of hits to be reported. In 1.4.1 and before, "-B" and "-V" were mistakenly used for the purpose. === Bio::Blast with the remote DDBJ server Bio::Blast with remote DDBJ server uses REST instead of SOAP. === Bio::RestrictionEnzyme internal data structure change Due to the speedup, internal data structure of the following classes are changed: Bio::RestrictionEnzyme::Range::SequenceRange, Bio::RestrictionEnzyme::Range::SequenceRange::CalculatedCuts, Bio::RestrictionEnzyme::Range::SequenceRange::Fragment. This indicates that Marshal.dump (and YAML.dump) data generated by older versions cannot be loaded by the new version, and vice versa, although public APIs of the classes keep compatibility. == Known issues The following issues are added or updated. See KNOWN_ISSUES.rdoc for other already known issues. * Bio::SPTR should be updated to follow UniProtKB format changes. * Problems observed only with Ruby 1.8.5 or earlier will not be fixed. * Descriptions about very old RubyGems 0.8.11 or earlier and about CVS repository are moved from README.rdoc. == Other important news * Required ruby version is now Ruby 1.8.6 or later (except 1.9.0). bio-2.0.3/doc/RELEASE_NOTES-1.4.0.rdoc0000644000175000017500000001422414141516614015725 0ustar nileshnilesh= BioRuby 1.4.0 RELEASE NOTES A lot of changes have been made to the BioRuby 1.4.0 after the version 1.3.1 is released. This document describes important and/or incompatible changes since the BioRuby 1.3.1 release. == New features === PhyloXML support Support for reading and writing PhyloXML file format is added. New classes Bio::PhyloXML::Parser and Bio::PhyloXML::Writer are used to read and write a PhyloXML file, respectively. The code is developed by Diana Jaunzeikare, mentored by Christian M Zmasek and co-mentors, supported by Google Summer of Code 2009 in collaboration with the National Evolutionary Synthesis Center (NESCent). === FASTQ file format support Support for reading and writing FASTQ file format is added. All of the three FASTQ format variants are supported. To read a FASTQ file, Bio::FlatFile can be used. File format auto-detection of the FASTQ format is supported (although the three format variants should be specified later by users if quality scores are needed). New class Bio::Fastq is the parser class for the FASTQ format. An object of the Bio::Fastq class can be converted to a Bio::Sequence object with the "to_biosequnece" method. Bio::Sequence#output now supports output of the FASTQ format. The code is written by Naohisa Goto, with the help of discussions in the open-bio-l mailing list. The prototype of Bio::Fastq class was first developed during the BioHackathon 2009 held in Okinawa. === DNA chromatogram support Support for reading DNA chromatogram files are added. SCF and and ABIF file formats are supported. The code is developed by Anthony Underwood. === MEME (motif-based sequence analysis tools) support Support for running MAST (Motif Aliginment & Search Tool, part of the MEME Suite, motif-based sequence analysis tools) and parsing its results are added. The code is developed by Adam Kraut. === Improvement of KEGG parser classes Some new methods are added to parse new fields added to some KEGG file formats. Unit tests for KEGG parsers are also added and improved. In addition, return value types of some methods are also changed for unifying APIs among KEGG parser classes. See incompatible changes below for details. Most of them are contributed by Kozo Nishida. === Many sample scripts are added Many sample scripts showing demonstrations of usages of classes are added. They are moved from primitive test codes for the classes described in the "if __FILE__ == $0" convention in the library files. === Unit tests can test installed BioRuby Mechanism to load library and to find test data in the unit tests are changed, and the library path and test data path can be specified with environment variables. BIORUBY_TEST_LIB is the path to be added to the Ruby's $LOAD_PATH. For example, to test BioRuby installed in /usr/local/lib/site_ruby/1.8, run env BIORUBY_TEST_LIB=/usr/local/lib/site_ruby/1.8 ruby test/runner.rb BIORUBY_TEST_DATA is the path of the test data, and BIORUBY_TEST_DEBUG is a flag to turn on debug of the tests. == Deprecated features === ChangeLog is replaced by git log ChangeLog is replaced by the output of git-log command, and ChangeLog before the 1.3.1 release is moved to doc/ChangeLog-before-1.3.1. === "if __FILE__ == $0" convention Primitive test codes in the "if __FILE__ == $0" convention are removed and the codes are moved to the sample scripts named sample/demo_*.rb (except some older or deprecated files). == Incompatible changes === Bio::NCBI::REST NCBI announces that all Entrez E-utility requests must contain email and tool parameters, and requests without them will return error after June 2010. To set default email address and tool name, following methods are added. * Bio::NCBI.default_email=(email) * Bio::NCBI.default_tool=(tool_name) For every query, Bio::NCBI::REST checks the email and tool parameters and raises error if they are empty. IMPORTANT NOTE: No default email address is preset in BioRuby. Programmers using BioRuby must set their own email address or implement to get user's email address in some way (from input form, configuration file, etc). Default tool name is set as "#{$0} (bioruby/#{Bio::BIORUBY_VERSION_ID})". For example, if you run "ruby my_script.rb" with BioRuby 1.4.0, the value is "my_script.rb (bioruby/1.4.0)". === Bio::KEGG ==== dblinks method In Bio::KEGG::COMPOUND, DRUG, ENZYME, GLYCAN and ORTHOLOGY, the method dblinks is changed to return a Hash. Each key of the hash is a database name and its value is an array of entry IDs in the database. If old behavior (returns raw entry lines as an array of strings) is needed, use dblinks_as_strings. ==== pathways method In Bio::KEGG::COMPOUND, DRUG, ENZYME, GENES, GLYCAN and REACTION, the method pathways is changed to return a Hash. Each key of the hash is a pathway ID and its value is the description of the pathway. In Bio::KEGG::GENES, if old behavior (returns pathway IDs as an Array) is needed, use pathways.keys. In Bio::KEGG::COMPOUND, DRUG, ENZYME, GLYCAN, and REACTION, if old behavior (returns raw entry lines as an array of strings) is needed, use pathways_as_strings. Note that Bio::KEGG::ORTHOLOGY#pathways is not changed (returns an array containing pathway IDs). ==== orthologs method In Bio::KEGG::ENZYME, GENES, GLYCAN and REACTION, the method orthologs is changed to return a Hash. Each key of the hash is a ortholog ID and its value is the name of the ortholog. If old behavior (returns raw entry lines as an array of strings) is needed, use orthologs_as_strings. ==== genes method In Bio::KEGG::ENZYME#genes and Bio::KEGG::ORTHOLOGY#genes is changed to return a Hash that is the same as Bio::KEGG::ORTHOLOGY#genes_as_hash. If old behavior (returns raw entry lines as an array of strings) is needed, use genes_as_strings. ==== Bio::KEGG:REACTION#rpairs Bio::KEGG::REACTION#rpairs is changed to return a Hash. Each key of the hash is a KEGG Rpair ID and its value is an array containing name and type. If old behavior (returns as tokens) is needed, use rpairs_as_tokens. ==== Bio::KEGG::ORTHOLOGY Bio::KEGG:ORTHOLOGY#dblinks_as_hash does not lower-case database names. === Bio::RestrictionEnzyme Format validation when creating an object is turned off because of efficiency. == Known problems See KNOWN_ISSUES.rdoc for details. bio-2.0.3/doc/RELEASE_NOTES-1.4.3.rdoc0000644000175000017500000002106014141516614015724 0ustar nileshnilesh= BioRuby 1.4.3 RELEASE NOTES A lot of changes have been made to the BioRuby 1.4.3 after the version 1.4.2 is released. This document describes important and/or incompatible changes since the BioRuby 1.4.2 release. For known problems, see KNOWN_ISSUES.rdoc. == New features === Bio::KEGG::KGML * New class Bio::KEGG::KGML::Graphics for storing a graphics element. In the instance of the class, "coords" attribute is now available. * New class Bio::KEGG::KGML::Substrate for storing a substrate element. * New class Bio::KEGG::KGML::Product for storing a product element. * New method Bio::KEGG::KGML::Reaction#id. * Improve RDoc documentation. * Unit tests are added. * There are incompatible changes. See Incompatible changes below. == Improvements === Portability running on JRuby and Rubinius Many failures and errors running on JRuby and Rubinius are resolved. Some of them are due to BioRuby bugs, and some of them are due to JRuby or Rubinius bugs. Artem Tarasov reported bugs in BioRuby and submitted bug reports to Rubinius. Clayton Wheeler and Naohisa Goto fixed bugs in BioRuby and submitted bug reports to JRuby. === Testing on Travis CI BioRuby is now using Travis CI (http://travis-ci.org/), a hosted continuous integration service for the open source community. == Bug fixes === Strange behavior related with "circular require" is fixed Fixed: In previous versions, some bioruby files may be required more than two times, and this sometimes causes strange behavior, depending on the order of files in the disk. In particular, unit tests running on JRuby sometimes crashes with strange errors. In BioRuby 1.4.3, almost all require and autoload lines are revised and are changed to avoid circular require. This also fixes crash on JRuby due to JRuby's autoload bug. === Other bug fixes * Fixed: Genomenet remote BLAST does not work. * Fixed: Bio::KEGG::KGML ignores "coords" field. * Fixed: Bio::NucleicAcid.to_re("s") typo * To suppress rare failure of chi-square equiprobability tests for Bio::Sequence::Common#randomize, test code changed to retry up to 10 times if the chi-square test fails. The assertion fails if the chi-square test fails 10 consecutive times, and this strongly suggests bugs in codes or in the random number generator. * Fixed: Bio::EMBL#os raises RuntimeError. The fix includes incompatible change. See below "Incompatible changes". * Fixed: bin/bioruby: Failed to save object with error message "can't convert Symbol into String" on Ruby 1.9. == Incompatible changes and removed features === Bio::FlatFile use binmode (binary mode) when opening a file In Bio::FlatFile.open and Bio::FlatFile.auto, binmode (binary mode) is used by default when opening a file, unless text mode is explicitly specified with open mode string or with options. Due to the change, files using CR+LF line separator might not be read correctly. === Broader FASTQ file recognition Because PacBio RS sequencer may produce kilobases long reads and read buffer size (default 31 lines) for file format detection may not be sufficient to find the second id line starting with "+", the regular expression for FASTQ is truncated only to check the first id line starting with "@". === Bio::KEGG::KGML * Bio::KEGG::KGML::Reaction#substrates and Bio::KEGG::KGML::Reaction#products are changed to return an array containing Bio::KEGG::KGML::Substrate and Bio::KEGG::KGML::Product objects, respectively. The changes enables us to get ID of substrates and products that were thrown away in the previous versions. * Most attribute methods that were different from the KGML attribute names are renamed to the names compatible with the KGML attribute names. Old method names are changed to aliases of them and marked as deprecated. The old names will be removed in the future. * Bio::KEGG::KGML::Entry#id (old name: entry_id) * Bio::KEGG::KGML::Entry#type (old name: category) * Bio::KEGG::KGML::Entry#entry1 (old name: node1) * Bio::KEGG::KGML::Entry#entry2 (old name: node2) * Bio::KEGG::KGML::Entry#type (old name: rel) * Bio::KEGG::KGML::Reaction#name (old name: entry_id) * Bio::KEGG::KGML::Reaction#type (old name: direction) * Following attribute methods are deprecated because two or more graphics elements may exist in an entry element. They will be removed in the future. Instead, please use instance methods of Bio::KEGG::KGML::Graphics, which can be obtained from Bio::KEGG::KGML::Entry#graphics attribute. * Bio::KEGG::KGML::Entry#label * Bio::KEGG::KGML::Entry#shape * Bio::KEGG::KGML::Entry#x * Bio::KEGG::KGML::Entry#y * Bio::KEGG::KGML::Entry#width * Bio::KEGG::KGML::Entry#height * Bio::KEGG::KGML::Entry#fgcolor * Bio::KEGG::KGML::Entry#bgcolor === Bio::EMBL#os Bio::EMBL#os, returns parser result of the EMBL OS line, no longer splits the content with comma, and it no longer raises error even if the OS line is not in the "Genus species (name)" format. The changes may affect the parsing of old EMBL files which contain two or more species names in an OS line. Note that Bio::EMBL#os returns an Array containing several Hash objects, and the argument is always ignored. The return value type and the meaning of the argument might be changed in the future. === Tests * Tests using network connections are moved under test/network/. To invoke these tests, run "rake test-network". * BIORUBY_TEST_LIB environment variable * The directory name specified with BIORUBY_TEST_LIB is always added on the top of $LOAD_PATH even if it is already included in the middle of $LOAD_PATH. * When BIORUBY_TEST_LIB is empty, it no longer add an empty string to $LOAD_PATH. * BIORUBY_TEST_LIB is ignored when BIORUBY_TEST_GEM is set. * BIORUBY_TEST_GEM environment variable * New environment variable BIORUBY_TEST_GEM for testing installed bio-X.X.X gem. Version number can be specified. See the following examples with/without the version number: * % env BIORUBY_TEST_GEM=1.4.2.5000 ruby test/runner.rb * % env BIORUBY_TEST_GEM="" ruby test/runner.rb === Other removed features * rdoc.zsh is removed because it have not been used for a long time. == Known issues The following issues are added or updated. See KNOWN_ISSUES.rdoc for other already known issues. === JRuby On JRuby, errors may be raised due to the following unfixed bugs in JRuby. * {JRUBY-6195}[http://jira.codehaus.org/browse/JRUBY-6195] Process.spawn (and related methods) ignore option hash * {JRUBY-6818}[http://jira.codehaus.org/browse/JRUBY-6818] Kernel.exec, Process.spawn (and IO.popen etc.) raise error when program is an array containing two strings With older version of JRuby, you may be bothered by the following bugs that have already been fixed in the head of JRuby. * {JRUBY-6658}[http://jira.codehaus.org/browse/JRUBY-6658] Problem when setting up an autoload entry, defining a class via require, then redefining the autoload entry * {JRUBY-6666}[http://jira.codehaus.org/browse/JRUBY-6666] Open3.popen3 failing due to missing handling for [path, argv[0]] array * {JRUBY-6819}[http://jira.codehaus.org/browse/JRUBY-6819] java.lang.ArrayIndexOutOfBoundsException in String#each_line Due to JRUBY-5678 (resolved issue) and the difference of behavior between CRuby and JRuby written in the comments of the issue tracking page, when running BioRuby on JRuby with sudo or root rights, TMPDIR environment variable should be set to a directory that is not world-writable. Currently, the workaround is needed for running BioRuby tests with JRuby on Travis-CI. * {JRUBY-5678}[http://jira.codehaus.org/browse/JRUBY-5678] tmpdir cannot be delete when jruby has sudo/root rights === Rubinius According to Travis-CI, unit tests have failed on 1.9 mode of Rubinius. With older version of Rubinius, you may be bothered by the following bugs that have already been fixed in the head of Rubinius. * {Rubinius Issue #1693}[https://github.com/rubinius/rubinius/issues/1693] String#split gives incorrect output when splitting by /^/ * {Rubinius Issue #1724}[https://github.com/rubinius/rubinius/issues/1724] Creating Struct class with length attribute === DDBJ Web API related classes (Bio::DDBJ::*, Bio::BLAST::Remote::DDBJ) DDBJ Web API is stopping after their system replacement in March 2012. (See the announcement though it is written only in Japanese: http://www.ddbj.nig.ac.jp/replace/rp120601-j.html) Due to the stop of the DDBJ Web API, Bio::DDBJ::* and Bio::BLAST::Remote::DDBJ which are using the web API can not be used. === SOAP4R with Ruby 1.9 soap4r-ruby1.9 may raise "ununitialized constant XML::SaxParser" error with some combinations of XML parser libraries. It seems this is a bug of soap4r-ruby1.9. bio-2.0.3/doc/Tutorial.rd.ja0000644000175000017500000031046614141516614015120 0ustar nileshnilesh=begin # $Id:$ Copyright (C) 2001-2003, 2005, 2006 Toshiaki Katayama Copyright (C) 2005, 2006 Naohisa Goto = BioRuby の使い方 BioRuby は国産の高機能オブジェクト指向スクリプト言語 Ruby のための オープンソースなバイオインフォマティクス用ライブラリです。 Ruby 言語は Perl 言語ゆずりの強力なテキスト処理と、 シンプルで分かりやすい文法、クリアなオブジェクト指向機能により、 広く使われるようになりました。Ruby について詳しくは、ウェブサイト (()) や市販の書籍等を参照してください。 == はじめに BioRuby を使用するには Ruby と BioRuby をインストールする必要があります。 === Ruby のインストール Ruby は Mac OS X や最近の UNIX には通常インストールされています。 Windows の場合も1クリックインストーラや ActiveScriptRuby などが 用意されています。まだインストールされていない場合は * (()) * (()) などを参考にしてインストールしましょう。 あなたのコンピュータにどのバージョンの Ruby がインストールされているかを チェックするには % ruby -v とコマンドを入力してください。すると、たとえば ruby 1.8.2 (2004-12-25) [powerpc-darwin7.7.0] のような感じでバージョンが表示されます。バージョン 1.8.5 以降をお勧めします。 Ruby 標準装備のクラスやメソッドについては、Ruby のリファレンスマニュアルを 参照してください。 * (()) * (()) コマンドラインでヘルプを参照するには、Ruby 標準添付の ri コマンドや、 日本語版の refe コマンドが便利です。 * (()) === RubyGems のインストール RubyGems のページから最新版をダウンロードします。 * (()) 展開してインストールします。 % tar zxvf rubygems-x.x.x.tar.gz % cd rubygems-x.x.x % ruby setup.rb === BioRuby のインストール BioRuby のインストール方法は (()) から 最新版を取得して以下のように行います(※1)。同梱されている README ファイルにも 目を通して頂きたいのですが、慣れないと1日がかりになる BioPerl と比べて BioRuby のインストールはすぐに終わるはずです。 % wget http://bioruby.org/archive/bioruby-x.x.x.tar.gz % tar zxvf bioruby-x.x.x.tar.gz % cd bioruby-x.x.x % su # ruby setup.rb RubyGems が使える環境であれば % gem install bio だけでインストールできます。このあと README ファイルに書かれているように bioruby-x.x.x/etc/bioinformatics/seqdatabase.ini というファイルをホームディレクトリの ~/.bioinformatics にコピーして おくとよいでしょう。RubyGems の場合は /usr/local/lib/ruby/gems/1.8/gems/bio-x.x.x/ などにあるはずです。 % mkdir ~/.bioinformatics % cp bioruby-x.x.x/etc/bioinformatics/seqdatabase.ini ~/.bioinformatics また、Emacs エディタを使う人は Ruby のソースに同梱されている misc/ruby-mode.el をインストールしておくとよいでしょう。 % mkdir -p ~/lib/lisp/ruby % cp ruby-x.x.x/misc/ruby-mode.el ~/lib/lisp/ruby などとしておいて、~/.emacs に以下の設定を書き足します。 ; subdirs の設定 (let ((default-directory "~/lib/lisp")) (normal-top-level-add-subdirs-to-load-path) ; ruby-mode の設定 (autoload 'ruby-mode "ruby-mode" "Mode for editing ruby source files") (add-to-list 'auto-mode-alist '("\\.rb$" . rd-mode)) (add-to-list 'interpeter-mode-alist '("ruby" . ruby-mode)) == BioRuby シェル BioRuby バージョン 0.7 以降では、簡単な操作は BioRuby と共にインストールされる bioruby コマンドで行うことができます。bioruby コマンドは Ruby に内蔵されている インタラクティブシェル irb を利用しており、Ruby と BioRuby にできることは全て 自由に実行することができます。 % bioruby project1 引数で指定した名前のディレクトリが作成され、その中で解析を行います。 上記の例の場合 project1 というディレクトリが作成され、さらに以下の サブディレクトリやファイルが作られます。 data/ ユーザの解析ファイルを置く場所 plugin/ 必要に応じて追加のプラグインを置く場所 session/ 設定やオブジェクト、ヒストリなどが保存される場所 session/config ユーザの設定を保存したファイル session/history ユーザの入力したコマンドのヒストリを保存したファイル session/object 永続化されたオブジェクトの格納ファイル このうち、data ディレクトリはユーザが自由に書き換えて構いません。 また、session/history ファイルを見ると、いつどのような操作を行ったかを 確認することができます。 2回目以降は、初回と同様に % bioruby project1 として起動しても構いませんし、作成されたディレクトリに移動して % cd project1 % bioruby のように引数なしで起動することもできます。 この他、script コマンドで作成されるスクリプトファイルや、 web コマンドで作成される Rails のための設定ファイルなどがありますが、 それらについては必要に応じて後述します。 BioRuby シェルではデフォルトでいくつかの便利なライブラリを読み込んでいます。 例えば readline ライブラリが使える環境では Tab キーでメソッド名や変数名が 補完されるはずです。open-uri, pp, yaml なども最初から読み込まれています。 === 塩基, アミノ酸の配列を作る --- getseq(str) getseq コマンド(※2)を使って文字列から塩基配列やアミノ酸配列を作ることが できます。塩基とアミノ酸は ATGC の含量が 90% 以上かどうかで自動判定されます。 ここでは、できた塩基配列を dna という変数に代入します。 bioruby> dna = getseq("atgcatgcaaaa") 変数の中身を確認するには Ruby の puts メソッドを使います。 bioruby> puts dna atgcatgcaaaa ファイル名を引数に与えると手元にあるファイルから配列を得ることもできます。 GenBank, EMBL, UniProt, FASTA など主要な配列フォーマットは自動判別されます (拡張子などのファイル名ではなくエントリの中身で判定します)。 以下は UniProt フォーマットのエントリをファイルから読み込んでいます。 この方法では、複数のエントリがある場合最初のエントリだけが読み込まれます。 bioruby> cdc2 = getseq("p04551.sp") bioruby> puts cdc2 MENYQKVEKIGEGTYGVVYKARHKLSGRIVAMKKIRLEDESEGVPSTAIREISLLKEVNDENNRSN...(略) データベース名とエントリ名が分かっていれば、インターネットを通じて 配列を自動的に取得することができます。 bioruby> psaB = getseq("genbank:AB044425") bioruby> puts psaB actgaccctgttcatattcgtcctattgctcacgcgatttgggatccgcactttggccaaccagca...(略) どこのデータベースからどのような方法でエントリを取得するかは、BioPerl などと共通の OBDA 設定ファイル ~/.bioinformatics/seqdatabase.ini を用いてデータベースごとに指定することができます(後述)。 また、EMBOSS の seqret コマンドによる配列取得にも対応していますので、 EMBOSS の USA 表記でもエントリを取得できます。EMBOSS のマニュアルを参照し ~/.embossrc を適切に設定してください。 どの方法で取得した場合も、getseq コマンドによって返される配列は、 汎用の配列クラス Bio::Sequence になります(※3)。 配列が塩基配列とアミノ酸配列のどちらと判定されているのかは、 moltype メソッドを用いて bioruby> p cdc2.moltype Bio::Sequence::AA bioruby> p psaB.moltype Bio::Sequence::NA のように調べることができます。自動判定が間違っている場合などには na, aa メソッドで強制的に変換できます。なお、これらのメソッドは 元のオブジェクトを強制的に書き換えます。 bioruby> dna.aa bioruby> p dna.moltype Bio::Sequence::AA bioruby> dna.na bioruby> p dna.moltype Bio::Sequence::NA または、to_naseq, to_aaseq メソッドで強制的に変換することもできます。 bioruby> pep = dna.to_aaseq to_naseq, to_aaseq メソッドの返すオブジェクトは、それぞれ、 DNA 配列のための Bio::Sequence::NA クラス、アミノ酸配列のための Bio::Sequence::AA クラスのオブジェクトになります。 配列がどちらのクラスに属するかは Ruby の class メソッドを用いて bioruby> p pep.class Bio::Sequence::AA のように調べることができます。 強制的に変換せずに、Bio::Sequence::NA クラスまたは Bio::sequence::AA クラス のどちらかのオブジェクトを得たい場合には seq メソッドを使います(※4)。 bioruby> pep2 = cdc2.seq bioruby> p pep2.class Bio::Sequence::AA また、以下で解説する complement や translate などのメソッドの結果は、 塩基配列を返すことが期待されるメソッドは Bio::Sequence::NA クラス、 アミノ酸配列を返すことが期待されるメソッドは Bio::sequence::AA クラス のオブジェクトになります。 塩基配列やアミノ酸配列のクラスは Ruby の文字列クラスである String を 継承しています。また、Bio::Sequence クラスのオブジェクトは String の オブジェクトと見かけ上同様に働くように工夫されています。このため、 length で長さを調べたり、+ で足し合わせたり、* で繰り返したりなど、 Ruby の文字列に対して行える操作は全て利用可能です。 このような特徴はオブジェクト指向の強力な側面の一つと言えるでしょう。 bioruby> puts dna.length 12 bioruby> puts dna + dna atgcatgcaaaaatgcatgcaaaa bioruby> puts dna * 5 atgcatgcaaaaatgcatgcaaaaatgcatgcaaaaatgcatgcaaaaatgcatgcaaaa :complement 塩基配列の相補鎖配列を得るには塩基配列の complement メソッドを呼びます。 bioruby> puts dna.complement ttttgcatgcat :translate 塩基配列をアミノ酸配列に翻訳するには translate メソッドを使います。 翻訳されたアミノ酸配列を pep という変数に代入してみます。 bioruby> pep = dna.translate bioruby> puts pep MHAK フレームを変えて翻訳するには bioruby> puts dna.translate(2) CMQ bioruby> puts dna.translate(3) ACK などとします。 :molecular_weight 分子量は molecular_weight メソッドで表示されます。 bioruby> puts dna.molecular_weight 3718.66444 bioruby> puts pep.molecular_weight 485.605 --- seqstat(seq) seqstat コマンドを使うと、組成などの情報も一度に表示されます。 bioruby> seqstat(dna) * * * Sequence statistics * * * 5'->3' sequence : atgcatgcaaaa 3'->5' sequence : ttttgcatgcat Translation 1 : MHAK Translation 2 : CMQ Translation 3 : ACK Translation -1 : FCMH Translation -2 : FAC Translation -3 : LHA Length : 12 bp GC percent : 33 % Composition : a - 6 ( 50.00 %) c - 2 ( 16.67 %) g - 2 ( 16.67 %) t - 2 ( 16.67 %) Codon usage : *---------------------------------------------* | | 2nd | | | 1st |-------------------------------| 3rd | | | U | C | A | G | | |-------+-------+-------+-------+-------+-----| | U U |F 0.0%|S 0.0%|Y 0.0%|C 0.0%| u | | U U |F 0.0%|S 0.0%|Y 0.0%|C 0.0%| c | | U U |L 0.0%|S 0.0%|* 0.0%|* 0.0%| a | | UUU |L 0.0%|S 0.0%|* 0.0%|W 0.0%| g | |-------+-------+-------+-------+-------+-----| | CCCC |L 0.0%|P 0.0%|H 25.0%|R 0.0%| u | | C |L 0.0%|P 0.0%|H 0.0%|R 0.0%| c | | C |L 0.0%|P 0.0%|Q 0.0%|R 0.0%| a | | CCCC |L 0.0%|P 0.0%|Q 0.0%|R 0.0%| g | |-------+-------+-------+-------+-------+-----| | A |I 0.0%|T 0.0%|N 0.0%|S 0.0%| u | | A A |I 0.0%|T 0.0%|N 0.0%|S 0.0%| c | | AAAAA |I 0.0%|T 0.0%|K 25.0%|R 0.0%| a | | A A |M 25.0%|T 0.0%|K 0.0%|R 0.0%| g | |-------+-------+-------+-------+-------+-----| | GGGG |V 0.0%|A 0.0%|D 0.0%|G 0.0%| u | | G |V 0.0%|A 0.0%|D 0.0%|G 0.0%| c | | G GGG |V 0.0%|A 25.0%|E 0.0%|G 0.0%| a | | GG G |V 0.0%|A 0.0%|E 0.0%|G 0.0%| g | *---------------------------------------------* Molecular weight : 3718.66444 Protein weight : 485.605 // アミノ酸配列の場合は以下のようになります。 bioruby> seqstat(pep) * * * Sequence statistics * * * N->C sequence : MHAK Length : 4 aa Composition : A Ala - 1 ( 25.00 %) alanine H His - 1 ( 25.00 %) histidine K Lys - 1 ( 25.00 %) lysine M Met - 1 ( 25.00 %) methionine Protein weight : 485.605 // :composition seqstat の中で表示されている組成は composition メソッドで得ることができます。 結果が文字列ではなく Hash で返されるので、とりあえず表示してみる場合には puts の代わりに p コマンドを使うと良いでしょう。 bioruby> p dna.composition {"a"=>6, "c"=>2, "g"=>2, "t"=>2} ==== 塩基配列、アミノ酸配列のその他のメソッド 他にも塩基配列、アミノ酸配列に対して行える操作は色々とあります。 :subseq(from, to) 部分配列を取り出すには subseq メソッドを使います。 bioruby> puts dna.subseq(1, 3) atg Ruby など多くのプログラミング言語の文字列は 1 文字目を 0 から数えますが、 subseq メソッドは 1 から数えて切り出せるようになっています。 bioruby> puts dna[0, 3] atg Ruby の String クラスが持つ slice メソッド str[] と適宜使い分けると よいでしょう。 :window_search(len, step) window_search メソッドを使うと長い配列の部分配列毎の繰り返しを 簡単に行うことができます。DNA 配列をコドン毎に処理する場合、 3文字ずつずらしながら3文字を切り出せばよいので以下のようになります。 bioruby> dna.window_search(3, 3) do |codon| bioruby+ puts "#{codon}\t#{codon.translate}" bioruby+ end atg M cat H gca A aaa K ゲノム配列を、末端 1000bp をオーバーラップさせながら 11000bp ごとに ブツ切りにし FASTA フォーマットに整形する場合は以下のようになります。 bioruby> seq.window_search(11000, 10000) do |subseq| bioruby+ puts subseq.to_fasta bioruby+ end 最後の 10000bp に満たない 3' 端の余り配列は返り値として得られるので、 必要な場合は別途受け取って表示します。 bioruby> i = 1 bioruby> remainder = seq.window_search(11000, 10000) do |subseq| bioruby+ puts subseq.to_fasta("segment #{i*10000}", 60) bioruby+ i += 1 bioruby+ end bioruby> puts remainder.to_fasta("segment #{i*10000}", 60) :splicing(position) 塩基配列の GenBank 等の position 文字列による切り出しは splicing メソッドで行います。 bioruby> puts dna atgcatgcaaaa bioruby> puts dna.splicing("join(1..3,7..9)") atggca :randomize randomize メソッドは、配列の組成を保存したままランダム配列を生成します。 bioruby> puts dna.randomize agcaatagatac :to_re to_re メソッドは、曖昧な塩基の表記を含む塩基配列を atgc だけの パターンからなる正規表現に変換します。 bioruby> ambiguous = getseq("atgcyatgcatgcatgc") bioruby> p ambiguous.to_re /atgc[tc]atgcatgcatgc/ bioruby> puts ambiguous.to_re (?-mix:atgc[tc]atgcatgcatgc) seq メソッドは ATGC の含有量が 90% 以下だとアミノ酸配列とみなすので、 曖昧な塩基が多く含まれる配列の場合は to_naseq メソッドを使って 明示的に Bio::Sequence::NA オブジェクトに変換する必要があります。 bioruby> s = getseq("atgcrywskmbvhdn").to_naseq bioruby> p s.to_re /atgc[ag][tc][at][gc][tg][ac][tgc][agc][atc][atg][atgc]/ bioruby> puts s.to_re (?-mix:atgc[ag][tc][at][gc][tg][ac][tgc][agc][atc][atg][atgc]) :names あまり使うことはありませんが、配列を塩基名やアミノ酸名に変換する メソッドです。 bioruby> p dna.names ["adenine", "thymine", "guanine", "cytosine", "adenine", "thymine", "guanine", "cytosine", "adenine", "adenine", "adenine", "adenine"] bioruby> p pep.names ["methionine", "histidine", "alanine", "lysine"] :codes アミノ酸配列を3文字コードに変換する names と似たメソッドです。 bioruby> p pep.codes ["Met", "His", "Ala", "Lys"] :gc_percent 塩基配列の GC 含量は gc_percent メソッドで得られます。 bioruby> p dna.gc_percent 33 :to_fasta FASTA フォーマットに変換するには to_fasta メソッドを使います。 bioruby> puts dna.to_fasta("dna sequence") >dna sequence aaccggttacgt === 塩基やアミノ酸のコード、コドン表をあつかう アミノ酸、塩基、コドンテーブルを得るための aminoacids, nucleicacids, codontables, codontable コマンドを紹介します。 --- aminoacids アミノ酸の一覧は aminoacids コマンドで表示できます。 bioruby> aminoacids ? Pyl pyrrolysine A Ala alanine B Asx asparagine/aspartic acid C Cys cysteine D Asp aspartic acid E Glu glutamic acid F Phe phenylalanine G Gly glycine H His histidine I Ile isoleucine K Lys lysine L Leu leucine M Met methionine N Asn asparagine P Pro proline Q Gln glutamine R Arg arginine S Ser serine T Thr threonine U Sec selenocysteine V Val valine W Trp tryptophan Y Tyr tyrosine Z Glx glutamine/glutamic acid 返り値は短い表記と対応する長い表記のハッシュになっています。 bioruby> aa = aminoacids bioruby> puts aa["G"] Gly bioruby> puts aa["Gly"] glycine --- nucleicacids 塩基の一覧は nucleicacids コマンドで表示できます。 bioruby> nucleicacids a a Adenine t t Thymine g g Guanine c c Cytosine u u Uracil r [ag] puRine y [tc] pYrimidine w [at] Weak s [gc] Strong k [tg] Keto m [ac] aroMatic b [tgc] not A v [agc] not T h [atc] not G d [atg] not C n [atgc] 返り値は塩基の1文字表記と該当する塩基のハッシュになっています。 bioruby> na = nucleicacids bioruby> puts na["r"] [ag] --- codontables コドンテーブルの一覧は codontables コマンドで表示できます。 bioruby> codontables 1 Standard (Eukaryote) 2 Vertebrate Mitochondrial 3 Yeast Mitochondorial 4 Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma 5 Invertebrate Mitochondrial 6 Ciliate Macronuclear and Dasycladacean 9 Echinoderm Mitochondrial 10 Euplotid Nuclear 11 Bacteria 12 Alternative Yeast Nuclear 13 Ascidian Mitochondrial 14 Flatworm Mitochondrial 15 Blepharisma Macronuclear 16 Chlorophycean Mitochondrial 21 Trematode Mitochondrial 22 Scenedesmus obliquus mitochondrial 23 Thraustochytrium Mitochondrial 返り値はテーブル番号と名前のハッシュになっています。 bioruby> ct = codontables bioruby> puts ct[3] Yeast Mitochondorial --- codontable(num) コドン表自体は codontable コマンドで表示できます。 bioruby> codontable(11) = Codon table 11 : Bacteria hydrophilic: H K R (basic), S T Y Q N S (polar), D E (acidic) hydrophobic: F L I M V P A C W G (nonpolar) *---------------------------------------------* | | 2nd | | | 1st |-------------------------------| 3rd | | | U | C | A | G | | |-------+-------+-------+-------+-------+-----| | U U | Phe F | Ser S | Tyr Y | Cys C | u | | U U | Phe F | Ser S | Tyr Y | Cys C | c | | U U | Leu L | Ser S | STOP | STOP | a | | UUU | Leu L | Ser S | STOP | Trp W | g | |-------+-------+-------+-------+-------+-----| | CCCC | Leu L | Pro P | His H | Arg R | u | | C | Leu L | Pro P | His H | Arg R | c | | C | Leu L | Pro P | Gln Q | Arg R | a | | CCCC | Leu L | Pro P | Gln Q | Arg R | g | |-------+-------+-------+-------+-------+-----| | A | Ile I | Thr T | Asn N | Ser S | u | | A A | Ile I | Thr T | Asn N | Ser S | c | | AAAAA | Ile I | Thr T | Lys K | Arg R | a | | A A | Met M | Thr T | Lys K | Arg R | g | |-------+-------+-------+-------+-------+-----| | GGGG | Val V | Ala A | Asp D | Gly G | u | | G | Val V | Ala A | Asp D | Gly G | c | | G GGG | Val V | Ala A | Glu E | Gly G | a | | GG G | Val V | Ala A | Glu E | Gly G | g | *---------------------------------------------* 返り値は Bio::CodonTable クラスのオブジェクトで、コドンとアミノ酸の 変換ができるだけでなく、以下のようなデータも得ることができます。 bioruby> ct = codontable(2) bioruby> p ct["atg"] "M" :definition コドン表の定義の説明 bioruby> puts ct.definition Vertebrate Mitochondrial :start 開始コドン一覧 bioruby> p ct.start ["att", "atc", "ata", "atg", "gtg"] :stop 終止コドン一覧 bioruby> p ct.stop ["taa", "tag", "aga", "agg"] :revtrans アミノ酸をコードするコドンを調べる bioruby> p ct.revtrans("V") ["gtc", "gtg", "gtt", "gta"] === フラットファイルのエントリ データベースのエントリと、フラットファイルそのものを扱う方法を紹介します。 GenBank データベースの中では、ファージのエントリが含まれる gbphg.seq の ファイルサイズが小さいので、このファイルを例として使います。 % wget ftp://ftp.hgc.jp/pub/mirror/ncbi/genbank/gbphg.seq.gz % gunzip gbphg.seq.gz --- getent(str) getseq コマンドは配列を取得しましたが、配列だけでなくエントリ全体を取得する には getent コマンド(※2)を使います。getseq コマンド同様、getent コマンドでも OBDA, EMBOSS, NCBI, EBI, TogoWS のデータベースが利用可能です(※5)。 設定については getseq コマンドの説明を参照してください。 bioruby> entry = getent("genbank:AB044425") bioruby> puts entry LOCUS AB044425 1494 bp DNA linear PLN 28-APR-2001 DEFINITION Volvox carteri f. kawasakiensis chloroplast psaB gene for photosystem I P700 chlorophyll a apoprotein A2, strain:NIES-732. (略) getent コマンドの引数には db:entry_id 形式の文字列、EMBOSS の USA、 ファイル、IO が与えられ、データベースの1エントリ分の文字列が返されます。 配列データベースに限らず、数多くのデータベースエントリに対応しています。 --- flatparse(str) 取得したエントリをパースして欲しいデータをとりだすには flatparse コマンドを使います。 bioruby> entry = getent("gbphg.seq") bioruby> gb = flatparse(entry) bioruby> puts gb.entry_id AB000833 bioruby> puts gb.definition Bacteriophage Mu DNA for ORF1, sheath protein gpL, ORF2, ORF3, complete cds. bioruby> puts psaB.naseq acggtcagacgtttggcccgaccaccgggatgaggctgacgcaggtcagaaatctttgtgacgacaaccgtatcaat (略) --- getobj(str) getobj コマンド(※2)は、getent でエントリを文字列として取得し flatparse で パースしたオブジェクトに変換するのと同じです。getent コマンドと同じ引数を 受け付けます。配列を取得する時は getseq、エントリを取得する時は getent、 パースしたオブジェクトを取得する時は getobj を使うことになります。 bioruby> gb = getobj("gbphg.seq") bioruby> puts gb.entry_id AB000833 --- flatfile(file) getent コマンドは1エントリしか扱えないため、ローカルのファイルを開いて 各エントリ毎に処理を行うには flatfile コマンドを使います。 bioruby> flatfile("gbphg.seq") do |entry| bioruby+ # do something on entry bioruby+ end ブロックを指定しない場合は、ファイル中の最初のエントリを取得します。 bioruby> entry = flatfile("gbphg.seq") bioruby> gb = flatparse(entry) bioruby> puts gb.entry_id --- flatauto(file) 各エントリを flatparse と同様にパースした状態で順番に処理するためには、 flatfile コマンドの代わりに flatauto コマンドを使います。 bioruby> flatauto("gbphg.seq") do |entry| bioruby+ print entry.entry_id bioruby+ puts entry.definition bioruby+ end flatfile 同様、ブロックを指定しない場合は、ファイル中の最初のエントリを 取得し、パースしたオブジェクトを返します。 bioruby> gb = flatfile("gbphg.seq") bioruby> puts gb.entry_id === フラットファイルのインデクシング EMBOSS の dbiflat に似た機能として、BioRuby, BioPerl などに共通の BioFlat というインデックスを作成する仕組みがあります。一度インデックスを 作成しておくとエントリの取り出しが高速かつ容易に行えます。 これにより自分専用のデータベースを手軽に作ることができます。 --- flatindex(db_name, *source_file_list) GenBank のファージの配列ファイル gbphg.seq に入っているエントリに対して mydb というデータベース名でインデックスを作成します。 bioruby> flatindex("mydb", "gbphg.seq") Creating BioFlat index (.bioruby/bioflat/mydb) ... done --- flatsearch(db_name, entry_id) 作成した mydb データベースからエントリをとり出すには flatsearch コマンドを 使います。 bioruby> entry = flatsearch("mydb", "AB004561") bioruby> puts entry LOCUS AB004561 2878 bp DNA linear PHG 20-MAY-1998 DEFINITION Bacteriophage phiU gene for integrase, complete cds, integration site. ACCESSION AB004561 (略) === 様々な DB の配列を FASTA フォーマットに変換して保存 FASTA フォーマットは配列データで標準的に用いられているフォーマットです。 「>」記号ではじまる1行目に配列の説明があり、2行目以降に配列がつづきます。 配列中の空白文字は無視されます。 >entry_id definition ... ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT 配列の説明行は、最初の単語が配列の ID になっていることが多いのですが、 NCBI の BLAST 用データベースではさらに高度な構造化がおこなわれています。 * (()) * (()) * FASTA format (Wikipedia) (()) BioRuby のデータベースエントリのクラスにはエントリID、配列、定義について 共通のメソッドが用意されています。 * entry_id - エントリ ID を取得 * definition - 定義文を取得 * seq - 配列を取得 これらの共通メソッドを使うと、どんな配列データベースエントリでも FASTA フォーマットに変換できるプログラムが簡単に作れます。 entry.seq.to_fasta("#{entry.entry_id} #{entry.definition}", 60) さらに、BioRuby では入力データベースの形式を自動判別できますので、 GenBank, UniProt など多くの主要な配列データベースでは ファイル名を指定するだけで FASTA フォーマットに変換できます。 --- flatfasta(fasta_file, *source_file_list) 入力データベースのファイル名のリストから、指定した FASTA フォーマットの ファイルを生成するコマンドです。ここではいくつかの GenBank のファイルを FASTA フォーマットに変換し、myfasta.fa というファイルに保存しています。 bioruby> flatfasta("myfasta.fa", "gbphg.seq", "gbvrl1.seq", "gbvrl2.seq") Saving fasta file (myfasta.fa) ... converting -- gbphg.gbk converting -- gbvrl1.gbk converting -- gbvrl2.gbk done === スクリプト生成 作業手順をスクリプト化して保存しておくこともできます。 bioruby> script -- 8< -- 8< -- 8< -- Script -- 8< -- 8< -- 8< -- bioruby> seq = getseq("gbphg.seq") bioruby> p seq bioruby> p seq.translate bioruby> script -- >8 -- >8 -- >8 -- Script -- >8 -- >8 -- >8 -- Saving script (script.rb) ... done 生成された script.rb は以下のようになります。 #!/usr/bin/env bioruby seq = getseq("gbphg.seq") p seq p seq.translate このスクリプトは bioruby コマンドで実行することができます。 % bioruby script.rb === 簡易シェル機能 --- cd(dir) カレントディレクトリを変更します。 bioruby> cd "/tmp" "/tmp" ホームディレクトリに戻るには引数をつけずに cd を実行します。 bioruby> cd "/home/k" --- pwd カレントディレクトリを表示します。 bioruby> pwd "/home/k" --- dir カレントディレクトリのファイルを一覧表示します。 bioruby> dir UGO Date Byte File ------ ---------------------------- ----------- ------------ 40700 Tue Dec 06 07:07:35 JST 2005 1768 "Desktop" 40755 Tue Nov 29 16:55:20 JST 2005 2176 "bin" 100644 Sat Oct 15 03:01:00 JST 2005 42599518 "gbphg.seq" (略) bioruby> dir "gbphg.seq" UGO Date Byte File ------ ---------------------------- ----------- ------------ 100644 Sat Oct 15 03:01:00 JST 2005 42599518 "gbphg.seq" --- head(file, lines = 10) テキストファイルやオブジェクトの先頭 10 行を表示します。 bioruby> head "gbphg.seq" GBPHG.SEQ Genetic Sequence Data Bank October 15 2005 NCBI-GenBank Flat File Release 150.0 Phage Sequences 2713 loci, 16892737 bases, from 2713 reported sequences 表示する行数を指定することもできます。 bioruby> head "gbphg.seq", 2 GBPHG.SEQ Genetic Sequence Data Bank October 15 2005 テキストの入っている変数の先頭を見ることもできます。 bioruby> entry = getent("gbphg.seq") bioruby> head entry, 2 GBPHG.SEQ Genetic Sequence Data Bank October 15 2005 --- disp(obj) テキストファイルやオブジェクトの中身をページャーで表示します。 ここで使用するページャーは pager コマンドで変更することができます(後述)。 bioruby> disp "gbphg.seq" bioruby> disp entry bioruby> disp [1, 2, 3] * 4 === 変数 --- ls セッション中に作成した変数(オブジェクト)の一覧を表示します。 bioruby> ls ["entry", "seq"] bioruby> a = 123 ["a", "entry", "seq"] --- rm(symbol) 変数を消去します。 bioruby> rm "a" bioruby> ls ["entry", "seq"] --- savefile(filename, object) 変数に保存されている内容をテキストファイルに保存します。 bioruby> savefile "testfile.txt", entry Saving data (testfile.txt) ... done bioruby> disp "testfile.txt" === 各種設定 永続化の仕組みとして BioRuby シェル終了時に session ディレクトリ内に ヒストリ、オブジェクト、個人の設定が保存され、次回起動時に自動的に 読み込まれます。 --- config BioRuby シェルの各種設定を表示します。 bioruby> config message = "...BioRuby in the shell..." marshal = [4, 8] color = false pager = nil echo = false echo 表示するかどうかを切り替えます。on の場合は、puts や p などを つけなくても評価した値が画面に表示されます。 irb コマンドの場合は初期設定が on になっていますが、bioruby コマンドでは 長い配列やエントリなど長大な文字列を扱うことが多いため、初期設定では off にしています。 bioruby> config :echo Echo on ==> nil bioruby> config :echo Echo off コドン表など、可能な場合にカラー表示するかどうかを切り替えます。 カラー表示の場合、プロンプトにも色がつきますので判別できます。 bioruby> config :color bioruby> codontable (色付き) 実行するたびに設定が切り替わります。 bioruby> config :color bioruby> codontable (色なし) BioRuby シェル起動時に表示されるスプラッシュメッセージを違う文字列に 変更します。何の解析プロジェクト用のディレクトリかを指定しておくのも よいでしょう。 bioruby> config :message, "Kumamushi genome project" K u m a m u s h i g e n o m e p r o j e c t Version : BioRuby 0.8.0 / Ruby 1.8.4 デフォルトの文字列に戻すには、引数なしで実行します。 bioruby> config :message BioRuby シェル起動時に表示されるスプラッシュメッセ−ジを アニメーション表示するかどうかを切り替えます。 こちらも実行するたびに設定が切り替わります。 bioruby> config :splash Splash on --- pager(command) disp コマンドで実際に利用するページャーを切り替えます。 bioruby> pager "lv" Pager is set to 'lv' bioruby> pager "less -S" Pager is set to 'less -S' ページャーを使用しない設定にする場合は引数なしで実行します。 bioruby> pager Pager is set to 'off' ページャーが off の時に引数なしで実行すると環境変数 PAGER の値を利用します。 bioruby> pager Pager is set to 'less' === 遺伝子アスキーアート --- doublehelix(sequence) DNA 配列をアスキーアートで表示するオマケ機能があります。 適当な塩基配列 seq を二重螺旋っぽく表示してみましょう。 bioruby> dna = getseq("atgc" * 10).randomize bioruby> doublehelix dna ta t--a a---t a----t a----t t---a g--c cg gc a--t g---c c----g c----g (略) === 遺伝子音楽 --- midifile(midifile, sequence) DNA 配列を MIDI ファイルに変換するオマケ機能があります。 適当な塩基配列 seq を使って生成した midifile.mid を MIDI プレイヤーで演奏してみましょう。 bioruby> midifile("midifile.mid", seq) Saving MIDI file (midifile.mid) ... done 以上で BioRuby シェルの解説を終わり、以下では BioRuby ライブラリ自体の 解説を行います。 == 塩基・アミノ酸配列を処理する (Bio::Sequence クラス) Bio::Sequence クラスは、配列に対する様々な操作を行うことができます。 簡単な例として、短い塩基配列 atgcatgcaaaa を使って、相補配列への変換、 部分配列の切り出し、塩基組成の計算、アミノ酸への翻訳、分子量計算などを 行なってみます。アミノ酸への翻訳では、必要に応じて何塩基目から翻訳を開 始するかフレームを指定したり、codontable.rb で定義されているコドンテー ブルの中から使用するものを指定したりする事ができます(コドンテーブルの 番号は (()) を参照)。 #!/usr/bin/env ruby require 'bio' seq = Bio::Sequence::NA.new("atgcatgcaaaa") puts seq # 元の配列 puts seq.complement # 相補配列 (Bio::Sequence::NA) puts seq.subseq(3,8) # 3 塩基目から 8 塩基目まで p seq.gc_percent # GC 塩基の割合 (Integer) p seq.composition # 全塩基組成 (Hash) puts seq.translate # 翻訳配列 (Bio::Sequence::AA) puts seq.translate(2) # 2文字目から翻訳(普通は1から) puts seq.translate(1,9) # 9番のコドンテーブルを使用 p seq.translate.codes # アミノ酸を3文字コードで表示 (Array) p seq.translate.names # アミノ酸を名前で表示 (Array) p seq.translate.composition # アミノ酸組成 (Hash) p seq.translate.molecular_weight # 分子量を計算 (Float) puts seq.complement.translate # 相補配列の翻訳 print, puts, p は内容を画面に表示するための Ruby 標準メソッドです。 基本となる print と比べて、puts は改行を自動でつけてくれる、 p は文字列や数字以外のオブジェクトも人間が見やすいように表示してくれる、 という特徴がありますので適宜使い分けます。さらに、 require 'pp' とすれば使えるようになる pp メソッドは、p よりも表示が見やすくなります。 塩基配列は Bio::Sequence::NA クラスの、アミノ酸配列は Bio::Sequence::AA クラスのオブジェクトになります。それぞれ Bio::Sequence クラスを継承し ているため、多くのメソッドは共通です。 さらに Bio::Sequence::NA, AA クラスは Ruby の String クラスを継承しているので String クラスが持つメソッドも使う事ができます。例えば部分配列を切り出すには Bio::Sequence クラスの subseq(from,to) メソッドの他に、String クラスの [] メソッドを使うこともできます。 Ruby の文字列は 1 文字目を 0 番目として数える点には注意が必要です。たとえば、 puts seq.subseq(1, 3) puts seq[0, 3] はどちらも seq の最初の3文字 atg を表示します。 このように、String のメソッドを使う場合は、生物学で普通使用される 1 文字目を 1 番目として数えた数字からは 1 を引く必要があります(subseq メソッドは これを内部でやっています。また、from, to のどちらかでも 0 以下の場合は 例外が発生するようになっています)。 ここまでの処理を BioRuby シェルで試すと以下のようになります。 # 次の行は seq = seq("atgcatgcaaaa") でもよい bioruby> seq = Bio::Sequence::NA.new("atgcatgcaaaa") # 生成した配列を表示 bioruby> puts seq atgcatgcaaaa # 相補配列を表示 bioruby> puts seq.complement ttttgcatgcat # 部分配列を表示(3塩基目から8塩基目まで) bioruby> puts seq.subseq(3,8) gcatgc # 配列の GC% を表示 bioruby> p seq.gc_percent 33 # 配列の組成を表示 bioruby> p seq.composition {"a"=>6, "c"=>2, "g"=>2, "t"=>2} # アミノ酸配列への翻訳 bioruby> puts seq.translate MHAK # 2塩基を開始塩基として翻訳 bioruby> puts seq.translate(2) CMQ # 9番のコドンテーブルを使用して翻訳 bioruby> puts seq.translate(1,9) MHAN # 翻訳されたアミノ酸配列を3文字コードで表示 bioruby> p seq.translate.codes ["Met", "His", "Ala", "Lys"] # 翻訳されたアミノ酸配列をアミノ酸の名前で表示 bioruby> p seq.translate.names ["methionine", "histidine", "alanine", "lysine"] # 翻訳されたアミノ酸配列の組成を表示 bioruby> p seq.translate.composition {"K"=>1, "A"=>1, "M"=>1, "H"=>1} # 翻訳されたアミノ酸配列の分子量を表示 bioruby> p seq.translate.molecular_weight 485.605 # 相補配列を翻訳 bioruby> puts seq.complement.translate FCMH # 部分配列(1塩基目から3塩基目まで) bioruby> puts seq.subseq(1, 3) atg # 部分配列(1塩基目から3塩基目まで) bioruby> puts seq[0, 3] atg window_search(window_size, step_size) メソッドを使うと、配列に対してウィ ンドウをずらしながらそれぞれの部分配列に対する処理を行うことができます。 Ruby の特長のひとつである「ブロック」によって、「それぞれに対する処理」を 簡潔かつ明瞭に書くことが可能です。以下の例では、subseq という変数にそれぞれ 部分配列を代入しながらブロックを繰り返し実行することになります。 * 100 塩基ごとに(1塩基ずつずらしながら)平均 GC% を計算して表示する seq.window_search(100) do |subseq| puts subseq.gc_percent end ブロックの中で受け取る部分配列も、元と同じ Bio::Sequence::NA または Bio::Sequence::AA クラスのオブジェクトなので、配列クラスの持つ全てのメ ソッドを実行することができます。 また、2番目の引数に移動幅を指定することが出来るようになっているので、 * コドン単位でずらしながら 15 塩基を 5 残基のペプチドに翻訳して表示する seq.window_search(15, 3) do |subseq| puts subseq.translate end といったことができます。さらに移動幅に満たない右端の部分配列をメソッド 自体の返り値として戻すようになっているので、 * ゲノム配列を 10000bp ごとにブツ切りにして FASTA フォーマットに整形、 このとき末端 1000bp はオーバーラップさせ、10000bp に満たない 3' 端は 別途受け取って表示する i = 1 remainder = seq.window_search(10000, 9000) do |subseq| puts subseq.to_fasta("segment #{i}", 60) i += 1 end puts remainder.to_fasta("segment #{i}", 60) のような事もわりと簡単にできます。 ウィンドウの幅と移動幅を同じにするとオーバーラップしないウィンドウサー チができるので、 * コドン頻度を数える codon_usage = Hash.new(0) seq.window_search(3, 3) do |subseq| codon_usage[subseq] += 1 end * 10 残基ずつ分子量を計算 seq.window_search(10, 10) do |subseq| puts subseq.molecular_weight end といった応用も考えられます。 実際には Bio::Sequence::NA オブジェクトはファイルから読み込んだ文字列か ら生成したり、データベースから取得したものを使ったりします。たとえば、 #!/usr/bin/env ruby require 'bio' input_seq = ARGF.read # 引数で与えられたファイルの全行を読み込む my_naseq = Bio::Sequence::NA.new(input_seq) my_aaseq = my_naseq.translate puts my_aaseq このプログラムを na2aa.rb として、以下の塩基配列 gtggcgatctttccgaaagcgatgactggagcgaagaaccaaagcagtgacatttgtctg atgccgcacgtaggcctgataagacgcggacagcgtcgcatcaggcatcttgtgcaaatg tcggatgcggcgtga を書いたファイル my_naseq.txt を読み込んで翻訳すると % ./na2aa.rb my_naseq.txt VAIFPKAMTGAKNQSSDICLMPHVGLIRRGQRRIRHLVQMSDAA* のようになります。ちなみに、このくらいの例なら短くすると1行で書けます。 % ruby -r bio -e 'p Bio::Sequence::NA.new($<.read).translate' my_naseq.txt しかし、いちいちファイルを作るのも面倒なので、次はデータベースから必要な 情報を取得してみます。 == GenBank のパース (Bio::GenBank クラス) GenBank 形式のファイルを用意してください(手元にない場合は、 ftp://ftp.ncbi.nih.gov/genbank/ から .seq ファイルをダウンロードします)。 % wget ftp://ftp.hgc.jp/pub/mirror/ncbi/genbank/gbphg.seq.gz % gunzip gbphg.seq.gz まずは、各エントリから ID と説明文、配列を取り出して FASTA 形式に変換して みましょう。 Bio::GenBank::DELIMITER は GenBank クラスで定義されている定数で、 データベースごとに異なるエントリの区切り文字(たとえば GenBank の場合は //) を覚えていなくても良いようになっています。 #!/usr/bin/env ruby require 'bio' while entry = gets(Bio::GenBank::DELIMITER) gb = Bio::GenBank.new(entry) # GenBank オブジェクト print ">#{gb.accession} " # ACCESSION 番号 puts gb.definition # DEFINITION 行 puts gb.naseq # 塩基配列(Sequence::NA オブジェクト) end しかし、この書き方では GenBank ファイルのデータ構造に依存しています。 ファイルからのデータ入力を扱うクラス Bio::FlatFile を使用することで、 以下のように区切り文字などを気にせず書くことができます。 #!/usr/bin/env ruby require 'bio' ff = Bio::FlatFile.new(Bio::GenBank, ARGF) ff.each_entry do |gb| definition = "#{gb.accession} #{gb.definition}" puts gb.naseq.to_fasta(definition, 60) end 形式の違うデータ、たとえばFASTAフォーマットのファイルを読み込むときでも、 #!/usr/bin/env ruby require 'bio' ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF) ff.each_entry do |f| puts "definition : " + f.definition puts "nalen : " + f.nalen.to_s puts "naseq : " + f.naseq end のように、同じような書き方で済ませられます。 さらに、各 Bio::DB クラスの open メソッドで同様のことができます。たとえば、 #!/usr/bin/env ruby require 'bio' ff = Bio::GenBank.open("gbvrl1.seq") ff.each_entry do |gb| definition = "#{gb.accession} #{gb.definition}" puts gb.naseq.to_fasta(definition, 60) end などと書くことができます(ただし、この書き方はあまり使われていません)。 次に、GenBank の複雑な FEATURES の中をパースして必要な情報を取り出します。 まずは /tranlation="アミノ酸配列" という Qualifier がある場合だけ アミノ酸配列を抽出して表示してみます。 #!/usr/bin/env ruby require 'bio' ff = Bio::FlatFile.new(Bio::GenBank, ARGF) # GenBank の1エントリごとに ff.each_entry do |gb| # FEATURES の要素を一つずつ処理 gb.features.each do |feature| # Feature に含まれる Qualifier を全てハッシュに変換 hash = feature.to_hash # Qualifier に translation がある場合だけ if hash['translation'] # エントリのアクセッション番号と翻訳配列を表示 puts ">#{gb.accession} puts hash['translation'] end end end さらに、Feature のポジションに書かれている情報からエントリの塩基配列を スプライシングし、それを翻訳したものと /translation= に書かれていた配列を 両方表示して比べてみましょう。 #!/usr/bin/env ruby require 'bio' ff = Bio::FlatFile.new(Bio::GenBank, ARGF) # GenBank の1エントリごとに ff.each_entry do |gb| # ACCESSION 番号と生物種名を表示 puts "### #{gb.accession} - #{gb.organism}" # FEATURES の要素を一つずつ処理 gb.features.each do |feature| # Feature の position (join ...など) を取り出す position = feature.position # Feature に含まれる Qualifier を全てハッシュに変換 hash = feature.to_hash # /translation= がなければスキップ next unless hash['translation'] # /gene=, /product= などの Qualifier から遺伝子名などの情報を集める gene_info = [ hash['gene'], hash['product'], hash['note'], hash['function'] ].compact.join(', ') puts "## #{gene_info}" # 塩基配列(position の情報によってスプライシング) puts ">NA splicing('#{position}')" puts gb.naseq.splicing(position) # アミノ酸配列(スプライシングした塩基配列から翻訳) puts ">AA translated by splicing('#{position}').translate" puts gb.naseq.splicing(position).translate # アミノ酸配列(/translation= に書かれていたのもの) puts ">AA original translation" puts hash['translation'] end end もし、使用されているコドンテーブルがデフォルト (universal) と違ったり、 最初のコドンが "atg" 以外だったり、セレノシステインが含まれていたり、 あるいは BioRuby にバグがあれば、上の例で表示される2つのアミノ酸配列は 異なる事になります。 この例で使用されている Bio::Sequence#splicing メソッドは、GenBank, EMBL, DDBJ フォーマットで使われている Location の表記を元に、塩基配列から 部分配列を切り出す強力なメソッドです。 この splicing メソッドの引数には GenBank 等の Location の文字列以外に BioRuby の Bio::Locations オブジェクトを渡すことも可能ですが、 通常は見慣れている Location 文字列の方が分かりやすいかも知れません。 Location 文字列のフォーマットや Bio::Locations について詳しく知りたい場合は BioRuby の bio/location.rb を見てください。 * GenBank 形式のデータの Feature で使われていた Location 文字列の例 naseq.splicing('join(2035..2050,complement(1775..1818),13..345') * あらかじめ Locations オブジェクトに変換してから渡してもよい locs = Bio::Locations.new('join((8298.8300)..10206,1..855)') naseq.splicing(locs) ちなみに、アミノ酸配列 (Bio::Sequence::AA) についても splicing メソッド を使用して部分配列を取り出すことが可能です。 * アミノ酸配列の部分配列を切り出す(シグナルペプチドなど) aaseq.splicing('21..119') === GenBank 以外のデータベース BioRuby では、GenBank 以外のデータベースについても基本的な扱い方は同じで、 データベースの1エントリ分の文字列を対応するデータベースのクラスに渡せば、 パースされた結果がオブジェクトになって返ってきます。 データベースのフラットファイルから1エントリずつ取り出してパースされた オブジェクトを取り出すには、先にも出てきた Bio::FlatFile を使います。 Bio::FlatFile.new の引数にはデータベースに対応する BioRuby でのクラス 名 (Bio::GenBank や Bio::KEGG::GENES など) を指定します。 ff = Bio::FlatFile.new(Bio::データベースクラス名, ARGF) しかし、すばらしいことに、実は FlatFile クラスはデータベースの自動認識が できますので、 ff = Bio::FlatFile.auto(ARGF) を使うのが一番簡単です。 #!/usr/bin/env ruby require 'bio' ff = Bio::FlatFile.auto(ARGF) ff.each_entry do |entry| p entry.entry_id # エントリの ID p entry.definition # エントリの説明文 p entry.seq # 配列データベースの場合 end ff.close さらに、開いたデータベースの閉じ忘れをなくすためには Ruby のブロックを 活用して以下のように書くのがよいでしょう。 #!/usr/bin/env ruby require 'bio' Bio::FlatFile.auto(ARGF) do |ff| ff.each_entry do |entry| p entry.entry_id # エントリの ID p entry.definition # エントリの説明文 p entry.seq # 配列データベースの場合 end end パースされたオブジェクトから、エントリ中のそれぞれの部分を取り出すための メソッドはデータベース毎に異なります。よくある項目については * entry_id メソッド → エントリの ID 番号が返る * definition メソッド → エントリの定義行が返る * reference メソッド → リファレンスオブジェクトが返る * organism メソッド → 生物種名 * seq や naseq や aaseq メソッド → 対応する配列オブジェクトが返る などのように共通化しようとしていますが、全てのメソッドが実装されているわ けではありません(共通化の指針は bio/db.rb 参照)。また、細かい部分は各 データベースパーザ毎に異なるので、それぞれのドキュメントに従います。 原則として、メソッド名が複数形の場合は、オブジェクトが配列として返ります。 たとえば references メソッドを持つクラスは複数の Bio::Reference オブジェ クトを Array にして返しますが、別のクラスでは単数形の reference メソッド しかなく、1つの Bio::Reference オブジェクトだけを返す、といった感じです。 == PDB のパース (Bio::PDB クラス) Bio::PDB は、PDB 形式を読み込むためのクラスです。PDB データベースは PDB, mmCIF, XML (PDBML) の3種類のフォーマットで提供されていますが、 これらのうち BioRuby で対応しているのは PDB フォーマットです。 PDB フォーマットの仕様は、以下の Protein Data Bank Contents Guide を 参照してください。 * (()) === PDB データの読み込み PDB の1エントリが 1bl8.pdb というファイルに格納されている場合は、 Ruby のファイル読み込み機能を使って entry = File.read("1bl8.pdb") のようにすることで、エントリの内容を文字列として entry という変数に 代入することができます。エントリの内容をパースするには pdb = Bio::PDB.new(entry) とします。これでエントリが Bio::PDB オブジェクトとなり、任意のデータを 取り出せるようになります。 PDB フォーマットは Bio::FlatFile による自動認識も可能ですが、現在は 1ファイルに複数エントリを含む場合には対応していません。 Bio::FlatFile を使って1エントリ分だけ読み込むには、 pdb = Bio::FlatFile.auto("1bl8.pdb") { |ff| ff.next_entry } とします。どちらの方法でも変数 pdb には同じ結果が得られます。 === オブジェクトの階層構造 各 PDB エントリは、英数字4文字からなる ID が付けられています。 Bio::PDB オブジェクトから ID を取リ出すには entry_id メソッドを使います。 p pdb.entry_id # => "1BL8" エントリの概要に関する情報も対応するメソッドで取り出すことができます。 p pdb.definition # => "POTASSIUM CHANNEL (KCSA) FROM STREPTOMYCES LIVIDANS" p pdb.keywords # => ["POTASSIUM CHANNEL", "INTEGRAL MEMBRANE PROTEIN"] 他に、登録者や文献、実験方法などの情報も取得できます(それぞれ authors, jrnl, method メソッド)。 PDB データは、基本的には1行が1つのレコードを形成しています。 1行に入りきらないデータを複数行に格納する continuation という 仕組みも用意されていますが、基本は1行1レコードです。 各行の先頭6文字がその行のデータの種類を示す名前(レコード)になります。 BioRuby では、HEADER レコードに対しては Bio::PDB::Record::HEADER クラス、 TITLE レコードに対しては Bio::PDB::Record::TITLE クラス、というように 基本的には各レコードに対応するクラスを1つ用意しています。 ただし、REMARK と JRNL レコードに関しては、それぞれ複数のフォーマットが 存在するため、複数のクラスを用意しています。 各レコードにアクセスするもっとも単純な方法は record メソッドです。 pdb.record("HELIX") のようにすると、その PDB エントリに含まれる全ての HELIX レコードを Bio::PDB::Record::HELIX クラスのオブジェクトの配列として取得できます。 このことをふまえ、以下では、PDB エントリのメインな内容である立体構造に 関するデータ構造の扱い方を見ていきます。 ==== 原子: Bio::PDB::Record::ATOM, Bio::PDB::Record::HETATM クラス PDB エントリは、タンパク質、核酸(DNA,RNA)やその他の分子の立体構造、 具体的には原子の3次元座標を含んでいます。 タンパク質または核酸の原子の座標は、ATOM レコードに格納されています。 対応するクラスは、Bio::PDB::Record::ATOM クラスです。 タンパク質・核酸以外の原子の座標は、HETATM レコードに格納されています。 対応するクラスは、Bio::PDB::Record::HETATM クラスです。 HETATM クラスは ATOM クラスを継承しているため、ATOM と HETATM の メソッドの使い方はまったく同じです。 ==== アミノ酸残基(または塩基): Bio::PDB::Residue クラス 1アミノ酸または1塩基単位で原子をまとめたのが Bio::PDB::Residue です。 Bio::PDB::Residue オブジェクトは、1個以上の Bio::PDB::Record::ATOM オブジェクトを含みます。 ==== 化合物: Bio::PDB::Heterogen クラス タンパク質・核酸以外の分子の原子は、基本的には分子単位で Bio::PDB::Heterogen にまとめられています。 Bio::PDB::Heterogen オブジェクトは、1個以上の Bio::PDB::Record::HETATM オブジェクトを含みます。 ==== 鎖(チェイン): Bio::PDB::Chain クラス Bio::PDB::Chain は、複数の Bio::PDB::Residue オブジェクトからなる 1個のタンパク質または核酸と、複数の Bio::PDB::Heterogen オブジェクト からなる1個以上のそれ以外の分子を格納するデータ構造です。 なお、大半の場合は、タンパク質・核酸(Bio::PDB::Residue)か、 それ以外の分子(Bio::PDB::Heterogen)のどちらか一種類しか持ちません。 Chain をひとつしか含まない PDB エントリでは両方持つ場合があるようです。 各 Chain には、英数字1文字の ID が付いています(Chain をひとつしか 含まない PDB エントリの場合は空白文字のときもあります)。 ==== モデル: Bio::PDB::Model 1個以上の Bio::PDB::Chain が集まったものが Bio::PDB::Model です。 X線結晶構造の場合、Model は通常1個だけですが、NMR 構造の場合、 複数の Model が存在することがあります。 複数の Model が存在する場合、各 Model にはシリアル番号が付きます。 そして、1個以上の Model が集まったものが、Bio::PDB オブジェクトになります。 === 原子にアクセスするメソッド Bio::PDB#each_atom は全ての ATOM を順番に1個ずつ辿るイテレータです。 pdb.each_atom do |atom| p atom.xyz end この each_atom メソッドは Model, Chain, Residue オブジェクトに対しても 使用することができ、それぞれ、その Model, Chain, Residue 内部のすべての ATOM をたどるイテレータとして働きます。 Bio::PDB#atoms は全ての ATOM を配列として返すメソッドです。 p pdb.atoms.size # => 2820 個の ATOM が含まれることがわかる each_atom と同様に atoms メソッドも Model, Chain, Residue オブジェクト に対して使用可能です。 pdb.chains.each do |chain| p chain.atoms.size # => 各 Chain 毎の ATOM 数が表示される end Bio::PDB#each_hetatm は、全ての HETATM を順番に1個ずつ辿るイテレータです。 pdb.each_hetatm do |hetatm| p hetatm.xyz end Bio::PDB#hetatms 全ての HETATM を配列として返すのは hetatms メソッドです。 p pdb.hetatms.size これらも atoms の場合と同様に、Model, Chain, Heterogen オブジェクトに 対して使用可能です。 ==== Bio::PDB::Record::ATOM, Bio::PDB::Record::HETATM クラスの使い方 ATOM はタンパク質・核酸(DNA・RNA)を構成する原子、HETATM はそれ以外の 原子を格納するためのクラスですが、HETATM が ATOM クラスを継承しているため これらのクラスでメソッドの使い方はまったく同じです。 p atom.serial # シリアル番号 p atom.name # 名前 p atom.altLoc # Alternate location indicator p atom.resName # アミノ酸・塩基名または化合物名 p atom.chainID # Chain の ID p atom.resSeq # アミノ酸残基のシーケンス番号 p atom.iCode # Code for insertion of residues p atom.x # X 座標 p atom.y # Y 座標 p atom.z # Z 座標 p atom.occupancy # Occupancy p atom.tempFactor # Temperature factor p atom.segID # Segment identifier p atom.element # Element symbol p atom.charge # Charge on the atom これらのメソッド名は、原則として Protein Data Bank Contents Guide の 記載に合わせています。メソッド名に resName や resSeq といった記名法 (CamelCase)を採用しているのはこのためです。 それぞれのメソッドの返すデータの意味は、仕様書を参考にしてください。 この他にも、いくつかの便利なメソッドを用意しています。 xyz メソッドは、座標を3次元のベクトルとして返すメソッドです。 このメソッドは、Ruby の Vector クラスを継承して3次元のベクトルに 特化させた Bio::PDB::Coordinate クラスのオブジェクトを返します (注: Vectorを継承したクラスを作成するのはあまり推奨されないようなので、 将来、Vectorクラスのオブジェクトを返すよう仕様変更するかもしれません)。 p atom.xyz ベクトルなので、足し算、引き算、内積などを求めることができます。 # 原子間の距離を求める p (atom1.xyz - atom2.xyz).r # r はベクトルの絶対値を求めるメソッド # 内積を求める p atom1.xyz.inner_product(atom2.xyz) 他には、その原子に対応する TER, SIGATM, ANISOU レコードを取得する ter, sigatm, anisou メソッドも用意されています。 === アミノ酸残基 (Residue) にアクセスするメソッド Bio::PDB#each_residue は、全ての Residue を順番に辿るイテレータです。 each_residue メソッドは、Model, Chain オブジェクトに対しても 使用することができ、それぞれの Model, Chain に含まれる全ての Residue を辿るイテレータとして働きます。 pdb.each_residue do |residue| p residue.resName end Bio::PDB#residues は、全ての Residue を配列として返すメソッドです。 each_residue と同様に、Model, Chain オブジェクトに対しても使用可能です。 p pdb.residues.size === 化合物 (Heterogen) にアクセスするメソッド Bio::PDB#each_heterogen は全ての Heterogen を順番にたどるイテレータ、 Bio::PDB#heterogens は全ての Heterogen を配列として返すメソッドです。 pdb.each_heterogen do |heterogeon| p heterogen.resName end p pdb.heterogens.size これらのメソッドも Residue と同様に Model, Chain オブジェクトに対しても 使用可能です。 === Chain, Model にアクセスするメソッド 同様に、Bio::PDB#each_chain は全ての Chain を順番にたどるイテレータ、 Bio::PDB#chains は全ての Chain を配列として返すメソッドです。 これらのメソッドは Model オブジェクトに対しても使用可能です。 Bio::PDB#each_model は全ての Model を順番にたどるイテレータ、 Bio::PDB#models は全ての Model を配列として返すメソッドです。 === PDB Chemical Component Dictionary のデータの読み込み Bio::PDB::ChemicalComponent クラスは、PDB Chemical Component Dictionary (旧名称 HET Group Dictionary)のパーサです。 PDB Chemical Component Dictionary については以下のページを参照してください。 * (()) データは以下でダウンロードできます。 * (()) このクラスは、RESIDUE から始まって空行で終わる1エントリをパースします (PDB フォーマットにのみ対応しています)。 Bio::FlatFile によるファイル形式自動判別に対応しています。 このクラス自体は ID から化合物を検索したりする機能は持っていません。 br_bioflat.rb によるインデックス作成には対応していますので、 必要ならそちらを使用してください。 Bio::FlatFile.auto("het_dictionary.txt") |ff| ff.each do |het| p het.entry_id # ID p het.hetnam # HETNAM レコード(化合物の名称) p het.hetsyn # HETSYM レコード(化合物の別名の配列) p het.formul # FORMUL レコード(化合物の組成式) p het.conect # CONECT レコード end end 最後の conect メソッドは、化合物の結合を Hash として返します。 たとえば、エタノールのエントリは次のようになりますが、 RESIDUE EOH 9 CONECT C1 4 C2 O 1H1 2H1 CONECT C2 4 C1 1H2 2H2 3H2 CONECT O 2 C1 HO CONECT 1H1 1 C1 CONECT 2H1 1 C1 CONECT 1H2 1 C2 CONECT 2H2 1 C2 CONECT 3H2 1 C2 CONECT HO 1 O END HET EOH 9 HETNAM EOH ETHANOL FORMUL EOH C2 H6 O1 このエントリに対して conect メソッドを呼ぶと { "C1" => [ "C2", "O", "1H1", "2H1" ], "C2" => [ "C1", "1H2", "2H2", "3H2" ], "O" => [ "C1", "HO" ], "1H1" => [ "C1" ], "1H2" => [ "C2" ], "2H1" => [ "C1" ], "2H2" => [ "C2" ], "3H2" => [ "C2" ], "HO" => [ "O" ] } という Hash を返します。 ここまでの処理を BioRuby シェルで試すと以下のようになります。 # PDB エントリ 1bl8 をネットワーク経由で取得 bioruby> ent_1bl8 = getent("pdb:1bl8") # エントリの中身を確認 bioruby> head ent_1bl8 # エントリをファイルに保存 bioruby> savefile("1bl8.pdb", ent_1bl8) # 保存されたファイルの中身を確認 bioruby> disp "data/1bl8.pdb" # PDB エントリをパース bioruby> pdb_1bl8 = flatparse(ent_1bl8) # PDB のエントリ ID を表示 bioruby> pdb_1bl8.entry_id # getent("pdb:1bl8") して flatparse する代わりに、以下でもOK bioruby> obj_1bl8 = getobj("pdb:1bl8") bioruby> obj_1bl8.entry_id # 各 HETEROGEN ごとに残基名を表示 bioruby> pdb_1bl8.each_heterogen { |heterogen| p heterogen.resName } # PDB Chemical Component Dictionary を取得 bioruby> het_dic = open("http://deposit.pdb.org/het_dictionary.txt").read # 取得したファイルのバイト数を確認 bioruby> het_dic.size # 取得したファイルを保存 bioruby> savefile("data/het_dictionary.txt", het_dic) # ファイルの中身を確認 bioruby> disp "data/het_dictionary.txt" # 検索のためにインデックス化し het_dic というデータベースを作成 bioruby> flatindex("het_dic", "data/het_dictionary.txt") # ID が EOH のエタノールのエントリを検索 bioruby> ethanol = flatsearch("het_dic", "EOH") # 取得したエントリをパース bioruby> osake = flatparse(ethanol) # 原子間の結合テーブルを表示 bioruby> sake.conect == アライメント (Bio::Alignment クラス) Bio::Alignment クラスは配列のアライメントを格納するためのコンテナです。 Ruby の Hash や Array に似た操作が可能で、BioPerl の Bio::SimpleAlign に 似た感じになっています。以下に簡単な使い方を示します。 require 'bio' seqs = [ 'atgca', 'aagca', 'acgca', 'acgcg' ] seqs = seqs.collect{ |x| Bio::Sequence::NA.new(x) } # アライメントオブジェクトを作成 a = Bio::Alignment.new(seqs) # コンセンサス配列を表示 p a.consensus # ==> "a?gc?" # IUPAC 標準の曖昧な塩基を使用したコンセンサス配列を表示 p a.consensus_iupac # ==> "ahgcr" # 各配列について繰り返す a.each { |x| p x } # ==> # "atgca" # "aagca" # "acgca" # "acgcg" # 各サイトについて繰り返す a.each_site { |x| p x } # ==> # ["a", "a", "a", "a"] # ["t", "a", "c", "c"] # ["g", "g", "g", "g"] # ["c", "c", "c", "c"] # ["a", "a", "a", "g"] # Clustal W を使用してアライメントを行う。 # 'clustalw' コマンドがシステムにインストールされている必要がある。 factory = Bio::ClustalW.new a2 = a.do_align(factory) == FASTA による相同性検索を行う(Bio::Fasta クラス) FASTA 形式の配列ファイル query.pep に対して、自分のマシン(ローカル)あるいは インターネット上のサーバ(リモート)で FASTA による相同性検索を行う方法です。 ローカルの場合は SSEARCH なども同様に使うことができます。 === ローカルの場合 FASTA がインストールされていることを確認してください。以下の例では、 コマンド名が fasta34 でパスが通ったディレクトリにインストール されている状況を仮定しています。 * (()) 検索対象とする FASTA 形式のデータベースファイル target.pep と、FASTA 形式の問い合わせ配列がいくつか入ったファイル query.pep を準備します。 この例では、各問い合わせ配列ごとに FASTA 検索を実行し、ヒットした配列の evalue が 0.0001 以下のものだけを表示します。 #!/usr/bin/env ruby require 'bio' # FASTA を実行する環境オブジェクトを作る(ssearch などでも良い) factory = Bio::Fasta.local('fasta34', ARGV.pop) # フラットファイルを読み込み、FastaFormat オブジェクトのリストにする ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF) # 1エントリずつの FastaFormat オブジェクトに対し ff.each do |entry| # '>' で始まるコメント行の内容を進行状況がわりに標準エラー出力に表示 $stderr.puts "Searching ... " + entry.definition # FASTA による相同性検索を実行、結果は Fasta::Report オブジェクト report = factory.query(entry) # ヒットしたものそれぞれに対し report.each do |hit| # evalue が 0.0001 以下の場合 if hit.evalue < 0.0001 # その evalue と、名前、オーバーラップ領域を表示 print "#{hit.query_id} : evalue #{hit.evalue}\t#{hit.target_id} at " p hit.lap_at end end end ここで factory は繰り返し FASTA を実行するために、あらかじめ作っておく 実行環境です。 上記のスクリプトを search.rb とすると、問い合わせ配列とデータベース配列の ファイル名を引数にして、以下のように実行します。 % ruby search.rb query.pep target.pep > search.out FASTA コマンドにオプションを与えたい場合、3番目の引数に FASTA の コマンドラインオプションを書いて渡します。ただし、ktup 値だけは メソッドを使って指定することになっています。 たとえば ktup 値を 1 にして、トップ 10 位以内のヒットを得る場合の オプションは、以下のようになります。 factory = Bio::Fasta.local('fasta34', 'target.pep', '-b 10') factory.ktup = 1 Bio::Fasta#query メソッドなどの返り値は Bio::Fasta::Report オブジェクト です。この Report オブジェクトから、様々なメソッドで FASTA の出力結果の ほぼ全てを自由に取り出せるようになっています。たとえば、ヒットに関する スコアなどの主な情報は、 report.each do |hit| puts hit.evalue # E-value puts hit.sw # Smith-Waterman スコア (*) puts hit.identity # % identity puts hit.overlap # オーバーラップしている領域の長さ puts hit.query_id # 問い合わせ配列の ID puts hit.query_def # 問い合わせ配列のコメント puts hit.query_len # 問い合わせ配列の長さ puts hit.query_seq # 問い合わせ配列 puts hit.target_id # ヒットした配列の ID puts hit.target_def # ヒットした配列のコメント puts hit.target_len # ヒットした配列の長さ puts hit.target_seq # ヒットした配列 puts hit.query_start # 相同領域の問い合わせ配列での開始残基位置 puts hit.query_end # 相同領域の問い合わせ配列での終了残基位置 puts hit.target_start # 相同領域のターゲット配列での開始残基位置 puts hit.target_end # 相同領域のターゲット配列での終了残基位置 puts hit.lap_at # 上記4位置の数値の配列 end などのメソッドで呼び出せます。これらのメソッドの多くは後で説明する Bio::Blast::Report クラスと共通にしてあります。上記以外のメソッドや FASTA 特有の値を取り出すメソッドが必要な場合は、Bio::Fasta::Report クラスのドキュメントを参照してください。 もし、パースする前の手を加えていない fasta コマンドの実行結果が必要な 場合には、 report = factory.query(entry) puts factory.output のように、query メソッドを実行した後で factory オブジェクトの output メソッドを使って取り出すことができます。 === リモートの場合 今のところ GenomeNet (fasta.genome.jp) での検索のみサポートしています。 リモートの場合は使用可能な検索対象データベースが決まっていますが、それ以 外の点については Bio::Fasta.remote と Bio::Fasta.local は同じように使う ことができます。 GenomeNet で使用可能な検索対象データベース: * アミノ酸配列データベース * nr-aa, genes, vgenes.pep, swissprot, swissprot-upd, pir, prf, pdbstr * 塩基配列データベース * nr-nt, genbank-nonst, gbnonst-upd, dbest, dbgss, htgs, dbsts, embl-nonst, embnonst-upd, genes-nt, genome, vgenes.nuc まず、この中から検索したいデータベースを選択します。問い合わせ配列の種類 と検索するデータベースの種類によってプログラムは決まります。 * 問い合わせ配列がアミノ酸のとき * 対象データベースがアミノ酸配列データベースの場合、program は 'fasta' * 対象データベースが核酸配列データベースの場合、program は 'tfasta' * 問い合わせ配列が核酸配列のとき * 対象データベースが核酸配列データベースの場合、program は 'fasta' * (対象データベースがアミノ酸配列データベースの場合は検索不能?) プログラムとデータベースの組み合せが決まったら program = 'fasta' database = 'genes' factory = Bio::Fasta.remote(program, database) としてファクトリーを作り、ローカルの場合と同じように factory.query など のメソッドで検索を実行します。 == BLAST による相同性検索を行う(Bio::Blast クラス) BLAST もローカルと GenomeNet (blast.genome.jp) での検索をサポートして います。できるだけ Bio::Fasta と API を共通にしていますので、上記の例を Bio::Blast と書き換えただけでも大丈夫な場合が多いです。 たとえば、先の f_search.rb は # BLAST を実行する環境オブジェクトを作る factory = Bio::Blast.local('blastp', ARGV.pop) と変更するだけで同じように実行できます。 同様に、GenomeNet を使用してBLASTを行う場合には Bio::Blast.remote を使います。 この場合、programの指定内容が FASTA と異なります。 * 問い合わせ配列がアミノ酸のとき * 対象データベースがアミノ酸配列データベースの場合、program は 'blastp' * 対象データベースが核酸配列データベースの場合、program は 'tblastn' * 問い合わせ配列が塩基配列のとき * 対象データベースがアミノ酸配列データベースの場合、program は 'blastx' * 対象データベースが塩基配列データベースの場合、program は 'blastn' * (問い合わせ・データベース共に6フレーム翻訳を行う場合は 'tblastx') をそれぞれ指定します。 ところで、BLAST では "-m 7" オプションによる XML 出力フォーマッットの方が 得られる情報が豊富なため、Bio::Blast は Ruby 用の XML ライブラリである XMLParser または REXML が使用可能な場合は、XML 出力を利用します。 両方使用可能な場合、XMLParser のほうが高速なので優先的に使用されます。 なお、Ruby 1.8.0 以降では REXML は Ruby 本体に標準添付されています。 もし XML ライブラリがインストールされていない場合は "-m 8" のタブ区切りの 出力形式を扱うようにしています。しかし、このフォーマットでは得られる データが限られるので、"-m 7" の XML 形式の出力を使うことをお勧めします。 すでに見たように Bio::Fasta::Report と Bio::Blast::Report の Hit オブジェ クトはいくつか共通のメソッドを持っています。BLAST 固有のメソッドで良く使 いそうなものには bit_score や midline などがあります。 report.each do |hit| puts hit.bit_score # bit スコア (*) puts hit.query_seq # 問い合わせ配列 puts hit.midline # アライメントの midline 文字列 (*) puts hit.target_seq # ヒットした配列 puts hit.evalue # E-value puts hit.identity # % identity puts hit.overlap # オーバーラップしている領域の長さ puts hit.query_id # 問い合わせ配列の ID puts hit.query_def # 問い合わせ配列のコメント puts hit.query_len # 問い合わせ配列の長さ puts hit.target_id # ヒットした配列の ID puts hit.target_def # ヒットした配列のコメント puts hit.target_len # ヒットした配列の長さ puts hit.query_start # 相同領域の問い合わせ配列での開始残基位置 puts hit.query_end # 相同領域の問い合わせ配列での終了残基位置 puts hit.target_start # 相同領域のターゲット配列での開始残基位置 puts hit.target_end # 相同領域のターゲット配列での終了残基位置 puts hit.lap_at # 上記4位置の数値の配列 end FASTAとのAPI共通化のためと簡便のため、スコアなどいくつかの情報は1番目の Hsp (High-scoring segment pair) の値をHitで返すようにしています。 Bio::Blast::Report オブジェクトは、以下に示すような、BLASTの結果出力の データ構造をそのまま反映した階層的なデータ構造を持っています。具体的には * Bio::Blast::Report オブジェクトの @iteratinos に * Bio::Blast::Report::Iteration オブジェクトの Array が入っており Bio::Blast::Report::Iteration オブジェクトの @hits に * Bio::Blast::Report::Hits オブジェクトの Array が入っており Bio::Blast::Report::Hits オブジェクトの @hsps に * Bio::Blast::Report::Hsp オブジェクトの Array が入っている という階層構造になっており、それぞれが内部の値を取り出すためのメソッドを 持っています。これらのメソッドの詳細や、BLAST 実行の統計情報などの値が 必要な場合には、 bio/appl/blast/*.rb 内のドキュメントやテストコードを 参照してください。 === 既存の BLAST 出力ファイルをパースする BLAST を実行した結果ファイルがすでに保存してあって、これを解析したい場合 には(Bio::Blast オブジェクトを作らずに) Bio::Blast::Report オブジェク トを作りたい、ということになります。これには Bio::Blast.reports メソッド を使います。対応しているのは デフォルト出力フォーマット("-m 0") または "-m 7" オプションの XML フォーマット出力です。 #!/usr/bin/env ruby require 'bio' # BLAST出力を順にパースして Bio::Blast::Report オブジェクトを返す Bio::Blast.reports(ARGF) do |report| puts "Hits for " + report.query_def + " against " + report.db report.each do |hit| print hit.target_id, "\t", hit.evalue, "\n" if hit.evalue < 0.001 end end のようなスクリプト hits_under_0.001.rb を書いて、 % ./hits_under_0.001.rb *.xml などと実行すれば、引数に与えた BLAST の結果ファイル *.xml を順番に処理で きます。 Blast のバージョンや OS などによって出力される XML の形式が異なる可能性 があり、時々 XML のパーザがうまく使えないことがあるようです。その場合は Blast 2.2.5 以降のバージョンをインストールするか -D や -m などのオプショ ンの組み合せを変えて試してみてください。 === リモート検索サイトを追加するには 注: このセクションは上級ユーザ向けです。可能であれば SOAP などによる ウェブサービスを利用する方がよいでしょう。 Blast 検索は NCBI をはじめ様々なサイトでサービスされていますが、今のとこ ろ BioRuby では GenomeNet 以外には対応していません。これらのサイトは、 * CGI を呼び出す(コマンドラインオプションはそのサイト用に処理する) * -m 8 など BioRuby がパーザを持っている出力フォーマットで blast の 出力を取り出す ことさえできれば、query を受け取って検索結果を Bio::Blast::Report.new に 渡すようなメソッドを定義するだけで使えるようになります。具体的には、この メソッドを「exec_サイト名」のような名前で Bio::Blast の private メソッド として登録すると、4番目の引数に「サイト名」を指定して factory = Bio::Blast.remote(program, db, option, 'サイト名') のように呼び出せるようになっています。完成したら BioRuby プロジェクトま で送ってもらえれば取り込ませて頂きます。 == PubMed を引いて引用文献リストを作る (Bio::PubMed クラス) 次は、NCBI の文献データベース PubMed を検索して引用文献リストを作成する例です。 #!/usr/bin/env ruby require 'bio' ARGV.each do |id| entry = Bio::PubMed.query(id) # PubMed を取得するクラスメソッド medline = Bio::MEDLINE.new(entry) # Bio::MEDLINE オブジェクト reference = medline.reference # Bio::Reference オブジェクト puts reference.bibtex # BibTeX フォーマットで出力 end このスクリプトを pmfetch.rb など好きな名前で保存し、 % ./pmfetch.rb 11024183 10592278 10592173 など引用したい論文の PubMed ID (PMID) を引数に並べると NCBI にアクセスし て MEDLINE フォーマットをパースし BibTeX フォーマットに変換して出力して くれるはずです。 他に、キーワードで検索する機能もあります。 #!/usr/bin/env ruby require 'bio' # コマンドラインで与えたキーワードのリストを1つの文字列にする keywords = ARGV.join(' ') # PubMed をキーワードで検索 entries = Bio::PubMed.search(keywords) entries.each do |entry| medline = Bio::MEDLINE.new(entry) # Bio::MEDLINE オブジェクト reference = medline.reference # Bio::Reference オブジェクト puts reference.bibtex # BibTeX フォーマットで出力 end このスクリプトを pmsearch.rb など好きな名前で保存し % ./pmsearch.rb genome bioinformatics など検索したいキーワードを引数に並べて実行すると、PubMed をキーワード 検索してヒットした論文のリストを BibTeX フォーマットで出力します。 最近では、NCBI は E-Utils というウェブアプリケーションを使うことが 推奨されているので、今後は Bio::PubMed.esearch メソッドおよび Bio::PubMed.efetch メソッドを使う方が良いでしょう。 #!/usr/bin/env ruby require 'bio' keywords = ARGV.join(' ') options = { 'maxdate' => '2003/05/31', 'retmax' => 1000, } entries = Bio::PubMed.esearch(keywords, options) Bio::PubMed.efetch(entries).each do |entry| medline = Bio::MEDLINE.new(entry) reference = medline.reference puts reference.bibtex end このスクリプトでは、上記の pmsearch.rb とほぼ同じように動きます。さらに、 NCBI E-Utils を活用することにより、検索対象の日付や最大ヒット件数などを 指定できるようになっているので、より高機能です。オプションに与えられる 引数については (()) を参照してください。 ちなみに、ここでは bibtex メソッドで BibTeX フォーマットに変換しています が、後述のように bibitem メソッドも使える他、(強調やイタリックなど 文字の修飾はできませんが)nature メソッドや nar など、いくつかの雑誌の フォーマットにも対応しています。 === BibTeX の使い方のメモ 上記の例で集めた BibTeX フォーマットのリストを TeX で使う方法を簡単にま とめておきます。引用しそうな文献を % ./pmfetch.rb 10592173 >> genoinfo.bib % ./pmsearch.rb genome bioinformatics >> genoinfo.bib などとして genoinfo.bib ファイルに集めて保存しておき、 \documentclass{jarticle} \begin{document} \bibliographystyle{plain} ほにゃらら KEGG データベース~\cite{PMID:10592173}はふがほげである。 \bibliography{genoinfo} \end{document} というファイル hoge.tex を書いて、 % platex hoge % bibtex hoge # → genoinfo.bib の処理 % platex hoge # → 文献リストの作成 % platex hoge # → 文献番号 とすると無事 hoge.dvi ができあがります。 === bibitem の使い方のメモ 文献用に別の .bib ファイルを作りたくない場合は Reference#bibitem メソッ ドの出力を使います。上記の pmfetch.rb や pmsearch.rb の puts reference.bibtex の行を puts reference.bibitem に書き換えるなどして、出力結果を \documentclass{jarticle} \begin{document} ほにゃらら KEGG データベース~\cite{PMID:10592173}はふがほげである。 \begin{thebibliography}{00} \bibitem{PMID:10592173} Kanehisa, M., Goto, S. KEGG: kyoto encyclopedia of genes and genomes., {\em Nucleic Acids Res}, 28(1):27--30, 2000. \end{thebibliography} \end{document} のように \begin{thebibliography} で囲みます。これを hoge.tex とすると % platex hoge # → 文献リストの作成 % platex hoge # → 文献番号 と2回処理すればできあがりです。 = OBDA OBDA (Open Bio Database Access) とは、Open Bioinformatics Foundation によって制定された、配列データベースへの共通アクセス方法です。これは、 2002 年の1月と2月に Arizona と Cape Town にて開催された BioHackathon において、BioPerl, BioJava, BioPython, BioRuby などの各プロジェクトの メンバーが参加して作成されました。 * BioRegistry (Directory) * データベース毎に配列をどこにどのように取りに行くかを指定する仕組み * BioFlat * フラットファイルの 2 分木または BDB を使ったインデックス作成 * BioFetch * HTTP 経由でデータベースからエントリを取得するサーバとクライアント * BioSQL * MySQL や PostgreSQL などの関係データベースに配列データを格納する ための schema と、エントリを取り出すためのメソッド 詳細は (()) を参照してください。 それぞれの仕様書は cvs.open-bio.org の CVSレポジトリに置いてあります。 または、(()) から参照できます。 == BioRegistry BioRegistryとは、設定ファイルによって各データベースのエントリ取得方法を 指定することにより、どんな方法を使っているかをほとんど意識せずデータを 取得することを可能とするための仕組みです。 設定ファイルの優先順位は * (メソッドのパラメータで)指定したファイル * ~/.bioinformatics/seqdatabase.ini * /etc/bioinformatics/seqdatabase.ini * http://www.open-bio.org/registry/seqdatabase.ini 最後の open-bio.org の設定は、ローカルな設定ファイルが見つからない場合に だけ参照します。 BioRuby の現在の実装では、すべてのローカルな設定ファイルを読み込み、 同じ名前の設定が複数存在した場合は、最初に見つかった設定だけが使用されます。 これを利用すると、たとえば、システム管理者が /etc/bioinformatics/ に置いた 設定のうち個人的に変更したいものだけ ~/.bioinformatics/ で上書きすることが できます。サンプルの seqdatabase.ini ファイルが bioruby のソースに含まれて いますので参照してください。 設定ファイルの中身は stanza フォーマットと呼ばれる書式で記述します。 [データベース名] protocol=プロトコル名 location=サーバ名 このようなエントリを各データベースについて記述することになります。 データベース名は、自分が使用するためのラベルなので分かりやすいものを つければ良く、実際のデータベースの名前と異なっていても構わないようです。 同じ名前のデータベースが複数あるときは最初に書かれているものから順に 接続を試すように仕様書では提案されていますが、今のところ BioRuby では それには対応していません。 また、プロトコルの種類によっては location 以外にも(MySQL のユーザ名など) 追加のオプションを記述する必要があります。現在のところ、仕様書で規定され ている protocol としては以下のものがあります。 * index-flat * index-berkeleydb * biofetch * biosql * bsane-corba * xembl 今のところ BioRuby で使用可能なのは index-flat, index-berkleydb, biofetch と biosql だけです。また、BioRegistryや各プロトコルの仕様は変更されること がありますが、BioRubyはそれに追従できていないかもしれません。 BioRegistry を使うには、まず Bio::Registryオブジェクトを作成します。 すると、設定ファイルが読み込まれます。 reg = Bio::Registry.new # 設定ファイルに書いたデータベース名でサーバへ接続 serv = reg.get_database('genbank') # ID を指定してエントリを取得 entry = serv.get_by_id('AA2CG') ここで serv は設定ファイルの [genbank] の欄で指定した protocol プロトコ ルに対応するサーバオブジェクトで、Bio::SQL や Bio::Fetch などのインスタ ンスが返っているはずです(データベース名が見つからなかった場合は nil)。 あとは OBDA 共通のエントリ取得メソッド get_by_id を呼んだり、サーバオ ブジェクト毎に固有のメソッドを呼ぶことになりますので、以下の BioFetch や BioSQL の解説を参照してください。 == BioFlat BioFlat はフラットファイルに対してインデックスを作成し、エントリを高速に 取り出す仕組みです。インデックスの種類は、RUbyの拡張ライブラリに依存しない index-flat と Berkeley DB (bdb) を使った index-berkeleydb の2種類が存在 します。なお、index-berkeleydb を使用するには、BDB という Ruby の拡張 ライブラリを別途インストールする必要があります。インデックスの作成には bioruby パッケージに付属する br_bioflat.rb コマンドを使って、 % br_bioflat.rb --makeindex データベース名 [--format クラス名] ファイル名 のようにします。BioRubyはデータフォーマットの自動認識機能を搭載している ので --format オプションは省略可能ですが、万一うまく認識しなかった場合は BioRuby の各データベースのクラス名を指定してください。検索は、 % bioflat データベース名 エントリID とします。具体的に GenBank の gbbct*.seq ファイルにインデックスを作成し て検索する場合、 % bioflat --makeindex my_bctdb --format GenBank gbbct*.seq % bioflat my_bctdb A16STM262 のような感じになります。 Ruby の bdb 拡張モジュール(詳細は http://raa.ruby-lang.org/project/bdb/ 参照) がインストールされている場合は Berkeley DB を利用してインデックスを作成する ことができます。この場合、 % bioflat --makeindex-bdb データベース名 [--format クラス名] ファイル名 のように "--makeindex" のかわりに "--makeindex-bdb" を指定します。 == BioFetch BioFetch は CGI を経由してサーバからデータベースのエントリを取得する仕様 で、サーバが受け取る CGI のオプション名、エラーコードなどが決められてい ます。クライアントは HTTP を使ってデータベース、ID、フォーマットなどを指 定し、エントリを取得します。 BioRuby プロジェクトでは GenomeNet の DBGET システムをバックエンドとした BioFetch サーバを実装しており、bioruby.org で運用しています。このサーバの ソースコードは BioRuby の sample/ ディレクトリに入っています。現在のところ BioFetch サーバはこの bioruby.org のものと EBI の二か所しかありません。 BioFetch を使ってエントリを取得するには、いくつかの方法があります。 (1) ウェブブラウザから検索する方法(以下のページを開く) http://bioruby.org/cgi-bin/biofetch.rb (2) BioRuby付属の br_biofetch.rb コマンドを用いる方法 % br_biofetch.rb db_name entry_id (3) スクリプトの中から Bio::Fetch クラスを直接使う方法 serv = Bio::Fetch.new(server_url) entry = serv.fetch(db_name, entry_id) (4) スクリプトの中で BioRegistry 経由で Bio::Fetch クラスを間接的に使う方法 reg = Bio::Registry.new serv = reg.get_database('genbank') entry = serv.get_by_id('AA2CG') もし (4) を使いたい場合は seqdatabase.ini で [genbank] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb biodbname=genbank などと指定しておく必要があります。 === BioFetch と Bio::KEGG::GENES, Bio::AAindex1 を組み合わせた例 次のプログラムは、BioFetch を使って KEGG の GENES データベースから古細菌 Halobacterium のバクテリアロドプシン遺伝子 (VNG1467G) を取ってきて、同じ ようにアミノ酸指標データベースである AAindex から取得したαヘリックスの 指標 (BURA740101) を使って、幅 15 残基のウィンドウサーチをする例です。 #!/usr/bin/env ruby require 'bio' entry = Bio::Fetch.query('hal', 'VNG1467G') aaseq = Bio::KEGG::GENES.new(entry).aaseq entry = Bio::Fetch.query('aax1', 'BURA740101') helix = Bio::AAindex1.new(entry).index position = 1 win_size = 15 aaseq.window_search(win_size) do |subseq| score = subseq.total(helix) puts [ position, score ].join("\t") position += 1 end ここで使っているクラスメソッド Bio::Fetch.query は暗黙に bioruby.org の BioFetch サーバを使う専用のショートカットです。(このサーバは内部的には ゲノムネットからデータを取得しています。KEGG/GENES データベースの hal や AAindex データベース aax1 のエントリは、他の BioFetch サーバでは取得でき ないこともあって、あえて query メソッドを使っています。) == BioSQL to be written... == BioRuby のサンプルプログラムの使い方 BioRuby のパッケージには samples/ ディレクトリ以下にいくつかのサンプルプ ログラムが含まれています。古いものも混じっていますし、量もとても十分とは 言えないので、実用的で面白いサンプルの提供は歓迎です。 to be written... == さらなる情報 他のチュートリアル的なドキュメントとしては、BioRuby Wikiに置いてある BioRuby in Anger があります。 == 脚注 * (※1) BioRuby 1.2.1 以前のバージョンでは、setup.rb のかわりに install.rb を使用します。また、以下のように3段階を踏む必要があります。 % ruby install.rb config % ruby install.rb setup # ruby install.rb install * (※2) BioRuby 1.0.0 以前のバージョンでは、getseq, getent, getobj の各コマンドのかわりに、seq, ent, obj の各コマンドを使用してください。 * (※3) BioRuby 0.7.1 以前のバージョンでは、Bio::Sequence::NA クラスか、 Bio::sequence::AA クラスのどちらかのオブジェクトになります。 配列がどちらのクラスに属するかは Ruby の class メソッドを用いて bioruby> p cdc2.class Bio::Sequence::AA bioruby> p psaB.class Bio::Sequence::NA のように調べることができます。自動判定が間違っている場合などには to_naseq, to_aaseq メソッドで強制的に変換できます。 * (※4) seq メソッドは、読み込んだデータの種類によっては、塩基・アミノ酸の どちらにも当てはまらない配列のための Bio::Sequence::Generic クラスや String クラスのオブジェクトを返す場合があるかもしれません。 * (※5) NCBI, EBI, TogoWS が特別な設定無しに getseq, getent, getobj コマンド から利用可能となったのは BioRuby 1.3.0 以降です。 =end bio-2.0.3/doc/Tutorial.rd0000644000175000017500000013665614141516614014536 0ustar nileshnilesh# This document is generated with a version of rd2html (part of Hiki) # # rd2 Tutorial.rd # # or with style sheet: # # rd2 -r rd/rd2html-lib.rb --with-css=bioruby.css Tutorial.rd > Tutorial.rd.html # # in Debian: # # rd2 -r rd/rd2html-lib --with-css="../lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/bioruby.css" Tutorial.rd > Tutorial.rd.html # # A common problem is tabs in the text file! TABs are not allowed. # # To add tests run Toshiaki's bioruby shell and paste in the query plus # results. # # To run the embedded Ruby doctests you can use the rubydoctest tool, though # it needs a little conversion. Like: # # cat Tutorial.rd | sed -e "s,bioruby>,>>," | sed "s,==>,=>," > Tutorial.rd.tmp # rubydoctest Tutorial.rd.tmp # # alternatively, the Ruby way is # # ruby -p -e '$_.sub!(/bioruby\>/, ">>"); $_.sub!(/\=\=\>/, "=>")' Tutorial.rd > Tutorial.rd.tmp # rubydoctest Tutorial.rd.tmp # # Rubydoctest is useful to verify an example in this document (still) works # # bioruby> $: << '../lib' # make sure rubydoctest finds bioruby/lib =begin #doctest Testing bioruby = BioRuby Tutorial * Copyright (C) 2001-2003 KATAYAMA Toshiaki * Copyright (C) 2005-2011 Pjotr Prins, Naohisa Goto and others This document was last modified: 2011/10/14 Current editor: Michael O'Keefe The latest version resides in the GIT source code repository: ./doc/(()). == Introduction This is a tutorial for using Bioruby. A basic knowledge of Ruby is required. If you want to know more about the programming language, we recommend the latest Ruby book (()) by Dave Thomas and Andy Hunt - the first edition can be read online (()). For BioRuby you need to install Ruby and the BioRuby package on your computer You can check whether Ruby is installed on your computer and what version it has with the % ruby -v command. You should see something like: ruby 1.9.2p290 (2011-07-09 revision 32553) [i686-linux] If you see no such thing you'll have to install Ruby using your installation manager. For more information see the (()) website. With Ruby download and install Bioruby using the links on the (()) website. The recommended installation is via RubyGems: gem install bio See also the Bioruby (()). A lot of BioRuby's documentation exists in the source code and unit tests. To really dive in you will need the latest source code tree. The embedded rdoc documentation can be viewed online at (()). But first lets start! == Trying Bioruby Bioruby comes with its own shell. After unpacking the sources run one of the following commands: bioruby or, from the source tree cd bioruby ruby -I lib bin/bioruby and you should see a prompt bioruby> Now test the following: bioruby> require 'bio' bioruby> seq = Bio::Sequence::NA.new("atgcatgcaaaa") ==> "atgcatgcaaaa" bioruby> seq.complement ==> "ttttgcatgcat" See the the Bioruby shell section below for more tweaking. If you have trouble running examples also check the section below on trouble shooting. You can also post a question to the mailing list. BioRuby developers usually try to help. == Working with nucleic / amino acid sequences (Bio::Sequence class) The Bio::Sequence class allows the usual sequence transformations and translations. In the example below the DNA sequence "atgcatgcaaaa" is converted into the complemental strand and spliced into a subsequence; next, the nucleic acid composition is calculated and the sequence is translated into the amino acid sequence, the molecular weight calculated, and so on. When translating into amino acid sequences, the frame can be specified and optionally the codon table selected (as defined in codontable.rb). bioruby> seq = Bio::Sequence::NA.new("atgcatgcaaaa") ==> "atgcatgcaaaa" # complemental sequence (Bio::Sequence::NA object) bioruby> seq.complement ==> "ttttgcatgcat" bioruby> seq.subseq(3,8) # gets subsequence of positions 3 to 8 (starting from 1) ==> "gcatgc" bioruby> seq.gc_percent ==> 33 bioruby> seq.composition ==> {"a"=>6, "c"=>2, "g"=>2, "t"=>2} bioruby> seq.translate ==> "MHAK" bioruby> seq.translate(2) # translate from frame 2 ==> "CMQ" bioruby> seq.translate(1,11) # codon table 11 ==> "MHAK" bioruby> seq.translate.codes ==> ["Met", "His", "Ala", "Lys"] bioruby> seq.translate.names ==> ["methionine", "histidine", "alanine", "lysine"] bioruby> seq.translate.composition ==> {"K"=>1, "A"=>1, "M"=>1, "H"=>1} bioruby> seq.translate.molecular_weight ==> 485.605 bioruby> seq.complement.translate ==> "FCMH" get a random sequence with the same NA count: bioruby> counts = {'a'=>seq.count('a'),'c'=>seq.count('c'),'g'=>seq.count('g'),'t'=>seq.count('t')} ==> {"a"=>6, "c"=>2, "g"=>2, "t"=>2} bioruby!> randomseq = Bio::Sequence::NA.randomize(counts) ==!> "aaacatgaagtc" bioruby!> print counts a6c2g2t2 bioruby!> p counts {"a"=>6, "c"=>2, "g"=>2, "t"=>2} The p, print and puts methods are standard Ruby ways of outputting to the screen. If you want to know more about standard Ruby commands you can use the 'ri' command on the command line (or the help command in Windows). For example % ri puts % ri p % ri File.open Nucleic acid sequence are members of the Bio::Sequence::NA class, and amino acid sequence are members of the Bio::Sequence::AA class. Shared methods are in the parent Bio::Sequence class. As Bio::Sequence inherits Ruby's String class, you can use String class methods. For example, to get a subsequence, you can not only use subseq(from, to) but also String#[]. Please take note that the Ruby's string's are base 0 - i.e. the first letter has index 0, for example: bioruby> s = 'abc' ==> "abc" bioruby> s[0].chr ==> "a" bioruby> s[0..1] ==> "ab" So when using String methods, you should subtract 1 from positions conventionally used in biology. (subseq method will throw an exception if you specify positions smaller than or equal to 0 for either one of the "from" or "to".) The window_search(window_size, step_size) method shows a typical Ruby way of writing concise and clear code using 'closures'. Each sliding window creates a subsequence which is supplied to the enclosed block through a variable named +s+. * Show average percentage of GC content for 20 bases (stepping the default one base at a time): bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa") ==> "atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa" bioruby> a=[]; seq.window_search(20) { |s| a.push s.gc_percent } bioruby> a ==> [30, 35, 40, 40, 35, 35, 35, 30, 25, 30, 30, 30, 35, 35, 35, 35, 35, 40, 45, 45, 45, 45, 40, 35, 40, 40, 40, 40, 40, 35, 35, 35, 30, 30, 30] Since the class of each subsequence is the same as original sequence (Bio::Sequence::NA or Bio::Sequence::AA or Bio::Sequence), you can use all methods on the subsequence. For example, * Shows translation results for 15 bases shifting a codon at a time bioruby> a = [] bioruby> seq.window_search(15, 3) { | s | a.push s.translate } bioruby> a ==> ["MHAIK", "HAIKL", "AIKLI", "IKLIP", "KLIPI", "LIPIR", "IPIRS", "PIRSS", "IRSSR", "RSSRS", "SSRSS", "SRSSK", "RSSKK", "SSKKK"] Finally, the window_search method returns the last leftover subsequence. This allows for example * Divide a genome sequence into sections of 10000bp and output FASTA formatted sequences (line width 60 chars). The 1000bp at the start and end of each subsequence overlapped. At the 3' end of the sequence the leftover is also added: i = 1 textwidth=60 remainder = seq.window_search(10000, 9000) do |s| puts s.to_fasta("segment #{i}", textwidth) i += 1 end if remainder puts remainder.to_fasta("segment #{i}", textwidth) end If you don't want the overlapping window, set window size and stepping size to equal values. Other examples * Count the codon usage bioruby> codon_usage = Hash.new(0) bioruby> seq.window_search(3, 3) { |s| codon_usage[s] += 1 } bioruby> codon_usage ==> {"cat"=>1, "aaa"=>3, "cca"=>1, "att"=>2, "aga"=>1, "atc"=>1, "cta"=>1, "gca"=>1, "cga"=>1, "tca"=>3, "aag"=>1, "tcc"=>1, "atg"=>1} * Calculate molecular weight for each 10-aa peptide (or 10-nt nucleic acid) bioruby> a = [] bioruby> seq.window_search(10, 10) { |s| a.push s.molecular_weight } bioruby> a ==> [3096.2062, 3086.1962, 3056.1762, 3023.1262, 3073.2262] In most cases, sequences are read from files or retrieved from databases. For example: require 'bio' input_seq = ARGF.read # reads all files in arguments my_naseq = Bio::Sequence::NA.new(input_seq) my_aaseq = my_naseq.translate puts my_aaseq Save the program above as na2aa.rb. Prepare a nucleic acid sequence described below and save it as my_naseq.txt: gtggcgatctttccgaaagcgatgactggagcgaagaaccaaagcagtgacatttgtctg atgccgcacgtaggcctgataagacgcggacagcgtcgcatcaggcatcttgtgcaaatg tcggatgcggcgtga na2aa.rb translates a nucleic acid sequence to a protein sequence. For example, translates my_naseq.txt: % ruby na2aa.rb my_naseq.txt or use a pipe! % cat my_naseq.txt|ruby na2aa.rb Outputs VAIFPKAMTGAKNQSSDICLMPHVGLIRRGQRRIRHLVQMSDAA* You can also write this, a bit fancifully, as a one-liner script. % ruby -r bio -e 'p Bio::Sequence::NA.new($<.read).translate' my_naseq.txt In the next section we will retrieve data from databases instead of using raw sequence files. One generic example of the above can be found in ./sample/na2aa.rb. == Parsing GenBank data (Bio::GenBank class) We assume that you already have some GenBank data files. (If you don't, download some .seq files from ftp://ftp.ncbi.nih.gov/genbank/) As an example we will fetch the ID, definition and sequence of each entry from the GenBank format and convert it to FASTA. This is also an example script in the BioRuby distribution. A first attempt could be to use the Bio::GenBank class for reading in the data: #!/usr/bin/env ruby require 'bio' # Read all lines from STDIN split by the GenBank delimiter while entry = gets(Bio::GenBank::DELIMITER) gb = Bio::GenBank.new(entry) # creates GenBank object print ">#{gb.accession} " # Accession puts gb.definition # Definition puts gb.naseq # Nucleic acid sequence # (Bio::Sequence::NA object) end But that has the disadvantage the code is tied to GenBank input. A more generic method is to use Bio::FlatFile which allows you to use different input formats: #!/usr/bin/env ruby require 'bio' ff = Bio::FlatFile.new(Bio::GenBank, ARGF) ff.each_entry do |gb| definition = "#{gb.accession} #{gb.definition}" puts gb.naseq.to_fasta(definition, 60) end For example, in turn, reading FASTA format files: #!/usr/bin/env ruby require 'bio' ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF) ff.each_entry do |f| puts "definition : " + f.definition puts "nalen : " + f.nalen.to_s puts "naseq : " + f.naseq end In the above two scripts, the first arguments of Bio::FlatFile.new are database classes of BioRuby. This is expanded on in a later section. Again another option is to use the Bio::DB.open class: #!/usr/bin/env ruby require 'bio' ff = Bio::GenBank.open("gbvrl1.seq") ff.each_entry do |gb| definition = "#{gb.accession} #{gb.definition}" puts gb.naseq.to_fasta(definition, 60) end Next, we are going to parse the GenBank 'features', which is normally very complicated: #!/usr/bin/env ruby require 'bio' ff = Bio::FlatFile.new(Bio::GenBank, ARGF) # iterates over each GenBank entry ff.each_entry do |gb| # shows accession and organism puts "# #{gb.accession} - #{gb.organism}" # iterates over each element in 'features' gb.features.each do |feature| position = feature.position hash = feature.assoc # put into Hash # skips the entry if "/translation=" is not found next unless hash['translation'] # collects gene name and so on and joins it into a string gene_info = [ hash['gene'], hash['product'], hash['note'], hash['function'] ].compact.join(', ') # shows nucleic acid sequence puts ">NA splicing('#{position}') : #{gene_info}" puts gb.naseq.splicing(position) # shows amino acid sequence translated from nucleic acid sequence puts ">AA translated by splicing('#{position}').translate" puts gb.naseq.splicing(position).translate # shows amino acid sequence in the database entry (/translation=) puts ">AA original translation" puts hash['translation'] end end * Note: In this example Feature#assoc method makes a Hash from a feature object. It is useful because you can get data from the hash by using qualifiers as keys. But there is a risk some information is lost when two or more qualifiers are the same. Therefore an Array is returned by Feature#feature. Bio::Sequence#splicing splices subsequences from nucleic acid sequences according to location information used in GenBank, EMBL and DDBJ. When the specified translation table is different from the default (universal), or when the first codon is not "atg" or the protein contains selenocysteine, the two amino acid sequences will differ. The Bio::Sequence#splicing method takes not only DDBJ/EMBL/GenBank feature style location text but also Bio::Locations object. For more information about location format and Bio::Locations class, see bio/location.rb. * Splice according to location string used in a GenBank entry naseq.splicing('join(2035..2050,complement(1775..1818),13..345') * Generate Bio::Locations object and pass the splicing method locs = Bio::Locations.new('join((8298.8300)..10206,1..855)') naseq.splicing(locs) You can also use this splicing method for amino acid sequences (Bio::Sequence::AA objects). * Splicing peptide from a protein (e.g. signal peptide) aaseq.splicing('21..119') === More databases Databases in BioRuby are essentially accessed like that of GenBank with classes like Bio::GenBank, Bio::KEGG::GENES. A full list can be found in the ./lib/bio/db directory of the BioRuby source tree. In many cases the Bio::DatabaseClass acts as a factory pattern and recognises the database type automatically - returning a parsed object. For example using Bio::FlatFile class as described above. The first argument of the Bio::FlatFile.new is database class name in BioRuby (such as Bio::GenBank, Bio::KEGG::GENES and so on). ff = Bio::FlatFile.new(Bio::DatabaseClass, ARGF) Isn't it wonderful that Bio::FlatFile automagically recognizes each database class? #!/usr/bin/env ruby require 'bio' ff = Bio::FlatFile.auto(ARGF) ff.each_entry do |entry| p entry.entry_id # identifier of the entry p entry.definition # definition of the entry p entry.seq # sequence data of the entry end An example that can take any input, filter using a regular expression and output to a FASTA file can be found in sample/any2fasta.rb. With this technique it is possible to write a Unix type grep/sort pipe for sequence information. One example using scripts in the BIORUBY sample folder: fastagrep.rb '/At|Dm/' database.seq | fastasort.rb greps the database for Arabidopsis and Drosophila entries and sorts the output to FASTA. Other methods to extract specific data from database objects can be different between databases, though some methods are common (see the guidelines for common methods in bio/db.rb). * entry_id --> gets ID of the entry * definition --> gets definition of the entry * reference --> gets references as Bio::Reference object * organism --> gets species * seq, naseq, aaseq --> returns sequence as corresponding sequence object Refer to the documents of each database to find the exact naming of the included methods. In general, BioRuby uses the following conventions: when a method name is plural, the method returns some object as an Array. For example, some classes have a "references" method which returns multiple Bio::Reference objects as an Array. And some classes have a "reference" method which returns a single Bio::Reference object. === Alignments (Bio::Alignment) The Bio::Alignment class in bio/alignment.rb is a container class like Ruby's Hash and Array classes and BioPerl's Bio::SimpleAlign. A very simple example is: bioruby> seqs = [ 'atgca', 'aagca', 'acgca', 'acgcg' ] bioruby> seqs = seqs.collect{ |x| Bio::Sequence::NA.new(x) } # creates alignment object bioruby> a = Bio::Alignment.new(seqs) bioruby> a.consensus ==> "a?gc?" # shows IUPAC consensus p a.consensus_iupac # ==> "ahgcr" # iterates over each seq a.each { |x| p x } # ==> # "atgca" # "aagca" # "acgca" # "acgcg" # iterates over each site a.each_site { |x| p x } # ==> # ["a", "a", "a", "a"] # ["t", "a", "c", "c"] # ["g", "g", "g", "g"] # ["c", "c", "c", "c"] # ["a", "a", "a", "g"] # doing alignment by using CLUSTAL W. # clustalw command must be installed. factory = Bio::ClustalW.new a2 = a.do_align(factory) Read a ClustalW or Muscle 'ALN' alignment file: bioruby> aln = Bio::ClustalW::Report.new(File.read('../test/data/clustalw/example1.aln')) bioruby> aln.header ==> "CLUSTAL 2.0.9 multiple sequence alignment" Fetch a sequence: bioruby> seq = aln.get_sequence(1) bioruby> seq.definition ==> "gi|115023|sp|P10425|" Get a partial sequence: bioruby> seq.to_s[60..120] ==> "LGYFNG-EAVPSNGLVLNTSKGLVLVDSSWDNKLTKELIEMVEKKFQKRVTDVIITHAHAD" Show the full alignment residue match information for the sequences in the set: bioruby> aln.match_line[60..120] ==> " . **. . .. ::*: . * : : . .: .* * *" Return a Bio::Alignment object: bioruby> aln.alignment.consensus[60..120] ==> "???????????SN?????????????D??????????L??????????????????H?H?D" == Restriction Enzymes (Bio::RE) BioRuby has extensive support for restriction enzymes (REs). It contains a full library of commonly used REs (from REBASE) which can be used to cut single stranded RNA or double stranded DNA into fragments. To list all enzymes: rebase = Bio::RestrictionEnzyme.rebase rebase.each do |enzyme_name, info| p enzyme_name end and to cut a sequence with an enzyme follow up with: res = seq.cut_with_enzyme('EcoRII', {:max_permutations => 0}, {:view_ranges => true}) if res.kind_of? Symbol #error err = Err.find_by_code(res.to_s) unless err err = Err.new(:code => res.to_s) end end res.each do |frag| em = EnzymeMatch.new em.p_left = frag.p_left em.p_right = frag.p_right em.c_left = frag.c_left em.c_right = frag.c_right em.err = nil em.enzyme = ar_enz em.sequence = ar_seq p em end == Sequence homology search by using the FASTA program (Bio::Fasta) Let's start with a query.pep file which contains a sequence in FASTA format. In this example we are going to execute a homology search from a remote internet site or on your local machine. Note that you can use the ssearch program instead of fasta when you use it in your local machine. === using FASTA in local machine Install the fasta program on your machine (the command name looks like fasta34. FASTA can be downloaded from ftp://ftp.virginia.edu/pub/fasta/). First, you must prepare your FASTA-formatted database sequence file target.pep and FASTA-formatted query.pep. #!/usr/bin/env ruby require 'bio' # Creates FASTA factory object ("ssearch" instead of # "fasta34" can also work) factory = Bio::Fasta.local('fasta34', ARGV.pop) (EDITOR's NOTE: not consistent pop command) ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF) # Iterates over each entry. the variable "entry" is a # Bio::FastaFormat object: ff.each do |entry| # shows definition line (begins with '>') to the standard error output $stderr.puts "Searching ... " + entry.definition # executes homology search. Returns Bio::Fasta::Report object. report = factory.query(entry) # Iterates over each hit report.each do |hit| # If E-value is smaller than 0.0001 if hit.evalue < 0.0001 # shows identifier of query and hit, E-value, start and # end positions of homologous region print "#{hit.query_id} : evalue #{hit.evalue}\t#{hit.target_id} at " p hit.lap_at end end end We named above script f_search.rb. You can execute it as follows: % ./f_search.rb query.pep target.pep > f_search.out In above script, the variable "factory" is a factory object for executing FASTA many times easily. Instead of using Fasta#query method, Bio::Sequence#fasta method can be used. seq = ">test seq\nYQVLEEIGRGSFGSVRKVIHIPTKKLLVRKDIKYGHMNSKE" seq.fasta(factory) When you want to add options to FASTA commands, you can set the third argument of the Bio::Fasta.local method. For example, the following sets ktup to 1 and gets a list of the top 10 hits: factory = Bio::Fasta.local('fasta34', 'target.pep', '-b 10') factory.ktup = 1 Bio::Fasta#query returns a Bio::Fasta::Report object. We can get almost all information described in FASTA report text with the Report object. For example, getting information for hits: report.each do |hit| puts hit.evalue # E-value puts hit.sw # Smith-Waterman score (*) puts hit.identity # % identity puts hit.overlap # length of overlapping region puts hit.query_id # identifier of query sequence puts hit.query_def # definition(comment line) of query sequence puts hit.query_len # length of query sequence puts hit.query_seq # sequence of homologous region puts hit.target_id # identifier of hit sequence puts hit.target_def # definition(comment line) of hit sequence puts hit.target_len # length of hit sequence puts hit.target_seq # hit of homologous region of hit sequence puts hit.query_start # start position of homologous # region in query sequence puts hit.query_end # end position of homologous region # in query sequence puts hit.target_start # start posiotion of homologous region # in hit(target) sequence puts hit.target_end # end position of homologous region # in hit(target) sequence puts hit.lap_at # array of above four numbers end Most of above methods are common to the Bio::Blast::Report described below. Please refer to the documentation of the Bio::Fasta::Report class for FASTA-specific details. If you need the original output text of FASTA program you can use the "output" method of the factory object after the "query" method. report = factory.query(entry) puts factory.output === using FASTA from a remote internet site * Note: Currently, only GenomeNet (fasta.genome.jp) is supported. check the class documentation for updates. For accessing a remote site the Bio::Fasta.remote method is used instead of Bio::Fasta.local. When using a remote method, the databases available may be limited, but, otherwise, you can do the same things as with a local method. Available databases in GenomeNet: * Protein database * nr-aa, genes, vgenes.pep, swissprot, swissprot-upd, pir, prf, pdbstr * Nucleic acid database * nr-nt, genbank-nonst, gbnonst-upd, dbest, dbgss, htgs, dbsts, embl-nonst, embnonst-upd, genes-nt, genome, vgenes.nuc Select the databases you require. Next, give the search program from the type of query sequence and database. * When query is an amino acid sequence * When protein database, program is "fasta". * When nucleic database, program is "tfasta". * When query is a nucleic acid sequence * When nucleic database, program is "fasta". * (When protein database, the search would fail.) For example, run: program = 'fasta' database = 'genes' factory = Bio::Fasta.remote(program, database) and try out the same commands as with the local search shown earlier. == Homology search by using BLAST (Bio::Blast class) The BLAST interface is very similar to that of FASTA and both local and remote execution are supported. Basically replace above examples Bio::Fasta with Bio::Blast! For example the BLAST version of f_search.rb is: # create BLAST factory object factory = Bio::Blast.local('blastp', ARGV.pop) For remote execution of BLAST in GenomeNet, Bio::Blast.remote is used. The parameter "program" is different from FASTA - as you can expect: * When query is a amino acid sequence * When protein database, program is "blastp". * When nucleic database, program is "tblastn". * When query is a nucleic acid sequence * When protein database, program is "blastx" * When nucleic database, program is "blastn". * ("tblastx" for six-frame search.) Bio::BLAST uses "-m 7" XML output of BLAST by default when either XMLParser or REXML (both of them are XML parser libraries for Ruby - of the two XMLParser is the fastest) is installed on your computer. In Ruby version 1.8.0 or later, REXML is bundled with Ruby's distribution. When no XML parser library is present, Bio::BLAST uses "-m 8" tabular deliminated format. Available information is limited with the "-m 8" format so installing an XML parser is recommended. Again, the methods in Bio::Fasta::Report and Bio::Blast::Report (and Bio::Fasta::Report::Hit and Bio::Blast::Report::Hit) are similar. There are some additional BLAST methods, for example, bit_score and midline. report.each do |hit| puts hit.bit_score puts hit.query_seq puts hit.midline puts hit.target_seq puts hit.evalue puts hit.identity puts hit.overlap puts hit.query_id puts hit.query_def puts hit.query_len puts hit.target_id puts hit.target_def puts hit.target_len puts hit.query_start puts hit.query_end puts hit.target_start puts hit.target_end puts hit.lap_at end For simplicity and API compatibility, some information such as score is extracted from the first Hsp (High-scoring Segment Pair). Check the documentation for Bio::Blast::Report to see what can be retrieved. For now suffice to say that Bio::Blast::Report has a hierarchical structure mirroring the general BLAST output stream: * In a Bio::Blast::Report object, @iterations is an array of Bio::Blast::Report::Iteration objects. * In a Bio::Blast::Report::Iteration object, @hits is an array of Bio::Blast::Report::Hits objects. * In a Bio::Blast::Report::Hits object, @hsps is an array of Bio::Blast::Report::Hsp objects. See bio/appl/blast.rb and bio/appl/blast/*.rb for more information. === Parsing existing BLAST output files When you already have BLAST output files and you want to parse them, you can directly create Bio::Blast::Report objects without the Bio::Blast factory object. For this purpose use Bio::Blast.reports, which supports the "-m 0" default and "-m 7" XML type output format. * For example: blast_version = nil; result = [] Bio::Blast.reports(File.new("../test/data/blast/blastp-multi.m7")) do |report| blast_version = report.version report.iterations.each do |itr| itr.hits.each do |hit| result.push hit.target_id end end end blast_version # ==> "blastp 2.2.18 [Mar-02-2008]" result # ==> ["BAB38768", "BAB38768", "BAB38769", "BAB37741"] * another example: require 'bio' Bio::Blast.reports(ARGF) do |report| puts "Hits for " + report.query_def + " against " + report.db report.each do |hit| print hit.target_id, "\t", hit.evalue, "\n" if hit.evalue < 0.001 end end Save the script as hits_under_0.001.rb and to process BLAST output files *.xml, you can run it with: % ruby hits_under_0.001.rb *.xml Sometimes BLAST XML output may be wrong and can not be parsed. Check whether blast is version 2.2.5 or later. See also blast --help. Bio::Blast loads the full XML file into memory. If this causes a problem you can split the BLAST XML file into smaller chunks using XML-Twig. An example can be found in (()). === Add remote BLAST search sites Note: this section is an advanced topic Here a more advanced application for using BLAST sequence homology search services. BioRuby currently only supports GenomeNet. If you want to add other sites, you must write the following: * the calling CGI (command-line options must be processed for the site). * make sure you get BLAST output text as supported format by BioRuby (e.g. "-m 8", "-m 7" or default("-m 0")). In addition, you must write a private class method in Bio::Blast named "exec_MYSITE" to get query sequence and to pass the result to Bio::Blast::Report.new(or Bio::Blast::Default::Report.new): factory = Bio::Blast.remote(program, db, option, 'MYSITE') When you write above routines, please send them to the BioRuby project, and they may be included in future releases. == Generate a reference list using PubMed (Bio::PubMed) Nowadays using NCBI E-Utils is recommended. Use Bio::PubMed.esearch and Bio::PubMed.efetch. #!/usr/bin/env ruby require 'bio' # NCBI announces that queries without email address will return error # after June 2010. When you modify the script, please enter your email # address instead of the staff's. Bio::NCBI.default_email = 'staff@bioruby.org' keywords = ARGV.join(' ') options = { 'maxdate' => '2003/05/31', 'retmax' => 1000, } entries = Bio::PubMed.esearch(keywords, options) Bio::PubMed.efetch(entries).each do |entry| medline = Bio::MEDLINE.new(entry) reference = medline.reference puts reference.bibtex end The script works same as pmsearch.rb. But, by using NCBI E-Utils, more options are available. For example published dates to search and maximum number of hits to show results can be specified. See the (()) for more details. === More about BibTeX In this section, we explain the simple usage of TeX for the BibTeX format bibliography list collected by above scripts. For example, to save BibTeX format bibliography data to a file named genoinfo.bib. % ./pmfetch.rb 10592173 >> genoinfo.bib % ./pmsearch.rb genome bioinformatics >> genoinfo.bib The BibTeX can be used with Tex or LaTeX to form bibliography information with your journal article. For more information on using BibTex see (()). A quick example: Save this to hoge.tex: \documentclass{jarticle} \begin{document} \bibliographystyle{plain} foo bar KEGG database~\cite{PMID:10592173} baz hoge fuga. \bibliography{genoinfo} \end{document} Then, % latex hoge % bibtex hoge # processes genoinfo.bib % latex hoge # creates bibliography list % latex hoge # inserts correct bibliography reference Now, you get hoge.dvi and hoge.ps - the latter of which can be viewed with any Postscript viewer. === Bio::Reference#bibitem When you don't want to create a bib file, you can use Bio::Reference#bibitem method instead of Bio::Reference#bibtex. In the above pmfetch.rb and pmsearch.rb scripts, change puts reference.bibtex to puts reference.bibitem Output documents should be bundled in \begin{thebibliography} and \end{thebibliography}. Save the following to hoge.tex \documentclass{jarticle} \begin{document} foo bar KEGG database~\cite{PMID:10592173} baz hoge fuga. \begin{thebibliography}{00} \bibitem{PMID:10592173} Kanehisa, M., Goto, S. KEGG: kyoto encyclopedia of genes and genomes., {\em Nucleic Acids Res}, 28(1):27--30, 2000. \end{thebibliography} \end{document} and run % latex hoge # creates bibliography list % latex hoge # inserts corrent bibliography reference = OBDA OBDA (Open Bio Database Access) is a standardized method of sequence database access developed by the Open Bioinformatics Foundation. It was created during the BioHackathon by BioPerl, BioJava, BioPython, BioRuby and other projects' members (2002). * BioRegistry (Directory) * Mechanism to specify how and where to retrieve sequence data for each database. * BioFlat * Flatfile indexing by using binary tree or BDB(Berkeley DB). * BioFetch * Server-client model for getting entry from database via http. * BioSQL * Schemas to store sequence data to relational databases such as MySQL and PostgreSQL, and methods to retrieve entries from the database. This tutorial only gives a quick overview of OBDA. Check out (()) for more extensive details. == BioRegistry BioRegistry allows for locating retrieval methods and database locations through configuration files. The priorities are * The file specified with method's parameter * ~/.bioinformatics/seqdatabase.ini * /etc/bioinformatics/seqdatabase.ini * http://www.open-bio.org/registry/seqdatabase.ini Note that the last locaation refers to www.open-bio.org and is only used when all local configulation files are not available. In the current BioRuby implementation all local configulation files are read. For databases with the same name settings encountered first are used. This means that if you don't like some settings of a database in the system's global configuration file (/etc/bioinformatics/seqdatabase.ini), you can easily override them by writing settings to ~/.bioinformatics/seqdatabase.ini. The syntax of the configuration file is called a stanza format. For example [DatabaseName] protocol=ProtocolName location=ServerName You can write a description like the above entry for every database. The database name is a local label for yourself, so you can name it freely and it can differ from the name of the actual databases. In the actual specification of BioRegistry where there are two or more settings for a database of the same name, it is proposed that connection to the database is tried sequentially with the order written in configuration files. However, this has not (yet) been implemented in BioRuby. In addition, for some protocols, you must set additional options other than locations (e.g. user name for MySQL). In the BioRegistory specification, current available protocols are: * index-flat * index-berkeleydb * biofetch * biosql * bsane-corba * xembl In BioRuby, you can use index-flat, index-berkleydb, biofetch and biosql. Note that the BioRegistry specification sometimes gets updated and BioRuby does not always follow quickly. Here is an example. It creates a Bio::Registry object and reads the configuration files: reg = Bio::Registry.new # connects to the database "genbank" serv = reg.get_database('genbank') # gets entry of the ID entry = serv.get_by_id('AA2CG') The variable "serv" is a server object corresponding to the settings written in the configuration files. The class of the object is one of Bio::SQL, Bio::Fetch, and so on. Note that Bio::Registry#get_database("name") returns nil if no database is found. After that, you can use the get_by_id method and some specific methods. Please refer to the sections below for more information. == BioFlat BioFlat is a mechanism to create index files of flat files and to retrieve these entries fast. There are two index types. index-flat is a simple index performing binary search without using any external libraries of Ruby. index-berkeleydb uses Berkeley DB for indexing - but requires installing bdb on your computer, as well as the BDB Ruby package. To create the index itself, you can use br_bioflat.rb command bundled with BioRuby. % br_bioflat.rb --makeindex database_name [--format data_format] filename... The format can be omitted because BioRuby has autodetection. If that doesn't work, you can try specifying the data format as the name of a BioRuby database class. Search and retrieve data from database: % br_bioflat.rb database_name identifier For example, to create an index of GenBank files gbbct*.seq and get the entry from the database: % br_bioflat.rb --makeindex my_bctdb --format GenBank gbbct*.seq % br_bioflat.rb my_bctdb A16STM262 If you have Berkeley DB on your system and installed the bdb extension module of Ruby (see (()) ), you can create and search indexes with Berkeley DB - a very fast alternative that uses little computer memory. When creating the index, use the "--makeindex-bdb" option instead of "--makeindex". % br_bioflat.rb --makeindex-bdb database_name [--format data_format] filename... == BioFetch Note: this section is an advanced topic BioFetch is a database retrieval mechanism via CGI. CGI Parameters, options and error codes are standardized. Client access via http is possible giving the database name, identifiers and format to retrieve entries. The BioRuby project has a BioFetch server at bioruby.org. It uses GenomeNet's DBGET system as a backend. The source code of the server is in sample/ directory. Currently, there are only two BioFetch servers in the world: bioruby.org and EBI. Here are some methods to retrieve entries from our BioFetch server. (1) Using a web browser http://bioruby.org/cgi-bin/biofetch.rb (2) Using the br_biofetch.rb command % br_biofetch.rb db_name entry_id (3) Directly using Bio::Fetch in a script serv = Bio::Fetch.new(server_url) entry = serv.fetch(db_name, entry_id) (4) Indirectly using Bio::Fetch via BioRegistry in script reg = Bio::Registry.new serv = reg.get_database('genbank') entry = serv.get_by_id('AA2CG') If you want to use (4), you have to include some settings in seqdatabase.ini. For example: [genbank] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb biodbname=genbank === The combination of BioFetch, Bio::KEGG::GENES and Bio::AAindex1 Bioinformatics is often about gluing things together. Here is an example that gets the bacteriorhodopsin gene (VNG1467G) of the archaea Halobacterium from KEGG GENES database and gets alpha-helix index data (BURA740101) from the AAindex (Amino acid indices and similarity matrices) database, and shows the helix score for each 15-aa length overlapping window. #!/usr/bin/env ruby require 'bio' entry = Bio::Fetch.query('hal', 'VNG1467G') aaseq = Bio::KEGG::GENES.new(entry).aaseq entry = Bio::Fetch.query('aax1', 'BURA740101') helix = Bio::AAindex1.new(entry).index position = 1 win_size = 15 aaseq.window_search(win_size) do |subseq| score = subseq.total(helix) puts [ position, score ].join("\t") position += 1 end The special method Bio::Fetch.query uses the preset BioFetch server at bioruby.org. (The server internally gets data from GenomeNet. Because the KEGG/GENES database and AAindex database are not available from other BioFetch servers, we used the bioruby.org server with Bio::Fetch.query method.) == BioSQL BioSQL is a well known schema to store and retrive biological sequences using a RDBMS like PostgreSQL or MySQL: note that SQLite is not supported. First of all, you must install a database engine or have access to a remote one. Then create the schema and populate with the taxonomy. You can follow the (()) to accomplish these steps. Next step is to install these gems: * ActiveRecord * CompositePrimaryKeys (Rails doesn't handle by default composite primary keys) * The layer to comunicate with you preferred RDBMS (postgresql, mysql, jdbcmysql in case you are running JRuby ) You can find ActiveRecord's models in /bioruby/lib/bio/io/biosql When you have your database up and running, you can connect to it like this: #!/usr/bin/env ruby require 'bio' connection = Bio::SQL.establish_connection({'development'=>{'hostname'=>"YourHostname", 'database'=>"CoolBioSeqDB", 'adapter'=>"jdbcmysql", 'username'=>"YourUser", 'password'=>"YouPassword" } }, 'development') #The first parameter is the hash contaning the description of the configuration; similar to database.yml in Rails applications, you can declare different environment. #The second parameter is the environment to use: 'development', 'test', or 'production'. #To store a sequence into the database you simply need a biosequence object. biosql_database = Bio::SQL::Biodatabase.find(:first) ff = Bio::GenBank.open("gbvrl1.seq") ff.each_entry do |gb| Bio::SQL::Sequence.new(:biosequence=>gb.to_biosequence, :biodatabase=>biosql_database end #You can list all the entries into every database Bio::SQL.list_entries #list databases: Bio::SQL.list_databases #retriving a generic accession bioseq = Bio::SQL.fetch_accession("YouAccession") #If you use biosequence objects, you will find all its method mapped to BioSQL sequences. #But you can also access to the models directly: #get the raw sequence associated with your accession bioseq.entry.biosequence #get the length of your sequence; this is the explicit form of bioseq.length bioseq.entry.biosequence.length #convert the sequence into GenBank format bioseq.to_biosequence.output(:genbank) BioSQL's (()) is not very intuitive for beginners, so spend some time on understanding it. In the end if you know a little bit of Ruby on Rails, everything will go smoothly. You can find information on Annotation (()). ToDo: add exemaples from George. I remember he did some cool post on BioSQL and Rails. = PhyloXML PhyloXML is an XML language for saving, analyzing and exchanging data of annotated phylogenetic trees. PhyloXML's parser in BioRuby is implemented in Bio::PhyloXML::Parser, and its writer in Bio::PhyloXML::Writer. More information can be found at (()). Bio::PhyloXML have been split out from BioRuby core and have been released as bio-phyloxml gem. To use Bio::PhyloXML, install the bio-phyloxml gem. % gem install bio-phyloxml The tutorial of Bio::PhyloXML is bundled in bio-phyloxml. (()) == The BioRuby example programs Some sample programs are stored in ./samples/ directory. For example, the n2aa.rb program (transforms a nucleic acid sequence into an amino acid sequence) can be run using: ./sample/na2aa.rb test/data/fasta/example1.txt == Unit testing and doctests BioRuby comes with an extensive testing framework with over 1300 tests and 2700 assertions. To run the unit tests: cd test ruby runner.rb We have also started with doctest for Ruby. We are porting the examples in this tutorial to doctest - more info upcoming. == Further reading See the BioRuby in anger Wiki. A lot of BioRuby's documentation exists in the source code and unit tests. To really dive in you will need the latest source code tree. The embedded rdoc documentation for the BioRuby source code can be viewed online at (()). == BioRuby Shell The BioRuby shell implementation is located in ./lib/bio/shell. It is very interesting as it uses IRB (the Ruby intepreter) which is a powerful environment described in (()). IRB commands can be typed directly into the shell, e.g. bioruby!> IRB.conf[:PROMPT_MODE] ==!> :PROMPT_C Additionally, you also may want to install the optional Ruby readline support - with Debian libreadline-ruby. To edit a previous line you may have to press line down (down arrow) first. = Helpful tools Apart from rdoc you may also want to use rtags - which allows jumping around source code by clicking on class and method names. cd bioruby/lib rtags -R --vi For a tutorial see (()) = APPENDIX == Biogem: Additional BioRuby plugins Biogem is one of the exciting developments for Ruby in bioinformatics! Biogems add new functionality next to the BioRuby core project (BioRuby is a biogem itself). A biogem is simply installed with gem install bio # The core BioRuby gem gem install bio-core # BioRuby + stable pure Ruby biogems gem install bio-core-ext # bio-core + stable Ruby extensions Information on these biogems, and the many others available, see (()) or (()). == Ruby Ensembl API The Ruby Ensembl API is a Ruby API to the Ensembl database. It is NOT currently included in the BioRuby archives. To install it, see (()) for more information. === Gene Ontology (GO) through the Ruby Ensembl API Gene Ontologies can be fetched through the Ruby Ensembl API package: require 'ensembl' Ensembl::Core::DBConnection.connect('drosophila_melanogaster') infile = IO.readlines(ARGV.shift) # reading your comma-separated accession mapping file (one line per mapping) infile.each do |line| accs = line.split(",") # Split the comma-sep.entries into an array drosphila_acc = accs.shift # the first entry is the Drosophila acc mosq_acc = accs.shift # the second entry is your Mosq. acc gene = Ensembl::Core::Gene.find_by_stable_id(drosophila_acc) print "#{mosq_acc}" gene.go_terms.each do |go| print ",#{go}" end end Prints each mosq. accession/uniq identifier and the GO terms from the Drosphila homologues. == Using BioPerl or BioPython from Ruby A possible route is to opt for JRuby and Jython on the JAVA virtual machine (JVM). At the moment there is no easy way of accessing BioPerl or BioPython directly from Ruby. A possibility is to create a Perl or Python server that gets accessed through XML/RPC or SOAP. == Installing required external libraries At this point for using BioRuby no additional libraries are needed. This may change, so keep an eye on the Bioruby website. Also when a package is missing BioRuby should show an informative message. At this point installing third party Ruby packages can be a bit painful, as the gem standard for packages evolved late and some still force you to copy things by hand. Therefore read the README's carefully that come with each package. == Trouble shooting * Error: in `require': no such file to load -- bio (LoadError) Ruby is failing to find the BioRuby libraries - add it to the RUBYLIB path, or pass it to the interpeter. For example: ruby -I$BIORUBYPATH/lib yourprogram.rb == Modifying this page IMPORTANT NOTICE: This page is maintained in the BioRuby source code repository. Please edit the file there otherwise changes may get lost. See (()) for repository and mailing list access. =end bio-2.0.3/doc/Tutorial.rd.html0000644000175000017500000014673714141516614015502 0ustar nileshnilesh Tutorial.rd

BioRuby Tutorial

  • Copyright (C) 2001-2003 KATAYAMA Toshiaki <k .at. bioruby.org>
  • Copyright (C) 2005-2011 Pjotr Prins, Naohisa Goto and others

This document was last modified: 2011/10/14 Current editor: Michael O'Keefe <okeefm (at) rpi (dot) edu>

The latest version resides in the GIT source code repository: ./doc/Tutorial.rd.

Introduction

This is a tutorial for using Bioruby. A basic knowledge of Ruby is required. If you want to know more about the programming language, we recommend the latest Ruby book Programming Ruby by Dave Thomas and Andy Hunt - the first edition can be read online here.

For BioRuby you need to install Ruby and the BioRuby package on your computer

You can check whether Ruby is installed on your computer and what version it has with the

% ruby -v

command. You should see something like:

ruby 1.9.2p290 (2011-07-09 revision 32553) [i686-linux]

If you see no such thing you'll have to install Ruby using your installation manager. For more information see the Ruby website.

With Ruby download and install Bioruby using the links on the Bioruby website. The recommended installation is via RubyGems:

gem install bio

See also the Bioruby wiki.

A lot of BioRuby's documentation exists in the source code and unit tests. To really dive in you will need the latest source code tree. The embedded rdoc documentation can be viewed online at bioruby's rdoc. But first lets start!

Trying Bioruby

Bioruby comes with its own shell. After unpacking the sources run one of the following commands:

bioruby

or, from the source tree

cd bioruby
ruby -I lib bin/bioruby

and you should see a prompt

bioruby>

Now test the following:

bioruby> require 'bio'
bioruby> seq = Bio::Sequence::NA.new("atgcatgcaaaa")
==> "atgcatgcaaaa"

bioruby> seq.complement
==> "ttttgcatgcat"

See the the Bioruby shell section below for more tweaking. If you have trouble running examples also check the section below on trouble shooting. You can also post a question to the mailing list. BioRuby developers usually try to help.

Working with nucleic / amino acid sequences (Bio::Sequence class)

The Bio::Sequence class allows the usual sequence transformations and translations. In the example below the DNA sequence "atgcatgcaaaa" is converted into the complemental strand and spliced into a subsequence; next, the nucleic acid composition is calculated and the sequence is translated into the amino acid sequence, the molecular weight calculated, and so on. When translating into amino acid sequences, the frame can be specified and optionally the codon table selected (as defined in codontable.rb).

bioruby> seq = Bio::Sequence::NA.new("atgcatgcaaaa")
==> "atgcatgcaaaa"

# complemental sequence (Bio::Sequence::NA object)
bioruby> seq.complement
==> "ttttgcatgcat"

bioruby> seq.subseq(3,8) # gets subsequence of positions 3 to 8 (starting from 1)
==> "gcatgc"
bioruby> seq.gc_percent 
==> 33
bioruby> seq.composition 
==> {"a"=>6, "c"=>2, "g"=>2, "t"=>2}
bioruby> seq.translate 
==> "MHAK"
bioruby> seq.translate(2)        # translate from frame 2
==> "CMQ"
bioruby> seq.translate(1,11)     # codon table 11
==> "MHAK"
bioruby> seq.translate.codes
==> ["Met", "His", "Ala", "Lys"]
bioruby> seq.translate.names
==> ["methionine", "histidine", "alanine", "lysine"]
bioruby>  seq.translate.composition
==> {"K"=>1, "A"=>1, "M"=>1, "H"=>1}
bioruby> seq.translate.molecular_weight
==> 485.605
bioruby> seq.complement.translate
==> "FCMH"

get a random sequence with the same NA count:

bioruby> counts = {'a'=>seq.count('a'),'c'=>seq.count('c'),'g'=>seq.count('g'),'t'=>seq.count('t')}
==> {"a"=>6, "c"=>2, "g"=>2, "t"=>2}
bioruby!> randomseq = Bio::Sequence::NA.randomize(counts) 
==!> "aaacatgaagtc"

bioruby!> print counts
a6c2g2t2  
bioruby!> p counts
{"a"=>6, "c"=>2, "g"=>2, "t"=>2}

The p, print and puts methods are standard Ruby ways of outputting to the screen. If you want to know more about standard Ruby commands you can use the 'ri' command on the command line (or the help command in Windows). For example

% ri puts
% ri p
% ri File.open

Nucleic acid sequence are members of the Bio::Sequence::NA class, and amino acid sequence are members of the Bio::Sequence::AA class. Shared methods are in the parent Bio::Sequence class.

As Bio::Sequence inherits Ruby's String class, you can use String class methods. For example, to get a subsequence, you can not only use subseq(from, to) but also String#[].

Please take note that the Ruby's string's are base 0 - i.e. the first letter has index 0, for example:

bioruby> s = 'abc'
==> "abc"
bioruby> s[0].chr
==> "a"
bioruby> s[0..1]
==> "ab"

So when using String methods, you should subtract 1 from positions conventionally used in biology. (subseq method will throw an exception if you specify positions smaller than or equal to 0 for either one of the "from" or "to".)

The window_search(window_size, step_size) method shows a typical Ruby way of writing concise and clear code using 'closures'. Each sliding window creates a subsequence which is supplied to the enclosed block through a variable named +s+.

  • Show average percentage of GC content for 20 bases (stepping the default one base at a time):

    bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa")
    ==> "atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa"
    
    bioruby> a=[]; seq.window_search(20) { |s| a.push s.gc_percent } 
    bioruby> a
    ==> [30, 35, 40, 40, 35, 35, 35, 30, 25, 30, 30, 30, 35, 35, 35, 35, 35, 40, 45, 45, 45, 45, 40, 35, 40, 40, 40, 40, 40, 35, 35, 35, 30, 30, 30]

Since the class of each subsequence is the same as original sequence (Bio::Sequence::NA or Bio::Sequence::AA or Bio::Sequence), you can use all methods on the subsequence. For example,

  • Shows translation results for 15 bases shifting a codon at a time

    bioruby> a = []
    bioruby> seq.window_search(15, 3) { | s | a.push s.translate }
    bioruby> a
    ==> ["MHAIK", "HAIKL", "AIKLI", "IKLIP", "KLIPI", "LIPIR", "IPIRS", "PIRSS", "IRSSR", "RSSRS", "SSRSS", "SRSSK", "RSSKK", "SSKKK"]

Finally, the window_search method returns the last leftover subsequence. This allows for example

  • Divide a genome sequence into sections of 10000bp and output FASTA formatted sequences (line width 60 chars). The 1000bp at the start and end of each subsequence overlapped. At the 3' end of the sequence the leftover is also added:

    i = 1
    textwidth=60
    remainder = seq.window_search(10000, 9000) do |s|
      puts s.to_fasta("segment #{i}", textwidth)
      i += 1
    end
    if remainder
      puts remainder.to_fasta("segment #{i}", textwidth) 
    end

If you don't want the overlapping window, set window size and stepping size to equal values.

Other examples

  • Count the codon usage

    bioruby> codon_usage = Hash.new(0)
    bioruby> seq.window_search(3, 3) { |s| codon_usage[s] += 1 }
    bioruby> codon_usage
    ==> {"cat"=>1, "aaa"=>3, "cca"=>1, "att"=>2, "aga"=>1, "atc"=>1, "cta"=>1, "gca"=>1, "cga"=>1, "tca"=>3, "aag"=>1, "tcc"=>1, "atg"=>1}
  • Calculate molecular weight for each 10-aa peptide (or 10-nt nucleic acid)

    bioruby> a = []
    bioruby> seq.window_search(10, 10) { |s| a.push s.molecular_weight }
    bioruby> a
    ==> [3096.2062, 3086.1962, 3056.1762, 3023.1262, 3073.2262]

In most cases, sequences are read from files or retrieved from databases. For example:

require 'bio'

input_seq = ARGF.read       # reads all files in arguments

my_naseq = Bio::Sequence::NA.new(input_seq)
my_aaseq = my_naseq.translate

puts my_aaseq

Save the program above as na2aa.rb. Prepare a nucleic acid sequence described below and save it as my_naseq.txt:

gtggcgatctttccgaaagcgatgactggagcgaagaaccaaagcagtgacatttgtctg
atgccgcacgtaggcctgataagacgcggacagcgtcgcatcaggcatcttgtgcaaatg
tcggatgcggcgtga

na2aa.rb translates a nucleic acid sequence to a protein sequence. For example, translates my_naseq.txt:

% ruby na2aa.rb my_naseq.txt

or use a pipe!

% cat my_naseq.txt|ruby na2aa.rb

Outputs

VAIFPKAMTGAKNQSSDICLMPHVGLIRRGQRRIRHLVQMSDAA*

You can also write this, a bit fancifully, as a one-liner script.

% ruby -r bio -e 'p Bio::Sequence::NA.new($<.read).translate' my_naseq.txt

In the next section we will retrieve data from databases instead of using raw sequence files. One generic example of the above can be found in ./sample/na2aa.rb.

Parsing GenBank data (Bio::GenBank class)

We assume that you already have some GenBank data files. (If you don't, download some .seq files from ftp://ftp.ncbi.nih.gov/genbank/)

As an example we will fetch the ID, definition and sequence of each entry from the GenBank format and convert it to FASTA. This is also an example script in the BioRuby distribution.

A first attempt could be to use the Bio::GenBank class for reading in the data:

#!/usr/bin/env ruby

require 'bio'

# Read all lines from STDIN split by the GenBank delimiter
while entry = gets(Bio::GenBank::DELIMITER)
  gb = Bio::GenBank.new(entry)      # creates GenBank object

  print ">#{gb.accession} "         # Accession
  puts gb.definition                # Definition
  puts gb.naseq                     # Nucleic acid sequence 
                                    # (Bio::Sequence::NA object)
end

But that has the disadvantage the code is tied to GenBank input. A more generic method is to use Bio::FlatFile which allows you to use different input formats:

#!/usr/bin/env ruby

require 'bio'

ff = Bio::FlatFile.new(Bio::GenBank, ARGF)
ff.each_entry do |gb|
  definition = "#{gb.accession} #{gb.definition}"
  puts gb.naseq.to_fasta(definition, 60)
end

For example, in turn, reading FASTA format files:

#!/usr/bin/env ruby

require 'bio'

ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF)
ff.each_entry do |f|
  puts "definition : " + f.definition
  puts "nalen      : " + f.nalen.to_s
  puts "naseq      : " + f.naseq
end

In the above two scripts, the first arguments of Bio::FlatFile.new are database classes of BioRuby. This is expanded on in a later section.

Again another option is to use the Bio::DB.open class:

#!/usr/bin/env ruby

require 'bio'

ff = Bio::GenBank.open("gbvrl1.seq")
ff.each_entry do |gb|
  definition = "#{gb.accession} #{gb.definition}"
  puts gb.naseq.to_fasta(definition, 60)
end

Next, we are going to parse the GenBank 'features', which is normally very complicated:

#!/usr/bin/env ruby

require 'bio'

ff = Bio::FlatFile.new(Bio::GenBank, ARGF)

# iterates over each GenBank entry
ff.each_entry do |gb|

  # shows accession and organism
  puts "# #{gb.accession} - #{gb.organism}"

  # iterates over each element in 'features'
  gb.features.each do |feature|
    position = feature.position
    hash = feature.assoc            # put into Hash

    # skips the entry if "/translation=" is not found
    next unless hash['translation']

    # collects gene name and so on and joins it into a string
    gene_info = [
      hash['gene'], hash['product'], hash['note'], hash['function']
    ].compact.join(', ')

    # shows nucleic acid sequence
    puts ">NA splicing('#{position}') : #{gene_info}"
    puts gb.naseq.splicing(position)

    # shows amino acid sequence translated from nucleic acid sequence
    puts ">AA translated by splicing('#{position}').translate"
    puts gb.naseq.splicing(position).translate

    # shows amino acid sequence in the database entry (/translation=)
    puts ">AA original translation"
    puts hash['translation']
  end
end
  • Note: In this example Feature#assoc method makes a Hash from a feature object. It is useful because you can get data from the hash by using qualifiers as keys. But there is a risk some information is lost when two or more qualifiers are the same. Therefore an Array is returned by Feature#feature.

Bio::Sequence#splicing splices subsequences from nucleic acid sequences according to location information used in GenBank, EMBL and DDBJ.

When the specified translation table is different from the default (universal), or when the first codon is not "atg" or the protein contains selenocysteine, the two amino acid sequences will differ.

The Bio::Sequence#splicing method takes not only DDBJ/EMBL/GenBank feature style location text but also Bio::Locations object. For more information about location format and Bio::Locations class, see bio/location.rb.

  • Splice according to location string used in a GenBank entry

    naseq.splicing('join(2035..2050,complement(1775..1818),13..345')
  • Generate Bio::Locations object and pass the splicing method

    locs = Bio::Locations.new('join((8298.8300)..10206,1..855)')
    naseq.splicing(locs)

You can also use this splicing method for amino acid sequences (Bio::Sequence::AA objects).

  • Splicing peptide from a protein (e.g. signal peptide)

    aaseq.splicing('21..119')

More databases

Databases in BioRuby are essentially accessed like that of GenBank with classes like Bio::GenBank, Bio::KEGG::GENES. A full list can be found in the ./lib/bio/db directory of the BioRuby source tree.

In many cases the Bio::DatabaseClass acts as a factory pattern and recognises the database type automatically - returning a parsed object. For example using Bio::FlatFile class as described above. The first argument of the Bio::FlatFile.new is database class name in BioRuby (such as Bio::GenBank, Bio::KEGG::GENES and so on).

ff = Bio::FlatFile.new(Bio::DatabaseClass, ARGF)

Isn't it wonderful that Bio::FlatFile automagically recognizes each database class?

#!/usr/bin/env ruby

require 'bio'

ff = Bio::FlatFile.auto(ARGF)
ff.each_entry do |entry|
  p entry.entry_id          # identifier of the entry
  p entry.definition        # definition of the entry
  p entry.seq               # sequence data of the entry
end

An example that can take any input, filter using a regular expression and output to a FASTA file can be found in sample/any2fasta.rb. With this technique it is possible to write a Unix type grep/sort pipe for sequence information. One example using scripts in the BIORUBY sample folder:

fastagrep.rb '/At|Dm/' database.seq | fastasort.rb

greps the database for Arabidopsis and Drosophila entries and sorts the output to FASTA.

Other methods to extract specific data from database objects can be different between databases, though some methods are common (see the guidelines for common methods in bio/db.rb).

  • entry_id --> gets ID of the entry
  • definition --> gets definition of the entry
  • reference --> gets references as Bio::Reference object
  • organism --> gets species
  • seq, naseq, aaseq --> returns sequence as corresponding sequence object

Refer to the documents of each database to find the exact naming of the included methods.

In general, BioRuby uses the following conventions: when a method name is plural, the method returns some object as an Array. For example, some classes have a "references" method which returns multiple Bio::Reference objects as an Array. And some classes have a "reference" method which returns a single Bio::Reference object.

Alignments (Bio::Alignment)

The Bio::Alignment class in bio/alignment.rb is a container class like Ruby's Hash and Array classes and BioPerl's Bio::SimpleAlign. A very simple example is:

bioruby> seqs = [ 'atgca', 'aagca', 'acgca', 'acgcg' ]
bioruby> seqs = seqs.collect{ |x| Bio::Sequence::NA.new(x) }
# creates alignment object
bioruby> a = Bio::Alignment.new(seqs)
bioruby> a.consensus 
==> "a?gc?"
# shows IUPAC consensus
p a.consensus_iupac       # ==> "ahgcr"

# iterates over each seq
a.each { |x| p x }
  # ==>
  #    "atgca"
  #    "aagca"
  #    "acgca"
  #    "acgcg"
# iterates over each site
a.each_site { |x| p x }
  # ==>
  #    ["a", "a", "a", "a"]
  #    ["t", "a", "c", "c"]
  #    ["g", "g", "g", "g"]
  #    ["c", "c", "c", "c"]
  #    ["a", "a", "a", "g"]

# doing alignment by using CLUSTAL W.
# clustalw command must be installed.
factory = Bio::ClustalW.new
a2 = a.do_align(factory)

Read a ClustalW or Muscle 'ALN' alignment file:

bioruby> aln = Bio::ClustalW::Report.new(File.read('../test/data/clustalw/example1.aln'))
bioruby> aln.header
==> "CLUSTAL 2.0.9 multiple sequence alignment"

Fetch a sequence:

bioruby> seq = aln.get_sequence(1)
bioruby> seq.definition
==> "gi|115023|sp|P10425|"

Get a partial sequence:

bioruby> seq.to_s[60..120]
==> "LGYFNG-EAVPSNGLVLNTSKGLVLVDSSWDNKLTKELIEMVEKKFQKRVTDVIITHAHAD"

Show the full alignment residue match information for the sequences in the set:

bioruby> aln.match_line[60..120]
==> "     .     **. .   ..   ::*:       . * : : .        .: .* * *"

Return a Bio::Alignment object:

bioruby> aln.alignment.consensus[60..120]
==> "???????????SN?????????????D??????????L??????????????????H?H?D"

Restriction Enzymes (Bio::RE)

BioRuby has extensive support for restriction enzymes (REs). It contains a full library of commonly used REs (from REBASE) which can be used to cut single stranded RNA or double stranded DNA into fragments. To list all enzymes:

rebase = Bio::RestrictionEnzyme.rebase
rebase.each do |enzyme_name, info|
  p enzyme_name
end

and to cut a sequence with an enzyme follow up with:

res = seq.cut_with_enzyme('EcoRII', {:max_permutations => 0}, 
  {:view_ranges => true})
if res.kind_of? Symbol #error
   err = Err.find_by_code(res.to_s)
   unless err
     err = Err.new(:code => res.to_s)
   end
end
res.each do |frag|
   em = EnzymeMatch.new

   em.p_left = frag.p_left
   em.p_right = frag.p_right
   em.c_left = frag.c_left
   em.c_right = frag.c_right

   em.err = nil
   em.enzyme = ar_enz
   em.sequence = ar_seq
   p em
 end

Sequence homology search by using the FASTA program (Bio::Fasta)

Let's start with a query.pep file which contains a sequence in FASTA format. In this example we are going to execute a homology search from a remote internet site or on your local machine. Note that you can use the ssearch program instead of fasta when you use it in your local machine.

using FASTA in local machine

Install the fasta program on your machine (the command name looks like fasta34. FASTA can be downloaded from ftp://ftp.virginia.edu/pub/fasta/).

First, you must prepare your FASTA-formatted database sequence file target.pep and FASTA-formatted query.pep.

#!/usr/bin/env ruby

require 'bio'

# Creates FASTA factory object ("ssearch" instead of 
# "fasta34" can also work)
factory = Bio::Fasta.local('fasta34', ARGV.pop)
(EDITOR's NOTE: not consistent pop command)

ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF)

# Iterates over each entry. the variable "entry" is a 
# Bio::FastaFormat object:
ff.each do |entry|
  # shows definition line (begins with '>') to the standard error output
  $stderr.puts "Searching ... " + entry.definition

  # executes homology search. Returns Bio::Fasta::Report object.
  report = factory.query(entry)

  # Iterates over each hit
  report.each do |hit|
    # If E-value is smaller than 0.0001
    if hit.evalue < 0.0001
      # shows identifier of query and hit, E-value, start and 
      # end positions of homologous region 
      print "#{hit.query_id} : evalue #{hit.evalue}\t#{hit.target_id} at "
      p hit.lap_at
    end
  end
end

We named above script f_search.rb. You can execute it as follows:

% ./f_search.rb query.pep target.pep > f_search.out

In above script, the variable "factory" is a factory object for executing FASTA many times easily. Instead of using Fasta#query method, Bio::Sequence#fasta method can be used.

seq = ">test seq\nYQVLEEIGRGSFGSVRKVIHIPTKKLLVRKDIKYGHMNSKE"
seq.fasta(factory)

When you want to add options to FASTA commands, you can set the third argument of the Bio::Fasta.local method. For example, the following sets ktup to 1 and gets a list of the top 10 hits:

factory = Bio::Fasta.local('fasta34', 'target.pep', '-b 10')
factory.ktup = 1

Bio::Fasta#query returns a Bio::Fasta::Report object. We can get almost all information described in FASTA report text with the Report object. For example, getting information for hits:

report.each do |hit|
  puts hit.evalue           # E-value
  puts hit.sw               # Smith-Waterman score (*)
  puts hit.identity         # % identity
  puts hit.overlap          # length of overlapping region
  puts hit.query_id         # identifier of query sequence
  puts hit.query_def        # definition(comment line) of query sequence
  puts hit.query_len        # length of query sequence
  puts hit.query_seq        # sequence of homologous region
  puts hit.target_id        # identifier of hit sequence
  puts hit.target_def       # definition(comment line) of hit sequence
  puts hit.target_len       # length of hit sequence
  puts hit.target_seq       # hit of homologous region of hit sequence
  puts hit.query_start      # start position of homologous 
                            # region in query sequence
  puts hit.query_end        # end position of homologous region 
                            # in query sequence
  puts hit.target_start     # start posiotion of homologous region 
                            # in hit(target) sequence
  puts hit.target_end       # end position of homologous region 
                            # in hit(target) sequence
  puts hit.lap_at           # array of above four numbers
end

Most of above methods are common to the Bio::Blast::Report described below. Please refer to the documentation of the Bio::Fasta::Report class for FASTA-specific details.

If you need the original output text of FASTA program you can use the "output" method of the factory object after the "query" method.

report = factory.query(entry)
puts factory.output

using FASTA from a remote internet site

  • Note: Currently, only GenomeNet (fasta.genome.jp) is supported. check the class documentation for updates.

For accessing a remote site the Bio::Fasta.remote method is used instead of Bio::Fasta.local. When using a remote method, the databases available may be limited, but, otherwise, you can do the same things as with a local method.

Available databases in GenomeNet:

  • Protein database
    • nr-aa, genes, vgenes.pep, swissprot, swissprot-upd, pir, prf, pdbstr
  • Nucleic acid database
    • nr-nt, genbank-nonst, gbnonst-upd, dbest, dbgss, htgs, dbsts, embl-nonst, embnonst-upd, genes-nt, genome, vgenes.nuc

Select the databases you require. Next, give the search program from the type of query sequence and database.

  • When query is an amino acid sequence
    • When protein database, program is "fasta".
    • When nucleic database, program is "tfasta".
  • When query is a nucleic acid sequence
    • When nucleic database, program is "fasta".
    • (When protein database, the search would fail.)

For example, run:

program = 'fasta'
database = 'genes'

factory = Bio::Fasta.remote(program, database)

and try out the same commands as with the local search shown earlier.

Homology search by using BLAST (Bio::Blast class)

The BLAST interface is very similar to that of FASTA and both local and remote execution are supported. Basically replace above examples Bio::Fasta with Bio::Blast!

For example the BLAST version of f_search.rb is:

# create BLAST factory object
factory = Bio::Blast.local('blastp', ARGV.pop)

For remote execution of BLAST in GenomeNet, Bio::Blast.remote is used. The parameter "program" is different from FASTA - as you can expect:

  • When query is a amino acid sequence
    • When protein database, program is "blastp".
    • When nucleic database, program is "tblastn".
  • When query is a nucleic acid sequence
    • When protein database, program is "blastx"
    • When nucleic database, program is "blastn".
    • ("tblastx" for six-frame search.)

Bio::BLAST uses "-m 7" XML output of BLAST by default when either XMLParser or REXML (both of them are XML parser libraries for Ruby - of the two XMLParser is the fastest) is installed on your computer. In Ruby version 1.8.0 or later, REXML is bundled with Ruby's distribution.

When no XML parser library is present, Bio::BLAST uses "-m 8" tabular deliminated format. Available information is limited with the "-m 8" format so installing an XML parser is recommended.

Again, the methods in Bio::Fasta::Report and Bio::Blast::Report (and Bio::Fasta::Report::Hit and Bio::Blast::Report::Hit) are similar. There are some additional BLAST methods, for example, bit_score and midline.

report.each do |hit|
  puts hit.bit_score       
  puts hit.query_seq       
  puts hit.midline         
  puts hit.target_seq      

  puts hit.evalue          
  puts hit.identity        
  puts hit.overlap         
  puts hit.query_id        
  puts hit.query_def       
  puts hit.query_len       
  puts hit.target_id       
  puts hit.target_def      
  puts hit.target_len      
  puts hit.query_start     
  puts hit.query_end       
  puts hit.target_start    
  puts hit.target_end      
  puts hit.lap_at          
end

For simplicity and API compatibility, some information such as score is extracted from the first Hsp (High-scoring Segment Pair).

Check the documentation for Bio::Blast::Report to see what can be retrieved. For now suffice to say that Bio::Blast::Report has a hierarchical structure mirroring the general BLAST output stream:

  • In a Bio::Blast::Report object, @iterations is an array of Bio::Blast::Report::Iteration objects.
    • In a Bio::Blast::Report::Iteration object, @hits is an array of Bio::Blast::Report::Hits objects.
      • In a Bio::Blast::Report::Hits object, @hsps is an array of Bio::Blast::Report::Hsp objects.

See bio/appl/blast.rb and bio/appl/blast/*.rb for more information.

Parsing existing BLAST output files

When you already have BLAST output files and you want to parse them, you can directly create Bio::Blast::Report objects without the Bio::Blast factory object. For this purpose use Bio::Blast.reports, which supports the "-m 0" default and "-m 7" XML type output format.

  • For example:

    blast_version = nil; result = []
    Bio::Blast.reports(File.new("../test/data/blast/blastp-multi.m7")) do |report|
      blast_version = report.version
      report.iterations.each do |itr|
        itr.hits.each do |hit|
          result.push hit.target_id
        end
      end
    end
    blast_version
    # ==> "blastp 2.2.18 [Mar-02-2008]"
    result
    # ==> ["BAB38768", "BAB38768", "BAB38769", "BAB37741"]
  • another example:

    require 'bio'
    Bio::Blast.reports(ARGF) do |report| 
      puts "Hits for " + report.query_def + " against " + report.db
      report.each do |hit|
        print hit.target_id, "\t", hit.evalue, "\n" if hit.evalue < 0.001
      end
    end

Save the script as hits_under_0.001.rb and to process BLAST output files *.xml, you can run it with:

% ruby hits_under_0.001.rb *.xml

Sometimes BLAST XML output may be wrong and can not be parsed. Check whether blast is version 2.2.5 or later. See also blast --help.

Bio::Blast loads the full XML file into memory. If this causes a problem you can split the BLAST XML file into smaller chunks using XML-Twig. An example can be found in Biotools.

Add remote BLAST search sites

Note: this section is an advanced topic

Here a more advanced application for using BLAST sequence homology search services. BioRuby currently only supports GenomeNet. If you want to add other sites, you must write the following:

  • the calling CGI (command-line options must be processed for the site).
  • make sure you get BLAST output text as supported format by BioRuby (e.g. "-m 8", "-m 7" or default("-m 0")).

In addition, you must write a private class method in Bio::Blast named "exec_MYSITE" to get query sequence and to pass the result to Bio::Blast::Report.new(or Bio::Blast::Default::Report.new):

factory = Bio::Blast.remote(program, db, option, 'MYSITE')

When you write above routines, please send them to the BioRuby project, and they may be included in future releases.

Generate a reference list using PubMed (Bio::PubMed)

Nowadays using NCBI E-Utils is recommended. Use Bio::PubMed.esearch and Bio::PubMed.efetch.

#!/usr/bin/env ruby

require 'bio'

# NCBI announces that queries without email address will return error
# after June 2010. When you modify the script, please enter your email
# address instead of the staff's.
Bio::NCBI.default_email = 'staff@bioruby.org'

keywords = ARGV.join(' ')

options = {
  'maxdate' => '2003/05/31',
  'retmax' => 1000,
}

entries = Bio::PubMed.esearch(keywords, options)

Bio::PubMed.efetch(entries).each do |entry|
  medline = Bio::MEDLINE.new(entry)
  reference = medline.reference
  puts reference.bibtex
end

The script works same as pmsearch.rb. But, by using NCBI E-Utils, more options are available. For example published dates to search and maximum number of hits to show results can be specified.

See the help page of E-Utils for more details.

More about BibTeX

In this section, we explain the simple usage of TeX for the BibTeX format bibliography list collected by above scripts. For example, to save BibTeX format bibliography data to a file named genoinfo.bib.

% ./pmfetch.rb 10592173 >> genoinfo.bib
% ./pmsearch.rb genome bioinformatics >> genoinfo.bib

The BibTeX can be used with Tex or LaTeX to form bibliography information with your journal article. For more information on using BibTex see BibTex HowTo site. A quick example:

Save this to hoge.tex:

\documentclass{jarticle}
\begin{document}
\bibliographystyle{plain}
foo bar KEGG database~\cite{PMID:10592173} baz hoge fuga.
\bibliography{genoinfo}
\end{document}

Then,

% latex hoge
% bibtex hoge # processes genoinfo.bib
% latex hoge  # creates bibliography list
% latex hoge  # inserts correct bibliography reference

Now, you get hoge.dvi and hoge.ps - the latter of which can be viewed with any Postscript viewer.

Bio::Reference#bibitem

When you don't want to create a bib file, you can use Bio::Reference#bibitem method instead of Bio::Reference#bibtex. In the above pmfetch.rb and pmsearch.rb scripts, change

puts reference.bibtex

to

puts reference.bibitem

Output documents should be bundled in \begin{thebibliography} and \end{thebibliography}. Save the following to hoge.tex

\documentclass{jarticle}
\begin{document}
foo bar KEGG database~\cite{PMID:10592173} baz hoge fuga.

\begin{thebibliography}{00}

\bibitem{PMID:10592173}
Kanehisa, M., Goto, S.
KEGG: kyoto encyclopedia of genes and genomes.,
{\em Nucleic Acids Res}, 28(1):27--30, 2000.

\end{thebibliography}
\end{document}

and run

% latex hoge   # creates bibliography list
% latex hoge   # inserts corrent bibliography reference

OBDA

OBDA (Open Bio Database Access) is a standardized method of sequence database access developed by the Open Bioinformatics Foundation. It was created during the BioHackathon by BioPerl, BioJava, BioPython, BioRuby and other projects' members (2002).

  • BioRegistry (Directory)
    • Mechanism to specify how and where to retrieve sequence data for each database.
  • BioFlat
    • Flatfile indexing by using binary tree or BDB(Berkeley DB).
  • BioFetch
    • Server-client model for getting entry from database via http.
  • BioSQL
    • Schemas to store sequence data to relational databases such as MySQL and PostgreSQL, and methods to retrieve entries from the database.

This tutorial only gives a quick overview of OBDA. Check out the OBDA site for more extensive details.

BioRegistry

BioRegistry allows for locating retrieval methods and database locations through configuration files. The priorities are

  • The file specified with method's parameter
  • ~/.bioinformatics/seqdatabase.ini
  • /etc/bioinformatics/seqdatabase.ini
  • http://www.open-bio.org/registry/seqdatabase.ini

Note that the last locaation refers to www.open-bio.org and is only used when all local configulation files are not available.

In the current BioRuby implementation all local configulation files are read. For databases with the same name settings encountered first are used. This means that if you don't like some settings of a database in the system's global configuration file (/etc/bioinformatics/seqdatabase.ini), you can easily override them by writing settings to ~/.bioinformatics/seqdatabase.ini.

The syntax of the configuration file is called a stanza format. For example

[DatabaseName]
protocol=ProtocolName
location=ServerName

You can write a description like the above entry for every database.

The database name is a local label for yourself, so you can name it freely and it can differ from the name of the actual databases. In the actual specification of BioRegistry where there are two or more settings for a database of the same name, it is proposed that connection to the database is tried sequentially with the order written in configuration files. However, this has not (yet) been implemented in BioRuby.

In addition, for some protocols, you must set additional options other than locations (e.g. user name for MySQL). In the BioRegistory specification, current available protocols are:

  • index-flat
  • index-berkeleydb
  • biofetch
  • biosql
  • bsane-corba
  • xembl

In BioRuby, you can use index-flat, index-berkleydb, biofetch and biosql. Note that the BioRegistry specification sometimes gets updated and BioRuby does not always follow quickly.

Here is an example. It creates a Bio::Registry object and reads the configuration files:

reg = Bio::Registry.new

# connects to the database "genbank"
serv = reg.get_database('genbank')

# gets entry of the ID
entry = serv.get_by_id('AA2CG')

The variable "serv" is a server object corresponding to the settings written in the configuration files. The class of the object is one of Bio::SQL, Bio::Fetch, and so on. Note that Bio::Registry#get_database("name") returns nil if no database is found.

After that, you can use the get_by_id method and some specific methods. Please refer to the sections below for more information.

BioFlat

BioFlat is a mechanism to create index files of flat files and to retrieve these entries fast. There are two index types. index-flat is a simple index performing binary search without using any external libraries of Ruby. index-berkeleydb uses Berkeley DB for indexing - but requires installing bdb on your computer, as well as the BDB Ruby package. To create the index itself, you can use br_bioflat.rb command bundled with BioRuby.

% br_bioflat.rb --makeindex database_name [--format data_format] filename...

The format can be omitted because BioRuby has autodetection. If that doesn't work, you can try specifying the data format as the name of a BioRuby database class.

Search and retrieve data from database:

% br_bioflat.rb database_name identifier

For example, to create an index of GenBank files gbbct*.seq and get the entry from the database:

% br_bioflat.rb --makeindex my_bctdb --format GenBank gbbct*.seq
% br_bioflat.rb my_bctdb A16STM262

If you have Berkeley DB on your system and installed the bdb extension module of Ruby (see the BDB project page ), you can create and search indexes with Berkeley DB - a very fast alternative that uses little computer memory. When creating the index, use the "--makeindex-bdb" option instead of "--makeindex".

% br_bioflat.rb --makeindex-bdb database_name [--format data_format] filename...

BioFetch

Note: this section is an advanced topic

BioFetch is a database retrieval mechanism via CGI. CGI Parameters, options and error codes are standardized. Client access via http is possible giving the database name, identifiers and format to retrieve entries.

The BioRuby project has a BioFetch server at bioruby.org. It uses GenomeNet's DBGET system as a backend. The source code of the server is in sample/ directory. Currently, there are only two BioFetch servers in the world: bioruby.org and EBI.

Here are some methods to retrieve entries from our BioFetch server.

  1. Using a web browser

    http://bioruby.org/cgi-bin/biofetch.rb
  2. Using the br_biofetch.rb command

    % br_biofetch.rb db_name entry_id
  3. Directly using Bio::Fetch in a script

    serv = Bio::Fetch.new(server_url)
    entry = serv.fetch(db_name, entry_id)
  4. Indirectly using Bio::Fetch via BioRegistry in script

    reg = Bio::Registry.new
    serv = reg.get_database('genbank')
    entry = serv.get_by_id('AA2CG')

If you want to use (4), you have to include some settings in seqdatabase.ini. For example:

[genbank]
protocol=biofetch
location=http://bioruby.org/cgi-bin/biofetch.rb
biodbname=genbank

The combination of BioFetch, Bio::KEGG::GENES and Bio::AAindex1

Bioinformatics is often about gluing things together. Here is an example that gets the bacteriorhodopsin gene (VNG1467G) of the archaea Halobacterium from KEGG GENES database and gets alpha-helix index data (BURA740101) from the AAindex (Amino acid indices and similarity matrices) database, and shows the helix score for each 15-aa length overlapping window.

#!/usr/bin/env ruby

require 'bio'

entry = Bio::Fetch.query('hal', 'VNG1467G')
aaseq = Bio::KEGG::GENES.new(entry).aaseq

entry = Bio::Fetch.query('aax1', 'BURA740101')
helix = Bio::AAindex1.new(entry).index

position = 1
win_size = 15

aaseq.window_search(win_size) do |subseq|
  score = subseq.total(helix)
  puts [ position, score ].join("\t")
  position += 1
end

The special method Bio::Fetch.query uses the preset BioFetch server at bioruby.org. (The server internally gets data from GenomeNet. Because the KEGG/GENES database and AAindex database are not available from other BioFetch servers, we used the bioruby.org server with Bio::Fetch.query method.)

BioSQL

BioSQL is a well known schema to store and retrive biological sequences using a RDBMS like PostgreSQL or MySQL: note that SQLite is not supported. First of all, you must install a database engine or have access to a remote one. Then create the schema and populate with the taxonomy. You can follow the Official Guide to accomplish these steps. Next step is to install these gems:

  • ActiveRecord
  • CompositePrimaryKeys (Rails doesn't handle by default composite primary keys)
  • The layer to comunicate with you preferred RDBMS (postgresql, mysql, jdbcmysql in case you are running JRuby )

You can find ActiveRecord's models in /bioruby/lib/bio/io/biosql

When you have your database up and running, you can connect to it like this:

#!/usr/bin/env ruby

require 'bio'

connection = Bio::SQL.establish_connection({'development'=>{'hostname'=>"YourHostname",
'database'=>"CoolBioSeqDB",
'adapter'=>"jdbcmysql",
'username'=>"YourUser",
'password'=>"YouPassword"
      }
  },
'development')

#The first parameter is the hash contaning the description of the configuration; similar to database.yml in Rails applications, you can declare different environment. 
#The second parameter is the environment to use: 'development', 'test', or 'production'.

#To store a sequence into the database you simply need a biosequence object.
biosql_database = Bio::SQL::Biodatabase.find(:first)
ff = Bio::GenBank.open("gbvrl1.seq")

ff.each_entry do |gb|
  Bio::SQL::Sequence.new(:biosequence=>gb.to_biosequence, :biodatabase=>biosql_database
end

#You can list all the entries into every database 
Bio::SQL.list_entries

#list databases:
Bio::SQL.list_databases

#retriving a generic accession
bioseq = Bio::SQL.fetch_accession("YouAccession")

#If you use biosequence objects, you will find all its method mapped to BioSQL sequences. 
#But you can also access to the models directly:

#get the raw sequence associated with your accession
bioseq.entry.biosequence 

#get the length of your sequence; this is the explicit form of bioseq.length
bioseq.entry.biosequence.length

#convert the sequence into GenBank format
bioseq.to_biosequence.output(:genbank)

BioSQL's schema is not very intuitive for beginners, so spend some time on understanding it. In the end if you know a little bit of Ruby on Rails, everything will go smoothly. You can find information on Annotation here. ToDo: add exemaples from George. I remember he did some cool post on BioSQL and Rails.

PhyloXML

PhyloXML is an XML language for saving, analyzing and exchanging data of annotated phylogenetic trees. PhyloXML's parser in BioRuby is implemented in Bio::PhyloXML::Parser, and its writer in Bio::PhyloXML::Writer. More information can be found at www.phyloxml.org.

Bio::PhyloXML have been split out from BioRuby core and have been released as bio-phyloxml gem. To use Bio::PhyloXML, install the bio-phyloxml gem.

% gem install bio-phyloxml

The tutorial of Bio::PhyloXML is bundled in bio-phyloxml. <URL:https://github.com/bioruby/bioruby-phyloxml/blob/master/doc/Tutorial.rd>

The BioRuby example programs

Some sample programs are stored in ./samples/ directory. For example, the n2aa.rb program (transforms a nucleic acid sequence into an amino acid sequence) can be run using:

./sample/na2aa.rb test/data/fasta/example1.txt 

Unit testing and doctests

BioRuby comes with an extensive testing framework with over 1300 tests and 2700 assertions. To run the unit tests:

cd test
ruby runner.rb

We have also started with doctest for Ruby. We are porting the examples in this tutorial to doctest - more info upcoming.

Further reading

See the BioRuby in anger Wiki. A lot of BioRuby's documentation exists in the source code and unit tests. To really dive in you will need the latest source code tree. The embedded rdoc documentation for the BioRuby source code can be viewed online at <URL:http://bioruby.org/rdoc/>.

BioRuby Shell

The BioRuby shell implementation is located in ./lib/bio/shell. It is very interesting as it uses IRB (the Ruby intepreter) which is a powerful environment described in Programming Ruby's IRB chapter. IRB commands can be typed directly into the shell, e.g.

bioruby!> IRB.conf[:PROMPT_MODE]
==!> :PROMPT_C

Additionally, you also may want to install the optional Ruby readline support - with Debian libreadline-ruby. To edit a previous line you may have to press line down (down arrow) first.

Helpful tools

Apart from rdoc you may also want to use rtags - which allows jumping around source code by clicking on class and method names.

cd bioruby/lib
rtags -R --vi

For a tutorial see here

APPENDIX

Biogem: Additional BioRuby plugins

Biogem is one of the exciting developments for Ruby in bioinformatics! Biogems add new functionality next to the BioRuby core project (BioRuby is a biogem itself). A biogem is simply installed with

gem install bio                 # The core BioRuby gem
gem install bio-core            # BioRuby + stable pure Ruby biogems
gem install bio-core-ext        # bio-core + stable Ruby extensions

Information on these biogems, and the many others available, see Biogems.info or gems.bioruby.org.

Ruby Ensembl API

The Ruby Ensembl API is a Ruby API to the Ensembl database. It is NOT currently included in the BioRuby archives. To install it, see the Ruby-Ensembl Github for more information.

Gene Ontology (GO) through the Ruby Ensembl API

Gene Ontologies can be fetched through the Ruby Ensembl API package:

require 'ensembl'
Ensembl::Core::DBConnection.connect('drosophila_melanogaster')
infile = IO.readlines(ARGV.shift) # reading your comma-separated accession mapping file (one line per mapping)
infile.each do |line|
  accs = line.split(",")          # Split the comma-sep.entries into an array
  drosphila_acc = accs.shift      # the first entry is the Drosophila acc
  mosq_acc = accs.shift           # the second entry is your Mosq. acc
  gene = Ensembl::Core::Gene.find_by_stable_id(drosophila_acc)
  print "#{mosq_acc}"
  gene.go_terms.each do |go|
     print ",#{go}"
  end
end

Prints each mosq. accession/uniq identifier and the GO terms from the Drosphila homologues.

Using BioPerl or BioPython from Ruby

A possible route is to opt for JRuby and Jython on the JAVA virtual machine (JVM).

At the moment there is no easy way of accessing BioPerl or BioPython directly from Ruby. A possibility is to create a Perl or Python server that gets accessed through XML/RPC or SOAP.

Installing required external libraries

At this point for using BioRuby no additional libraries are needed.

This may change, so keep an eye on the Bioruby website. Also when a package is missing BioRuby should show an informative message.

At this point installing third party Ruby packages can be a bit painful, as the gem standard for packages evolved late and some still force you to copy things by hand. Therefore read the README's carefully that come with each package.

Trouble shooting

  • Error: in `require': no such file to load -- bio (LoadError)

Ruby is failing to find the BioRuby libraries - add it to the RUBYLIB path, or pass it to the interpeter. For example:

ruby -I$BIORUBYPATH/lib yourprogram.rb

Modifying this page

IMPORTANT NOTICE: This page is maintained in the BioRuby source code repository. Please edit the file there otherwise changes may get lost. See BioRuby Developer Information for repository and mailing list access.

bio-2.0.3/doc/Tutorial.rd.ja.html0000644000175000017500000034230014141516614016053 0ustar nileshnilesh Tutorial.rd.ja
Copyright (C) 2001-2003, 2005, 2006 Toshiaki Katayama <k@bioruby.org>
Copyright (C) 2005, 2006 Naohisa Goto <ng@bioruby.org>

BioRuby の使い方

BioRuby は国産の高機能オブジェクト指向スクリプト言語 Ruby のための オープンソースなバイオインフォマティクス用ライブラリです。

Ruby 言語は Perl 言語ゆずりの強力なテキスト処理と、 シンプルで分かりやすい文法、クリアなオブジェクト指向機能により、 広く使われるようになりました。Ruby について詳しくは、ウェブサイト <URL:http://www.ruby-lang.org/> や市販の書籍等を参照してください。

はじめに

BioRuby を使用するには Ruby と BioRuby をインストールする必要があります。

Ruby のインストール

Ruby は Mac OS X や最近の UNIX には通常インストールされています。 Windows の場合も1クリックインストーラや ActiveScriptRuby などが 用意されています。まだインストールされていない場合は

などを参考にしてインストールしましょう。

あなたのコンピュータにどのバージョンの Ruby がインストールされているかを チェックするには

% ruby -v

とコマンドを入力してください。すると、たとえば

ruby 1.8.2 (2004-12-25) [powerpc-darwin7.7.0]

のような感じでバージョンが表示されます。バージョン 1.8.5 以降をお勧めします。

Ruby 標準装備のクラスやメソッドについては、Ruby のリファレンスマニュアルを 参照してください。

コマンドラインでヘルプを参照するには、Ruby 標準添付の ri コマンドや、 日本語版の refe コマンドが便利です。

RubyGems のインストール

RubyGems のページから最新版をダウンロードします。

展開してインストールします。

% tar zxvf rubygems-x.x.x.tar.gz
% cd rubygems-x.x.x
% ruby setup.rb

BioRuby のインストール

BioRuby のインストール方法は <URL:http://bioruby.org/archive/> から 最新版を取得して以下のように行います(※1)。同梱されている README ファイルにも 目を通して頂きたいのですが、慣れないと1日がかりになる BioPerl と比べて BioRuby のインストールはすぐに終わるはずです。

% wget http://bioruby.org/archive/bioruby-x.x.x.tar.gz
% tar zxvf bioruby-x.x.x.tar.gz
% cd bioruby-x.x.x
% su
# ruby setup.rb

RubyGems が使える環境であれば

% gem install bio

だけでインストールできます。このあと README ファイルに書かれているように

bioruby-x.x.x/etc/bioinformatics/seqdatabase.ini

というファイルをホームディレクトリの ~/.bioinformatics にコピーして おくとよいでしょう。RubyGems の場合は

/usr/local/lib/ruby/gems/1.8/gems/bio-x.x.x/

などにあるはずです。

% mkdir ~/.bioinformatics
% cp bioruby-x.x.x/etc/bioinformatics/seqdatabase.ini ~/.bioinformatics

また、Emacs エディタを使う人は Ruby のソースに同梱されている misc/ruby-mode.el をインストールしておくとよいでしょう。

% mkdir -p ~/lib/lisp/ruby
% cp ruby-x.x.x/misc/ruby-mode.el ~/lib/lisp/ruby

などとしておいて、~/.emacs に以下の設定を書き足します。

; subdirs の設定
(let ((default-directory "~/lib/lisp"))
  (normal-top-level-add-subdirs-to-load-path)

; ruby-mode の設定
(autoload 'ruby-mode "ruby-mode" "Mode for editing ruby source files")
(add-to-list 'auto-mode-alist '("\\.rb$" . rd-mode))
(add-to-list 'interpeter-mode-alist '("ruby" . ruby-mode))

BioRuby シェル

BioRuby バージョン 0.7 以降では、簡単な操作は BioRuby と共にインストールされる bioruby コマンドで行うことができます。bioruby コマンドは Ruby に内蔵されている インタラクティブシェル irb を利用しており、Ruby と BioRuby にできることは全て 自由に実行することができます。

% bioruby project1

引数で指定した名前のディレクトリが作成され、その中で解析を行います。 上記の例の場合 project1 というディレクトリが作成され、さらに以下の サブディレクトリやファイルが作られます。

data/           ユーザの解析ファイルを置く場所
plugin/         必要に応じて追加のプラグインを置く場所
session/        設定やオブジェクト、ヒストリなどが保存される場所
session/config  ユーザの設定を保存したファイル
session/history ユーザの入力したコマンドのヒストリを保存したファイル
session/object  永続化されたオブジェクトの格納ファイル

このうち、data ディレクトリはユーザが自由に書き換えて構いません。 また、session/history ファイルを見ると、いつどのような操作を行ったかを 確認することができます。

2回目以降は、初回と同様に

% bioruby project1

として起動しても構いませんし、作成されたディレクトリに移動して

% cd project1
% bioruby

のように引数なしで起動することもできます。

この他、script コマンドで作成されるスクリプトファイルや、 web コマンドで作成される Rails のための設定ファイルなどがありますが、 それらについては必要に応じて後述します。

BioRuby シェルではデフォルトでいくつかの便利なライブラリを読み込んでいます。 例えば readline ライブラリが使える環境では Tab キーでメソッド名や変数名が 補完されるはずです。open-uri, pp, yaml なども最初から読み込まれています。

塩基, アミノ酸の配列を作る

getseq(str)

getseq コマンド(※2)を使って文字列から塩基配列やアミノ酸配列を作ることが できます。塩基とアミノ酸は ATGC の含量が 90% 以上かどうかで自動判定されます。 ここでは、できた塩基配列を dna という変数に代入します。

bioruby> dna = getseq("atgcatgcaaaa")

変数の中身を確認するには Ruby の puts メソッドを使います。

bioruby> puts dna
atgcatgcaaaa

ファイル名を引数に与えると手元にあるファイルから配列を得ることもできます。 GenBank, EMBL, UniProt, FASTA など主要な配列フォーマットは自動判別されます (拡張子などのファイル名ではなくエントリの中身で判定します)。 以下は UniProt フォーマットのエントリをファイルから読み込んでいます。 この方法では、複数のエントリがある場合最初のエントリだけが読み込まれます。

bioruby> cdc2 = getseq("p04551.sp")
bioruby> puts cdc2
MENYQKVEKIGEGTYGVVYKARHKLSGRIVAMKKIRLEDESEGVPSTAIREISLLKEVNDENNRSN...(略)

データベース名とエントリ名が分かっていれば、インターネットを通じて 配列を自動的に取得することができます。

bioruby> psaB = getseq("genbank:AB044425")
bioruby> puts psaB
actgaccctgttcatattcgtcctattgctcacgcgatttgggatccgcactttggccaaccagca...(略)

どこのデータベースからどのような方法でエントリを取得するかは、BioPerl などと共通の OBDA 設定ファイル ~/.bioinformatics/seqdatabase.ini を用いてデータベースごとに指定することができます(後述)。 また、EMBOSS の seqret コマンドによる配列取得にも対応していますので、 EMBOSS の USA 表記でもエントリを取得できます。EMBOSS のマニュアルを参照し ~/.embossrc を適切に設定してください。

どの方法で取得した場合も、getseq コマンドによって返される配列は、 汎用の配列クラス Bio::Sequence になります(※3)。

配列が塩基配列とアミノ酸配列のどちらと判定されているのかは、 moltype メソッドを用いて

bioruby> p cdc2.moltype
Bio::Sequence::AA

bioruby> p psaB.moltype
Bio::Sequence::NA

のように調べることができます。自動判定が間違っている場合などには na, aa メソッドで強制的に変換できます。なお、これらのメソッドは 元のオブジェクトを強制的に書き換えます。

bioruby> dna.aa
bioruby> p dna.moltype
Bio::Sequence::AA

bioruby> dna.na
bioruby> p dna.moltype
Bio::Sequence::NA

または、to_naseq, to_aaseq メソッドで強制的に変換することもできます。

bioruby> pep = dna.to_aaseq

to_naseq, to_aaseq メソッドの返すオブジェクトは、それぞれ、 DNA 配列のための Bio::Sequence::NA クラス、アミノ酸配列のための Bio::Sequence::AA クラスのオブジェクトになります。 配列がどちらのクラスに属するかは Ruby の class メソッドを用いて

bioruby> p pep.class
Bio::Sequence::AA

のように調べることができます。

強制的に変換せずに、Bio::Sequence::NA クラスまたは Bio::sequence::AA クラス のどちらかのオブジェクトを得たい場合には seq メソッドを使います(※4)。

bioruby> pep2 = cdc2.seq
bioruby> p pep2.class
Bio::Sequence::AA

また、以下で解説する complement や translate などのメソッドの結果は、 塩基配列を返すことが期待されるメソッドは Bio::Sequence::NA クラス、 アミノ酸配列を返すことが期待されるメソッドは Bio::sequence::AA クラス のオブジェクトになります。

塩基配列やアミノ酸配列のクラスは Ruby の文字列クラスである String を 継承しています。また、Bio::Sequence クラスのオブジェクトは String の オブジェクトと見かけ上同様に働くように工夫されています。このため、 length で長さを調べたり、+ で足し合わせたり、* で繰り返したりなど、 Ruby の文字列に対して行える操作は全て利用可能です。 このような特徴はオブジェクト指向の強力な側面の一つと言えるでしょう。

bioruby> puts dna.length
12

bioruby> puts dna + dna
atgcatgcaaaaatgcatgcaaaa

bioruby> puts dna * 5
atgcatgcaaaaatgcatgcaaaaatgcatgcaaaaatgcatgcaaaaatgcatgcaaaa
complement

塩基配列の相補鎖配列を得るには塩基配列の complement メソッドを呼びます。

bioruby> puts dna.complement
ttttgcatgcat
translate

塩基配列をアミノ酸配列に翻訳するには translate メソッドを使います。 翻訳されたアミノ酸配列を pep という変数に代入してみます。

bioruby> pep = dna.translate
bioruby> puts pep
MHAK

フレームを変えて翻訳するには

bioruby> puts dna.translate(2)
CMQ
bioruby> puts dna.translate(3)
ACK

などとします。

molecular_weight

分子量は molecular_weight メソッドで表示されます。

bioruby> puts dna.molecular_weight
3718.66444

bioruby> puts pep.molecular_weight
485.605
seqstat(seq)

seqstat コマンドを使うと、組成などの情報も一度に表示されます。

bioruby> seqstat(dna)

* * * Sequence statistics * * *

5'->3' sequence   : atgcatgcaaaa
3'->5' sequence   : ttttgcatgcat
Translation   1   : MHAK
Translation   2   : CMQ
Translation   3   : ACK
Translation  -1   : FCMH
Translation  -2   : FAC
Translation  -3   : LHA
Length            : 12 bp
GC percent        : 33 %
Composition       : a -  6 ( 50.00 %)
                    c -  2 ( 16.67 %)
                    g -  2 ( 16.67 %)
                    t -  2 ( 16.67 %)
Codon usage       :

 *---------------------------------------------*
 |       |              2nd              |     |
 |  1st  |-------------------------------| 3rd |
 |       |  U    |  C    |  A    |  G    |     |
 |-------+-------+-------+-------+-------+-----|
 | U   U |F  0.0%|S  0.0%|Y  0.0%|C  0.0%|  u  |
 | U   U |F  0.0%|S  0.0%|Y  0.0%|C  0.0%|  c  |
 | U   U |L  0.0%|S  0.0%|*  0.0%|*  0.0%|  a  |
 |  UUU  |L  0.0%|S  0.0%|*  0.0%|W  0.0%|  g  |
 |-------+-------+-------+-------+-------+-----|
 |  CCCC |L  0.0%|P  0.0%|H 25.0%|R  0.0%|  u  |
 | C     |L  0.0%|P  0.0%|H  0.0%|R  0.0%|  c  |
 | C     |L  0.0%|P  0.0%|Q  0.0%|R  0.0%|  a  |
 |  CCCC |L  0.0%|P  0.0%|Q  0.0%|R  0.0%|  g  |
 |-------+-------+-------+-------+-------+-----|
 |   A   |I  0.0%|T  0.0%|N  0.0%|S  0.0%|  u  |
 |  A A  |I  0.0%|T  0.0%|N  0.0%|S  0.0%|  c  |
 | AAAAA |I  0.0%|T  0.0%|K 25.0%|R  0.0%|  a  |
 | A   A |M 25.0%|T  0.0%|K  0.0%|R  0.0%|  g  |
 |-------+-------+-------+-------+-------+-----|
 |  GGGG |V  0.0%|A  0.0%|D  0.0%|G  0.0%|  u  |
 | G     |V  0.0%|A  0.0%|D  0.0%|G  0.0%|  c  |
 | G GGG |V  0.0%|A 25.0%|E  0.0%|G  0.0%|  a  |
 |  GG G |V  0.0%|A  0.0%|E  0.0%|G  0.0%|  g  |
 *---------------------------------------------*

Molecular weight  : 3718.66444
Protein weight    : 485.605
//

アミノ酸配列の場合は以下のようになります。

bioruby> seqstat(pep)

* * * Sequence statistics * * *

N->C sequence     : MHAK
Length            : 4 aa
Composition       : A Ala - 1 ( 25.00 %) alanine
                    H His - 1 ( 25.00 %) histidine
                    K Lys - 1 ( 25.00 %) lysine
                    M Met - 1 ( 25.00 %) methionine
Protein weight    : 485.605
//
composition

seqstat の中で表示されている組成は composition メソッドで得ることができます。 結果が文字列ではなく Hash で返されるので、とりあえず表示してみる場合には puts の代わりに p コマンドを使うと良いでしょう。

bioruby> p dna.composition
{"a"=>6, "c"=>2, "g"=>2, "t"=>2}

塩基配列、アミノ酸配列のその他のメソッド

他にも塩基配列、アミノ酸配列に対して行える操作は色々とあります。

subseq(from, to)

部分配列を取り出すには subseq メソッドを使います。

bioruby> puts dna.subseq(1, 3)
atg

Ruby など多くのプログラミング言語の文字列は 1 文字目を 0 から数えますが、 subseq メソッドは 1 から数えて切り出せるようになっています。

bioruby> puts dna[0, 3]
atg

Ruby の String クラスが持つ slice メソッド str[] と適宜使い分けると よいでしょう。

window_search(len, step)

window_search メソッドを使うと長い配列の部分配列毎の繰り返しを 簡単に行うことができます。DNA 配列をコドン毎に処理する場合、 3文字ずつずらしながら3文字を切り出せばよいので以下のようになります。

bioruby> dna.window_search(3, 3) do |codon|
bioruby+   puts "#{codon}\t#{codon.translate}"
bioruby+ end
atg     M
cat     H
gca     A
aaa     K

ゲノム配列を、末端 1000bp をオーバーラップさせながら 11000bp ごとに ブツ切りにし FASTA フォーマットに整形する場合は以下のようになります。

bioruby> seq.window_search(11000, 10000) do |subseq|
bioruby+   puts subseq.to_fasta
bioruby+ end

最後の 10000bp に満たない 3' 端の余り配列は返り値として得られるので、 必要な場合は別途受け取って表示します。

bioruby> i = 1
bioruby> remainder = seq.window_search(11000, 10000) do |subseq|
bioruby+   puts subseq.to_fasta("segment #{i*10000}", 60)
bioruby+   i += 1
bioruby+ end
bioruby> puts remainder.to_fasta("segment #{i*10000}", 60)
splicing(position)

塩基配列の GenBank 等の position 文字列による切り出しは splicing メソッドで行います。

bioruby> puts dna
atgcatgcaaaa
bioruby> puts dna.splicing("join(1..3,7..9)")
atggca
randomize

randomize メソッドは、配列の組成を保存したままランダム配列を生成します。

bioruby> puts dna.randomize
agcaatagatac
to_re

to_re メソッドは、曖昧な塩基の表記を含む塩基配列を atgc だけの パターンからなる正規表現に変換します。

bioruby> ambiguous = getseq("atgcyatgcatgcatgc")

bioruby> p ambiguous.to_re
/atgc[tc]atgcatgcatgc/

bioruby> puts ambiguous.to_re
(?-mix:atgc[tc]atgcatgcatgc)

seq メソッドは ATGC の含有量が 90% 以下だとアミノ酸配列とみなすので、 曖昧な塩基が多く含まれる配列の場合は to_naseq メソッドを使って 明示的に Bio::Sequence::NA オブジェクトに変換する必要があります。

bioruby> s = getseq("atgcrywskmbvhdn").to_naseq
bioruby> p s.to_re
/atgc[ag][tc][at][gc][tg][ac][tgc][agc][atc][atg][atgc]/

bioruby> puts s.to_re
(?-mix:atgc[ag][tc][at][gc][tg][ac][tgc][agc][atc][atg][atgc])
names

あまり使うことはありませんが、配列を塩基名やアミノ酸名に変換する メソッドです。

bioruby> p dna.names
["adenine", "thymine", "guanine", "cytosine", "adenine", "thymine",
"guanine", "cytosine", "adenine", "adenine", "adenine", "adenine"]

bioruby> p pep.names
["methionine", "histidine", "alanine", "lysine"]
codes

アミノ酸配列を3文字コードに変換する names と似たメソッドです。

bioruby> p pep.codes
["Met", "His", "Ala", "Lys"]
gc_percent

塩基配列の GC 含量は gc_percent メソッドで得られます。

bioruby> p dna.gc_percent
33
to_fasta

FASTA フォーマットに変換するには to_fasta メソッドを使います。

bioruby> puts dna.to_fasta("dna sequence")
>dna sequence
aaccggttacgt

塩基やアミノ酸のコード、コドン表をあつかう

アミノ酸、塩基、コドンテーブルを得るための aminoacids, nucleicacids, codontables, codontable コマンドを紹介します。

aminoacids

アミノ酸の一覧は aminoacids コマンドで表示できます。

bioruby> aminoacids
?       Pyl     pyrrolysine
A       Ala     alanine
B       Asx     asparagine/aspartic acid
C       Cys     cysteine
D       Asp     aspartic acid
E       Glu     glutamic acid
F       Phe     phenylalanine
G       Gly     glycine
H       His     histidine
I       Ile     isoleucine
K       Lys     lysine
L       Leu     leucine
M       Met     methionine
N       Asn     asparagine
P       Pro     proline
Q       Gln     glutamine
R       Arg     arginine
S       Ser     serine
T       Thr     threonine
U       Sec     selenocysteine
V       Val     valine
W       Trp     tryptophan
Y       Tyr     tyrosine
Z       Glx     glutamine/glutamic acid

返り値は短い表記と対応する長い表記のハッシュになっています。

bioruby> aa = aminoacids
bioruby> puts aa["G"]
Gly
bioruby> puts aa["Gly"]
glycine
nucleicacids

塩基の一覧は nucleicacids コマンドで表示できます。

bioruby> nucleicacids
a       a       Adenine
t       t       Thymine
g       g       Guanine
c       c       Cytosine
u       u       Uracil
r       [ag]    puRine
y       [tc]    pYrimidine
w       [at]    Weak
s       [gc]    Strong
k       [tg]    Keto
m       [ac]    aroMatic
b       [tgc]   not A
v       [agc]   not T
h       [atc]   not G
d       [atg]   not C
n       [atgc]  

返り値は塩基の1文字表記と該当する塩基のハッシュになっています。

bioruby> na = nucleicacids
bioruby> puts na["r"]
[ag]
codontables

コドンテーブルの一覧は codontables コマンドで表示できます。

bioruby> codontables
1       Standard (Eukaryote)
2       Vertebrate Mitochondrial
3       Yeast Mitochondorial
4       Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma
5       Invertebrate Mitochondrial
6       Ciliate Macronuclear and Dasycladacean
9       Echinoderm Mitochondrial
10      Euplotid Nuclear
11      Bacteria
12      Alternative Yeast Nuclear
13      Ascidian Mitochondrial
14      Flatworm Mitochondrial
15      Blepharisma Macronuclear
16      Chlorophycean Mitochondrial
21      Trematode Mitochondrial
22      Scenedesmus obliquus mitochondrial
23      Thraustochytrium Mitochondrial

返り値はテーブル番号と名前のハッシュになっています。

bioruby> ct = codontables
bioruby> puts ct[3]
Yeast Mitochondorial
codontable(num)

コドン表自体は codontable コマンドで表示できます。

bioruby> codontable(11)

 = Codon table 11 : Bacteria

   hydrophilic: H K R (basic), S T Y Q N S (polar), D E (acidic)
   hydrophobic: F L I M V P A C W G (nonpolar)

 *---------------------------------------------*
 |       |              2nd              |     |
 |  1st  |-------------------------------| 3rd |
 |       |  U    |  C    |  A    |  G    |     |
 |-------+-------+-------+-------+-------+-----|
 | U   U | Phe F | Ser S | Tyr Y | Cys C |  u  |
 | U   U | Phe F | Ser S | Tyr Y | Cys C |  c  |
 | U   U | Leu L | Ser S | STOP  | STOP  |  a  |
 |  UUU  | Leu L | Ser S | STOP  | Trp W |  g  |
 |-------+-------+-------+-------+-------+-----|
 |  CCCC | Leu L | Pro P | His H | Arg R |  u  |
 | C     | Leu L | Pro P | His H | Arg R |  c  |
 | C     | Leu L | Pro P | Gln Q | Arg R |  a  |
 |  CCCC | Leu L | Pro P | Gln Q | Arg R |  g  |
 |-------+-------+-------+-------+-------+-----|
 |   A   | Ile I | Thr T | Asn N | Ser S |  u  |
 |  A A  | Ile I | Thr T | Asn N | Ser S |  c  |
 | AAAAA | Ile I | Thr T | Lys K | Arg R |  a  |
 | A   A | Met M | Thr T | Lys K | Arg R |  g  |
 |-------+-------+-------+-------+-------+-----|
 |  GGGG | Val V | Ala A | Asp D | Gly G |  u  |
 | G     | Val V | Ala A | Asp D | Gly G |  c  |
 | G GGG | Val V | Ala A | Glu E | Gly G |  a  |
 |  GG G | Val V | Ala A | Glu E | Gly G |  g  |
 *---------------------------------------------*

返り値は Bio::CodonTable クラスのオブジェクトで、コドンとアミノ酸の 変換ができるだけでなく、以下のようなデータも得ることができます。

bioruby> ct = codontable(2)
bioruby> p ct["atg"]
"M"
definition

コドン表の定義の説明

bioruby> puts ct.definition
Vertebrate Mitochondrial
start

開始コドン一覧

bioruby> p ct.start
["att", "atc", "ata", "atg", "gtg"]
stop

終止コドン一覧

bioruby> p ct.stop
["taa", "tag", "aga", "agg"]
revtrans

アミノ酸をコードするコドンを調べる

bioruby> p ct.revtrans("V")
["gtc", "gtg", "gtt", "gta"]

フラットファイルのエントリ

データベースのエントリと、フラットファイルそのものを扱う方法を紹介します。 GenBank データベースの中では、ファージのエントリが含まれる gbphg.seq の ファイルサイズが小さいので、このファイルを例として使います。

% wget ftp://ftp.hgc.jp/pub/mirror/ncbi/genbank/gbphg.seq.gz
% gunzip gbphg.seq.gz
getent(str)

getseq コマンドは配列を取得しましたが、配列だけでなくエントリ全体を取得する には getent コマンド(※2)を使います。getseq コマンド同様、getent コマンドでも OBDA, EMBOSS, NCBI, EBI, TogoWS のデータベースが利用可能です(※5)。 設定については getseq コマンドの説明を参照してください。

bioruby> entry = getent("genbank:AB044425")
bioruby> puts entry
LOCUS       AB044425                1494 bp    DNA     linear   PLN 28-APR-2001
DEFINITION  Volvox carteri f. kawasakiensis chloroplast psaB gene for
            photosystem I P700 chlorophyll a apoprotein A2,
            strain:NIES-732.
(略)

getent コマンドの引数には db:entry_id 形式の文字列、EMBOSS の USA、 ファイル、IO が与えられ、データベースの1エントリ分の文字列が返されます。 配列データベースに限らず、数多くのデータベースエントリに対応しています。

flatparse(str)

取得したエントリをパースして欲しいデータをとりだすには flatparse コマンドを使います。

bioruby> entry = getent("gbphg.seq")
bioruby> gb = flatparse(entry)
bioruby> puts gb.entry_id
AB000833
bioruby> puts gb.definition
Bacteriophage Mu DNA for ORF1, sheath protein gpL, ORF2, ORF3, complete cds.
bioruby> puts psaB.naseq
acggtcagacgtttggcccgaccaccgggatgaggctgacgcaggtcagaaatctttgtgacgacaaccgtatcaat
(略)
getobj(str)

getobj コマンド(※2)は、getent でエントリを文字列として取得し flatparse で パースしたオブジェクトに変換するのと同じです。getent コマンドと同じ引数を 受け付けます。配列を取得する時は getseq、エントリを取得する時は getent、 パースしたオブジェクトを取得する時は getobj を使うことになります。

bioruby> gb = getobj("gbphg.seq")
bioruby> puts gb.entry_id
AB000833
flatfile(file)

getent コマンドは1エントリしか扱えないため、ローカルのファイルを開いて 各エントリ毎に処理を行うには flatfile コマンドを使います。

bioruby> flatfile("gbphg.seq") do |entry|
bioruby+   # do something on entry
bioruby+ end

ブロックを指定しない場合は、ファイル中の最初のエントリを取得します。

bioruby> entry = flatfile("gbphg.seq")
bioruby> gb = flatparse(entry)
bioruby> puts gb.entry_id
flatauto(file)

各エントリを flatparse と同様にパースした状態で順番に処理するためには、 flatfile コマンドの代わりに flatauto コマンドを使います。

bioruby> flatauto("gbphg.seq") do |entry|
bioruby+   print entry.entry_id
bioruby+   puts  entry.definition
bioruby+ end

flatfile 同様、ブロックを指定しない場合は、ファイル中の最初のエントリを 取得し、パースしたオブジェクトを返します。

bioruby> gb = flatfile("gbphg.seq")
bioruby> puts gb.entry_id

フラットファイルのインデクシング

EMBOSS の dbiflat に似た機能として、BioRuby, BioPerl などに共通の BioFlat というインデックスを作成する仕組みがあります。一度インデックスを 作成しておくとエントリの取り出しが高速かつ容易に行えます。 これにより自分専用のデータベースを手軽に作ることができます。

flatindex(db_name, *source_file_list)

GenBank のファージの配列ファイル gbphg.seq に入っているエントリに対して mydb というデータベース名でインデックスを作成します。

bioruby> flatindex("mydb", "gbphg.seq")
Creating BioFlat index (.bioruby/bioflat/mydb) ... done
flatsearch(db_name, entry_id)

作成した mydb データベースからエントリをとり出すには flatsearch コマンドを 使います。

bioruby> entry = flatsearch("mydb", "AB004561")
bioruby> puts entry
LOCUS       AB004561                2878 bp    DNA     linear   PHG 20-MAY-1998
DEFINITION  Bacteriophage phiU gene for integrase, complete cds, integration
            site.
ACCESSION   AB004561
(略)

様々な DB の配列を FASTA フォーマットに変換して保存

FASTA フォーマットは配列データで標準的に用いられているフォーマットです。 「>」記号ではじまる1行目に配列の説明があり、2行目以降に配列がつづきます。 配列中の空白文字は無視されます。

>entry_id definition ...
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT

配列の説明行は、最初の単語が配列の ID になっていることが多いのですが、 NCBI の BLAST 用データベースではさらに高度な構造化がおこなわれています。

BioRuby のデータベースエントリのクラスにはエントリID、配列、定義について 共通のメソッドが用意されています。

  • entry_id - エントリ ID を取得
  • definition - 定義文を取得
  • seq - 配列を取得

これらの共通メソッドを使うと、どんな配列データベースエントリでも FASTA フォーマットに変換できるプログラムが簡単に作れます。

entry.seq.to_fasta("#{entry.entry_id} #{entry.definition}", 60)

さらに、BioRuby では入力データベースの形式を自動判別できますので、 GenBank, UniProt など多くの主要な配列データベースでは ファイル名を指定するだけで FASTA フォーマットに変換できます。

flatfasta(fasta_file, *source_file_list)

入力データベースのファイル名のリストから、指定した FASTA フォーマットの ファイルを生成するコマンドです。ここではいくつかの GenBank のファイルを FASTA フォーマットに変換し、myfasta.fa というファイルに保存しています。

bioruby> flatfasta("myfasta.fa", "gbphg.seq", "gbvrl1.seq", "gbvrl2.seq")
Saving fasta file (myfasta.fa) ... 
  converting -- gbphg.gbk
  converting -- gbvrl1.gbk
  converting -- gbvrl2.gbk
done

スクリプト生成

作業手順をスクリプト化して保存しておくこともできます。

bioruby> script
-- 8< -- 8< -- 8< --  Script  -- 8< -- 8< -- 8< --
bioruby> seq = getseq("gbphg.seq")
bioruby> p seq
bioruby> p seq.translate
bioruby> script
-- >8 -- >8 -- >8 --  Script  -- >8 -- >8 -- >8 --
Saving script (script.rb) ... done

生成された script.rb は以下のようになります。

#!/usr/bin/env bioruby

seq = getseq("gbphg.seq")
p seq
p seq.translate

このスクリプトは bioruby コマンドで実行することができます。

% bioruby script.rb

簡易シェル機能

cd(dir)

カレントディレクトリを変更します。

bioruby> cd "/tmp"
"/tmp"

ホームディレクトリに戻るには引数をつけずに cd を実行します。

bioruby> cd
"/home/k"
pwd

カレントディレクトリを表示します。

bioruby> pwd
"/home/k"
dir

カレントディレクトリのファイルを一覧表示します。

bioruby> dir
   UGO  Date                                 Byte  File
------  ----------------------------  -----------  ------------
 40700  Tue Dec 06 07:07:35 JST 2005         1768  "Desktop"
 40755  Tue Nov 29 16:55:20 JST 2005         2176  "bin"
100644  Sat Oct 15 03:01:00 JST 2005     42599518  "gbphg.seq"
(略)

bioruby> dir "gbphg.seq"
   UGO  Date                                 Byte  File
------  ----------------------------  -----------  ------------
100644  Sat Oct 15 03:01:00 JST 2005     42599518  "gbphg.seq"
head(file, lines = 10)

テキストファイルやオブジェクトの先頭 10 行を表示します。

bioruby> head "gbphg.seq"
GBPHG.SEQ            Genetic Sequence Data Bank
                          October 15 2005

                NCBI-GenBank Flat File Release 150.0

                          Phage Sequences         

    2713 loci,    16892737 bases, from     2713 reported sequences

表示する行数を指定することもできます。

bioruby> head "gbphg.seq", 2
GBPHG.SEQ            Genetic Sequence Data Bank
                          October 15 2005

テキストの入っている変数の先頭を見ることもできます。

bioruby> entry = getent("gbphg.seq")
bioruby> head entry, 2
GBPHG.SEQ            Genetic Sequence Data Bank
                          October 15 2005
disp(obj)

テキストファイルやオブジェクトの中身をページャーで表示します。 ここで使用するページャーは pager コマンドで変更することができます(後述)。

bioruby> disp "gbphg.seq"
bioruby> disp entry
bioruby> disp [1, 2, 3] * 4

変数

ls

セッション中に作成した変数(オブジェクト)の一覧を表示します。

bioruby> ls
["entry", "seq"]

bioruby> a = 123
["a", "entry", "seq"]
rm(symbol)

変数を消去します。

bioruby> rm "a"

bioruby> ls
["entry", "seq"]
savefile(filename, object)

変数に保存されている内容をテキストファイルに保存します。

bioruby> savefile "testfile.txt", entry
Saving data (testfile.txt) ... done

bioruby> disp "testfile.txt"

各種設定

永続化の仕組みとして BioRuby シェル終了時に session ディレクトリ内に ヒストリ、オブジェクト、個人の設定が保存され、次回起動時に自動的に 読み込まれます。

config

BioRuby シェルの各種設定を表示します。

bioruby> config
message = "...BioRuby in the shell..."
marshal = [4, 8]
color   = false
pager   = nil
echo    = false

echo 表示するかどうかを切り替えます。on の場合は、puts や p などを つけなくても評価した値が画面に表示されます。 irb コマンドの場合は初期設定が on になっていますが、bioruby コマンドでは 長い配列やエントリなど長大な文字列を扱うことが多いため、初期設定では off にしています。

bioruby> config :echo
Echo on
  ==> nil

bioruby> config :echo
Echo off

コドン表など、可能な場合にカラー表示するかどうかを切り替えます。 カラー表示の場合、プロンプトにも色がつきますので判別できます。

bioruby> config :color
bioruby> codontable
(色付き)

実行するたびに設定が切り替わります。

bioruby> config :color
bioruby> codontable
(色なし)

BioRuby シェル起動時に表示されるスプラッシュメッセージを違う文字列に 変更します。何の解析プロジェクト用のディレクトリかを指定しておくのも よいでしょう。

bioruby> config :message, "Kumamushi genome project"

K u m a m u s h i   g e n o m e   p r o j e c t

  Version : BioRuby 0.8.0 / Ruby 1.8.4

デフォルトの文字列に戻すには、引数なしで実行します。

bioruby> config :message

BioRuby シェル起動時に表示されるスプラッシュメッセ−ジを アニメーション表示するかどうかを切り替えます。 こちらも実行するたびに設定が切り替わります。

bioruby> config :splash
Splash on
pager(command)

disp コマンドで実際に利用するページャーを切り替えます。

bioruby> pager "lv"
Pager is set to 'lv'

bioruby> pager "less -S"
Pager is set to 'less -S'

ページャーを使用しない設定にする場合は引数なしで実行します。

bioruby> pager
Pager is set to 'off'

ページャーが off の時に引数なしで実行すると環境変数 PAGER の値を利用します。

bioruby> pager
Pager is set to 'less'

遺伝子アスキーアート

doublehelix(sequence)

DNA 配列をアスキーアートで表示するオマケ機能があります。 適当な塩基配列 seq を二重螺旋っぽく表示してみましょう。

bioruby> dna = getseq("atgc" * 10).randomize
bioruby> doublehelix dna
     ta
    t--a
   a---t
  a----t
 a----t
t---a
g--c
 cg
 gc
a--t
g---c
 c----g
  c----g
(略)

遺伝子音楽

midifile(midifile, sequence)

DNA 配列を MIDI ファイルに変換するオマケ機能があります。 適当な塩基配列 seq を使って生成した midifile.mid を MIDI プレイヤーで演奏してみましょう。

bioruby> midifile("midifile.mid", seq)
Saving MIDI file (midifile.mid) ... done

以上で BioRuby シェルの解説を終わり、以下では BioRuby ライブラリ自体の 解説を行います。

塩基・アミノ酸配列を処理する (Bio::Sequence クラス)

Bio::Sequence クラスは、配列に対する様々な操作を行うことができます。 簡単な例として、短い塩基配列 atgcatgcaaaa を使って、相補配列への変換、 部分配列の切り出し、塩基組成の計算、アミノ酸への翻訳、分子量計算などを 行なってみます。アミノ酸への翻訳では、必要に応じて何塩基目から翻訳を開 始するかフレームを指定したり、codontable.rb で定義されているコドンテー ブルの中から使用するものを指定したりする事ができます(コドンテーブルの 番号は <URL:http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi> を参照)。

#!/usr/bin/env ruby

require 'bio'

seq = Bio::Sequence::NA.new("atgcatgcaaaa")

puts seq                            # 元の配列
puts seq.complement                 # 相補配列 (Bio::Sequence::NA)
puts seq.subseq(3,8)                # 3 塩基目から 8 塩基目まで

p seq.gc_percent                    # GC 塩基の割合 (Integer)
p seq.composition                   # 全塩基組成 (Hash)

puts seq.translate                  # 翻訳配列 (Bio::Sequence::AA)
puts seq.translate(2)               # 2文字目から翻訳(普通は1から)
puts seq.translate(1,9)             # 9番のコドンテーブルを使用

p seq.translate.codes               # アミノ酸を3文字コードで表示 (Array)
p seq.translate.names               # アミノ酸を名前で表示 (Array)
p seq.translate.composition         # アミノ酸組成 (Hash)
p seq.translate.molecular_weight    # 分子量を計算 (Float)

puts seq.complement.translate       # 相補配列の翻訳

print, puts, p は内容を画面に表示するための Ruby 標準メソッドです。 基本となる print と比べて、puts は改行を自動でつけてくれる、 p は文字列や数字以外のオブジェクトも人間が見やすいように表示してくれる、 という特徴がありますので適宜使い分けます。さらに、

require 'pp'

とすれば使えるようになる pp メソッドは、p よりも表示が見やすくなります。

塩基配列は Bio::Sequence::NA クラスの、アミノ酸配列は Bio::Sequence::AA クラスのオブジェクトになります。それぞれ Bio::Sequence クラスを継承し ているため、多くのメソッドは共通です。

さらに Bio::Sequence::NA, AA クラスは Ruby の String クラスを継承しているので String クラスが持つメソッドも使う事ができます。例えば部分配列を切り出すには Bio::Sequence クラスの subseq(from,to) メソッドの他に、String クラスの [] メソッドを使うこともできます。

Ruby の文字列は 1 文字目を 0 番目として数える点には注意が必要です。たとえば、

puts seq.subseq(1, 3)
puts seq[0, 3]

はどちらも seq の最初の3文字 atg を表示します。

このように、String のメソッドを使う場合は、生物学で普通使用される 1 文字目を 1 番目として数えた数字からは 1 を引く必要があります(subseq メソッドは これを内部でやっています。また、from, to のどちらかでも 0 以下の場合は 例外が発生するようになっています)。

ここまでの処理を BioRuby シェルで試すと以下のようになります。

# 次の行は seq = seq("atgcatgcaaaa") でもよい
bioruby> seq = Bio::Sequence::NA.new("atgcatgcaaaa")
# 生成した配列を表示
bioruby> puts seq
atgcatgcaaaa
# 相補配列を表示
bioruby> puts seq.complement
ttttgcatgcat
# 部分配列を表示(3塩基目から8塩基目まで)
bioruby> puts seq.subseq(3,8)
gcatgc
# 配列の GC% を表示
bioruby> p seq.gc_percent
33
# 配列の組成を表示
bioruby> p seq.composition
{"a"=>6, "c"=>2, "g"=>2, "t"=>2}
# アミノ酸配列への翻訳
bioruby> puts seq.translate
MHAK
# 2塩基を開始塩基として翻訳
bioruby> puts seq.translate(2)
CMQ
# 9番のコドンテーブルを使用して翻訳
bioruby> puts seq.translate(1,9)
MHAN
# 翻訳されたアミノ酸配列を3文字コードで表示
bioruby> p seq.translate.codes
["Met", "His", "Ala", "Lys"]
# 翻訳されたアミノ酸配列をアミノ酸の名前で表示
bioruby> p seq.translate.names
["methionine", "histidine", "alanine", "lysine"]
# 翻訳されたアミノ酸配列の組成を表示
bioruby> p seq.translate.composition
{"K"=>1, "A"=>1, "M"=>1, "H"=>1}
# 翻訳されたアミノ酸配列の分子量を表示
bioruby> p seq.translate.molecular_weight
485.605
# 相補配列を翻訳
bioruby> puts seq.complement.translate
FCMH
# 部分配列(1塩基目から3塩基目まで)
bioruby> puts seq.subseq(1, 3)
atg
# 部分配列(1塩基目から3塩基目まで)
bioruby> puts seq[0, 3]
atg

window_search(window_size, step_size) メソッドを使うと、配列に対してウィ ンドウをずらしながらそれぞれの部分配列に対する処理を行うことができます。 Ruby の特長のひとつである「ブロック」によって、「それぞれに対する処理」を 簡潔かつ明瞭に書くことが可能です。以下の例では、subseq という変数にそれぞれ 部分配列を代入しながらブロックを繰り返し実行することになります。

  • 100 塩基ごとに(1塩基ずつずらしながら)平均 GC% を計算して表示する

    seq.window_search(100) do |subseq|
      puts subseq.gc_percent
    end

ブロックの中で受け取る部分配列も、元と同じ Bio::Sequence::NA または Bio::Sequence::AA クラスのオブジェクトなので、配列クラスの持つ全てのメ ソッドを実行することができます。

また、2番目の引数に移動幅を指定することが出来るようになっているので、

  • コドン単位でずらしながら 15 塩基を 5 残基のペプチドに翻訳して表示する

    seq.window_search(15, 3) do |subseq|
      puts subseq.translate
    end

といったことができます。さらに移動幅に満たない右端の部分配列をメソッド 自体の返り値として戻すようになっているので、

  • ゲノム配列を 10000bp ごとにブツ切りにして FASTA フォーマットに整形、 このとき末端 1000bp はオーバーラップさせ、10000bp に満たない 3' 端は 別途受け取って表示する

    i = 1
    remainder = seq.window_search(10000, 9000) do |subseq|
      puts subseq.to_fasta("segment #{i}", 60)
      i += 1
    end
    puts remainder.to_fasta("segment #{i}", 60)

のような事もわりと簡単にできます。

ウィンドウの幅と移動幅を同じにするとオーバーラップしないウィンドウサー チができるので、

  • コドン頻度を数える

    codon_usage = Hash.new(0)
    seq.window_search(3, 3) do |subseq|
      codon_usage[subseq] += 1
    end
  • 10 残基ずつ分子量を計算

    seq.window_search(10, 10) do |subseq|
      puts subseq.molecular_weight
    end

といった応用も考えられます。

実際には Bio::Sequence::NA オブジェクトはファイルから読み込んだ文字列か ら生成したり、データベースから取得したものを使ったりします。たとえば、

#!/usr/bin/env ruby

require 'bio'

input_seq = ARGF.read       # 引数で与えられたファイルの全行を読み込む

my_naseq = Bio::Sequence::NA.new(input_seq)
my_aaseq = my_naseq.translate

puts my_aaseq

このプログラムを na2aa.rb として、以下の塩基配列

gtggcgatctttccgaaagcgatgactggagcgaagaaccaaagcagtgacatttgtctg
atgccgcacgtaggcctgataagacgcggacagcgtcgcatcaggcatcttgtgcaaatg
tcggatgcggcgtga

を書いたファイル my_naseq.txt を読み込んで翻訳すると

% ./na2aa.rb my_naseq.txt
VAIFPKAMTGAKNQSSDICLMPHVGLIRRGQRRIRHLVQMSDAA*

のようになります。ちなみに、このくらいの例なら短くすると1行で書けます。

% ruby -r bio -e 'p Bio::Sequence::NA.new($<.read).translate' my_naseq.txt

しかし、いちいちファイルを作るのも面倒なので、次はデータベースから必要な 情報を取得してみます。

GenBank のパース (Bio::GenBank クラス)

GenBank 形式のファイルを用意してください(手元にない場合は、 ftp://ftp.ncbi.nih.gov/genbank/ から .seq ファイルをダウンロードします)。

% wget ftp://ftp.hgc.jp/pub/mirror/ncbi/genbank/gbphg.seq.gz
% gunzip gbphg.seq.gz

まずは、各エントリから ID と説明文、配列を取り出して FASTA 形式に変換して みましょう。

Bio::GenBank::DELIMITER は GenBank クラスで定義されている定数で、 データベースごとに異なるエントリの区切り文字(たとえば GenBank の場合は //) を覚えていなくても良いようになっています。

#!/usr/bin/env ruby

require 'bio'

while entry = gets(Bio::GenBank::DELIMITER)
  gb = Bio::GenBank.new(entry)      # GenBank オブジェクト

  print ">#{gb.accession} "         # ACCESSION 番号
  puts gb.definition                # DEFINITION 行
  puts gb.naseq                     # 塩基配列(Sequence::NA オブジェクト)
end

しかし、この書き方では GenBank ファイルのデータ構造に依存しています。 ファイルからのデータ入力を扱うクラス Bio::FlatFile を使用することで、 以下のように区切り文字などを気にせず書くことができます。

#!/usr/bin/env ruby

require 'bio'

ff = Bio::FlatFile.new(Bio::GenBank, ARGF)
ff.each_entry do |gb|
  definition = "#{gb.accession} #{gb.definition}"
  puts gb.naseq.to_fasta(definition, 60)
end

形式の違うデータ、たとえばFASTAフォーマットのファイルを読み込むときでも、

#!/usr/bin/env ruby

require 'bio'

ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF)
ff.each_entry do |f|
  puts "definition : " + f.definition
  puts "nalen      : " + f.nalen.to_s
  puts "naseq      : " + f.naseq
end

のように、同じような書き方で済ませられます。

さらに、各 Bio::DB クラスの open メソッドで同様のことができます。たとえば、

#!/usr/bin/env ruby

require 'bio'

ff = Bio::GenBank.open("gbvrl1.seq")
ff.each_entry do |gb|
  definition = "#{gb.accession} #{gb.definition}"
  puts gb.naseq.to_fasta(definition, 60)    
end

などと書くことができます(ただし、この書き方はあまり使われていません)。

次に、GenBank の複雑な FEATURES の中をパースして必要な情報を取り出します。 まずは /tranlation="アミノ酸配列" という Qualifier がある場合だけ アミノ酸配列を抽出して表示してみます。

#!/usr/bin/env ruby

require 'bio'

ff = Bio::FlatFile.new(Bio::GenBank, ARGF)

# GenBank の1エントリごとに
ff.each_entry do |gb|

  # FEATURES の要素を一つずつ処理
  gb.features.each do |feature|

    # Feature に含まれる Qualifier を全てハッシュに変換
    hash = feature.to_hash

    # Qualifier に translation がある場合だけ
    if hash['translation']
      # エントリのアクセッション番号と翻訳配列を表示
      puts ">#{gb.accession}
      puts hash['translation']
    end
  end
end

さらに、Feature のポジションに書かれている情報からエントリの塩基配列を スプライシングし、それを翻訳したものと /translation= に書かれていた配列を 両方表示して比べてみましょう。

#!/usr/bin/env ruby

require 'bio'

ff = Bio::FlatFile.new(Bio::GenBank, ARGF)

# GenBank の1エントリごとに
ff.each_entry do |gb|

  # ACCESSION 番号と生物種名を表示
  puts "### #{gb.accession} - #{gb.organism}"

  # FEATURES の要素を一つずつ処理
  gb.features.each do |feature|

    # Feature の position (join ...など) を取り出す
    position = feature.position

    # Feature に含まれる Qualifier を全てハッシュに変換
    hash = feature.to_hash

    # /translation= がなければスキップ
    next unless hash['translation']

    # /gene=, /product= などの Qualifier から遺伝子名などの情報を集める
    gene_info = [
      hash['gene'], hash['product'], hash['note'], hash['function']
    ].compact.join(', ')
    puts "## #{gene_info}"

    # 塩基配列(position の情報によってスプライシング)
    puts ">NA splicing('#{position}')"
    puts gb.naseq.splicing(position)

    # アミノ酸配列(スプライシングした塩基配列から翻訳)
    puts ">AA translated by splicing('#{position}').translate"
    puts gb.naseq.splicing(position).translate

    # アミノ酸配列(/translation= に書かれていたのもの)
    puts ">AA original translation"
    puts hash['translation']
  end
end

もし、使用されているコドンテーブルがデフォルト (universal) と違ったり、 最初のコドンが "atg" 以外だったり、セレノシステインが含まれていたり、 あるいは BioRuby にバグがあれば、上の例で表示される2つのアミノ酸配列は 異なる事になります。

この例で使用されている Bio::Sequence#splicing メソッドは、GenBank, EMBL, DDBJ フォーマットで使われている Location の表記を元に、塩基配列から 部分配列を切り出す強力なメソッドです。

この splicing メソッドの引数には GenBank 等の Location の文字列以外に BioRuby の Bio::Locations オブジェクトを渡すことも可能ですが、 通常は見慣れている Location 文字列の方が分かりやすいかも知れません。 Location 文字列のフォーマットや Bio::Locations について詳しく知りたい場合は BioRuby の bio/location.rb を見てください。

  • GenBank 形式のデータの Feature で使われていた Location 文字列の例

    naseq.splicing('join(2035..2050,complement(1775..1818),13..345')
  • あらかじめ Locations オブジェクトに変換してから渡してもよい

    locs = Bio::Locations.new('join((8298.8300)..10206,1..855)')
    naseq.splicing(locs)

ちなみに、アミノ酸配列 (Bio::Sequence::AA) についても splicing メソッド を使用して部分配列を取り出すことが可能です。

  • アミノ酸配列の部分配列を切り出す(シグナルペプチドなど)

    aaseq.splicing('21..119')

GenBank 以外のデータベース

BioRuby では、GenBank 以外のデータベースについても基本的な扱い方は同じで、 データベースの1エントリ分の文字列を対応するデータベースのクラスに渡せば、 パースされた結果がオブジェクトになって返ってきます。

データベースのフラットファイルから1エントリずつ取り出してパースされた オブジェクトを取り出すには、先にも出てきた Bio::FlatFile を使います。 Bio::FlatFile.new の引数にはデータベースに対応する BioRuby でのクラス 名 (Bio::GenBank や Bio::KEGG::GENES など) を指定します。

ff = Bio::FlatFile.new(Bio::データベースクラス名, ARGF)

しかし、すばらしいことに、実は FlatFile クラスはデータベースの自動認識が できますので、

ff = Bio::FlatFile.auto(ARGF)

を使うのが一番簡単です。

#!/usr/bin/env ruby

require 'bio'

ff = Bio::FlatFile.auto(ARGF)

ff.each_entry do |entry|
  p entry.entry_id          # エントリの ID
  p entry.definition        # エントリの説明文
  p entry.seq               # 配列データベースの場合
end

ff.close

さらに、開いたデータベースの閉じ忘れをなくすためには Ruby のブロックを 活用して以下のように書くのがよいでしょう。

#!/usr/bin/env ruby

require 'bio'

Bio::FlatFile.auto(ARGF) do |ff|
  ff.each_entry do |entry|
    p entry.entry_id          # エントリの ID
    p entry.definition        # エントリの説明文
    p entry.seq               # 配列データベースの場合
  end
end

パースされたオブジェクトから、エントリ中のそれぞれの部分を取り出すための メソッドはデータベース毎に異なります。よくある項目については

  • entry_id メソッド → エントリの ID 番号が返る
  • definition メソッド → エントリの定義行が返る
  • reference メソッド → リファレンスオブジェクトが返る
  • organism メソッド → 生物種名
  • seq や naseq や aaseq メソッド → 対応する配列オブジェクトが返る

などのように共通化しようとしていますが、全てのメソッドが実装されているわ けではありません(共通化の指針は bio/db.rb 参照)。また、細かい部分は各 データベースパーザ毎に異なるので、それぞれのドキュメントに従います。

原則として、メソッド名が複数形の場合は、オブジェクトが配列として返ります。 たとえば references メソッドを持つクラスは複数の Bio::Reference オブジェ クトを Array にして返しますが、別のクラスでは単数形の reference メソッド しかなく、1つの Bio::Reference オブジェクトだけを返す、といった感じです。

PDB のパース (Bio::PDB クラス)

Bio::PDB は、PDB 形式を読み込むためのクラスです。PDB データベースは PDB, mmCIF, XML (PDBML) の3種類のフォーマットで提供されていますが、 これらのうち BioRuby で対応しているのは PDB フォーマットです。

PDB フォーマットの仕様は、以下の Protein Data Bank Contents Guide を 参照してください。

PDB データの読み込み

PDB の1エントリが 1bl8.pdb というファイルに格納されている場合は、 Ruby のファイル読み込み機能を使って

entry = File.read("1bl8.pdb")

のようにすることで、エントリの内容を文字列として entry という変数に 代入することができます。エントリの内容をパースするには

pdb = Bio::PDB.new(entry)

とします。これでエントリが Bio::PDB オブジェクトとなり、任意のデータを 取り出せるようになります。

PDB フォーマットは Bio::FlatFile による自動認識も可能ですが、現在は 1ファイルに複数エントリを含む場合には対応していません。 Bio::FlatFile を使って1エントリ分だけ読み込むには、

pdb = Bio::FlatFile.auto("1bl8.pdb") { |ff| ff.next_entry }

とします。どちらの方法でも変数 pdb には同じ結果が得られます。

オブジェクトの階層構造

各 PDB エントリは、英数字4文字からなる ID が付けられています。 Bio::PDB オブジェクトから ID を取リ出すには entry_id メソッドを使います。

p pdb.entry_id   # => "1BL8"

エントリの概要に関する情報も対応するメソッドで取り出すことができます。

p pdb.definition # => "POTASSIUM CHANNEL (KCSA) FROM STREPTOMYCES LIVIDANS"
p pdb.keywords   # => ["POTASSIUM CHANNEL", "INTEGRAL MEMBRANE PROTEIN"]

他に、登録者や文献、実験方法などの情報も取得できます(それぞれ authors, jrnl, method メソッド)。

PDB データは、基本的には1行が1つのレコードを形成しています。 1行に入りきらないデータを複数行に格納する continuation という 仕組みも用意されていますが、基本は1行1レコードです。

各行の先頭6文字がその行のデータの種類を示す名前(レコード)になります。 BioRuby では、HEADER レコードに対しては Bio::PDB::Record::HEADER クラス、 TITLE レコードに対しては Bio::PDB::Record::TITLE クラス、というように 基本的には各レコードに対応するクラスを1つ用意しています。 ただし、REMARK と JRNL レコードに関しては、それぞれ複数のフォーマットが 存在するため、複数のクラスを用意しています。

各レコードにアクセスするもっとも単純な方法は record メソッドです。

pdb.record("HELIX")

のようにすると、その PDB エントリに含まれる全ての HELIX レコードを Bio::PDB::Record::HELIX クラスのオブジェクトの配列として取得できます。

このことをふまえ、以下では、PDB エントリのメインな内容である立体構造に 関するデータ構造の扱い方を見ていきます。

原子: Bio::PDB::Record::ATOM, Bio::PDB::Record::HETATM クラス

PDB エントリは、タンパク質、核酸(DNA,RNA)やその他の分子の立体構造、 具体的には原子の3次元座標を含んでいます。

タンパク質または核酸の原子の座標は、ATOM レコードに格納されています。 対応するクラスは、Bio::PDB::Record::ATOM クラスです。

タンパク質・核酸以外の原子の座標は、HETATM レコードに格納されています。 対応するクラスは、Bio::PDB::Record::HETATM クラスです。

HETATM クラスは ATOM クラスを継承しているため、ATOM と HETATM の メソッドの使い方はまったく同じです。

アミノ酸残基(または塩基): Bio::PDB::Residue クラス

1アミノ酸または1塩基単位で原子をまとめたのが Bio::PDB::Residue です。 Bio::PDB::Residue オブジェクトは、1個以上の Bio::PDB::Record::ATOM オブジェクトを含みます。

化合物: Bio::PDB::Heterogen クラス

タンパク質・核酸以外の分子の原子は、基本的には分子単位で Bio::PDB::Heterogen にまとめられています。 Bio::PDB::Heterogen オブジェクトは、1個以上の Bio::PDB::Record::HETATM オブジェクトを含みます。

鎖(チェイン): Bio::PDB::Chain クラス

Bio::PDB::Chain は、複数の Bio::PDB::Residue オブジェクトからなる 1個のタンパク質または核酸と、複数の Bio::PDB::Heterogen オブジェクト からなる1個以上のそれ以外の分子を格納するデータ構造です。

なお、大半の場合は、タンパク質・核酸(Bio::PDB::Residue)か、 それ以外の分子(Bio::PDB::Heterogen)のどちらか一種類しか持ちません。 Chain をひとつしか含まない PDB エントリでは両方持つ場合があるようです。

各 Chain には、英数字1文字の ID が付いています(Chain をひとつしか 含まない PDB エントリの場合は空白文字のときもあります)。

モデル: Bio::PDB::Model

1個以上の Bio::PDB::Chain が集まったものが Bio::PDB::Model です。 X線結晶構造の場合、Model は通常1個だけですが、NMR 構造の場合、 複数の Model が存在することがあります。 複数の Model が存在する場合、各 Model にはシリアル番号が付きます。

そして、1個以上の Model が集まったものが、Bio::PDB オブジェクトになります。

原子にアクセスするメソッド

Bio::PDB#each_atom は全ての ATOM を順番に1個ずつ辿るイテレータです。

pdb.each_atom do |atom|
  p atom.xyz
end

この each_atom メソッドは Model, Chain, Residue オブジェクトに対しても 使用することができ、それぞれ、その Model, Chain, Residue 内部のすべての ATOM をたどるイテレータとして働きます。

Bio::PDB#atoms は全ての ATOM を配列として返すメソッドです。

p pdb.atoms.size        # => 2820 個の ATOM が含まれることがわかる

each_atom と同様に atoms メソッドも Model, Chain, Residue オブジェクト に対して使用可能です。

pdb.chains.each do |chain|
  p chain.atoms.size    # => 各 Chain 毎の ATOM 数が表示される
end

Bio::PDB#each_hetatm は、全ての HETATM を順番に1個ずつ辿るイテレータです。

pdb.each_hetatm do |hetatm|
  p hetatm.xyz
end

Bio::PDB#hetatms 全ての HETATM を配列として返すのは hetatms メソッドです。

p pdb.hetatms.size

これらも atoms の場合と同様に、Model, Chain, Heterogen オブジェクトに 対して使用可能です。

Bio::PDB::Record::ATOM, Bio::PDB::Record::HETATM クラスの使い方

ATOM はタンパク質・核酸(DNA・RNA)を構成する原子、HETATM はそれ以外の 原子を格納するためのクラスですが、HETATM が ATOM クラスを継承しているため これらのクラスでメソッドの使い方はまったく同じです。

p atom.serial       # シリアル番号
p atom.name         # 名前
p atom.altLoc       # Alternate location indicator
p atom.resName      # アミノ酸・塩基名または化合物名
p atom.chainID      # Chain の ID
p atom.resSeq       # アミノ酸残基のシーケンス番号
p atom.iCode        # Code for insertion of residues
p atom.x            # X 座標
p atom.y            # Y 座標
p atom.z            # Z 座標
p atom.occupancy    # Occupancy
p atom.tempFactor   # Temperature factor
p atom.segID        # Segment identifier
p atom.element      # Element symbol
p atom.charge       # Charge on the atom

これらのメソッド名は、原則として Protein Data Bank Contents Guide の 記載に合わせています。メソッド名に resName や resSeq といった記名法 (CamelCase)を採用しているのはこのためです。 それぞれのメソッドの返すデータの意味は、仕様書を参考にしてください。

この他にも、いくつかの便利なメソッドを用意しています。 xyz メソッドは、座標を3次元のベクトルとして返すメソッドです。 このメソッドは、Ruby の Vector クラスを継承して3次元のベクトルに 特化させた Bio::PDB::Coordinate クラスのオブジェクトを返します (注: Vectorを継承したクラスを作成するのはあまり推奨されないようなので、 将来、Vectorクラスのオブジェクトを返すよう仕様変更するかもしれません)。

p atom.xyz

ベクトルなので、足し算、引き算、内積などを求めることができます。

# 原子間の距離を求める
p (atom1.xyz - atom2.xyz).r  # r はベクトルの絶対値を求めるメソッド

# 内積を求める
p atom1.xyz.inner_product(atom2.xyz)

他には、その原子に対応する TER, SIGATM, ANISOU レコードを取得する ter, sigatm, anisou メソッドも用意されています。

アミノ酸残基 (Residue) にアクセスするメソッド

Bio::PDB#each_residue は、全ての Residue を順番に辿るイテレータです。 each_residue メソッドは、Model, Chain オブジェクトに対しても 使用することができ、それぞれの Model, Chain に含まれる全ての Residue を辿るイテレータとして働きます。

pdb.each_residue do |residue|
  p residue.resName
end

Bio::PDB#residues は、全ての Residue を配列として返すメソッドです。 each_residue と同様に、Model, Chain オブジェクトに対しても使用可能です。

p pdb.residues.size

化合物 (Heterogen) にアクセスするメソッド

Bio::PDB#each_heterogen は全ての Heterogen を順番にたどるイテレータ、 Bio::PDB#heterogens は全ての Heterogen を配列として返すメソッドです。

pdb.each_heterogen do |heterogeon|
  p heterogen.resName
end

p pdb.heterogens.size

これらのメソッドも Residue と同様に Model, Chain オブジェクトに対しても 使用可能です。

Chain, Model にアクセスするメソッド

同様に、Bio::PDB#each_chain は全ての Chain を順番にたどるイテレータ、 Bio::PDB#chains は全ての Chain を配列として返すメソッドです。 これらのメソッドは Model オブジェクトに対しても使用可能です。

Bio::PDB#each_model は全ての Model を順番にたどるイテレータ、 Bio::PDB#models は全ての Model を配列として返すメソッドです。

PDB Chemical Component Dictionary のデータの読み込み

Bio::PDB::ChemicalComponent クラスは、PDB Chemical Component Dictionary (旧名称 HET Group Dictionary)のパーサです。

PDB Chemical Component Dictionary については以下のページを参照してください。

データは以下でダウンロードできます。

このクラスは、RESIDUE から始まって空行で終わる1エントリをパースします (PDB フォーマットにのみ対応しています)。

Bio::FlatFile によるファイル形式自動判別に対応しています。 このクラス自体は ID から化合物を検索したりする機能は持っていません。 br_bioflat.rb によるインデックス作成には対応していますので、 必要ならそちらを使用してください。

Bio::FlatFile.auto("het_dictionary.txt") |ff|
  ff.each do |het|
    p het.entry_id  # ID
    p het.hetnam    # HETNAM レコード(化合物の名称)
    p het.hetsyn    # HETSYM レコード(化合物の別名の配列)
    p het.formul    # FORMUL レコード(化合物の組成式)
    p het.conect    # CONECT レコード
  end
end

最後の conect メソッドは、化合物の結合を Hash として返します。 たとえば、エタノールのエントリは次のようになりますが、

RESIDUE   EOH      9
CONECT      C1     4 C2   O   1H1  2H1
CONECT      C2     4 C1  1H2  2H2  3H2
CONECT      O      2 C1   HO
CONECT     1H1     1 C1
CONECT     2H1     1 C1
CONECT     1H2     1 C2
CONECT     2H2     1 C2
CONECT     3H2     1 C2
CONECT      HO     1 O
END
HET    EOH              9
HETNAM     EOH ETHANOL
FORMUL      EOH    C2 H6 O1

このエントリに対して conect メソッドを呼ぶと

{ "C1"  => [ "C2", "O", "1H1", "2H1" ], 
  "C2"  => [ "C1", "1H2", "2H2", "3H2" ], 
  "O"   => [ "C1", "HO" ], 
  "1H1" => [ "C1" ], 
  "1H2" => [ "C2" ], 
  "2H1" => [ "C1" ], 
  "2H2" => [ "C2" ], 
  "3H2" => [ "C2" ], 
  "HO"  => [ "O" ] }

という Hash を返します。

ここまでの処理を BioRuby シェルで試すと以下のようになります。

# PDB エントリ 1bl8 をネットワーク経由で取得
bioruby> ent_1bl8 = getent("pdb:1bl8")
# エントリの中身を確認
bioruby> head ent_1bl8
# エントリをファイルに保存
bioruby> savefile("1bl8.pdb", ent_1bl8)
# 保存されたファイルの中身を確認
bioruby> disp "data/1bl8.pdb"
# PDB エントリをパース
bioruby> pdb_1bl8 = flatparse(ent_1bl8)
# PDB のエントリ ID を表示
bioruby> pdb_1bl8.entry_id
# getent("pdb:1bl8") して flatparse する代わりに、以下でもOK
bioruby> obj_1bl8 = getobj("pdb:1bl8")
bioruby> obj_1bl8.entry_id
# 各 HETEROGEN ごとに残基名を表示
bioruby> pdb_1bl8.each_heterogen { |heterogen| p heterogen.resName }

# PDB Chemical Component Dictionary を取得
bioruby> het_dic = open("http://deposit.pdb.org/het_dictionary.txt").read
# 取得したファイルのバイト数を確認
bioruby> het_dic.size
# 取得したファイルを保存
bioruby> savefile("data/het_dictionary.txt", het_dic)
# ファイルの中身を確認
bioruby> disp "data/het_dictionary.txt"
# 検索のためにインデックス化し het_dic というデータベースを作成
bioruby> flatindex("het_dic", "data/het_dictionary.txt")
# ID が EOH のエタノールのエントリを検索
bioruby> ethanol = flatsearch("het_dic", "EOH")
# 取得したエントリをパース
bioruby> osake = flatparse(ethanol)
# 原子間の結合テーブルを表示
bioruby> sake.conect

アライメント (Bio::Alignment クラス)

Bio::Alignment クラスは配列のアライメントを格納するためのコンテナです。 Ruby の Hash や Array に似た操作が可能で、BioPerl の Bio::SimpleAlign に 似た感じになっています。以下に簡単な使い方を示します。

require 'bio'

seqs = [ 'atgca', 'aagca', 'acgca', 'acgcg' ]
seqs = seqs.collect{ |x| Bio::Sequence::NA.new(x) }

# アライメントオブジェクトを作成
a = Bio::Alignment.new(seqs)

# コンセンサス配列を表示
p a.consensus             # ==> "a?gc?"

# IUPAC 標準の曖昧な塩基を使用したコンセンサス配列を表示
p a.consensus_iupac       # ==> "ahgcr"

# 各配列について繰り返す
a.each { |x| p x }
  # ==>
  #    "atgca"
  #    "aagca"
  #    "acgca"
  #    "acgcg"

# 各サイトについて繰り返す
a.each_site { |x| p x }
  # ==>
  #    ["a", "a", "a", "a"]
  #    ["t", "a", "c", "c"]
  #    ["g", "g", "g", "g"]
  #    ["c", "c", "c", "c"]
  #    ["a", "a", "a", "g"]

# Clustal W を使用してアライメントを行う。
# 'clustalw' コマンドがシステムにインストールされている必要がある。
factory = Bio::ClustalW.new
a2 = a.do_align(factory)

FASTA による相同性検索を行う(Bio::Fasta クラス)

FASTA 形式の配列ファイル query.pep に対して、自分のマシン(ローカル)あるいは インターネット上のサーバ(リモート)で FASTA による相同性検索を行う方法です。 ローカルの場合は SSEARCH なども同様に使うことができます。

ローカルの場合

FASTA がインストールされていることを確認してください。以下の例では、 コマンド名が fasta34 でパスが通ったディレクトリにインストール されている状況を仮定しています。

検索対象とする FASTA 形式のデータベースファイル target.pep と、FASTA 形式の問い合わせ配列がいくつか入ったファイル query.pep を準備します。

この例では、各問い合わせ配列ごとに FASTA 検索を実行し、ヒットした配列の evalue が 0.0001 以下のものだけを表示します。

#!/usr/bin/env ruby

require 'bio'

# FASTA を実行する環境オブジェクトを作る(ssearch などでも良い)
factory = Bio::Fasta.local('fasta34', ARGV.pop)

# フラットファイルを読み込み、FastaFormat オブジェクトのリストにする
ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF)

# 1エントリずつの FastaFormat オブジェクトに対し
ff.each do |entry|
  # '>' で始まるコメント行の内容を進行状況がわりに標準エラー出力に表示
  $stderr.puts "Searching ... " + entry.definition

  # FASTA による相同性検索を実行、結果は Fasta::Report オブジェクト
  report = factory.query(entry)

  # ヒットしたものそれぞれに対し
  report.each do |hit|
    # evalue が 0.0001 以下の場合
    if hit.evalue < 0.0001
      # その evalue と、名前、オーバーラップ領域を表示
      print "#{hit.query_id} : evalue #{hit.evalue}\t#{hit.target_id} at "
      p hit.lap_at
    end
  end
end

ここで factory は繰り返し FASTA を実行するために、あらかじめ作っておく 実行環境です。

上記のスクリプトを search.rb とすると、問い合わせ配列とデータベース配列の ファイル名を引数にして、以下のように実行します。

% ruby search.rb query.pep target.pep > search.out

FASTA コマンドにオプションを与えたい場合、3番目の引数に FASTA の コマンドラインオプションを書いて渡します。ただし、ktup 値だけは メソッドを使って指定することになっています。 たとえば ktup 値を 1 にして、トップ 10 位以内のヒットを得る場合の オプションは、以下のようになります。

factory = Bio::Fasta.local('fasta34', 'target.pep', '-b 10')
factory.ktup = 1

Bio::Fasta#query メソッドなどの返り値は Bio::Fasta::Report オブジェクト です。この Report オブジェクトから、様々なメソッドで FASTA の出力結果の ほぼ全てを自由に取り出せるようになっています。たとえば、ヒットに関する スコアなどの主な情報は、

report.each do |hit|
  puts hit.evalue           # E-value
  puts hit.sw               # Smith-Waterman スコア (*)
  puts hit.identity         # % identity
  puts hit.overlap          # オーバーラップしている領域の長さ 
  puts hit.query_id         # 問い合わせ配列の ID
  puts hit.query_def        # 問い合わせ配列のコメント
  puts hit.query_len        # 問い合わせ配列の長さ
  puts hit.query_seq        # 問い合わせ配列
  puts hit.target_id        # ヒットした配列の ID
  puts hit.target_def       # ヒットした配列のコメント
  puts hit.target_len       # ヒットした配列の長さ
  puts hit.target_seq       # ヒットした配列
  puts hit.query_start      # 相同領域の問い合わせ配列での開始残基位置
  puts hit.query_end        # 相同領域の問い合わせ配列での終了残基位置
  puts hit.target_start     # 相同領域のターゲット配列での開始残基位置
  puts hit.target_end       # 相同領域のターゲット配列での終了残基位置
  puts hit.lap_at           # 上記4位置の数値の配列
end

などのメソッドで呼び出せます。これらのメソッドの多くは後で説明する Bio::Blast::Report クラスと共通にしてあります。上記以外のメソッドや FASTA 特有の値を取り出すメソッドが必要な場合は、Bio::Fasta::Report クラスのドキュメントを参照してください。

もし、パースする前の手を加えていない fasta コマンドの実行結果が必要な 場合には、

report = factory.query(entry)
puts factory.output

のように、query メソッドを実行した後で factory オブジェクトの output メソッドを使って取り出すことができます。

リモートの場合

今のところ GenomeNet (fasta.genome.jp) での検索のみサポートしています。 リモートの場合は使用可能な検索対象データベースが決まっていますが、それ以 外の点については Bio::Fasta.remote と Bio::Fasta.local は同じように使う ことができます。

GenomeNet で使用可能な検索対象データベース:

  • アミノ酸配列データベース
    • nr-aa, genes, vgenes.pep, swissprot, swissprot-upd, pir, prf, pdbstr
  • 塩基配列データベース
    • nr-nt, genbank-nonst, gbnonst-upd, dbest, dbgss, htgs, dbsts, embl-nonst, embnonst-upd, genes-nt, genome, vgenes.nuc

まず、この中から検索したいデータベースを選択します。問い合わせ配列の種類 と検索するデータベースの種類によってプログラムは決まります。

  • 問い合わせ配列がアミノ酸のとき
    • 対象データベースがアミノ酸配列データベースの場合、program は 'fasta'
    • 対象データベースが核酸配列データベースの場合、program は 'tfasta'
  • 問い合わせ配列が核酸配列のとき
    • 対象データベースが核酸配列データベースの場合、program は 'fasta'
    • (対象データベースがアミノ酸配列データベースの場合は検索不能?)

プログラムとデータベースの組み合せが決まったら

program = 'fasta'
database = 'genes'

factory = Bio::Fasta.remote(program, database)

としてファクトリーを作り、ローカルの場合と同じように factory.query など のメソッドで検索を実行します。

BLAST による相同性検索を行う(Bio::Blast クラス)

BLAST もローカルと GenomeNet (blast.genome.jp) での検索をサポートして います。できるだけ Bio::Fasta と API を共通にしていますので、上記の例を Bio::Blast と書き換えただけでも大丈夫な場合が多いです。

たとえば、先の f_search.rb は

# BLAST を実行する環境オブジェクトを作る
factory = Bio::Blast.local('blastp', ARGV.pop) 

と変更するだけで同じように実行できます。

同様に、GenomeNet を使用してBLASTを行う場合には Bio::Blast.remote を使います。 この場合、programの指定内容が FASTA と異なります。

  • 問い合わせ配列がアミノ酸のとき
    • 対象データベースがアミノ酸配列データベースの場合、program は 'blastp'
    • 対象データベースが核酸配列データベースの場合、program は 'tblastn'
  • 問い合わせ配列が塩基配列のとき
    • 対象データベースがアミノ酸配列データベースの場合、program は 'blastx'
    • 対象データベースが塩基配列データベースの場合、program は 'blastn'
    • (問い合わせ・データベース共に6フレーム翻訳を行う場合は 'tblastx')

をそれぞれ指定します。

ところで、BLAST では "-m 7" オプションによる XML 出力フォーマッットの方が 得られる情報が豊富なため、Bio::Blast は Ruby 用の XML ライブラリである XMLParser または REXML が使用可能な場合は、XML 出力を利用します。 両方使用可能な場合、XMLParser のほうが高速なので優先的に使用されます。 なお、Ruby 1.8.0 以降では REXML は Ruby 本体に標準添付されています。 もし XML ライブラリがインストールされていない場合は "-m 8" のタブ区切りの 出力形式を扱うようにしています。しかし、このフォーマットでは得られる データが限られるので、"-m 7" の XML 形式の出力を使うことをお勧めします。

すでに見たように Bio::Fasta::Report と Bio::Blast::Report の Hit オブジェ クトはいくつか共通のメソッドを持っています。BLAST 固有のメソッドで良く使 いそうなものには bit_score や midline などがあります。

report.each do |hit|
  puts hit.bit_score        # bit スコア (*)
  puts hit.query_seq        # 問い合わせ配列
  puts hit.midline          # アライメントの midline 文字列 (*)
  puts hit.target_seq       # ヒットした配列

  puts hit.evalue           # E-value
  puts hit.identity         # % identity
  puts hit.overlap          # オーバーラップしている領域の長さ 
  puts hit.query_id         # 問い合わせ配列の ID
  puts hit.query_def        # 問い合わせ配列のコメント
  puts hit.query_len        # 問い合わせ配列の長さ
  puts hit.target_id        # ヒットした配列の ID
  puts hit.target_def       # ヒットした配列のコメント
  puts hit.target_len       # ヒットした配列の長さ
  puts hit.query_start      # 相同領域の問い合わせ配列での開始残基位置
  puts hit.query_end        # 相同領域の問い合わせ配列での終了残基位置
  puts hit.target_start     # 相同領域のターゲット配列での開始残基位置
  puts hit.target_end       # 相同領域のターゲット配列での終了残基位置
  puts hit.lap_at           # 上記4位置の数値の配列
end

FASTAとのAPI共通化のためと簡便のため、スコアなどいくつかの情報は1番目の Hsp (High-scoring segment pair) の値をHitで返すようにしています。

Bio::Blast::Report オブジェクトは、以下に示すような、BLASTの結果出力の データ構造をそのまま反映した階層的なデータ構造を持っています。具体的には

  • Bio::Blast::Report オブジェクトの @iteratinos に
    • Bio::Blast::Report::Iteration オブジェクトの Array が入っており Bio::Blast::Report::Iteration オブジェクトの @hits に
      • Bio::Blast::Report::Hits オブジェクトの Array が入っており Bio::Blast::Report::Hits オブジェクトの @hsps に
        • Bio::Blast::Report::Hsp オブジェクトの Array が入っている

という階層構造になっており、それぞれが内部の値を取り出すためのメソッドを 持っています。これらのメソッドの詳細や、BLAST 実行の統計情報などの値が 必要な場合には、 bio/appl/blast/*.rb 内のドキュメントやテストコードを 参照してください。

既存の BLAST 出力ファイルをパースする

BLAST を実行した結果ファイルがすでに保存してあって、これを解析したい場合 には(Bio::Blast オブジェクトを作らずに) Bio::Blast::Report オブジェク トを作りたい、ということになります。これには Bio::Blast.reports メソッド を使います。対応しているのは デフォルト出力フォーマット("-m 0") または "-m 7" オプションの XML フォーマット出力です。

#!/usr/bin/env ruby

require 'bio'

# BLAST出力を順にパースして Bio::Blast::Report オブジェクトを返す
Bio::Blast.reports(ARGF) do |report|
  puts "Hits for " + report.query_def + " against " + report.db
  report.each do |hit|
    print hit.target_id, "\t", hit.evalue, "\n" if hit.evalue < 0.001
  end
end

のようなスクリプト hits_under_0.001.rb を書いて、

% ./hits_under_0.001.rb *.xml

などと実行すれば、引数に与えた BLAST の結果ファイル *.xml を順番に処理で きます。

Blast のバージョンや OS などによって出力される XML の形式が異なる可能性 があり、時々 XML のパーザがうまく使えないことがあるようです。その場合は Blast 2.2.5 以降のバージョンをインストールするか -D や -m などのオプショ ンの組み合せを変えて試してみてください。

リモート検索サイトを追加するには

注: このセクションは上級ユーザ向けです。可能であれば SOAP などによる ウェブサービスを利用する方がよいでしょう。

Blast 検索は NCBI をはじめ様々なサイトでサービスされていますが、今のとこ ろ BioRuby では GenomeNet 以外には対応していません。これらのサイトは、

  • CGI を呼び出す(コマンドラインオプションはそのサイト用に処理する)
  • -m 8 など BioRuby がパーザを持っている出力フォーマットで blast の 出力を取り出す

ことさえできれば、query を受け取って検索結果を Bio::Blast::Report.new に 渡すようなメソッドを定義するだけで使えるようになります。具体的には、この メソッドを「exec_サイト名」のような名前で Bio::Blast の private メソッド として登録すると、4番目の引数に「サイト名」を指定して

factory = Bio::Blast.remote(program, db, option, 'サイト名')

のように呼び出せるようになっています。完成したら BioRuby プロジェクトま で送ってもらえれば取り込ませて頂きます。

PubMed を引いて引用文献リストを作る (Bio::PubMed クラス)

次は、NCBI の文献データベース PubMed を検索して引用文献リストを作成する例です。

#!/usr/bin/env ruby

require 'bio'

ARGV.each do |id|
  entry = Bio::PubMed.query(id)     # PubMed を取得するクラスメソッド
  medline = Bio::MEDLINE.new(entry) # Bio::MEDLINE オブジェクト
  reference = medline.reference     # Bio::Reference オブジェクト
  puts reference.bibtex             # BibTeX フォーマットで出力
end

このスクリプトを pmfetch.rb など好きな名前で保存し、

% ./pmfetch.rb 11024183 10592278 10592173

など引用したい論文の PubMed ID (PMID) を引数に並べると NCBI にアクセスし て MEDLINE フォーマットをパースし BibTeX フォーマットに変換して出力して くれるはずです。

他に、キーワードで検索する機能もあります。

#!/usr/bin/env ruby

require 'bio'

# コマンドラインで与えたキーワードのリストを1つの文字列にする
keywords = ARGV.join(' ')

# PubMed をキーワードで検索
entries = Bio::PubMed.search(keywords)

entries.each do |entry|
  medline = Bio::MEDLINE.new(entry) # Bio::MEDLINE オブジェクト
  reference = medline.reference     # Bio::Reference オブジェクト
  puts reference.bibtex             # BibTeX フォーマットで出力
end

このスクリプトを pmsearch.rb など好きな名前で保存し

% ./pmsearch.rb genome bioinformatics

など検索したいキーワードを引数に並べて実行すると、PubMed をキーワード 検索してヒットした論文のリストを BibTeX フォーマットで出力します。

最近では、NCBI は E-Utils というウェブアプリケーションを使うことが 推奨されているので、今後は Bio::PubMed.esearch メソッドおよび Bio::PubMed.efetch メソッドを使う方が良いでしょう。

#!/usr/bin/env ruby

require 'bio'

keywords = ARGV.join(' ')

options = {
  'maxdate' => '2003/05/31',
  'retmax' => 1000,
}

entries = Bio::PubMed.esearch(keywords, options)

Bio::PubMed.efetch(entries).each do |entry|
  medline = Bio::MEDLINE.new(entry)
  reference = medline.reference
  puts reference.bibtex
end

このスクリプトでは、上記の pmsearch.rb とほぼ同じように動きます。さらに、 NCBI E-Utils を活用することにより、検索対象の日付や最大ヒット件数などを 指定できるようになっているので、より高機能です。オプションに与えられる 引数については E-Utils のヘルプページ を参照してください。

ちなみに、ここでは bibtex メソッドで BibTeX フォーマットに変換しています が、後述のように bibitem メソッドも使える他、(強調やイタリックなど 文字の修飾はできませんが)nature メソッドや nar など、いくつかの雑誌の フォーマットにも対応しています。

BibTeX の使い方のメモ

上記の例で集めた BibTeX フォーマットのリストを TeX で使う方法を簡単にま とめておきます。引用しそうな文献を

% ./pmfetch.rb 10592173 >> genoinfo.bib
% ./pmsearch.rb genome bioinformatics >> genoinfo.bib

などとして genoinfo.bib ファイルに集めて保存しておき、

\documentclass{jarticle}
\begin{document}
\bibliographystyle{plain}
ほにゃらら KEGG データベース~\cite{PMID:10592173}はふがほげである。
\bibliography{genoinfo}
\end{document}

というファイル hoge.tex を書いて、

% platex hoge
% bibtex hoge   # → genoinfo.bib の処理
% platex hoge   # → 文献リストの作成
% platex hoge   # → 文献番号

とすると無事 hoge.dvi ができあがります。

bibitem の使い方のメモ

文献用に別の .bib ファイルを作りたくない場合は Reference#bibitem メソッ ドの出力を使います。上記の pmfetch.rb や pmsearch.rb の

puts reference.bibtex

の行を

puts reference.bibitem

に書き換えるなどして、出力結果を

\documentclass{jarticle}
\begin{document}
ほにゃらら KEGG データベース~\cite{PMID:10592173}はふがほげである。

\begin{thebibliography}{00}

\bibitem{PMID:10592173}
Kanehisa, M., Goto, S.
KEGG: kyoto encyclopedia of genes and genomes.,
{\em Nucleic Acids Res}, 28(1):27--30, 2000.

\end{thebibliography}
\end{document}

のように \begin{thebibliography} で囲みます。これを hoge.tex とすると

% platex hoge   # → 文献リストの作成
% platex hoge   # → 文献番号

と2回処理すればできあがりです。

OBDA

OBDA (Open Bio Database Access) とは、Open Bioinformatics Foundation によって制定された、配列データベースへの共通アクセス方法です。これは、 2002 年の1月と2月に Arizona と Cape Town にて開催された BioHackathon において、BioPerl, BioJava, BioPython, BioRuby などの各プロジェクトの メンバーが参加して作成されました。

  • BioRegistry (Directory)
    • データベース毎に配列をどこにどのように取りに行くかを指定する仕組み
  • BioFlat
    • フラットファイルの 2 分木または BDB を使ったインデックス作成
  • BioFetch
    • HTTP 経由でデータベースからエントリを取得するサーバとクライアント
  • BioSQL
    • MySQL や PostgreSQL などの関係データベースに配列データを格納する ための schema と、エントリを取り出すためのメソッド

詳細は <URL:http://obda.open-bio.org/> を参照してください。 それぞれの仕様書は cvs.open-bio.org の CVSレポジトリに置いてあります。 または、<URL:http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/obda-specs/?cvsroot=obf-common> から参照できます。

BioRegistry

BioRegistryとは、設定ファイルによって各データベースのエントリ取得方法を 指定することにより、どんな方法を使っているかをほとんど意識せずデータを 取得することを可能とするための仕組みです。 設定ファイルの優先順位は

  • (メソッドのパラメータで)指定したファイル
  • ~/.bioinformatics/seqdatabase.ini
  • /etc/bioinformatics/seqdatabase.ini
  • http://www.open-bio.org/registry/seqdatabase.ini

最後の open-bio.org の設定は、ローカルな設定ファイルが見つからない場合に だけ参照します。

BioRuby の現在の実装では、すべてのローカルな設定ファイルを読み込み、 同じ名前の設定が複数存在した場合は、最初に見つかった設定だけが使用されます。 これを利用すると、たとえば、システム管理者が /etc/bioinformatics/ に置いた 設定のうち個人的に変更したいものだけ ~/.bioinformatics/ で上書きすることが できます。サンプルの seqdatabase.ini ファイルが bioruby のソースに含まれて いますので参照してください。

設定ファイルの中身は stanza フォーマットと呼ばれる書式で記述します。

[データベース名]
protocol=プロトコル名
location=サーバ名

このようなエントリを各データベースについて記述することになります。 データベース名は、自分が使用するためのラベルなので分かりやすいものを つければ良く、実際のデータベースの名前と異なっていても構わないようです。 同じ名前のデータベースが複数あるときは最初に書かれているものから順に 接続を試すように仕様書では提案されていますが、今のところ BioRuby では それには対応していません。

また、プロトコルの種類によっては location 以外にも(MySQL のユーザ名など) 追加のオプションを記述する必要があります。現在のところ、仕様書で規定され ている protocol としては以下のものがあります。

  • index-flat
  • index-berkeleydb
  • biofetch
  • biosql
  • bsane-corba
  • xembl

今のところ BioRuby で使用可能なのは index-flat, index-berkleydb, biofetch と biosql だけです。また、BioRegistryや各プロトコルの仕様は変更されること がありますが、BioRubyはそれに追従できていないかもしれません。

BioRegistry を使うには、まず Bio::Registryオブジェクトを作成します。 すると、設定ファイルが読み込まれます。

reg = Bio::Registry.new

# 設定ファイルに書いたデータベース名でサーバへ接続
serv = reg.get_database('genbank')

# ID を指定してエントリを取得
entry = serv.get_by_id('AA2CG')

ここで serv は設定ファイルの [genbank] の欄で指定した protocol プロトコ ルに対応するサーバオブジェクトで、Bio::SQL や Bio::Fetch などのインスタ ンスが返っているはずです(データベース名が見つからなかった場合は nil)。

あとは OBDA 共通のエントリ取得メソッド get_by_id を呼んだり、サーバオ ブジェクト毎に固有のメソッドを呼ぶことになりますので、以下の BioFetch や BioSQL の解説を参照してください。

BioFlat

BioFlat はフラットファイルに対してインデックスを作成し、エントリを高速に 取り出す仕組みです。インデックスの種類は、RUbyの拡張ライブラリに依存しない index-flat と Berkeley DB (bdb) を使った index-berkeleydb の2種類が存在 します。なお、index-berkeleydb を使用するには、BDB という Ruby の拡張 ライブラリを別途インストールする必要があります。インデックスの作成には bioruby パッケージに付属する br_bioflat.rb コマンドを使って、

% br_bioflat.rb --makeindex データベース名 [--format クラス名] ファイル名

のようにします。BioRubyはデータフォーマットの自動認識機能を搭載している ので --format オプションは省略可能ですが、万一うまく認識しなかった場合は BioRuby の各データベースのクラス名を指定してください。検索は、

% bioflat データベース名 エントリID

とします。具体的に GenBank の gbbct*.seq ファイルにインデックスを作成し て検索する場合、

% bioflat --makeindex my_bctdb --format GenBank gbbct*.seq
% bioflat my_bctdb A16STM262

のような感じになります。

Ruby の bdb 拡張モジュール(詳細は http://raa.ruby-lang.org/project/bdb/ 参照) がインストールされている場合は Berkeley DB を利用してインデックスを作成する ことができます。この場合、

% bioflat --makeindex-bdb データベース名 [--format クラス名] ファイル名

のように "--makeindex" のかわりに "--makeindex-bdb" を指定します。

BioFetch

BioFetch は CGI を経由してサーバからデータベースのエントリを取得する仕様 で、サーバが受け取る CGI のオプション名、エラーコードなどが決められてい ます。クライアントは HTTP を使ってデータベース、ID、フォーマットなどを指 定し、エントリを取得します。

BioRuby プロジェクトでは GenomeNet の DBGET システムをバックエンドとした BioFetch サーバを実装しており、bioruby.org で運用しています。このサーバの ソースコードは BioRuby の sample/ ディレクトリに入っています。現在のところ BioFetch サーバはこの bioruby.org のものと EBI の二か所しかありません。

BioFetch を使ってエントリを取得するには、いくつかの方法があります。

  1. ウェブブラウザから検索する方法(以下のページを開く)

    http://bioruby.org/cgi-bin/biofetch.rb
  2. BioRuby付属の br_biofetch.rb コマンドを用いる方法

    % br_biofetch.rb db_name entry_id
  3. スクリプトの中から Bio::Fetch クラスを直接使う方法

    serv = Bio::Fetch.new(server_url)
    entry = serv.fetch(db_name, entry_id)
  4. スクリプトの中で BioRegistry 経由で Bio::Fetch クラスを間接的に使う方法

    reg = Bio::Registry.new
    serv = reg.get_database('genbank')
    entry = serv.get_by_id('AA2CG')

もし (4) を使いたい場合は seqdatabase.ini で

[genbank]
protocol=biofetch
location=http://bioruby.org/cgi-bin/biofetch.rb
biodbname=genbank

などと指定しておく必要があります。

BioFetch と Bio::KEGG::GENES, Bio::AAindex1 を組み合わせた例

次のプログラムは、BioFetch を使って KEGG の GENES データベースから古細菌 Halobacterium のバクテリアロドプシン遺伝子 (VNG1467G) を取ってきて、同じ ようにアミノ酸指標データベースである AAindex から取得したαヘリックスの 指標 (BURA740101) を使って、幅 15 残基のウィンドウサーチをする例です。

#!/usr/bin/env ruby

require 'bio'

entry = Bio::Fetch.query('hal', 'VNG1467G')
aaseq = Bio::KEGG::GENES.new(entry).aaseq

entry = Bio::Fetch.query('aax1', 'BURA740101')
helix = Bio::AAindex1.new(entry).index

position = 1
win_size = 15

aaseq.window_search(win_size) do |subseq|
  score = subseq.total(helix)
  puts [ position, score ].join("\t")
  position += 1
end

ここで使っているクラスメソッド Bio::Fetch.query は暗黙に bioruby.org の BioFetch サーバを使う専用のショートカットです。(このサーバは内部的には ゲノムネットからデータを取得しています。KEGG/GENES データベースの hal や AAindex データベース aax1 のエントリは、他の BioFetch サーバでは取得でき ないこともあって、あえて query メソッドを使っています。)

BioSQL

to be written...

BioRuby のサンプルプログラムの使い方

BioRuby のパッケージには samples/ ディレクトリ以下にいくつかのサンプルプ ログラムが含まれています。古いものも混じっていますし、量もとても十分とは 言えないので、実用的で面白いサンプルの提供は歓迎です。

to be written...

さらなる情報

他のチュートリアル的なドキュメントとしては、BioRuby Wikiに置いてある BioRuby in Anger があります。

脚注

  • (※1) BioRuby 1.2.1 以前のバージョンでは、setup.rb のかわりに install.rb を使用します。また、以下のように3段階を踏む必要があります。

    % ruby install.rb config
    % ruby install.rb setup
    # ruby install.rb install
  • (※2) BioRuby 1.0.0 以前のバージョンでは、getseq, getent, getobj の各コマンドのかわりに、seq, ent, obj の各コマンドを使用してください。
  • (※3) BioRuby 0.7.1 以前のバージョンでは、Bio::Sequence::NA クラスか、 Bio::sequence::AA クラスのどちらかのオブジェクトになります。 配列がどちらのクラスに属するかは Ruby の class メソッドを用いて

    bioruby> p cdc2.class
    Bio::Sequence::AA
    
    bioruby> p psaB.class
    Bio::Sequence::NA

    のように調べることができます。自動判定が間違っている場合などには to_naseq, to_aaseq メソッドで強制的に変換できます。

  • (※4) seq メソッドは、読み込んだデータの種類によっては、塩基・アミノ酸の どちらにも当てはまらない配列のための Bio::Sequence::Generic クラスや String クラスのオブジェクトを返す場合があるかもしれません。
  • (※5) NCBI, EBI, TogoWS が特別な設定無しに getseq, getent, getobj コマンド から利用可能となったのは BioRuby 1.3.0 以降です。
bio-2.0.3/doc/ChangeLog-before-1.3.10000644000175000017500000037566114141516614016055 0ustar nileshnilesh2009-09-02 Naohisa Goto * BioRuby 1.3.1 is released. 2009-09-02 Naohisa Goto * lib/bio/version.rb Preparation for bioruby-1.3.1 release. (commit 3d86bc6d519c4c3319e5a1b2ca36f8f5177f127f) 2009-08-31 Naohisa Goto * lib/bio/sequence/compat.rb Document bug fix: Bio::Sequence::(NA|AA|Generic)#to_fasta are currently not deprecated. (commit 0e0f888a73a60c0f0a7b103019aeb82c8f063c4e) 2009-08-28 Naohisa Goto * lib/bio/appl/sim4/report.rb Bug fix: parse error when unaligned regions exist. Thanks to Tomoaki NISHIYAMA who reports the bug ([BioRuby] SIM4 parser). * test/unit/bio/appl/sim4/test_report.rb, test/data/sim4/complement-A4.sim4 To confirm the bug fix, tests are added with new test data. (commit 02d531e36ecf789f232cf3e05f85391b60279f00) 2009-08-27 Naohisa Goto * lib/bio/appl/sim4/report.rb Bug fix: parse errpr when the alignment of an intron is splitted into two lines. Thanks to Tomoaki NISHIYAMA who sent the patch ([BioRuby] SIM4 parser). (commit 137ec4c3099236c89ac4a0157d0c77ba13d1875c) 2009-08-27 Naohisa Goto * lib/bio/appl/sim4/report.rb Ruby 1.9 support: String#each_line instead of String#each (commit b65f176f3be74c21a8bb8fc2a6f204fb8ab08fd6) 2009-08-27 Naohisa Goto * test/unit/bio/appl/sim4/test_report.rb, test/data/sim4/simple-A4.sim4, test/data/sim4/simple2-A4.sim4 Newly added unit tests for Bio::Sim4::Report with test data. The test data is based on the data provided by Tomoaki NISHIYAMA ([BioRuby] SIM4 parser), and most of the sequence data is replaced by random sequence. (commit 0f53916dd728b871f02d1caf0c5105a2e1c58bc4) 2009-08-18 Naohisa Goto * COPYING, COPYING.ja, GPL, LGPL, LEGAL License files are added. COPYING, COPYING.ja, GPL, LGPL are taken from Ruby's svn repository. LEGAL is written for BioRuby. (commit c65531331e840562ac7342f1896f7e2a3aac6c88) * README.rdoc Added descriptions about license to refer COPYING and LEGAL. (commit d88015a2e3b2c5f7c2a931261819b908084d0179) * COPYING Modified COPYING for BioRuby, following Matz's recommendation in [ruby-list:46293]. (commit 2c30e7342e33c878bd7132a302974364c54caad9) 2009-05-06 Naohisa Goto * lib/bio/appl/fasta.rb, lib/bio/appl/fasta/format10.rb Restored Bio::Fasta.parser for keeping compatibility, and added forgotten require. (commit 97b9284109c9a4431b92eab208509e1df6069b4b) 2009-05-02 Naohisa Goto * lib/bio/appl/fasta.rb * Bug fix: Bio::Fasta::Report should be autoloaded. * Removed useless method Bio::Fasta::Report.parser because only the "format10" parser is available for a long time and dynamic require is a potential security hole. * Removed "require" lines in Bio::Fasta#parse_result. (commit 3d3edc44127f4fd97abcc17a859e36623facdc7c) 2009-05-02 Naohisa Goto * lib/bio/appl/fasta/format10.rb Bug Fix: stack overflow problem, and added support for multiple query sequences. * Bug fix: stack overflow problem. Thanks to Fredrik Johansson who reports the bug ([BioRuby] Made a change in format10.rb). * Changed to set @entry_overrun when a report containing multiple query sequences' results is given. * New methods Bio::Fasta::Report#query_def and query_len. * To support reading a search result with multiple query sequences by using Bio::FlatFile, a flatfile splitter class Bio::Fasta::Report::FastaFormat10Splitter is newly added. (commit e57349594427ad1a51979c9d4e0c3efcffd160c2) 2009-04-27 Naohisa Goto * test/unit/bio/test_feature.rb, test/unit/bio/test_reference.rb class name conflict of NullStderr (commit 1607b60d905eb8cb5ca289e357cbb2cbb7a118ff) * test/unit/bio/appl/test_blast.rb Bug fix: method redefined: TestBlast#test_self_local (commit 9caa4c9d94126b3568c439878876062c84afbdec) * test/unit/bio/appl/hmmer/test_report.rb Bug fix: method name conflict: TestHMMERReportClassMethods#test_reports_ary (commit cc3e1b85cf885736a7b1293c7e0951e099cd7e6b) * test/unit/bio/appl/bl2seq/test_report.rb * Bug fix: method redefined: TestBl2seqReport#test_undefed_methods. To fix the bug, the second "test_undefed_methods" is renamed to "test_undefed_methods_for_iteration". * Assertions are changed in the first "test_undefed_methods". * Fixed typo. (commit 7e1a550de3dffde3fd8808803e44f35072e4d40b) 2009-04-27 Naohisa Goto * lib/bio/util/restriction_enzyme/range/sequence_range/calculated_cuts.rb Bug fix: attribute "strands_for_display" is disabled because the method definition with the same name overwrites the attribute definition. (commit af07e2784faacc51366ddfab5bedd45841734f53) * lib/bio/db/embl/embl.rb Bug fix: removed duplicated alias (commit 65c360f39580322b5eee64b7c2d8274ff7b8dfff) * lib/bio/appl/pts1.rb Bug fix: removed unused attribute "function" in Bio::PTS1 because method definition with the same name appeared later wipes out the attribute definition. (commit 81cbe9da55217d186e6dc9c1bfb56a39fba73590) 2009-04-27 Naohisa Goto * lib/bio/appl/blast/format0.rb Bug fix: forgotten "if false #dummy" in the attribute "query_to". (commit 7c2e8d0d11baf8cb9e25207ba5b27d4e9d756054) * lib/bio/appl/blast.rb Bug fix: suppressing warning messages when $VERBOSE=true. * To suppress warining message "lib/bio/appl/blast.rb:402: warning: useless use of :: in void context", a dummy variable is added. * The attribute "server" is changed to attr_reader because "server=" is defined later. (commit f9404276d2ddcf15966cab74c419733ccd748af2) 2009-04-27 Naohisa Goto * test/unit/bio/test_sequence.rb Fixed test name overwriting another test name in TestSequence. Fixed by Andrew Grimm at git://github.com/agrimm/bioruby.git in Thu Feb 19 22:30:26 2009 +1100. (commit a6c39a719b284a43fe8c67edc1f2826d2941647f) 2009-04-26 Naohisa Goto * test/unit/bio/appl/gcg/test_msf.rb, test/data/gcg/pileup-aa.msf Newly added unit tests for Bio::GCG::Msf with test data (commit a1819cd3b772300ef5bea2ebb63376e5b9fc64da) 2009-04-23 Naohisa Goto * lib/bio/appl/gcg/msf.rb * Bug fix: incorrect parsing of GCG clustalw+ results. * Small refactoring of codes (commit 2eae8f722aa888c85d54aa958eb117d49ce42f8b) 2009-04-21 Naohisa Goto * lib/bio/appl/gcg/msf.rb * Bug fix: Bio::GCG::Msf fails parsing when two dots are appeared at the end of a line. Thanks to Fredrik Johansson who reports the bug and send the patch ([BioRuby] Parsing MSF alignment file). * bug fix: misspelling of "ALIGNMENT". (commit 44ca52443e0249f54c43f92d08cf083cdd12c692) 2009-04-21 Naohisa Goto * lib/bio/io/pubmed.rb Bug fix: Bio::PubMed#efetch should return an array of string objects. Thanks to Masahide Kikkawa and Fredrik Johansson who report the bug (in "[BioRuby] Bio::PubMed.efetch, bug?" and "[BioRuby] PubMed.efetch error"). (commit a48a9a35b87dead069fe328ba7086977304af995) * test/functional/bio/io/test_pubmed.rb Newly added functional test for Bio::PubMed. (commit bf5ba6d4503f3ddb0ca31673882f5b396a932bbe) 2009-04-21 Naohisa Goto * lib/bio/io/ncbirest.rb * Bug fix: Bio::NCBI::REST#esearch ignores hash["retstart"]. * In Bio::NCBI::REST#esearch, the priority of limit and hash["retmax"] is clarified: limit is used unless it is nil. In addition, default value of limit is changed to nil. If both limit and hash["retmax"] are nil, default value 100 is used. * Bio::NCBI::REST::NCBI_INTERVAL is changed to 1. (commit fc0339fe8a42cd00199cfdc938590ae9626551bc) 2009-03-19 Naohisa Goto * lib/bio/io/ncbirest.rb Bug fix: Bio::PubMed.efetch/esearch ignores retmax after refactoring of Bio::PubMed. efetch/esearch methods in Bio::NCBI::REST are also affected. Thanks to Craig Knox who reports the bug ([BioRuby] efetch/esearch broken). Bug fix by Toshiaki Katayama. (commit 51c3223e033b2992a7bd95da282f88164406ff92) 2009-03-19 Naohisa Goto * doc/Tutorial.rd GO example using Ensembl API is moved to Appendix. (commit d677c3d7cbd2f4ff6193255e0e30366ecd0aa421) fixed RD text formatting issues (commit 642577ae70647f8bd0ae3bcc8ddc118cecc886c7) * doc/Tutorial.rd.html doc/Tutorial.rd.html is regenerated (commit dd878d3ecd83ad5e61a21bbf90d27d1c89d5f12d) 2009-03-18 Naohisa Goto * doc/Tutorial.rd Reverted a Blast example code because it aims to tell usage of blocks to explore a Blast report, and getting the "result" is only a side effect and not the main purpose. (commit db172eb1e5f1cbc17317bff8043cc07bf6597073) 2009-03-18 Pjotr Prins * doc/Tutorial.rd Updated Tutorial. (commit b3363ee94cfb86540a7d286ccac608b74737b30d) Updated tutorial with links and gene ontology example by Marc Hoeppner. (commit 27a5019ca7a41211055550f9731672aa71a3a4b3) Fixed doctests in documentation. (commit 9a21a1750a9584152fae669be132af89086e7d5f) Added working BLAST example. (commit 45c27f109f069db3b6208fd59cc2b683a5bca5a9) Added BLAST example. All doctests work again in Tutorial.rd. (commit 05edf3092d0322b8f2775e60448700024d8cb343) Slightly improved remarks. Tutorial.rd runs its doctests. (commit 541a4cf0d9d0d3904f1570e1258a847a22f9238b) reference to github.com/pjotrp/bioruby-support (commit ec5dfb1544e32034457b0dd36a9dc50fef6c0fbe) Added info on how to split large BLAST XML files. (commit 6c9a80cde4be6c4c3d02b77c44dfa8bfbf0a41ff) Updated Tutorial (commit 07267e2d9c5b774bb0f41b795f6be1f24ff175ba) 2009-03-17 Raoul Jean Pierre Bonnal * lib/bio/db/biosql/sequence.rb Fixed: taxonomy, do not report node_rank of type "class". GenBanks Tests I/O passed. (commit ba5400eaf6de0f38341825cb0fbc24ca1d99eeba) 2009-03-17 Raoul Jean Pierre Bonnal * test/unit/bio/db/biosql/tc_biosql.rb Removed last "\n" from reference GenBank string (commit 2de87ceef220056a502c5a9a3457abdf1d93fab0) 2009-03-17 Raoul Jean Pierre Bonnal * lib/bio/db/biosql/sequence.rb Fix: reference deletion from bioentry deletion, when reference is a leaf (no more bioentries connected to it) (commit 6f3195a023cab8ee64eb3e3bb9c491534cd80603) 2009-03-17 Raoul Jean Pierre Bonnal * lib/bio/io/biosql/ar-biosql.rb Added: relation between bioentry and refernces, also through references by bioentry_reference. This is useful to accomplish complete bioentry's delete. (commit d9e5876231d451c9ab1a2e75702f9fe70b1509b8) 2009-03-17 Raoul Jean Pierre Bonnal * test/unit/bio/db/biosql/tc_biosql.rb Fixed: title test (commit 45e2d5e21bc1f93240827dee2e46ac02d24cf696) 2009-03-17 Raoul Jean Pierre Bonnal * lib/bio/db/genbank/common.rb Fix: Delete added dot at the end of TITLE. (commit 2a29c9e7fd41da9d6bf065b3d6dbd473e4d03bbe) 2009-03-17 Raoul Jean Pierre Bonnal * lib/bio/db/biosql/sequence.rb Add: bioentry_qualifier_value recognize if it's handling data (reader) and format it accordingly with GenBank/EMBL format ex: 26-SEP-2006 . (commit 05ba3f1647d4cc71747ada95c9bb7f2a5a44b518) 2009-03-17 Raoul Jean Pierre Bonnal * lib/bio/db/biosql/biosql_to_biosequence.rb Fixed: date_modified, the code is moved to Bio::SQL::Sequence#bioentry_qualifier_anchor#method_reader. It's most an exercise of style than good programming. date_modifier reader should be a method apart. (commit c9a980877c9222e05aa0d9163ba51aa2c77a7146) 2009-03-17 Raoul Jean Pierre Bonnal * test/unit/bio/db/biosql/tc_biosql.rb, test/unit/bio/db/biosql/test_biosql.rb, test/unit/bio/db/biosql/ts_suite_biosql.rb Add: BioSQL's TestSuite, alpha stage (commit be1839b3bf3008fe234e8f89d85302caef83398f) 2009-03-17 Raoul Jean Pierre Bonnal * lib/bio/db/biosql/biosql_to_biosequence.rb Fix: date_modifier biosequence adapter (commit a7c1c717e1684fd9117fc2d096e8d6e7c647b62d) 2009-03-17 Raoul Jean Pierre Bonnal * test/unit/bio/db/biosql/test_biosql.rb Added preliminar tests using connection with jdbcmysql. Test are focused on input/output coherence. (commit 0ada9f8b4bb8553bf076caca76bc76a4d6791c6b) 2009-03-17 Raoul Jean Pierre Bonnal * lib/bio/db/biosql/biosql_to_biosequence.rb Fixed: GI:xxxx reference on VERSION's line using biosql/to_biosequence.output(:genbank) (commit 35e1dce1a75ed967ec707457ed3655ce927f83c3) 2009-03-17 Raoul Jean Pierre Bonnal * lib/bio/db/biosql/sequence.rb added other_seqids as alias of identifier, for the adapter. Export problem of GI in output(:genbank) from biosql/biosequence. (commit 7f69ea73dcd28e76743bd5213c3719cf7d9d44a0) 2009-03-17 Raoul Jean Pierre Bonnal * lib/bio/db/biosql/biosql_to_biosequence.rb (Changed comments only) Added TODOs as comments: to_biosequence.output(:genbank) some major and minor problems. 1) Major. GI: is not exported IN(VERSION X64011.1 GI:44010), OUT(VERSION X64011.1) 1.1) Db storage is ok, GI is saved into identifier of bioentry 2) Moderate. date wrong format IN(26-SEP-2006), OUT(2006-09-26) 3) Minor. Organism in output as more terms. 4) Minor. Title has a dot at the end, input was without ref for GI in genbank are functions ncbi_gi_number/other_seqids (commit 03662955a45e1c3d5d32150b423a92d40c0c33c7) 2009-03-17 Raoul Jean Pierre Bonnal * lib/bio/io/sql.rb get and first converted from DataMapper to ActiveRecord (commit 78b37c61bbb0a16bbee6c3dd16bff7c292e77695) 2009-03-17 Raoul Jean Pierre Bonnal * lib/bio/db/biosql/sequence.rb converted syntax of first function from DataMapper to ActiveRecord (commit 822a35794b958906e5d4bfb6d5b9d74efb360ea7) converted .get! method in find with conditions (commit 1f3012ba93a9c462e8b1daa762372a55534db29c) 2009-03-17 Raoul Jean Pierre Bonnal * lib/bio/io/biosql/biosql.rb establish_connection rewrite and update Class.first call with ActiveRecord syntax. Coming from DataMapper. (commit 66fb6ff597a2ebf2f2dc1ebe7e505fbcc46c993c) 2009-03-17 Raoul Jean Pierre Bonnal * lib/bio/db/biosql/sequence.rb Version developed with DataMapper, need to be tested with ActiveRecord -current ORM-. (commit 7bf5d24364fce8f3a466697e479af5f28c672265) 2009-03-17 Raoul Jean Pierre Bonnal * lib/bio/io/biosql/config/database.yml Configured development database with jdbcmysql adapter. (commit 7e143b1d0451bce6865e560febc5c57048210416) 2009-03-17 Raoul Jean Pierre Bonnal * lib/bio/io/biosql/ar-biosql.rb Newly added lib/bio/io/biosql/ar-biosql.rb: In one file definition of all BioSQL's ActiveRecords classes. (commit 87f7bc6ac844583adc07e409c9fac7fa1f275d2b) class Bioentry: added has_many obejct_bioentry_path and subject_bioentry_path. (commit e969eae59d0de098e094ea21007c34371bab3bdd) class BioentryRelationship: added relation to Term class. (commit f77b28045f0631391c6f4ad4e9eed15d296bec95) class Biosequence: changed to composite primary keys, :bioentry_id, :version. (commit c6683346e4c13d8969bb859e882698b90d0828f1) class SeqfeatureQualifierValue: find function deleted, wrong here. (commit 89f64af363d0b204e50ea71924909724d56bccc4) * lib/bio/io/sql.rb Separated connection (see lib/bio/io/biosql.rb) from definition of public methods. (commit 24b9e6473ce36e3151c560ea26c3b95105656ef4) 2009-03-17 Raoul Jean Pierre Bonnal * lib/bio/io/biosql To integrate BioSQL ActiveRecords classes to one file, as the first step, following 28 files listed below are deleted. In the later commit, they will be integrated into one file, lib/bio/io/biosql/ar-biosql.rb. (commit 0ea9f08b36e10e50c855d4346194849e8e7a263b) * lib/bio/io/biosql/biodatabase.rb * lib/bio/io/biosql/bioentry.rb * lib/bio/io/biosql/bioentry_dbxref.rb * lib/bio/io/biosql/bioentry_path.rb * lib/bio/io/biosql/bioentry_qualifier_value.rb * lib/bio/io/biosql/bioentry_reference.rb * lib/bio/io/biosql/bioentry_relationship.rb * lib/bio/io/biosql/biosequence.rb * lib/bio/io/biosql/comment.rb * lib/bio/io/biosql/dbxref.rb * lib/bio/io/biosql/dbxref_qualifier_value.rb * lib/bio/io/biosql/location.rb * lib/bio/io/biosql/location_qualifier_value.rb * lib/bio/io/biosql/ontology.rb * lib/bio/io/biosql/reference.rb * lib/bio/io/biosql/seqfeature.rb * lib/bio/io/biosql/seqfeature_dbxref.rb * lib/bio/io/biosql/seqfeature_path.rb * lib/bio/io/biosql/seqfeature_qualifier_value.rb * lib/bio/io/biosql/seqfeature_relationship.rb * lib/bio/io/biosql/taxon.rb * lib/bio/io/biosql/taxon_name.rb * lib/bio/io/biosql/term.rb * lib/bio/io/biosql/term_dbxref.rb * lib/bio/io/biosql/term_path.rb * lib/bio/io/biosql/term_relationship.rb * lib/bio/io/biosql/term_relationship_term.rb * lib/bio/io/biosql/term_synonym.rb 2009-03-17 Naohisa Goto * Rakefile Rake::Task#execute now needs to take an argument. Currently, nil is given. 2009-02-20 Naohisa Goto * BioRuby 1.3.0 is released. 2009-02-19 Naohisa Goto * lib/bio/version.rb Preparation for bioruby-1.3.0 release. (commit fd7fc9f78bc5f4d9a10b3c0d457d9781c9ec2e49) * bioruby.gemspec.erb Fixed a logic to determine whether in git repository, and file lists are changed to be sorted. (commit ede0c0d7aeab078b6183c4e0e7c74faec32739f7) 2009-02-18 Naohisa Goto * README.rdoc Added list of document files bundled in the BioRuby distribution. (commit 92748f848e4708766e44c22b2f02ac662491971f) 2009-02-10 Naohisa Goto * KNOWN_ISSUES.rdoc Added details about the text mode issue on mswin32/mingw32/bccwin32 and about non-UNIX/Windows systems. (commit 342a167a23d3b078bd77b3f16f0ceb1aa071df66) 2009-02-09 Naohisa Goto * test/unit/bio/db/test_gff.rb Test bug fix: test_gff.rb failed in some environment (e.g. Windows) because the default formatting rule of Float#to_s depends on the libc implementation. (commit f39bf88ed6a41bd328372ee7de7a23902235f833) 2009-02-06 Naohisa Goto * lib/bio/db/gff.rb, test/unit/bio/db/test_gff.rb * Bug fix: Bio::GFF::GFF3::Record#id and #id= should be changed to follow the previous incompatible change of @attributes. Thanks to Tomoaki NISHIYAMA who reports the bug ([BioRuby] GFF3 status (possible bug?)). * Unit tests are added. (commit 5258d88ef98a12fd7829eb86aa8664a18a672a43) (commit c0c7708b3e91b0d2f2d0d50a4a0ba36928057cc8) 2009-02-05 Naohisa Goto * Rakefile New task "tutorial2html" to generate html from doc/Tutorial.rd and doc/Tutorial.rd.ja. (commit 8d66fae59477f01f12b2fa3509ea34c371102725) * doc/Tutorial.rd.html, doc/Tutorial.rd.ja.html Automatically generated tutorial html from RD formatted documents. (commit 90c4a23eea08b06dd758aaa0a53bea789602d252) * doc/bioruby.css Newly added stylesheet for the tutorial html files. The bioruby.css have been used in http://bioruby.org/ and have been maintained by Toshiaki Katayama. (commit b69dc243787525de065bdf2e6b7da68d6079ab91) * test/runner.rb Added workaroud for test-unit-2.0.x. (commit 475ac6a6b38e8df30de3d9bf4c7e810759ab023d) 2009-02-04 Naohisa Goto * lib/bio/appl/blast/format0.rb Bug fix: a null line can be inserted after query name lines. (commit bea9ce35b4177f407575ed0752c36bba8a50f502) 2009-02-03 Naohisa Goto * Tutorial.rd.ja * Document bug: BioRuby shell commands seq, ent, obj were renamed to getseq, getent, getobj, respectively. Thanks to Hiroyuki Mishima who reports the issue ([BioRuby-ja]). * Changes of returned value of getseq are also reflected to the document. * Recommended Ruby version and installation procedure are also changed. (commit 916e96ca549db71a550e7a5d3bd49a3149614313) * doc/Changes-0.7.rd Documentation forgotten in 1.1.0: rename of BioRuby shell commands. (commit 64113314caac3453b4cc3b80ece9b5fb5841e069) 2009-01-30 Naohisa Goto * lib/bio/appl/blast/format0.rb Bug fix: incorrect parsing of hit sequence's whole length. (commit 98e6f57630b2c3394a9403f58e76b102346c56ef) Bug fix: Whole length of a hit sequence is mistakenly parsed when it contains ",". WU-BLAST parser is also affected in addition to NCBI BLAST parser. * lib/bio/db/lasergene.rb, lib/bio/db/soft.rb, lib/bio/util/color_scheme.rb, lib/bio/util/contingency_table.rb, lib/bio/util/restriction_enzyme.rb Removed ":nodoc:" in in "module Bio" which prevents RDoc of the Bio module. (commit 458db79b467d40ed02db0d085218f611e7dd5e04) 2009-01-29 Naohisa Goto * doc/Changes-1.3.rdoc Added documents about Bio::TogoWS and Bio::BIORUBY_VERSION. * lib/bio/shell/plugin/entry.rb getent (BioRuby shell command) is changed to use EBI Dbfetch or TogoWS in addition to NCBI or KEGG API. (commit 0e172590f60dd5a5f27a24ecd230037a7909224c) * lib/bio/shell/plugin/togows.rb, lib/bio/shell.rb Added new shell plugin providing accesses to TogoWS REST services. (commit 03f6720b90e90703c23536a11b3f12c8155550ff) * lib/bio.rb Added autoload of Bio::TogoWS. (commit f8605e1234164a7aa7f236b4e96a4299229753d7) * test/functional/bio/io/test_togows.rb, test/unit/bio/io/test_togows.rb Newly added functional and unit tests for Bio::TogoWS::REST. (commit f04152b80d07f44f146fa3fa0729facede865aac) * lib/bio/io/togows.rb New class Bio::TogoWS::REST, a REST client for the TogoWS web service (http://togows.dbcls.jp/site/en/rest.html). (commit 652d2534163675182b9ce30cbb1dd5efff45cd60) * bin/br_pmfetch.rb Changed to use Bio::BIORUBY_VERSION_ID instead of CVS version ID. (commit f69d538ffa9ded00eb68dd306e65505d03b6c656) * lib/bio/shell/core.rb Changed to use BIORUBY_VERSION_ID. (commit 4ce11656a205e85cae64eca27cef7cd94eb80930) * bioruby.gemspec.erb Gem version is now determined from lib/bio/version.rb or BIORUBY_GEM_VERSION environment variable. (commit 1811e845e60bc2847ea5717ef936bad93f9f2c87) * Rakefile * Changed to use lib/bio/version.rb. * Environment variable BIORUBY_EXTRAVERSION is renamed to BIORUBY_EXTRA_VERSION. * Added dependency on lib/bio/version.rb to bioruby.gemspec. (commit fb27eaa584cda1bb4cb75e10085996503361c98a) * lib/bio.rb, lib/bio/version.rb Bio::BIORUBY_VERSION is split into lib/bio/version.rb. (commit 9779398c3fa0e9405a875b754a5243e0d6922c32) * New file lib/bio/version.rb contains BioRuby version information. * New constants: Bio::BIORUBY_EXTRA_VERSION stores extra version string (e.g. "-pre1") and Bio::BIORUBY_VERSION_ID stores BioRuby version string (e.g. "1.3.0-pre1"). * Bio::BIORUBY_VERSION is changed to be frozen. Above two constants also store frozen values. 2009-01-26 Naohisa Goto * KNOWN_ISSUES.rdoc Newly added KNOWN_ISSUES.rdoc that describes known issues and bugs in current BioRuby. (commit 06b10262be0bf797a3b133e4697e9b0955408944) (commit a65ad8b42613e46b0b4bb0650d6301da0dcc88c9) * lib/bio/shell/plugin/ncbirest.rb, lib/bio/shell.rb New shell plugin lib/bio/shell/plugin/ncbirest.rb, providing "efetch", "einfo", "esearch", and "esearch_count" methods. They act the same as those defined in Bio::NCBI::REST, except that "efetch" fetches entries with pre-defined databases depending on arguments. (commit c482e1864aa0dbca3727b1059d4fe3d0aefb3917) (commit 3360b8905fdbcd4ca050470fdb2f02a7387e8bb9) * lib/bio/shell/plugin/entry.rb Shell commands "getent" and "getseq" are changed to use "efetch" method when "gb" or some variant is specified as the database. (commit c482e1864aa0dbca3727b1059d4fe3d0aefb3917) (commit 3360b8905fdbcd4ca050470fdb2f02a7387e8bb9) * bioruby.gemspec.erb, bioruby.gemspec * Changed version to 1.2.9.9501. * Changed to use "git ls-files" instead of "git-ls-files", and changed not to redirect to /dev/null. * Special treatment of bioruby.gemspec is removed. * ChangeLog is included to RDoc. * Set RDoc title to "BioRuby API documentation". * Set "--line-numbers" and "--inline-source" to rdoc_options. (commit f014685090c38eeb64219603f2c7e90574849431) * added KNOWN_ISSUES.rdoc to files for no-git environment. (commit 06b10262be0bf797a3b133e4697e9b0955408944) * Ruby 1.9 support: command execution with shell can raise an error. (commit 3179de32f1dc746c8de975917b1718a523800d69) * bioruby.gemspec is generated from bioruby.gemspec.erb. (commit 4e1cd3bfb8207b357d5b71cc0fc8366f06491130) (commit 06b10262be0bf797a3b133e4697e9b0955408944) 2009-01-21 Naohisa Goto * ChangeLog Added recent changes and fixed typo for recent changes. 2009-01-20 Naohisa Goto * ChangeLog, doc/Changes-1.3.rdoc Added ChangeLog and doc/Changes-1.3.rdoc for recent changes. (commit be2254ddea152fddf51a2476eeb20d804b1e3123) * bioruby.gemspec Added bioruby.gemspec created from bioruby.gemspec.erb. (commit 4c54597eaf09107c34ad06bc5f5f9cead77a0198) * lib/bio/appl/blast/wublast.rb Bug fix: parsing of exit code failed when ignoring fatal errors (commit 44ed958acebe4324a9a48e7292c4f0ad5c0fb685) * Bug fix: could not get exit code in WU-BLAST results executed with a command line option "-nonnegok", "-novalidctxok", or "-shortqueryok". * New methods Bio::Blast::WU::Report#exit_code_message and #notes. * Rakefile Added package tasks and changed to use ERB instead of eruby. (commit 7b081c173d3b1cbc46034297ea802a4e06f85b2f) * bioruby.gemspec.erb Use git-ls-files command to obtain list of files when available. (commit 5d5cb24fdd56601bc43ee78facc255ca484245c0) 2009-01-17 Naohisa Goto * Rakefile Simple Rakefile for dynamic generation of bioruby.gemspec (commit d5161d164f3520db25bed9aececb962428b9d6bc) * bioruby.gemspec.erb bioruby.gemspec is renamed to bioruby.gemspec.erb with modification. (commit bef311668e4a3be30965ce94d41e7bde4a4e17f9) To prevent the error "Insecure operation - glob" in GitHub, bioruby.gemspec is renamed to bioruby.gemspec.erb, and modified to generate the file list by using eruby. 2009-01-15 Naohisa Goto * doc/Changes-1.3.rdoc Changes-1.3.rd is renamed to Changes-1.3.rdoc with format conversion, and fixed typo. (commit 1aef599650d14362ed233dcc9a7db8d3c1db1777) Added details about newly added classes etc. (commit eda9fd0abbb8e430810468d777d0b585e33c25d8) 2009-01-13 Naohisa Goto * bioruby.gemspec Changed version to 1.2.9001, set has_rdoc = true and rdoc options. (commit 1f63d3d5389dd3b0316e9f312b56e62371caa253) * Gem version number changed to 1.2.9.9001 for testing gem. * Changed to has_rdoc = true. * README.rdoc and README_DEV.rdoc are now included to gem's rdoc, and README.rdoc is set to the main page. * *.yaml is now excluded from rdoc. 2009-01-13 Jan Aerts * bioruby.gemspec Renamed gemspec.rb to bioruby.gemspec because so github builds the gem automatically (commit 561ae16d20f73dcd6fc3d47c41c97c32f9aadb1a) (committer: Naohisa Goto) (original commit date: Wed Jun 25 11:01:03 2008 +0100) Edited gemspec because github returned an error while building gem. (commit f0d91e07550872c2f0d5835e496af1add7759d42) (committer: Naohisa Goto) (original commit date: Wed Jun 25 11:03:04 2008 +0100) 2009-01-13 Naohisa Goto * README.rdoc Changed format from RD to RDoc with some additional URLs (commit cb8781d701f22cbaf16575bb237a9e0cbf8cd407) Clarified copyright of README.rdoc and BioRuby (commit acd9e6d6e6046281c6c9c03cff1021449b8e780f) Updated descriptions about RubyGems, and added Ruby 1.9 partial support (commit ff63658b255988bf0e7a9f5a2d1523d5104fe588) 2009-01-09 Naohisa Goto * test/runner,rb Ruby 1.9.1 support: using alternatives if no Test::Unit::AutoRunner (commit 5df2a9dc0642d4f1e9a4398d6af908780d622a6e) 2009-01-05 Naohisa Goto * lib/bio/db/fantom.rb Bug fix: incomplete cgi parameter escaping, and suppressing warnings. (commit 754d8815255a0f0db20df9dd74f9f146605d430e) * Bug fix: incomplete cgi parameter escaping for ID string in Bio::FANTOM.get_by_id (and Bio::FANTOM.query which internally calls the get_by_id method). * Warning message "Net::HTTP v1.1 style assignment found" when $VERBOSE=true is suppressed. * Removed obsolete "rescue LoadError" when require 'rexml/document'. * lib/bio/io/fetch.rb Bug fix: possible incomplete form key/value escaping. (commit ecaf2c66261e4ce19ab35f73e305468e1da412ed) * Bug fix: possible incomplete form key/value escaping * Refactoring: changed to use private methods _get and _get_single to access remote site. * lib/bio/io/pubmed.rb Bug fix: possible incomplete escaping of parameters, and suppressing warnings (commit 93daccabb1a82bb20e92798c1810182dfb836ba7) * Bug fix: possible incomplete string escaping of REST parameters in Bio::PubMed#query and #pmfetch. * Warning message "Net::HTTP v1.1 style assignment found" when $VERBOSE=true is suppressed. * Removed obsolete "unless defined?(CGI)". * lib/bio/command.rb, test/unit/bio/test_command.rb Bug fix: incomplete escaping in Bio::Command.make_cgi_params etc. (commit 17c8f947e5d94012921f9252f71460e9d8f593e3) * Buf fix: in Bio::Command.make_cgi_params and make_cgi_params_key_value, string escaping of form keys and values is incomplete. * Warning message "useless use of :: in void context" is suppressed when running test/unit/bio/test_command.rb with $VERBOSE=true. * Unit tests are added. * lib/bio/appl/, lib/bio/io/ (9 files) Suppress warning message "Net::HTTP v1.1 style assignment found" when $VERBOSE = true. (commit a2985eb1f3aed383f1b1b391f2184317c7fd21c7) 2009-01-02 Naohisa Goto * README.rdoc Changing optional requirements, recommended Ruby version, and setup.rb credit. (commit a5462ab4bd403d2d833e5d6db26ae98ca763513c) 2008-12-30 Naohisa Goto * README.rdoc Fixed grammar and spelling in README.rdoc, indicated by Andrew Grimm at git://github.com/agrimm/bioruby.git in Sun Sep 21 19:59:03 2008 +1000. (commit 446918037bff392b9c6bc6828720c585733a8f4b) 2008-12-30 Naohisa Goto * lib/bio.rb Changed BIORUBY_VERSION to 1.3.0, which will be the next BioRuby release version number. (commit b000b1c4a5a136ab287b517b8b8c66e54f99a8a8). * doc/Changes-1.3.rd Added documents about changed points for 1.3.0 release. (commit 028e323e784eb60b18f941cce1e3752abff1433c) * lib/bio/appl/blast/format8.rb Ruby 1.9 support: String#each_line instead of String#each (commit 1bc59708137fd46911d5892e4712cc49c71fa031) * lib/bio/io/flatfile/splitter.rb Checks for undefined constants are added for running without "require 'bio'" in unit tests. (commit 311176d4d390e5948348f623ff3632454136a03f) * lib/bio/appl/blast.rb, lib/bio/appl/blast/report.rb, test/unit/bio/appl/test_blast.rb Support for default (-m 0) and tabular (-m 8) formats in Bio::Blast.reports. * Added support for default (-m 0) and tabular (-m 8) formats in Bio::Blast.reports method. For the purpose, Bio::Blast::Report_tab is added to read tabular format by using Bio::FlatFile. * Unit tests are added. 2008-12-26 Naohisa Goto * lib/bio/appl/paml/codeml/rates.rb Ruby 1.9 support: String#each_line instead of String#each (commit 1789a3975c4c82d3b45f545893be8f2a7bf47a01) 2008-12-26 Naohisa Goto * lib/bio/command.rb, lib/bio/appl/fasta.rb, lib/bio/appl/blast/genomenet.rb Refactoring and following the change of the remote site fasta.genome.jp. (commit 671092dff67890fc48dd7ff2f606c4cedc2eb02c) * New method Bio::Command.http_post_form. * Bio::Blast::Remote::GenomeNet#exec_genomenet and Bio::Fasta#exec_genomenet are changed to use the new method. * Changed a regexp. in Bio::Fasta#exec_genomenet is changed following the change of the remote GenomeNet (fasta.genome.jp). 2008-12-24 Naohisa Goto * lib/bio/location.rb, test/unit/bio/test_location.rb New method Bio::Locations#to_s with bug fix, etc. (commit 115b09456881e1d03730d0b9e7a61a65abf6a1fe) * New method Bio::Locations#to_s is added. * New attributes Bio::Locations#operator and Bio::Location#carat. * Changed not to substitute from "order(...)" or "group(...)" to "join(...)". * Bug fix: Bio::Locations.new(str) changes the argument string when the string contains whitespaces. * Unit tests for Bio::Locations#to_s are added. 2008-12-20 Naohisa Goto * test/functional/bio/appl/test_pts1.rb, test/unit/bio/appl/test_pts1.rb Moved part of test_pts1.rb using network from test/unit to test/functional. (commit 933ff3e7d615fe6521934f137519ea84b3b517f2) 2008-12-18 Naohisa Goto * test/unit/bio/io/test_soapwsdl.rb Ruby 1.9 support: following the change of Object#instance_methods (commit 008cf5f43786f6143f74889e0ec53d1c8a452aa2) Note that SOAP/WSDL library is no longer bundled with Ruby 1.9, and tests in test_soapwsdl.rb may fail. * test/unit/bio/io/test_ddbjxml.rb Ruby 1.9 support: following the change of Module::constants (commit ed1ad96e7ed9d6c7d67e5413a22ba935a3b36efa) * lib/bio/util/restriction_enzyme/single_strand.rb Ruby 1.9 support: changed Array#to_s to join, Symbol#to_i to __id__, etc. (commit a29debb8c03244c1ce61317d6df0a2c5d066de3d) * Ruby 1.9 support: in pattern method, changed to use Array#join instead of Array#to_s. * Ruby 1.9 support: in self.once method, changed to use Object#__id__ instead of Symbol#to_i. * self.once is changed to be a private class method. 2008-12-18 Naohisa Goto * lib/bio/db/rebase.rb Ruby 1.9 support: changed not to use String#each, etc. (commit 47ba6e9fcf864f5881211e766f2e47b60dde178a) * Ruby 1.9 support: In parse_enzymes, parse_references, and parse_suppliers methods, String#each is changed to each_line. * Changed to use require instead of autoload, to reduce support cost. 2008-12-16 Moses Hohman * lib/bio/db/medline.rb, test/unit/bio/db/test_medline.rb fix medline parsing of author last names that are all caps (commit 5f37d566fc2efa4878efbd19e83f909a58c4cb00) 2008-12-15 Mitsuteru Nakao * lib/bio/db/kegg/glycan.rb Bug fix in Bio::KEGG::GLYCAN#mass. Thanks to a reporter. (commit cb8f1acc4caebf1f04d4a6c141dd4477fcb5394b) (committer: Naohisa Goto) 2008-12-15 Naohisa Goto * lib/bio/pathway.rb, test/unit/bio/test_pathway.rb Fixed pending bugs described in unit test, and Ruby 1.9 support (commit 97b3cd4cf78eff8aede16369298aaacf1c319b68) * Pending bugs described in test/unit/bio/test_pathway.rb are fixed. Fixed a bug in subgraph: does not include nodes w/o edges. A bug in cliquishness depending on the subgraph bug is also fixed. * Bio::Pathway#cliquishness is changed to calculate cliquishness (clustering coefficient) for not only undirected graphs but also directed graphs. Note that pending proposed specification changes previously written in test_pathway.rb (raises error for directed graphs, and return 1 for a node that has only one neighbor node) are rejected. * Ruby 1.9 support: To avoid dependency to the order of objects in Hash#each (and each_keys, etc.), Bio::Pathway#index is used to specify preferences of nodes in a graph. Affected methods are: to_matrix, dump_matrix, dump_list, depth_first_search. * Bug fix in the libpath magic in test/unit/bio/test_pathway.rb. 2008-12-09 Naohisa Goto * lib/bio/db/newick.rb, lib/bio/tree.rb Ruby 1.9 support: suppressing "warning: shadowing outer local variable". (commit 6fe31f0a42a87631bdee3796cff65afb053b2add) 2008-12-05 Naohisa Goto * test/unit/bio/io/test_fastacmd.rb Ruby 1.9 support: changed to use respond_to?, etc. (commit 5d6c92c752c00f07ed856fd209c8078ef9fdf57a) * Following the change of Module#methods in Ruby 1.9, changed to use respond_to?(). * The test path '/tmp/test' is replaced with '/dev/null' * lib/bio/db/gff.rb Ruby 1.9 support: changes following the change of String#[] (commit c25cc506bffcf1f2397ac2210153cfbfbbcb4942) * lib/bio/reference.rb Ruby 1.9 support: using enumerator instead of String#collect (commit ea99242570fc8b2e2a869db84b7daaa7737f23e0) * test/unit/bio/test_location.rb Test bug fix: wrong number in libpath magic (commit aa45101246bc42f78a21ee110bc58e59f532e24a) * test/unit/bio/db/test_nexus.rb Test bug fix: missing libpath magic (commit d54eed426461f3a3148953fda1f7b428e74051c6) Thanks to Anthony Underwood who reports the bug in his Github repository. * test/unit/bio/db/pdb/test_pdb.rb Test bug fix: wrong number in libpath magic (commit b53d703a8dd72608ab5ea03457c2828470069f2f) 2008-12-04 Naohisa Goto * test/unit/bio/db/embl/test_embl_to_bioseq.rb Test bug fix: typing error (found by using Ruby 1.9) (commit fa52f99406ddd42221be354346f67245b3572510) * test/unit/bio/db/embl/test_common.rb Ruby 1.9 support: following the change of Module#instance_methods (commit d18fa7c1c3660cf04ec2a8a42d543a20a77cee2c) In Ruby 1.9, Module#instance_methods returns Array containing Symbol objects instead of String. To support both 1.8 and 1.9, "to_s" is added to every affected test method. * lib/bio/appl/tmhmm/report.rb Ruby 1.9 support: using enumerator if the entry is a string (commit 36968122b64b722e230e3e1b52d78221c0b60884) * lib/bio/appl/pts1.rb Ruby 1.9 support: String#each to each_line and Array#to_s to join('') (commit c4c251d5e94167512a0b8a38073a09b72994c08f) * test/unit/bio/appl/test_fasta.rb Ruby 1.9 support: changed to use Array#join instead of Array#to_s (commit bf8823014488166c6e1227dd26bdca344c9f07b7) * lib/bio/appl/blast.rb Ruby 1.9 support: String#each is changed to String#each_line (commit 3e177b9aecf6b54a5112fd81fc02386d18fc14b9) * lib/bio/appl/hmmer/report.rb Ruby 1.9 support: String#each is changed to String#each_line (commit 63bdb3a098bc447e7bd272b3be8f809b4b56d451) * lib/bio/appl/genscan/report.rb Ruby 1.9 support: String#each is changed to String#each_line (commit 082250786756de2b4171b3a00e0c4faaa816fc8f) * test/functional/bio/io/test_ensembl.rb Using jul2008.archive.ensembl.org for workaround of test failure. (commit 1d286f222cdc51cf1323d57c1c79e6943d574829) Due to the renewal of Ensembl web site, lib/bio/io/ensembl.rb does not work for the latest Ensembl. For a workaround of the failure of tests in test/functional/bio/io/test_ensembl.rb, tests for Ensembl#exportview are changed using Ensembl archive (http://jul2008.archive.ensembl.org/). 2008-12-03 Naohisa Goto * sample/demo_sequence.rb sample/demo_sequence.rb, example of sequence manipulation. (commit b7f52b47dbcc7d32f4eb7377d2b1510eb1991fd5) The content of this file is moved from previous version of lib/bio/sequence.rb (inside the "if __FILE__ == $0"). 2008-12-02 Naohisa Goto * lib/bio/appl/paml/baseml.rb, etc. (17 files) Support for baseml and yn00 (still under construction), and incompatible changes to Bio::PAML::Codeml. (commit d2571013409661b4d7be8c5c9db14dbe9a9daaaf) * Security fix: To prevent possible shell command injection, changed to use Bio::Command.query_command instead of %x. * Bug fix with incompatible changes: Using Tempfile.new.path as default values are removed because this can cause unexpected file loss during garbage collection. * Change of method/file names: The term "config file" is changed to "control file" because the term "config file" is never used in PAML documents. The term "options" is changed to "parameters" because the "options" have been used for command-line arguments in other wrappers (e.g. Bio::Blast, Bio::ClustalW). The term "parameters" is also used in BioPerl's Bio::Tools::Run::Phylo::PAML. * Bio::PAML::Codeml.create_config_file, create_control_file, Bio::PAML::Codeml#options, and #options= are now deprecated. They will be removed in the future. * New class Bio::PAML::Common, basic wrapper common to PAML programs. Bio::PAML::Codeml is changed to inherit the Common class. * New classes Bio::PAML::Baseml and Bio::PAML::Yn00, wrappers for baseml and yn00. * New classes Bio::PAML::Common::Report, Bio::PAML::Baseml::Report and Bio::PAML::Yn00::Report, but still under construction. * New methods Bio::PAML::Codeml#query(alignment, tree), etc. * test/data/paml/codeml/dummy_binary is removed because the default of Bio::PAML::Codeml.new is changed to use "codeml" command in PATH. * test/data/paml/codeml/config.missing_tree.txt is removed because treefile can be optional parameter depending on runmode. test/data/paml/codeml/config.missing_align.txt is also removed because test is changed to use normal control file parameters. * lib/bio/command.rb, test/functional/bio/test_command.rb Improvement of Bio::Command.query_command, call_command, etc. (commit e68ee45589f8063e5a648ab235d6c8bbc2c6e5ff) * Improvement of Bio::Command.query_command, call_command, query_command_popen, query_command_fork, call_command_popen, and call_command_fork: they can get an option :chdir => "path", specifying working directory of the child process. * New method Bio::Command.mktmpdir backported from Ruby 1.9.0. * New method Bio::Command.remove_entry_secure that simply calls FileUtils.remove_entry_secure or prints warning messages. * Tests are added in test/functional/bio/test_command.rb. * Ruby 1.9 followup: FuncTestCommandQuery#test_query_command_open3 failed in ruby 1.9 due to the change of Array#to_s. 2008-11-19 Naohisa Goto * test/data/paml/codeml/ Removed some files in test/data/paml/codeml/ because of potential copyright problem, because they are completely identical with those distributed in PAML 4.1. (commit 086b83d3e54f69d2b9e71af3f9647518768353b0) 2008-10-21 Naohisa Goto * lib/bio/sequence/compat.rb Bug fix: TypeError is raised in Bio::Sequence#to_s before Sequence#seq is called. (commit ea8e068a5b7f670ce62bc0d3d4b21639e3ca2714) Thanks to Anthony Underwood who reported the bug and sent the patch. 2008-10-19 Naohisa Goto * setup.rb, README.rdoc install.rb is replaced by new setup.rb. (commit 9def7df5b81340c49534ff0bb932de62402a1c8d) * install.rb is replaced by the latest version of setup.rb taken from the original author's svn repository (svn r2637, newer than version 3.4.1, latest release version. $ svn co http://i.loveruby.net/svn/public/setup/trunk setup). * README.rdoc is modified to follow the rename of install.rb to setup.rb. 2008-10-18 Toshiaki Katayama * lib/bio/io/ncbirest.rb * New methods: Bio::NCBI::REST#einfo, #esearch_count, etc. * New classes: Bio::NCBI::REST::ESearch, Bio::NCBI::REST::EFetch. (commit 637f97deefd6cc113ef18fe18ab628eb619f3dc1) (committer: Naohisa Goto) 2008-10-14 Naohisa Goto * lib/bio/sequence/common.rb, test/unit/bio/sequence/test_common.rb, test/unit/bio/sequence/test_compat.rb, test/unit/bio/sequence/test_na.rb Bug fix: Bio::Sequence::Common#randomize severely biased. (commit 02de70cbf036b41a50d770954f3b16ba2beca880) * Bug fix: Bio::Sequence::Common#randomize was severely biased. To fix the bug, it is changed to used Fisher-Yates shuffle, as suggested by Anders Jacobsen. ([BioRuby] Biased Bio::Sequence randomize()) * The module method Bio::Sequence::Common.randomize is removed because it is not used anymore. * Unit tests for Bio::Sequence::Common#randomize are added. * To avoid possible test class name conflicts, class/module names are changed in test_na.rb, test_compat.rb, and test_common.rb. 2008-10-14 Raoul Jean Pierre Bonnal * lib/bio/io/sql.rb Changed the demonstration code in the "if __FILE__ == $0". (commit 9942105920182c809564554bb0d1dba33fe4caab) * lib/bio/db/biosql/sequence.rb Fix: typing error (commit 67fbbb93adaa8b4b91de3703a235bc75eaef842a) 2008-10-14 Naohisa Goto * lib/bio/db/biosql/sequence.rb, lib/bio/io/sql.rb Merging patches by Raoul in commit 496561a70784d3a1a82bf3117b2d267c7625afac which are ignored when rebasing, probably because of manually editing during merge. (commit c699253d53510c0e76188a72004651a4635088b3) 2008-10-10 Raoul Jean Pierre Bonnal * lib/bio/db/biosql/sequence.rb Fix: check on nil objects (to_biosql) (commit f701e9a71f524ee4373c94ee1bd345e87f16f6ce) BugFix: ex. /focus="true" in output was /focus="t", qualifier.value.to_s fix the bug (commit f6e1530f3372c87031b551e5c76e24f264891e64) * lib/bio/io/biosql/seqfeature.rb BugFix: seqfeature_qualifier_value returned ordered only by rank (commit fb74009393eeca6743f78b7b45cb66858c41d733) * lib/bio/io/biosql/bioentry.rb BugFix: seqfeatures returned ordered by rank (commit 25a249d87d23bd9cb4e671053019675836fcd38c) * lib/bio/db/biosql/sequence.rb Fixed to suppress warnings: Bio::Features is obsoleted. (commit 198a1e893dd4515d61276c9cce8905f02130e721) * lib/bio/db/biosql/biosql_to_biosequence.rb Removed alias comment. (commit c037ec565987634b354ff6d77dbbe7c9d83a9e7c) * lib/bio/db/biosql/sequence.rb Implemented Entry's comments and reference's comments. Fixed species common name. (commit bd3b24ea53ebd9b0ec9dd9f15c27091fe6143e28) * lib/bio/io/biosql/bioentry.rb Cleaned, deleted pk and seq reference (commit 14bcf90334ec3c3f1c1784977b329ae641e9e106) * lib/bio/io/biosql/comment.rb cleaned codes (commit 54976693350ab0512cecf946999c2868b9e88007) * lib/bio/db/biosql/biosql_to_biosequence.rb Added comments, comment adapter. (commit 5394ecea34778c9f571eb35cfc16e3b1a6cb6d1b) 2008-10-09 Raoul Jean Pierre Bonnal * lib/bio/io/sql.rb Changed the demonstration code in the "if __FILE__ == $0". (commit efb61d7c21d229e882c6706838c284404343fa9c) * lib/bio/db/biosql/sequence.rb Added support for reference. ToDo: handling comments. (commit 29211059ee04214d7879f900ec563c0708d8c9d6) * lib/bio/io/biosql/bioentry_reference.rb Fix: compisite primary keys :bioentry_id, :reference_id, :rank (commit eba61ba670c591f58866b37ababc4acac0cc7883) * lib/bio/io/biosql/dbxref.rb removed explicit pk and seq (commit e149f94484469fb3dfd881b45b14be7093b67e0d) 2008-10-09 Naohisa Goto * test/functional/bio/test_command.rb, test/data/command/echoarg2.bat Bug fix: tests in FuncTestCommandCall are failed on mswin32, and URL changed. (commit 921292f1188d85994742ce4aa156b39d6e720aad) * Bug fix: tests in FuncTestCommandCall were failed on mswin32. To fix the test bug, a batch file test/data/command/echoarg2.bat is newly added. This file is only used on mswin32 or bccwin32. * URL for test to fetch a web page is changed to http://bioruby.open-bio.org/. 2008-10-07 Naohisa Goto * test/unit/bio/appl/paml/test_codeml.rb Bug fix: error on mswin32 in test_expected_options_set_in_config_file. (commit 16b8f321c653502ef801d801383a019bc45f67de) Bug fix: On mswin32, test_expected_options_set_in_config_file in Bio::TestCodemlConfigGeneration failed with the error "Errno::EACCESS: Permission denied" because it attempts to remove the temporary file that is previously opened but not explicitly closed, and, in Windows, the opend file is automatically locked and protected from being removed. * lib/bio/command.rb, test/functional/bio/test_command.rb, test/unit/bio/test_command.rb Bio::Command improved, and added functional tests. (commit bb618cdfbfb56c40249aff81b6ef84742465851c) * In Bio::Command.call_command_* and Bio::Command.query_command_*, when giving command-line array with size 1, the command might passed to shell. To prevent this, changed to call a new method Bio::Command#safe_command_line_array internally. * Added test/functional/bio/test_command.rb, contains unit tests to call external commands and to access external web sites. 2008-10-06 Naohisa Goto * lib/bio/db/biosql/sequence.rb Bio::Sequence::SQL::Sequence#seq is changed to return a Bio::Sequence::Generic object, because of avoiding to create nested Bio::Sequence object in #to_biosequence and because Bio::FastaFormat#seq also returns a Bio::Sequence::Generic object. (commit 8fb944c964ab5e1ca8905e6c4ce8e68479952935) 2008-10-03 Raoul Jean Pierre Bonnal * lib/bio/io/biosql/taxon.rb Added has_one :taxon_genbank_common_name, :class_name => "TaxonName", :conditions => "name_class = 'genbank common name'" (commit dc7a18b17cad8e603e0d3c20a5a80bc2a6f0899c) * lib/bio/db/biosql/sequence.rb Fix taxon identification by splitting scientific name and genbank common name. Fix organism/source's name composed by scientific name and genbank common name. (commit 5d6abcc0dcd05d7083622360489a5f4c361e0cc7) * lib/bio/io/sql.rb Working on tests about format import/export. (commit d28a343e4bab3cc0c04ac65dce677cfee0f81a46) * lib/bio/io/biosql/term.rb Fix foreign keys (commit c19c8766c7c0bec7561727abf2ef1bdf47d4e032) * lib/bio/io/biosql/seqfeature_qualifier_value.rb added composite primary keys :seqfeature_id, :term_id, :rank (commit cdd6a3bfc1ab748acb0c0d9161ebeb3dc7a76544) * lib/bio/io/biosql/ontology.rb class cleaned. (commit 81eb2c246d01790db72f0b08929bec5d862c959e) * lib/bio/io/biosql/biodatabase.rb class cleaned. (commit 4aede5c5fee92c2f8cdf151a3e038025b6c7fd74) * lib/bio/db/biosql/sequence.rb to_biosequence: removed not adapter comments. (commit 591fda23464c7b7031db09a8ca85deca320a5c87) Removed main garbage comments. (commit c46d7a2b4e188a0592d5b49def17b9e6fd598268) feature= Fix creation of Ontology and Term. (commit 95fe6d1a65e94da502529e597b137d12c3fe2fc2) * lib/bio/db/biosql/biosql_to_biosequence.rb :seq cleaned. (commit d6f719693286b74c1a0ea8a42c09a12f775b74dc) 2008-10-01 Naohisa Goto * test/functional/bio/io/test_ensembl.rb Bug fix: 3 failures occurred in test_ensembl.rb because of recent changes in Ensembl database (the gene ENSG00000206158 used as an example in this file was removed from the Ensembl database). To fix this, the example gene is changed to ENSG00000172146 (OR1A1, olfactory receptor 1A1). (commit e20c86d2cd7d4fd1723762e8a5acc3bc311a5c1b) * lib/bio/db/embl/sptr.rb, test/unit/bio/db/embl/test_sptr.rb Ruby 1.9 support: in Bio::SPTR, avoid using String#each and Array#to_s. (commit 5ff56653cd7cc2520c2c04acbc9ce2bf2a0fae9a) * In Bio::SPTR#gn_uniprot_parser, String#each (which is removed in Ruby 1.9) is changed to each_line. * In Bio::SPTR#cc and cc_* (private) methods, Array#to_s (whose behavior is changed in Ruby 1.9) is changed to join(''). * Unit test for Bio::STPR#dr method is added and changed. 2008-09-30 Naohisa Goto * lib/bio/db/embl/sptr.rb, test/unit/bio/db/embl/test_sptr.rb Bug fix in Bio::SPTR#dr: raised error when asked it to return a DR key that didn't exist in the uniprot entry. Thanks to Ben Woodcroft who reports the bug and send a patch. ([BioRuby] Bio::SPTR bug and fix). (commit 3147683c0b41e3f9418e26b481bf8b3e9ce63b8c) * lib/bio.rb Added autoload of Bio::NCBI::REST, and BIORUBY_VERSION incremented. (commit d6a37b0fcf1fb2f6e134dcdb8e29e79ec2a8fea7) * Added autoload of Bio::NCBI::REST. * Added comments for autoloading Bio::Sequence and Bio::Blast. * BIORUBY_VERSION is temporary incremented to 1.2.2, though the version number will not be used in upcoming release. Upcoming release will probably be using larger version number. 2008-09-25 Raoul Jean Pierre Bonnal * lib/bio/db/biosql/sequence.rb Updated with adapter. Problem saving big sequences. (commit 82d87fbaf70f9a46c40dded0b2db510a40964e62) * lib/bio/io/biosql/* (25 files) AR: explicit class and foreign_key reference. (commit 70327998186c2f943addb5d46b4bda8007ed5444) 2008-09-24 Naohisa Goto * lib/bio/db/gff.rb, test/unit/bio/db/test_gff.rb Bug fix and incompatible changes in GFF2 and GFF3 attributes. (commit 7b174bb842d9dcf9fd7f4b59e8f3b13ebc0ff3d4) * Bug fix: GFF2 attributes parser misunderstand semicolons. * Incompatible change in Bio::GFF::GFF2::Record#attributes and Bio::GFF::GFF3::Record#attributes. Now, instead of Hash, the method is changed to return a nested Array, containing [ tag, value ] pairs, because of supporting multiple tags in same name. If you want to get a Hash, use Record#attributes_to_hash method, though some tag-value pairs in same tag name may not be included. * Bio::GFF::Record#attribute still returns a Hash for compatibility. * New methods for getting, setting and manipulating attributes: Bio::GFF::GFF2::Record#attribute, #get_attribute, #get_attributes, #set_attribute, #replace_attributes, #add_attribute, #delete_attribute, #delete_attributes, and #sort_attributes_by_tag! (These are also added to Bio::GFF::GFF3::Record). It is recommended to use these methods instead of directly manipulating the array returned by Record#attributes. * Incompatible change in GFF2 attributes parser: the priority of '"' (double quote) is greater than ';' (semicolon). Special treatment of '\;' in GFF2 is now removed. Unlike GFF2, in Bio::GFF, the '\;' can still be used for backward compatibility. * Incompatible changes in attribute values in Bio::GFF::GFF2. Now, GFF2 attribute values are automatically unescaped. In addition, if a value of an attribute is consisted of two or more tokens delimited by spaces, an object of the new class Bio::GFF::GFF2::Record::Value is returned instead of String. The new class Bio::GFF::GFF2::Record::Value aims to store a parsed value of an attribute. If you really want to get unparsed string, Value#to_s can be used. * Incompatible changes about data type in GFF2 columns: Bio::GFF::GFF2::Record#start, #end, and #frame return Integer or nil, and #score returns Float or nil. * Incompatible changes about the metadata in GFF2. The "##gff-version" line is parsed and the version string is stored to Bio::GFF::GFF2#gff_version. Other metadata lines are stored in an array obtained with a new method Bio::GFF::GFF2#metadata. Each metadata is parsed to Bio::GFF::GFF2::MetaData object. * Bio::GFF::Record#comments is renamed to #comment, and #comments= is renamed to #comment=, because they only allow a single String (or nil) and the plural form "comments" may be confusable. The "comments" and "comments=" methods can still be used, but warning messages will be shown when using in GFF2::Record and GFF3::Record objects. * New methods Bio::GFF::GFF2#to_s, Bio::GFF::GFF2::Record#to_s. * New methods Bio::GFF::GFF2::Record#comment_only? (also added in Bio::GFF::GFF3::Record). * Unit tests are added and modified. 2008-09-18 Naohisa Goto * lib/bio/appl/blast/rpsblast.rb, lib/bio/appl/blast/format0.rb, lib/bio/io/flatfile/autodetection.rb, test/unit/bio/appl/blast/test_rpsblast.rb, test/data/rpsblast/misc.rpsblast Improved support for RPS-BLAST results from multi-fasta query sequences. (commit 11f1787cf93c046c06d4a33a554210d56866274e) * By using Bio::FlatFile (e.g. Bio::FlatFile.open), a rpsblast result generated from multiple query sequences is automatically split into multiple Bio::Blast::RPSBlast::Report objects corresponding to query sequences. For the purpose, new flatfile splitter class Bio::Blast::RPSBlast::RPSBlastSplitter is added. * File format autodetection for RPS-BLAST default report is added. * Bug fix: Bio::Blast::RPSBlast::Report#program returns incorrect value. To fix the bug, regular expression in Bio::Blast::Default::Report#format0_parse_header (private method) is changed. * Unit tests are added for Bio::Blast::RPSBlast. 2008-09-17 Naohisa Goto * lib/bio/io/flatfile/buffer.rb, test/unit/bio/io/flatfile/test_buffer.rb Bug fix in Bio::FlatFile::BufferedInputStream#gets. (commit e15012e2a94d05308d139cb010749a1829d5c57f) * Bug fix: Bio::FlatFile::BufferedInputStream#gets('') might not work correctly. Now, BufferedInputStream#gets is refactored. Note that when rs = '' (paragraph mode), the behavior may still differ from that of IO#gets(''). * Test methods are added to test_buffer.rb. 2008-09-16 Naohisa Goto * lib/bio/appl/blast/wublast.rb Bug fix: parse error or infinite loop for WU-BLAST reports. (commit 07d1554c945400f9202d7b856055743e11860752) * Bug fix in Bio::Blast::WU::Report: fixed parse errors (errors, infinite loop, and wrong results could be generated) when parsing WU-BLAST reports generated by recent version of WU-BLAST. * New methods Bio::Blast::WU::Report#query_record_number, #exit_code, and #fatal_errors. 2008-09-03 Naohisa Goto * lib/bio/appl/blat/report.rb Bug fix: headers were parsed incorrectly with warning. (commit 3ff940988b76bdff75679cdf0af4c836f76fa3a1) * lib/bio/io/flatfile/splitter.rb To suppress warning messages "warning: private attribute?", private attributes are explicitly specified by using "private". (commit 1440b766202a2b66ac7386b9b46928834a9c9873) 2008-09-01 Michael Barton * lib/bio/appl/paml/codeml/report.rb Added code to pull estimated tree from codeml report. (commit 64cc5ef6f2d949cc9193b08dfc3fde6b221950d7) 2008-09-01 Naohisa Goto * test/unit/bio/db/embl/test_embl_rel89.rb Changed test class name because of name conflict of Bio::TestEMBL. (commit 536cdf903a3c3908c117efd554d33117d91452f4) * test/unit/bio/util/restriction_enzyme/ To prevent possible test class name conflicts about restriction enzyme. (commit 0fe1e7d3ed02185632f4a34d8efe1f21f755b289) * Tests about restriction enzyme are moved under a new module Bio::TestRestrictionEnzyme to prevent possible name conflict. * Conflicted test class names are changed. 2008-08-31 Naohisa Goto * test/unit/bio/db/test_prosite.rb Fixed failed test due to the change of hash algorithm in Ruby 1.8.7. (Probably also affected in Ruby 1.9.0). (commit e86f8d757c45805389e154f06ccde5a3d9e8a557) 2008-08-29 Naohisa Goto * lib/bio/appl/blast.rb Bio::Blast.reports is changed to support new BLAST XML format. (commit 02cc0695b85f18e8254aefed78a912812fc896d6) * Bio::Blast.reports is changed to support new BLAST XML format. * Removed unused require. 2008-08-28 Naohisa Goto * lib/bio/appl/blast/report.rb, lib/bio/appl/blast/rexml.rb, lib/bio/appl/blast/xmlparser.rb, test/unit/bio/appl/blast/test_report.rb Support for BLAST XML format with multiple queries after blastall 2.2.14. (commit de7897b5690279aae14d9bded5e682458bc61f9c) * BLAST XML format with multiple query sequences generated by blastall 2.2.14 or later is now supported. * New methods Bio::Blast::Report#reports, stores Bio::Blast::Report objects corresponding to the multiple query sequences. * New methods Bio::Blast::Report::Iteration#query_id, query_def, and query_len, which are available only for the new format. * New class Bio::Blast::Report::BlastXmlSplitter, flatfile splitter for Bio::FlatFile system. * Bug fix: Bio::Blast::Report#expect returned incorrect value. * Fixed typo and added tests in test/unit/bio/appl/blast/test_report.rb. * Some RDoc documents are added/modified. 2008-08-19 Michael Barton * lib/bio/appl/paml/codeml/rates.rb Updated regex for rates parser to include columns that have a '*' character. * test/unit/bio/appl/paml/codeml/test_rates.rb Updated testing for new rates file with * characters. * test/data/paml/codeml/rates Added rates file that includes positions with * characters. 2008-08-18 Naohisa Goto * test/unit/bio/io/test_ddbjxml.rb Changed a failed test, and added a test for Bio::DDBJ::XML::RequestManager. 2008-08-16 Michael Barton * lib/bio/appl/paml/, test/unit/bio/appl/paml/, test/data/paml/ Wrapper and parser for PAML Codeml program is added (merged from git://github.com/michaelbarton/bioruby). After merging, some changes were made by Naohisa Goto. See git log for details. 2008-08-15 Naohisa Goto * lib/bio/appl/blast.rb, lib/bio/appl/blast/genomenet.rb "-m 0" (BLAST's default) format support is improved, and fixed wrong example in the RDoc of Bio::Blast#query. * Added support for "-m 0" (BLAST's default) format to the Bio::Blast factory. For the purpose, Bio::Blast#parse_result (private method) is changed. * Added support for "-m 0" (default) format to the GenomeNet BLAST factory (in Bio::Blast::Remote::GenomeNet). * Bug fix: wrong example in the RDoc in Bio::Blast#query is changed. * Bio::Blast#set_option (private method) is changed to determine format correctly. * lib/bio/appl/blast/ddbj.rb, lib/bio/io/ddbjxml.rb Changed always using REST version of RequestManager, and changed to raise error when busy. * In Bio::Blast::Remote::DDBJ, changed always to use REST version for RequestManager, because of suppressing warning messages. * In Bio::DDBJ::XML::RequestManager, module REST_RequestManager is changed to class REST. * In Bio::Blast::Remote::DDBJ#exec_ddbj, changed to raise RuntimeError when "The search and analysis service by WWW is very busy now" message is returned from the server (which implies invalid options or queries may be given). 2008-08-14 Naohisa Goto * lib/bio/appl/blast.rb, lib/bio/appl/blast/genomenet.rb, lib/bio/appl/blast/remote.rb Bio::Blast#exec_genomenet is moved to genomenet.rb, with bug fix. * Bio::Blast#exec_genomenet is moved to lib/bio/appl/blast/genomenet.rb. * Incompatible change: Bio::Blast#exec_* is changed to return String. Parsing the string is now processed in query method. * New module Bio::Blast::Remote, to store remote BLAST factories. * New module Bio::Blast::Remote::GenomeNet (and Genomenet for lazy including), to store exec_genomenet and other methods. In the future, it might be a standalone class (or something else). * New module methods Bio::Blast::Remote::GenomeNet.databases, nucleotide_databases, protein_databases, and database_description, to provide information of available databases. * Bug fix: remote BLAST on GenomeNet with long query sequences fails because of the change of the behavior of the remote site. * Incompatible change: Bio::Blast#options= can change program, db, format, matrix, and filter instance variables. * Bio::Blast#format= is added. * Bio::Blast.local changed to accept 4th argument: full path to the blastall command. * lib/bio/appl/blast/ddbj.rb, lib/bio/io/ddbjxml.rb, lib/bio/appl/blast/genomenet.rb, lib/bio/appl/blast/remote.rb, lib/bio/appl/blast.rb New module Bio::Blast::Remote::DDBJ, remote BLAST on DDBJ. * New module Bio::Blast::Remote::DDBJ, remote BLAST routine using DDBJ Web API for Biology (WABI). Now, Bio::Blast.new(program, db, options, 'ddbj') works. * New class Bio::DDBJ::XML::RequestManager. In this class, workaround for Ruby 1.8.5's bundled SOAP4R is made. * Some common codes are moved from Bio::Blast::Remote::GenomeNet::Information to Bio::Blast::Remote::Information. * lib/bio/io/ddbjxml.rb Changed to use DDBJ REST interface for a workaround instead of editing WSDL. (commit a64c8da5df5076c5f55b54b7f134d22a2e8d281c) 2008-08-09 Naohisa Goto * lib/bio/appl/blast.rb * Bug fix: Bio::Blast raises TypeError without "-m" option, reported by Natapol Pornputapong. * New class Bio::Blast::NCBIOptions to treat command-line options for blastall (and for other NCBI tools, e.g. formatdb). * Changed not to overwrite @filter, @matrix or @format unless '-F', '-M', or '-m' option is given, respectively. 2008-07-30 BioHackathon2008 participants from BioRuby project * Branch 'biohackathon2008' is merged. See doc/Changes-1.3.rd for incompatible changes. * lib/bio/sequence.rb, lib/bio/sequence/ * lib/bio/db/embl/ * lib/bio/db/genbank/ * lib/bio/db/fasta.rb, lib/bio/db/fasta/ A new method #to_biosequence is added to Bio::EMBL, Bio::GenBank and Bio::FastaFormat. Bio::FastaFormat#to_seq is now an alias of the #to_biosequence method. Bio::Sequence#output is added to output formatted text. Supported formats are: EMBL, GenBank, Fasta, or raw. Written by Naohisa Goto and Jan Aerts. * lib/bio/db/biosql/ * lib/bio/io/sql.rb, lib/bio/io/biosql/ New BioSQL implementation by Raoul Jean Pierre Bonnal. * lib/bio/reference.rb * lib/bio/feature.rb Bio::References and Bio::Features are obsoleted. For more information, see doc/Changes-1.3.rd. * (Many changes are not listed here. See git log for details.) 2008-07-30 Naohisa Goto * lib/bio/db/gff.rb, test/unit/bio/db/test_gff.rb Branch 'test-gff3' in git://github.com/ngoto/bioruby is merged. Fixed gff3 attribute bug, and many improvements are added. See doc/Changes-1.3.rd for incompatible changes. Thanks to Ben Woodcroft who reported the bug and contributed codes. 2008-07-29 Naohisa Goto * lib/bio/appl/blast/format0.rb Bug fix: fixed ScanError when bit score is in exponential notation such as 1.234e+5. Regular expressions for numerics including exponential notations are changed to get correct values. 2008-07-18 Naohisa Goto * lib/bio/appl/hmmer.rb Bug fix: ArgumentError caused by misspelling of a variable name. 2008-06-23 Jan Aerts * README.rdoc * README_DEV.rdoc * gemspec.rb Renamed README files to RDoc gets parsed on github website. (commit 34b7693f74de2358759e955d8ce36cfe15e64b54) Edited README.rdoc and README_DEV.rdoc to reflect move from CVS to git. (commit a61b16163d3ca74f3f7c8d8e8f03f5f8c68dee60) 2008-06-13 Naohisa Goto * lib/bio/reference.rb * test/unit/bio/test_reference.rb * New method Bio::Reference#pubmed_url added (renamed the url method in CVS revision 1.25). * Bio::Reference#endnote is changed not to overwrite url if url is already given by user. * Improvement of Bio::Reference#bibtex method. (Idea to improve bibtex method is originally made by Pjotr Prins.) * test/unit/bio/util/restriction_enzyme/double_stranded/test_aligned_strands.rb "require 'bio/sequence'" is needed to run the tests in this file. (commit 735e3563b723645afa65f0e4213a7c92152f68ec) 2008-05-19 Pjotr Prins * sample/fastasort.rb Simple example for sorting a flatfile (commit 677ac7c0707860f0478e75f72f23faa05b29dc6d) * doc/Tutorial.rd * sample/fastagrep.rb * sample/fastasort.rb Piping FASTA files (examples and doc) (commit ecd5e04477246dcf6cac84a6fbd21fb59efa3cf0) 2008-05-14 Naohisa Goto * lib/bio/appl/blast/format0.rb Bug fix: Possibly because of the output format changes of PHI-BLAST, Bio::Blast::Default::Report::Iteration#eff_space (and the shortcut method in the Report class) failed for PHI-BLAST (blastpgp) results, and Iteration#pattern and #pattern_positions (and the shortcut methods in the Report class) returned incorrect values. 2008-05-12 Naohisa Goto * lib/bio/appl/blast/xmlparser.rb, lib/bio/appl/blast/rexml.rb Bug fix: unit test sometime fails due to improper treatment of some Blast parameters and difference between rexml and xmlparser. To fix the bug, types of some parameters may be changed, e.g. Bio::Blast::Report#expect is changed to return Float or nil. * lib/bio/appl/blast/format0.rb Bug fix: Bio::Blast::Default::Report#eff_space returns wrong value ("Effective length of database"). It should return the value of "Effective search space". * test/unit/bio/appl/blast/test_xmlparser.rb Bug fix: tests in test/unit/bio/appl/blast/test_report.rb were ignored because of conflicts of the names of test classes. Class name in test_xmlparser.rb is changed to fix the bug. 2008-04-23 Naohisa Goto * lib/bio/db/embl/common.rb Bug fix: Bio::EMBL#references failed to parse journal name, volume, issue, pages, and year. In addition, it might failed to parse PubMed ID. (commit c715f51729b115309a78cf29fdce7fef992da875) 2008-04-18 Naohisa Goto * lib/bio/db/embl/sptr.rb Bug fix: Bio::SPTR#references raises NoMethodError since lib/bio/db/embl/sptr.rb CVS version 1.34. (commit 1b3e484e19c9c547cecfe53858a646b525685e0d) 2008-04-15 Naohisa Goto * lib/bio/appl/blast/rpsblast.rb Newly added RPS-Blast default (-m 0) output parser. 2008-04-01 Naohisa Goto * lib/bio/appl/blast/format0.rb Fixed a bug: Failed to parse database name in some cases. Thanks to Tomoaki Nishiyama who reported the bug and sent patches ([BioRuby-ja] BLAST format0 parser fails header parsing output of specific databases). * lib/bio/db/pdb/chain.rb, lib/bio/db/pdb/pdb.rb Fixed bugs: Bio::PDB::Chain#aaseq failed for nucleotide chain; Failed to parse chains for some entries (e.g. 1B2M). Thanks to Semin Lee who reported the bugs and sent patches ([BioRuby] Bio::PDB parsing problem (1B2M)). 2008-02-19 Toshiaki Katayama * lib/bio/io/ncbirest.rb * lib/bio/io/pubmed.rb NCBI E-Utilities (REST) functionality is separated to ncbirest.rb and pubmed.rb is changed to utilize the Bio::NCBI::REST class for esearch and efetch. You can now search and retrieve any database in any format that NCBI supports by E-Utilities through the Bio::NCBI::REST interface (currently, only esearch and efetch methods are implemented). (commit 0677bb69044cf6cfba453420bc1bbeb422f691c1) (commit f60e9f8153efacff0c97d12fb5c0830ebeb02edd) (commit 6e4670ab5e67ca596788f4c26a95a9687d36ce84) 2008-02-13 Pjotr Prins * doc/Tutorial.rd (commit d7ee01d86d6982f6b8aa19eba9adac95bebb08e8) 2008-02-12 Naohisa Goto * lib/bio/appl/blast/format0.rb Fixed bugs: Failed to parse query length for long query (>= 10000 letters) as comma is inserted for digit separator by blastall; Failed to parse e-value for some BLASTX results. Thanks to Shuji Shigenobu who reported the bugs and sent patches. 2008-02-11 Pjotr Prins * doc/Tutorial.rd Expanding on the Tutorial (bdc1d14f497909041fa761f659a74d98702a335a) Minor adjustments to Tutorial (72b5f4f0667a3a0c44ca31b0ab8228381e37919c) 2008-02-06 Pjotr Prins * sample/na2aa.rb Simple example to translate any NA to AA fasta (commit 433f974219cf04342935c1760464af24a5696c49) 2008-02-05 Pjotr Prins * sample/gb2fasta.rb Fixed broken require in gb2fasta example (commit b55daed0d6cff2e45155be01ef2a946925c972cf) 2008-02-05 Pjotr Prins * doc/Tutorial.rd Minor tweak to Tutorial.rd (commit 75416d780f99de24498a47fd22703d74f9a22329) 2008-02-03 Pjotr Prins * doc/Tutorial.rd More doctests in Tutorial.rd (commit 39d182bb67977956c0f22631ac596d65ccce74ff) 2008-02-02 Pjotr Prins * doc/Tutorial.rd Tabs in the Tutorial broke the rd parser - the Wiki will be fixed now. (commit 49078a5dea4f16f44add1882c60bf75df67ea19b) Updating tutorial. (commit f2f2005c3964f37e2d65afef0d52e63950d6bcb7) (commit d2b05581953712d0ac67ba0de1aa43853ed4e27f) 2008-02-02 Toshiaki Katayama * lib/bio/shell/rails/vendor/plugins/ The 'generators' directory is moved under the 'bioruby' subdirectory so that 'bioruby --rails' command can work with Rails 2.x series in addition to the Rails 1.2.x series. 2008-01-30 Mitsuteru Nakao * lib/bio/appl/blast.rb Fixed the bug at building the blastall command line options ('-m 0'). (commit 61443d177847825505103488573186dfc4e7568e) 2008-01-10 Naohisa Goto * lib/bio/appl/emboss.rb Added a method Bio::EMBOSS.run(program, arguments...) and Bio::EMBOSS.new is obsoleted. (commit fa04d97b073aefe05edc34a84498ba0a57ff98d2) 2008-01-10 Toshiaki Katayama * lib/bio/io/hinv.rb Bio::Hinv to access the H-invitational DB (http://h-invitational.jp/) web service in REST mode is added. 2007-12-30 Toshiaki Katayama * BioRuby 1.2.1 released This version is not Ruby 1.9 (released few days ago) compliant yet. 2007-12-28 Naohisa Goto * lib/bio/appl/blast/report/format0.rb Fixed parse error when compisition-based statistics were enabled. In addition, Bio::Blast::Default::Report#references and Bio::Blast::Default::Report::HSP#stat_method methods are added. In NCBI BLAST 2.2.17, default option of composition-based statistics for blastp or tblastn are changed to be enabled by default. * lib/bio/appl/blast/report/wublast.rb Changed to follow the above changes in format0.rb. * lib/bio/sequence/common.rb Ruby 1.9 compliant: in window_search method, a local variable name outside the iterator loop is changed not to be shadowed by the iterator variable. * lib/bio/db/pdb/pdb.rb Ruby 1.9 compliant: changed to avoid "RuntimeError: implicit argument passing of super from method defined by define_method() is not supported. Specify all arguments explicitly." error. Ruby 1.9 compliant: Bio::PDB::Record.get_record_class and Bio::PDB::Record.create_definition_hash (Note: they should only be internally used by PDB parser and users should not call them) are changed to follow the change of Module#constants which returns an array of Symbol instead of String. 2007-12-26 Naohisa Goto * lib/bio/alignment.rb Ruby 1.9 compliant: in EnumerableExtension#each_window and OriginalAlignment#index methods, local variable names outside the iterator loops are changed not to be shadowed by iterator variables. Warning messages for uninitialized instance variables of @gap_regexp, @gap_char, @missing_char, and @seqclass are suppressed. * test/unit/bio/test_alignment.rb Ruby 1.9 compliant: Ruby 1.9 compliant: The last comma in Array.[] is no longer allowed. (For example, class A < Array; end; A[ 1, 2, 3, ] raises syntax error in Ruby 1.9.) 2007-12-21 Toshiaki Katayama * lib/bio/db/medline.rb Added doi and pii methods to extract DOI and PII number from AID field 2007-12-18 Naohisa Goto * lib/bio/db/pdb/pdb.rb Bio::PDB#inspect is added to prevent memory exhaust problem. ([BioRuby] Parse big PDB use up all memory) * lib/bio/db/pdb/model.rb Bio::PDB::Model#inspect is added. * lib/bio/db/pdb/chain.rb Bio::PDB::Chain#inspect is added. * lib/bio/db/pdb/residue.rb Bio::PDB::Residue#inspect is added. This also affects Bio::PDB::Heterogen#inspect. 2007-12-15 Toshiaki Katayama * BioRuby 1.2.0 released * BioRuby shell is improved * file save functionality is fixed * deprecated require_gem is changed to gem to suppress warnings * deprecated end_form_tag is rewrited to suppress warnings * images for Rails shell are separated to the bioruby directory * spinner is shown during the evaluation * background image in the textarea is removed for the visibility * Bio::Blast is fixed to parse -m 8 formatted result correctly * Bio::PubMed is rewrited to enhance its functionality * e.g. 'rettype' => 'count' and 'retmode' => 'xml' are available * Bio::FlatFile is improved to accept recent MEDLINE format * Bio::KEGG::COMPOUND is enhanced to utilize REMARK field * Bio::KEGG::API is fixed to skip filter when the value is Fixnum * A number of minor bug fixes 2007-12-12 Naohisa Goto * lib/bio/db/newick.rb: Changed to be compliant with the Gary Olsen's Interpretation of the "Newick's 8:45" Tree Format Standard. * test/unit/bio/db/test_newick.rb More tests are added. * lib/bio/io/flatfile/indexer.rb Fixed a misspelling in Bio::FlatFileIndex.formatstring2class. 2007-11-28 Toshiaki Katayama * lib/bio/io/pubmed.rb: Fixed search, query methods (but use of esearch and efetch is strongly recommended). efetch method is enhanced to accept any PubMed search options as a hash (to retrieve in XML format etc.) Changed to wait 3 seconds among each access by default to be compliant with the NCBI terms (Make no more than one request every 3 seconds). All Bio::PubMed.* class methods are changed to instance methods (interface as the class methods are remained for the backward compatibility). 2007-07-19 Toshiaki Katayama * BioRuby 1.1.0 released 2007-07-17 Toshiaki Katayama * lib/bio/io/das.rb Fixed that mapmaster method to return correct value (mapmaseter's URL). This bug is reported and fixed by Dave Thorne. 2007-07-16 Naohisa Goto * lib/bio/mafft/report.rb For generic multi-fasta formatted sequence alignment, Bio::Alignment::MultiFastaFormat is newly added based on Bio::MAFFT::Report class, and Bio::MAFFT::Report is changed to inherit the new class. Tests are added in test/unit/bio/appl/mafft/test_report.rb. * lib/bio/alignment.rb New modules and classes Bio::Alignment::FactoryTemplate::* are added. They are used by following three new classes. * lib/bio/appl/muscle.rb * lib/bio/appl/probcons.rb * lib/bio/appl/tcoffee.rb New classess Bio::Muscle, Bio::Probcons, and Bio::Tcoffee are added for MUSCLE, ProbCons, and T-Coffee multiple alignment programs. Contributed by Jeffrey Blakeslee and colleagues. * lib/bio/appl/clustalw.rb * lib/bio/appl/mafft.rb Interfaces of Bio::ClustalW and Bio::MAFFT are added/modified to follow Bio::Alignment::FactoryTemplate (but not yet changed to use it). 2007-07-09 Toshiaki Katayama * BioRuby shell on Rails has new CSS theme Completely new design for BioRuby shell on Rails translated from the 'DibdoltRed' theme on www.openwebdesign.org which is created by Darjan Panic and Brian Green as a public domain work! 2007-07-09 Toshiaki Katayama * lib/bio/db/kegg/taxonomy.rb Newly added KEGG taxonomy file parser which treats taxonomic tree structure of the KEGG organisms. The file is available at ftp://ftp.genome.jp/pub/kegg/genes/taxonomy and is a replacement of the previously used keggtab file (obsoleted). * lib/bio/db/kegg/keggtab.rb Bio::KEGG::Keggtab is obsoleted as the file is no longer provided. Use Bio::KEGG::Taxonomy (lib/bio/db/kegg/taxonomy.rb) instead. * lib/bio/shell/plugin/soap.rb Newly added web service plugins for BioRuby shell which supports NCBI SOAP, EBI SOAP and DDBJ XML in addition to the KEGG API. 2007-07-09 Naohisa Goto * lib/bio/db/pdb/pdb.rb Pdb_LString.new is changed not to raise error for nil. Fixed a bug when below records does not exist in a PDB entry: REMARK (remark), JRNL (jrnl), HELIX (helix), TURN (turn), SHEET (sheet), SSBOND (ssbond), SEQRES (seqres), DBREF (dbref), KEYWDS (keywords), AUTHOR (authors), HEADER (entry_id, accession, classification), TITLE (definition), and REVDAT (version) records (methods). Incompatible change: Bio::PDB#record is changed to return an empty array for nonexistent record. (reported by Mikael Borg) 2007-07-09 Naohisa Goto * lib/bio/io/flatfile.rb Bio::FlatFile.foreach is added (which is suggested by IO.foreach). 2007-06-28 Toshiaki Katayama * lib/bio/shell/setup.rb, core.rb Changed not to use Dir.chdir by caching full path of the save directory at a start up time, so that user can freely change the work directory without affecting object/history saving functionality. Bio::Shell.cache[:savedir] stores the session saving directory (session means shell/session/{config,history,object} files), Bio::Shell.cache[:workdir] stores the working directory at a start up time (can be same directory with the :savedir) and both are converted and stored as full path allowing user to use Dir.chdir in the shell session). If --rails (-r) option is applied, 'bioruby' command will run in the Rails server mode, and the server will start in the :savedir. (A) IRB mode 1. run in the current directory and the session will be saved in the ~/.bioruby directory % bioruby 2. run in the current directory and the session will be saved in the foo/bar directory % bioruby foo/bar 3. run in the current directory and the session will be saved in the /tmp/foo/bar directory % bioruby /tmp/foo/bar (B) Rails mode 4. run in the ~/.bioruby directory and the session will also be saved in the ~/.bioruby directory % bioruby -r 5. run in the foo/bar directory and the session will also be saved in the foo/bar directory % bioruby -r foo/bar 6. run in the /tmp/foo/bar directory and the session will also be saved in the /tmp/foo/bar directory % bioruby -r /tmp/foo/bar (C) Script mode 7. run in the current directory using the session saved in the ~/.bioruby directory % bioruby ~/.bioruby/shell/script.rb 8. run in the current directory using the session saved in the foo/bar directory % bioruby foo/bar/shell/script.rb 9. run in the current directory using the session saved in the /tmp/foo/bar directory % bioruby /tmp/foo/bar/shell/script.rb 2007-06-21 Toshiaki Katayama * lib/bio/shell/setup.rb If no directory is specified to the bioruby command, use ~/.bioruby directory as the default save directory instead of the current directory, as suggested by Jun Sese. User can use 'bioruby' command without botherd by directories and files previously created by the 'bioruby' command in the current directory even when not needed. 2007-05-19 Toshiaki Katayama * lib/bio/appl/fasta.rb Bug fixed that exec_local fails to exec when @ktup is nil. This problem is reported and fixed by Fredrik Johansson. * lib/bio/db/gff.rb parser_attributes method in GFF3 class is modified to use '=' char as a separator instead of ' ' (which is used in GFF2 spec). 2007-04-06 Toshiaki Katayama * COPYING, COPYING.LIB are removed BioRuby is now distributed under the same terms as Ruby. On behalf of the BioRuby developer, I have asked all authors of the BioRuby code to change BioRuby's license from LGPL to Ruby's. And we have finished to change license of all modules in the BioRuby library. This means that Ruby user can freely use BioRuby library without being annoyed by licensing issues. * lib/bio/db/kegg/ko.rb is renamed to lib/bio/db/kegg/ortholog.rb KEGG KO database is renamed to KEGG ORTHOLOG database, thus we follow the change. Bio::KEGG::KO is renamed to Bio::KEGG::ORTHOLOG. Bio::KEGG::ORTHOLOG#genes, dblinks methods are rewrited to use lines_fetch method. * lib/bio/data/aa.rb to_re method is changed that the generated regexp to include ambiguous amino acid itself - replacement of amino acid X should include X itself. 2007-04-05 Trevor Wennblom * License headers are completely rewrited to Ruby's. 2007-04-02 Naohisa Goto * lib/bio/appl/mafft.rb Incompatible change: Bio::MAFFT#output is changed to return a string of multi-fasta formmatted text. To get an array of Bio::FastaFormat objects (as of 1.0 or before), please use report.data instead. 2007-03-29 Toshiaki Katayama * lib/bio/db/kegg/cell.rb Obsoleted as the KEGG CELL database is not publically available any more. 2007-03-28 Toshiaki Katayama * lib/bio/shell/rails/.../bioruby_controller.rb BioRuby shell on Rails access is changed to be limited only from the localhost for security reason (if local_request?). * lib/bio/command.rb The post_form method is modified to accept URL as a string and extended to accept params as array of string array of hash array of array or string in addition to hash (also can be ommited if not needed - defaults to nil). Keys and parameters of params are forced to use to_s for sure. * lib/bio/io/ensembl.rb Re-designed to allows user to use Bio::Ensembl.new without creating inherited sub class. Changed to use Bio::Command.post_form * lib/bio/das.rb Changed to use Bio::Command * lib/bio/shell/plugin/das.rb Newly added BioDAS client plugin for BioRuby shell. das.list_sequences das.dna das.features 2007-03-15 Toshiaki Katayama * lib/bio/shell/irb.rb Changed to load Rails environment when bioruby shell is invoked in the Rails project directory. This means that user can use 'bioruby' command as a better './script/console' which has persistent objects and shared history. 2007-03-08 Toshiaki Katayama * lib/bio/db/kegg/drug.rb Newly added KEGG DRUG database parser. * lib/bio/db/kegg/glycan.rb Bio::KEGG::GLYCAN#bindings method is removed. Bio::KEGG::GLYCAN#comment, remarks methods are added. Bio::KEGG::GLYCAN#orthologs and dblinks methods are changed to use lines_fetch method. * lib/bio/kegg/compound.rb Bio::KEGG::COMPOUND#glycans method is added Bio::KEGG::COMPOUND#names method is changed to return an array of stripped strings. * lib/bio/db/kegg/genes.rb Bio::KEGG::GENES#orthologs method is added. 2007-03-27 Naohisa Goto * lib/bio/command.rb Bio::Command.call_command_fork and query_command_fork methods are changed to handle all Ruby exceptions in the child process. * lib/bio/io/flatfile.rb UniProt format autodetection was changed to follow the change of UniProtKB release 9.0 of 31-Oct-2006. 2007-02-12 Naohisa Goto * lib/bio/io/flatfile.rb Exception class UnknownDataFormatError is added. It will be raised before reading data from IO when data format hasn't been specified due to failure of file format autodetection. 2007-02-12 Toshiaki Katayama * lib/bio/io/flatfile.rb Added support for KEGG EGENES. 2007-02-02 Trevor Wennblom * lib/bio/util/restriction_enzyme* Bio::RestrictionEnzyme stabilized. 2007-02-02 Trevor Wennblom * lib/bio/db/lasergene.rb Bio::Lasergene Interface for DNAStar Lasergene sequence file format 2007-02-02 Trevor Wennblom * lib/bio/db/soft.rb Bio::SOFT for reading SOFT formatted NCBI GEO files. 2007-01-16 Toshiaki Katayama * BioRuby shell on Rails new features and fixes New features: * Input [#] is linked to action for filling textarea from history * [methods] is separated into columns for readability Fixes and improvements: * HIDE_VARIABLES is moved from helper to controller to avoid warning "already initialized constant - HIDE_VARIABLES" repeated on reload. *
is renamed to "log_#" with number for future extention. *
are inserted in the
2007-01-15 Toshiaki Katayama * lib/bio/db.rb lines_fetch method (internally used various bio/db/*.rb modules) is rewrited to concatenate indented sub field. * lib/bio/db/kegg/compound.rb Bio::KEGG::COMPOUND#comment method which returns contents of the COMMENT line is added * lib/bio/db/kegg/enzyme.rb Bio::KEGG::ENZYME#entry_id is changed to return EC number only. Previous version of entry_id method is renamed to entry method which returns a "EC x.x.x.x Enzyme" style string. Bio::KEGG::ENZYME#obsolete? method is added which returns boolean value (true or false) according to the ENTRY line contains a string 'Obsolete' or not. Bio::KEGG::ENZYME#all_reac, iubmb_reactions, kegg_reactions methods are added to support newly added ALL_REAC field. Bio::KEGG::ENZYME#inhibitors and orthologs methods are added. Bio::KEGG::ENZYME#substrates, products, inhibitors, cofactors, pathways, orthologs, diseases, motifs methods are rewrited to utilizes new lines_fetch method in db.rb to process continuous sub field. * lib/bio/db/kegg/genome.rb Bio::KEGG::GENOME#scaffolds, gc, genomemap methods are obsoleted. Bio::KEGG::GENOME#distance, data_source, original_db methods are added. 2006-12-24 Toshiaki Katayama * bin/bioruby, lib/bio/shell/, lib/bio/shell/rails/ (lib/bio/shell/rails/vendor/plugins/generators/) Web functionallity of the BioRuby shell is completely rewrited to utilize generator of the Ruby on Rails. This means we don't need to have a copy of the rails installation in our code base any more. The shell now run in threads so user does not need to run 2 processes as before (drb and webrick). Most importantly, the shell is extended to have textarea to input any code and the evaluated result is returned with AJAX having various neat visual effects. * lib/bio.rb Extended to have Bio.command where command can be any BioRuby shell methods. ex. puts Bio.getseq("atgc" * 10).randomize.translate * lib/bio/shell/plugin/entry.rb, seq.rb seq, ent, obj commands are renamed to getseq, getent, getobj respectively. This getseq is also changed to return Bio::Sequence with @moltype = Bio::Sequence::NA object instead of Bio::Sequence::NA object. * lib/bio/db/kegg/kgml.rb Some method names are changed to avoid confusion: * entry_type is renamed to category () * map is renamed to pathway () 2006-12-19 Christian Zmasek * lib/bio/db/nexus.rb Bio::Nexus is newly developed during the Phyloinformatics hackathon. 2006-12-16 Toshiaki Katayama * lib/bio/io/sql.rb Updated to follow recent BioSQL schema contributed by Raoul Jean Pierre Bonnal. 2006-12-15 Mitsuteru Nakao * lib/bio/appl/iprscan/report.rb Bio::Iprscan::Report for InterProScan output is newly added. 2006-12-15 Naohisa Goto * lib/bio/appl/mafft/report.rb Bio::MAFFT::Report#initialize is changed to get a string of multi-fasta formmatted text instead of Array. 2006-12-14 Naohisa Goto * lib/bio/appl/phylip/alignment.rb Phylip format multiple sequence alignment parser class Bio::Phylip::PhylipFormat is newly added. * lib/bio/appl/phylip/distance_matrix.rb Bio::Phylip::DistanceMatrix, a parser for phylip distance matrix (generated by dnadist/protdist/restdist programs) is newly added. * lib/bio/appl/gcg/msf.rb, lib/bio/appl/gcg/seq.rb Bio::GCG::Msf in lib/bio/appl/gcg/msf.rb for GCG MSF multiple sequence alignment format parser, and Bio::GCG::Seq in lib/bio/appl/gcg/seq.rb for GCG sequence format parser are newly added. * lib/bio/alignment.rb Output of Phylip interleaved/non-interleaved format (.phy), Molphy alignment format (.mol), and GCG MSF format (.msf) are supported. Bio::Alignment::ClustalWFormatter is removed and methods in the module are renamed and moved to Bio::Alignment::Output. * lib/bio/appl/clustalw.rb, lib/bio/appl/mafft.rb, lib/bio/appl/sim4.rb Changed to use Bio::Command instead of Open3.popen3. 2006-12-13 Naohisa Goto * lib/bio/tree.rb, lib/bio/db/newick.rb Bio::PhylogeneticTree is renamed to Bio::Tree, and lib/bio/phylogenetictree.rb is renamed to lib/bio/tree.rb. NHX (New Hampshire eXtended) parser/writer support are added. 2006-12-13 Toshiaki Katayama * doc/Desing.rd.ja, doc/TODO.rd.ja, doc/BioRuby.rd.ja are obsoletd. 2006-10-05 Naohisa Goto * lib/bio/db/newick.rb Bio::Newick for Newick standard phylogenetic tree parser is newly added (contributed by Daniel Amelang). * lib/bio/phylogenetictree.rb Bio::PhylogeneticTree for phylogenetic tree data structure is newly added. 2006-09-19 Toshiaki Katayama * lib/bio/io/soapwsdl.rb * lib/bio/io/ebisoap.rb * lib/bio/io/ncbisoap.rb Newly added web service modules. * lib/bio/db/kegg/kgml.rb Accessor for the attribute is added. * lib/bio/shell/plugin/codon.rb Support for Pyrrolysine and Selenocysteine are added in the BioRuby shell. * lib/bio/sshell/plugin/seq.rb sixtrans, skip, step methods are added in the BioRuby shell. bioruby> seqtrans(seq) bioruby> seq.step(window_size) {|subseq| # do something on subseq } bioruby> seq.skip(window_sizep, step_size) {|subseq| # do something on subseq } 2006-07-26 Toshiaki Katayama * lib/bio/data/aa.rb Amino acids J (Xle: I/L), O (Pyl: pyrrolysine) and X (unknown) are added (now we have consumed 26 alphabets!). * lib/bio/io/fastacmd.rb Fixed that new version of fastacmd (in BLAST package) changed the option from '-D T' to '-D 1', contributed by the author of this module Shuji Shigenobu. * lib/bio/shell/plugin/psort.rb Newly added BioRuby shell plugin for PSORT * lib/bio/shell/plugin/blast.rb Newly added BioRuby shell plugin for BLAST search against KEGG GENES * lib/bio/db/prosite.rb PROSITE#re instance method is added to translate PATTERN of the entry to Regexp using PROSITE.pa2re class method. * lib/bio/db/kegg/genes.rb Bio::KEGG::GENES#keggclass method is renamed to pathway Bio::KEGG::GENES#splinks method is removed Bio::KEGG::GENES#motifs method is added these reflect changes made in the original KEGG GENES database. Bio::KEGG::GENES#locations method is added to return Bio::Locations Bio::KEGG::GENES#codon_usage is renamed cu_list (returns as Array) Bio::KEGG::GENES#cu is renamed to codon_usage (returns as Hash) Bio::KEGG::GENES#aalen, nalen methods are changed to return the number written in the entry (use seq.length to obtain calculated number as before). * lib/bio/db/kegg/kgml.rb Names of some accessors have been changed (including bug fixes) and instance variable @dom is obsoleted. Here's a list of incompatible attribute names with KGML tags by this change: :id -> :entry_id :type -> :entry_type names() :name -> :label :type -> :shape :entry1 -> :node1 :entry2 -> :node2 :type -> :rel edge() :name -> :entry_id :type -> :direction * lib/bio/io/das.rb Bug fixed that the value of segment.stop was overwritten by segment.orientation. 2006-07-14 Naohisa Goto * lib/bio/command.rb Bio::Command::Tools and Bio::Command::NetTools are combined and re-constructed into a new Bio::Command module. lib/bio/appl/blast.rb, lib/bio/appl/fasta.rb, lib/bio/appl/emboss.rb, lib/bio/appl/psort.rb, lib/bio/appl/hmmer.rb, lib/bio/db/fantom.rb, lib/bio/io/fastacmd.rb, lib/bio/io/fetch.rb, lib/bio/io/keggapi.rb, lib/bio/io/pubmed.rb, and lib/bio/io/registry.rb are changed to use the new Bio::Command instead of old Bio::Command or Net::HTTP. 2006-06-29 Naohisa Goto * lib/bio/appl/blat/report.rb Bio::BLAT::Report::Hit#milli_bad, #percent_identity, #protein?, #score, and #psl_version methods/attributes are newly added, and psl files without headers are supported (discussed in bioruby-ja ML). 2006-06-27 Naohisa Goto * lib/bio/sequence/na.rb Bio::Sequence::NA#gc_content, #at_content, #gc_skew, #at_skew are newly added. Bio::Sequence::NA#gc_percent are changed not to raise ZeroDivisionError and returns 0 when given sequence is empty. * lib/bio/db/pdb/pdb.rb Bio::PDB::ATOM#name, #resName, #iCode, #chaarge, #segID, and #element are changed to strip whitespaces when initializing. Bio::PDB::HETATM is also subject to the above changes. (suggested by Mikael Borg) 2006-06-12 Naohisa Goto * lib/bio/io/flatfile.rb Bug fix: Bio::FlatFile.open(klass, filename) didn't work. 2006-05-30 Toshiaki Katayama * lib/bio/io/soapwsdl.rb Generic list_methods method which extracts web service methods defined in the WSDL file is added. 2006-05-02 Mitsuteru Nakao * lib/bio/appl/pts1.rb Bio::PTS1 first commit. 2006-04-30 Naohisa Goto * lib/bio/appl/blast/format0.rb Bug fix: parse error for hits whose database sequence names contain 'Score', and subsequent hits after them would lost (reported by Tomoaki NISHIYAMA). 2006-04-14 Mitsuteru Nakao * lib/bio/io/ensembl.rb Bio::Ensembl first commit. It is a client class for Ensembl Genome Browser. 2006-03-22 Naohisa Goto * lib/bio/io/flatfile.rb Bug fix: Bio::FlatFile raises error for pipes, ARGF, etc. The bug also affects bio/appl/mafft.rb, bio/appl/clustalw.rb, bio/appl/blast.rb, bio/io/fastacmd.rb, and so on. Bio::FlatFile#entry_start_pos and #entry_ended_pos are changed to be enabled only when Bio::FlatFile#entry_pos_flag is true. 2006-02-27 Toshiaki Katayama * BioRuby 1.0.0 released 2006-02-10 Toshiaki Katayama * BioRuby shell is changed to use session/ directory under the current or specified directory to store the session information instead of ./.bioruby directory. 2006-02-05 Toshiaki Katayama * License to be changed to Ruby's (not yet completed). 2006-02-01 Trevor Wennblom * Bio::RestrictionEnzyme first commit for comments. * See lib/bio/util/restriction_enzyme.rb and test/unit/bio/util/restriction_enzyme 2006-01-28 Toshiaki Katayama * lib/bio/appl/emboss.rb EMBOSS USA format is now accepted via seqret/entret commands and also utilized in the BioRuby shell (lib/bio/shell.rb, plugin/entry.rb, plugin/emboss.rb). * lib/bio/io/brdb.rb is removed - unused Bio::BRDB (BioRuby DB) 2006-01-23 Toshiaki Katayama * lib/bio/sequence.rb Bio::Sequence is refactored to be a container class for any sequence annotations. Functionality is separared into several files under the lib/bio/sequence/ direcotry as common.rb, compat.rb, aa.rb, na.rb, format.rb 2006-01-20 Toshiaki Katayama * BioRuby 0.7.1 is released. 2006-01-12 Toshiaki Katayama * lib/bio/db.ra: fixed a bug of the tag_cut method introduced in 0.7.0 (reported by Alex Gutteridge) 2006-01-04 Naohisa Goto * Bio::PDB is refactored. See doc/Changes-0.7 for more details. 2005-12-19 Toshiaki Katayama * BioRuby 0.7.0 is released. See doc/Changes-0.7.rd file for major and incompatible changes. 2005-12-19 Naohisa Goto * lib/bio/db/pdb.rb, lib/bio/db/pdb/pdb.rb, lib/bio/db/pdb/*.rb * Many changes have been made. * Bio::PDB::FieldDef is removed and Bio::PDB::Record is completely changed. Now, Record is changed from hash to Struct, and method_missing is no longer used. * In the "MODEL" record, model_serial is changed to serial. * In any records, record_type is changed to record_name. * In most records contains real numbers, changed to return float values instead of strings. * Pdb_AChar, Pdb_Atom, Pdb_Character, Pdb_Continuation, Pdb_Date, Pdb_IDcode, Pdb_Integer, Pdb_LString, Pdb_List, Pdb_Real, Pdb_Residue_name, Pdb_SList, Pdb_Specification_list, Pdb_String, Pdb_StringRJ and Pdb_SymOP are moved under Bio::PDB::DataType. * There are more and more changes to be written... * lib/bio/db/pdb/atom.rb * Bio::PDB::Atom is removed. Instead, please use Bio::PDB::Record::ATOM and Bio::PDB::Record::HETATM. 2005-12-02 Naohisa Goto * lib/bio/alignment.rb * Old Bio::Alignment class is renamed to Bio::Alignment::OriginalAlignment. Now, new Bio::Alignment is a module. However, you don't mind so much because most of the class methods previously existed are defined to delegate to the new Bio::Alignment::OriginalAlignment class, for keeping backward compatibility. * New classes and modules are introduced. Please refer RDoc. * each_site and some methods changed to return Bio::Alignment::Site, which inherits Array (previously returned Array). * consensus_iupac now returns only standard bases 'a', 'c', 'g', 't', 'm', 'r', 'w', 's', 'y', 'k', 'v', 'h', 'd', 'b', 'n', or nil (in SiteMethods#consensus_iupac) or '?' (or missing_char, in EnumerableExtension#consensus_iupac). Note that consensus_iupac now does not return u and invalid letters not defined in IUPAC standard even if all bases are equal. * There are more and more changes to be written... 2005-11-05 Toshiaki Katayama * lib/bio/sequence.rb Bio::Sequence.auto(str) method is added which auto detect the molecular type of the string and then returns the Bio::Sequence::NA or Bio::Sequence::AA object. Bio::Sequence#blast and Bio::Sequence#fasta methods are removed. * lib/bio/shell/plugin/codon.rb Newly added plugin to treat codon table. ColoredCodonTable is ported from the codontable.rb 2005-11-01 Toshiaki Katayama * bin/bioruby, lib/bio/shell/ All methods are changed to private methods to avoid adding them in top level binding, which caused many unexpected behaviors, as adviced by Koichi Sasada. The MIDI plugin is now able to select musical scales. 2005-10-23 Toshiaki Katayama * lib/bio/util/color_scheme Newly contributed Bio::ColorScheme * lib/bio/db/kegg/kgml.rb Newly added KEGG KGML parser. 2005-10-05 Toshiaki Katayama * lib/bio/shell/plugin/midi.rb Sequcne to MIDI plugin is contributed by Natsuhiro Ichinose 2005-09-25 Toshiaki Katayama * README.DEV Newly added guideline document for the contributors. * README Updated and added instructions on RubyGems. 2005-09-23 Toshiaki Katayama * bin/bioruby, lib/bio/shell.rb, lib/bio/shell/core.rb, lib/bio/shell/session.rb, lib/bio/shell/plugin/seq.rb, lib/bio/shell/flatfile.rb, lib/bio/shell/obda.rb Newly added BioRuby shell, the command line user interface. Try 'bioruby' command in your terminal. * doc/Changes-0.7.rd Newly added document describing incompatible and important changes between the BioRuby 0.6 and 0.7 versions. * lib/bio/sequence.rb Bio::Sequence.guess, Bio::Sequence#guess methods are added which guess the sequence type by following fomula (default value for the threshold is 0.9). number of ATGC --------------------------------------- > threshold number of other chars - number of N 2005-09-10 Naohisa Goto * lib/bio.rb, lib/bio/appl/blast.rb, lib/bio/appl/blast/format0.rb, lib/bio/appl/blast/report.rb, lib/bio/appl/clustalw.rb, lib/bio/appl/fasta.rb, lib/bio/appl/fasta/format10.rb, lib/bio/appl/hmmer.rb, lib/bio/appl/hmmer/report.rb, lib/bio/appl/mafft.rb, lib/bio/appl/psort.rb, lib/bio/appl/psort/report.rb, lib/bio/appl/sim4.rb, lib/bio/db/genbank/ddbj.rb, lib/bio/io/flatfile/bdb.rb, lib/bio/io/flatfile/index.rb, lib/bio/io/flatfile/indexer.rb fixed autoload problem * lib/bio/appl/blast.rb, lib/bio/appl/blast/report.rb Bio::Blast.reports method was moved from lib/bio/appl/blast/report.rb to lib/bio/appl/blast.rb for autoload. 2005-08-31 Toshiaki Katayama * BioRuby 0.6.4 is released. * doc/KEGG_API.rd Newly added English version of the KEGG API manual. * lib/bio/aa.rb the 'one2name' method introduced in 0.6.3 is fixed and added 'one' and 'three' methods as aliases for 'to_1' and 'to_3' methods. 2005-08-31 Naohisa Goto * removed unused file lib/bio/appl/factory.rb (the functionality had been integrated into lib/bio/command.rb) * doc/Tutorial.rd Newly added an English translation of the Japanese tutorial. 2005-08-16 Naohisa Goto * lib/bio/command.rb Newly added Bio::Command::Tools module. Bio::Command::Tools is a collection of useful methods for execution of external commands. * lib/bio/appl/blast.rb, lib/bio/appl/fasta.rb, lib/bio/appl/hmmer.rb, lib/bio/io/fastacmd.rb For security reason, shell special characters are escaped. * lib/bio/appl/blast.rb, lib/bio/appl/fasta.rb, lib/bio/appl/hmmer.rb Options are stored with an array (@options). #options and #opions= methods are added. * lib/bio/appl/blast.rb, lib/bio/appl/fasta.rb Bio::Blast.remote and Bio::Fasta.remote is fixed to work with the recent change of the GenomeNet. 2005-08-11 Toshiaki Katayama * Sequence#to_re method to have compatibility with 0.6.2 for RNA * Fixed Bio::Fastacmd#fetch to work * Bio::Fastacmd and Bio::Bl2seq classes (introduced in 0.6.3) are renamed to Bio::Blast::Fastacmd, Bio::Blast::Bl2seq respectively. 2005-08-09 Toshiaki Katayama * BioRuby 0.6.3 is released. This version would be the final release to support Ruby 1.6 series (as long as no serious bug is found:). * lib/bio/util/sirna.rb: Newly added method for desing of siRNA, contributed by Itoshi Nikaido. The lib/bio/util/ directory if reserved for bioinfomatics algorithms implemented by pure Ruby. * lib/bio/io/fastacmd.rb: Newly added wrapper for NCBI fastacmd program, contributed by Shinji Shigenobu. * lib/bio/appl/hmmer/report.rb: Bug fixed by Masashi Fujita when the position of sequence rarely becomes '-' instead of digits. 2005-08-08 Mitsuteru Nakao * lib/bio/db/embl/sptr.rb: Added Bio::SPTR#protein_name and Bio::SPTR#synoyms methods. contributed by Luca Pireddu. Changed Bio::SPTR#gn, Bio::SPTR#gene_name and Bio::SPTR#gene_names methods. contributed by Luca Pireddu. 2005-08-08 Naohisa Goto * lib/bio/appl/bl2seq/report.rb: Newly added bl2seq (BLAST 2 sequences) output parser. * lib/bio/appl/blast/format0.rb: Added `self.class::` before F0dbstat.new for bl2seq/report.rb 2005-08-07 Toshiaki Katayama * lib/bio/sequence.rb, lib/bio/data/na.rb, lib/bio/data/aa.rb: Bio::NucleicAcid, Bio::AminoAcid classes are refactored to have Data module, and this module is included and extended to make all methods as both of instance methods and class methods. Bio::Sequence::NA and AA classes are rewrited (molecular_weight, to_re methods) to use Bio::NucleicAcid. Bio::Sequence::NA#molecular_weight method is fixed to subtract two hydrogens per each base. * lib/bio/db/medline.rb: publication_type (pt) method is added. 2005-08-07 Naohisa Goto * lib/bio/db/genbank/common.rb: Avoid NoMethodError (private method `chomp` called for nil:NilClass) when parsing features of ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/ Salmonella_typhimurium_LT2/AE006468.gbk 2005-07-11 Toshiaki Katayama * bin/br_pmfetch.rb: Added sort by page option (--sort page) * lib/io/higet.rb: Newly added Bio::HGC::HiGet class for HiGet SOAP service. 2005-06-28 Toshiaki Katayama * gemspec.rb: newly added RubyGems spec file. 2005-06-21 Naohisa Goto * lib/bio/appl/blast/report.rb: Newly added support for reading BLAST -m 7 result files through Bio::FlatFile by adding DELIMITER = "\n" to Bio::Blast::Report class. (Note that tab-delimited format (-m 8 and -m 9) are not yet supported by Bio::FlatFile) * lib/bio/io/flatfile.rb: Added file format autodetection of BLAST XML format. 2005-06-20 Naohisa Goto * lib/bio/appl/blast/format0.rb: added 'to_s' to store original entry 2005-04-04 Mitsuteru Nakao * lib/bio/db/go.rb: Newly added Bio::GO::External2go class for parsing external2go file. 2005-03-10 Naohisa Goto * lib/bio/io/flatfile.rb: Added file format autodetection of Spidey (Bio::Spidey::Report). 2005-03-10 Naohisa Goto * lib/bio/io/flatfile.rb: Added file format autodetection for Bio::KEGG::KO, Bio::KEGG::GLYCAN, Bio::KEGG::REACTION, Bio::Blat::Report and Bio::Sim4::Report. In order to distinguish Bio::KEGG::REACTION and Bio::KEGG::COMPOUND, autodetection regexp. of Bio::KEGG::COMPOUND were modified. 2005-02-09 KATAYAMA Toshiaki * lib/bio/db/kegg/genes.rb: Added cu method which returns codon usage in Hash for the convenience (codon_usage method returns in Array or Fixnum). 2004-12-13 KATAYAMA Toshiaki * BioRuby 0.6.2 released. * test/all_tests.rb: Unit tests for some classes are newly incorporated by Moses Hohman. You can try it by 'ruby install.rb test' * lib/bio/appl/spidey/report.rb: Newly added Spidey result parser class. * lib/bio/appl/blat/report.rb: Newly added BLAT result parser class. * fixes and improvements: * lib/bio/appl/blast/blast/format0.rb * minor fix for the Blast default format parser * lib/bio/alignment.rb * Alignment class * lib/bio/db/prosite.rb * bug reported by Rolv Seehuus is fixed * some methods are added 2004-10-25 KATAYAMA Toshiaki * lib/bio/db/{compound.rb,reaction.rb,glycan.rb}: Newly added parser for KEGG REACTION and KEGG GLYCAN database entries, fix for KEGG COMPOUND parser to support the new format. 2004-10-09 GOTO Naohisa * lib/bio/appl/sim4.rb Newly added sim4 wrapper class. This is test version, specs would be changed frequently. * lib/bio/appl/sim4/report.rb Newly added sim4 result parser class. 2004-08-25 KATAYAMA Toshiaki * BioRuby 0.6.1 released. * fix for the packaging miss of 0.6.0 * bin/*.rb are renamed to bin/br_*.rb (similar to the BioPerl's convention: bp_*.pl) 2004-08-24 KATAYAMA Toshiaki * BioRuby 0.6.0 released. * many fixes for Ruby 1.8 * updated for genome.ad.jp -> genome.jp transition * lib/bio/db/pdb.rb Newly added parser for PDB contributed by Alex Gutteridge (EBI). * lib/bio/data/codontable.rb Bio::CodonTable is rewrited to be a class instead of static variable. Now it can hold table definition, start codons, stop codons and added methods to detect start/stop codons and reverse translation. Also includes sample code to show codon table in ANSI colored ascii art, have fun. * lib/bio/sequence.rb Bio::Sequence::NA#translate is rewrited to accept an user defined codon table as a Bio::CodonTable object and any character can be specified for the unknown codon. This method runs about 30% faster than ever before. Bio::Sequence::AA#to_re method is added for the symmetry. Bio::Seq will be changed to hold generic rich sequence features. This means Bio::Seq is no longer an alias of Bio::Sequence but is a sequence object model, something like contents of a GenBank entry, common in BioPerl, BioJava etc. * lib/bio/io/soapwsdl.rb Newly added common interface for SOAP/WSDL in BioRuby used by keggapi.rb, ddbjxml.rb. * lib/bio/io/keggapi.rb Completely rewrited to support KEGG API v3.0 * lib/bio/io/esoap.rb Newly added client library for Entrez Utilities SOAP interface. * lib/bio/db/genbank, lib/bio/db/embl Refactored to use common.rb as a common module. * bin/pmfetch.rb Newly added command to search PubMed. * bin/biofetch.rb, flatfile.rb, biogetseq.rb Renamed to have .rb suffix. * sample/biofetch.rb Rewrited to use KEGG API instead of DBGET 2003-10-13 KATAYAMA Toshiaki * BioRuby 0.5.3 released. Fixed bugs in Blast XML parsers: xmlparser.rb is fixed not to omit the string after ' and " in sequence definitions, rexml.rb is fixed not to raise NoMethodError as "undefined method `each_element_with_text' for nil:NilClass". 2003-10-07 GOTO Naohisa * lib/bio/db/nbrf.rb Newly added NBRF/PIR flatfile sequence format class. 2003-09-30 GOTO Naohisa * lib/bio/db/pdb.rb Newly added PDB database flatfile format class. This is pre-alpha version, specs shall be changed frequently. 2003-08-22 KATAYAMA Toshiaki * BioRuby 0.5.2 released. Fixed to be loaded in Ruby 1.8.0 without warnings. * doc/KEGG_API.rd.ja Newly added a Japanese document on the KEGG API. 2003-08-12 GOTO Naohisa * lib/bio/appl/blast/format0.rb Newly added NCBI BLAST default (-m 0) output parser, which may be 5-10x faster than BioPerl's parser. This is alpha version, specs may be frequently changed. PHI-BLAST support is still incomplete. Ruby 1.8 recommended. In ruby 1.6, you need strscan. * lib/bio/appl/blast/wublast.rb Newly added WU-BLAST default output parser. This is alpha version, specs may be frequently changed. Support for parameters and statistics are still incomplete. Ruby 1.8 recommended. In ruby 1.6, you need strscan. 2003-07-25 GOTO Naohisa * lib/bio/alignment.rb: Newly added multiple sequence alignment class. * lib/bio/appl/alignfactory.rb: Newly added template class for multiple alignment software. * lib/bio/appl/clustalw.rb: Newly added CLUSTAL W wrapper. * lib/bio/appl/clustalw/report.rb: Newly added CLUSTAL W result data (*.aln file) parser. * lib/bio/appl/mafft.rb, lib/bio/appl/mafft/report.rb: Newly added MAFFT wrapper and report parser. (MAFFT is a multiple sequence alignment program based on FFT.) 2003-07-16 KATAYAMA Toshiaki * BioRuby version 0.5.1 released. * lib/bio/sequence.rb: some methods (using 'rna?' internally) were temporally unusable by the changes in 0.5.0 is fixed. * lib/bio/io/flatfile.rb: autodetection failure of the fasta entry without sequence is fixed. FlatFile.auto method is added. * lib/bio/db.rb: sugtag2array fixed. DB.open now accepts IO/ARGF. * lib/bio/db/embl.rb: references method is added. 2003-06-25 KATAYAMA Toshiaki * BioRuby version 0.5.0 released. * lib/bio/appl/blast/report.rb: Refactored from xmlparser.rb, rexml.rb, and format8.rb files. Formats are auto detected and parsers are automatically selected by checking whether XMLParser or REXML are installed. You can call simply as Bio::Blast::Report.new(blastoutput) or you can choose parsers/format explicitly by Bio::Blast::Report.xmlparser(format7blastoutput) Bio::Blast::Report.rexml(fomat7blastoutput) Bio::Blast::Report.tab(format8blastoutput) You can also use newly added class method reports for multiple xml blast output. Bio::Blast.reports(output) # output can be IO or String * lib/bio/appl/fasta/report.rb: Refactored from format10.rb, format6.rb and sample/* files. * lib/bio/appl/hmmer/report.rb: Bug fix and clean up. * bin/biogetseq: Newly added OBDA (BioRegistry) entry retrieval command. * etc/bioinformatics/seqdatabase.ini, lib/bio/io/registry.rb: Updated for new OBDA spec (Singapore version). Including config file versioning and changes in tag names, support for OBDA_SEARCH_PATH environmental variable. * lib/bio/io/keggapi.rb: Newly added KEGG API client library. * lib/bio/io/ddbjxml.rb: Newly added DDBJ XML client library (test needed). * lib/bio/io/das.rb: Newly added BioDAS client library. * lib/bio/db/gff.rb: Newly added GFF format parser/store library. * lib/bio/appl/tmhmm/report.rb: Newly added TMHMM report parser. * lib/bio/appl/targetp/report.rb: Newly added TargetP report parser. * lib/bio/appl/sosui/report.rb: Newly added SOSUI report parser. * lib/bio/appl/psort/report.rb: Newly added PSORT report parser. , * lib/bio/appl/genscan/report.rb: Newly added GENSCAN report parser. * lib/bio/db/prosite.rb: bug fix in ps2re method. * lib/bio/db/fantom.rb: Newly added FANTOM database parser (XML). * lib/bio/db/go.rb: Newly added GO parser. * lib/bio/feature.rb: 'each' method now accepts an argument to select specific feature. * lib/bio/db/fasta.rb: definition=, data= to change comment line. * lib/bio/db/genbank.rb: References and features now accept a block. 'acc_version' method is added to return the Accsession.Version string. 'accession' method now returns Accession part of the acc_version. 'version' method now returns Version part of the acc_version as an integer. * lib/bio/db/keggtab.rb: Rewrited for bug fix and clean up (note: some methods renamed!) * gsub('abrev', 'abbrev') in method names * db_path_by_keggorg is changed to db_path_by_abbrev * @bio_root is changed to @bioroot (ENV['BIOROOT'] overrides) * Bio::KEGG::DBname is changed to Bio::KEGG::Keggtab::DB * @database is added (a hash with its key db_abbreb) * database, name, path methods added with its argument db_abbreb * lib/bio/io/flatfile.rb: Enumerable mix-in is included. * lib/bio/io/flatfile/indexer.rb: Indexing of the FASTA format file is now supported with various type of definition line. * bin/dbget: Removed (moved under sample directory because the port of the dbget server is now closed). * install.rb: Changed to use setup 3.1.4 to avoid installing CVS/ directory. * sample/goslim.rb: Added a sample to generate histogram from GO slim. * sample/tdiary.rb: Added for tDiary users. have fun. :) 2003-01-28 KATAYAMA Toshiaki * BioRuby version 0.4.0 released. * bin/bioflat: * newly added for the BioFlat indexing * lib/bio/io/flatfile.rb, flatfile/{indexer.rb,index.rb,bdb.rb}: * flatfile indexing is supported by N. Goto * lib/bio/db/genbank.rb: changed to contain common methods only * lib/bio/db/genbank/genbank.rb * lib/bio/db/genbank/genpept.rb * lib/bio/db/genbank/refseq.rb * lib/bio/db/genbank/ddbj.rb * lib/bio/db/embl.rb: changed to contain common methods only * lib/bio/db/embl/embl.rb * lib/bio/db/embl/sptr.rb * lib/bio/db/embl/swissprot.rb * lib/bio/db/embl/trembl.rb * lib/bio/appl/emboss.rb: * added - just a generic wrapper, no specific parsers yet. * lib/bio/appl/hmmer.rb: * added - execution wrapper * lib/bio/appl/hmmer/report.rb: * added - parsers for hmmsearch, hmmpfam contributed by H. Suga * lib/bio/db.rb: open method added for easy use of flatfile. * lib/bio/db/kegg/genes.rb: * fixed bug in codon_usage method in the case of long sequence >999 * eclinks, splinks, pathways, gbposition, chromosome methods added * lib/bio/db/aaindex.rb: * adapted for the new AAindex2 format (release >= 6.0). * lib/bio/db/fasta.rb: entry_id is changed to return first word only * lib/bio/data/na.rb, aa.rb, keggorg.rb: * moved under class NucleicAcid, AminoAcid, KEGG (!) * in the test codes, DBGET is replaced by BioFetch 2002-08-30 Yoshinori K. Okuji * lib/bio/matrix.rb: Removed. * lib/bio/db/aaindex.rb: Require matrix instead of bio/matrix. * lib/bio/db/transfac.rb: Likewise. * lib/bio/pathway.rb: Likewise. (Pathway#dump_matrix): Don't use Matrix#dump. 2002-07-30 KATAYAMA Toshiaki * BioRuby version 0.3.9 released. * lib/bio/location.rb: * Locations#length (size) methods added (contributed by N. Goto) * Locations#relative method added (contributed by N. Goto) * Locations#absolute method is renamed from offset * Locations#offset, offset_aa methods removed * use absolute/relative(n, :aa) for _aa * Locations#[], range methods added * Location#range method added * lib/bio/db/embl.rb: * fix accession method. * lib/bio/db/genpept.rb: * temporally added - in the next release, we will make refactoring. * lib/bio/reference.rb: * in bibtex and bibitem format, "PMIDnum" is changed to "PMID:num". * lib/bio/io/pubmed.rb: * esearch, efetch methods are added. * lib/bio/db/aaindex.rb: * fix serious bug in the index method to support negative values. * lib/bio/db.rb: * fix fetch method to cut tag without fail. * lib/bio/extend.rb: * added first_line_only option for the prefix in fill method. * doc/Tutorial.rd.ja: * added docs on BibTeX etc. 2002-06-26 KATAYAMA Toshiaki * BioRuby version 0.3.8 released. * lib/bio/sequence.rb: * normalize! method added for clean up the object itself. * 'to_seq' method was renamed to 'seq' (!) * to_xxxx should be used when the class of the object changes. * lib/bio/appl/blast/xmparser.rb: * each_iteration, each_hit, each, hits, statistics, message methods are added in Report class. * statistics, message methods are added in Iteration class. * methods compatible with Fasta::Report::Hit are added in Hit class. * lib/bio/appl/blast/rexml.rb: * many APIs were changed to follow the xmlparser.rb's. (!) * lib/bio/appl/{blast.rb,fasta.rb]: * class method parser() is added for loading specified Report class. * etc/bioinformatics/seqdatabase.ini: added for OBDA (!) * sample setup for BioRegistry - Open Bio Sequence Database Access. * lib/bio/extend.rb: added (!) * This module adds some functionarity to the existing classes and not loaded by default. User should require specifically if needed. * lib/bio/util/*: removed and merged into lib/bio/extend.rb (!) * lib/bio/id.rb: removed (!) * lib/bio/db/{embl.rb,sptr.rb,transfac.rb}: added entry_id * lib/bio/data/keggorg.rb: updated * sample/genes2* sample/genome2*: updated * doc/Tutrial.rd.ja: updated 2002-06-19 KATAYAMA Toshiaki * BioRuby version 0.3.7 released. * lib/bio/sequence.rb: Sequence inherits String again (!) * lib/bio/db.rb, db/embl.rb, db/sptr.rb: moved EMBL specific methods 2002-06-18 KATAYAMA Toshiaki * lib/bio/feature.rb: Bio::Feature#[] method added * doc/Tutrial.rd.ja: changed to use Feature class 2002-05-28 KATAYAMA Toshiaki * lib/bio/appl/fasta.rb: parser separated, API renewal (!) * lib/bio/appl/fasta/format10.rb: moved from fasta.rb * lib/bio/appl/blast.rb: parser separated, API renewal (!) * lib/bio/appl/blast/format8.rb: newly added * lib/bio/appl/blast/rexml.rb: newly added * lib/bio/appl/blast/xmlparser.rb: moved from blast.rb 2002-05-16 KATAYAMA Toshiaki * lib/bio/sequence.rb: added alias 'Seq' for class Sequence * lib/bio/db/fasta.rb: entry method added 2002-05-15 KATAYAMA Toshiaki * lib/bio/io/dbget.rb: bug fixed for pfam (was wrongly skip # lines) * lib/bio/location.rb: offset method added, eased range check 2002-04-26 KATAYAMA Toshiaki * sample/biofetch.rb: new 'info=' option added 2002-04-22 KATAYAMA Toshiaki * lib/bio/appl/fasta.rb: follow changes made at fasta.genome.ad.jp * sample/gb2tab.rb: fixed to use authors.inspect for reference 2002-04-15 KATAYAMA Toshiaki * sample/gb2fasta.rb: changed to follow new genbank.rb spec. * sample/gt2fasta.rb: changed to follow new genbank.rb spec. * sample/gbtab2mysql.rb: added for loading tab delimited data. 2002/04/08 * version 0.3.6 released -k * fixed inconsistency among db.rb, genbank.rb, genome.rb -k * lib/bio/db/genbank.rb : serious bug fixed in locus method -k * lib/bio/feature.rb : method name 'type' has changed -k 2002/03/27 * sample/gb2tab.rb changed to follow new genbank.rb w/ new schema -k 2002/03/26 * sample/gb2tab.rb use ruby instead of perl in the example -o * sample/gb2fasta.rb updated -o 2002/03/11 * version 0.3.5 released -k 2002/03/04 * lib/bio/sequence.rb to_a, to_ary methods renamed to names, codes -k * sample/biofetch.rb added for BioFetch server -k * bin/biofetch added for BioFetch client -k * lib/bio/io/fetch.rb added for BioFetch library -k * lib/bio/io/sql.rb added for BioSQL -k * lib/bio/io/registry.rb added for BioDirectory/Registry -k * lib/bio/feature.rb added for BioSQL, GenBank, EMBL etc. -k * lib/bio/db/genbank.rb rewrited to use Features, References -k * lib/bio/db/{genes,genome}.rb clean up -k * lib/bio/reference.rb added class References -k 2002/02/05 * changed to use 'cgi' instead of 'cgi-lib' -n,k 2002/01/31 * version 0.3.4 released -k * lib/bio/db/genbank.rb -k * fix for multiple 'allele' in the feature key. (thanx Lixin) 2002/01/07 * lib/bio/appl/blast.rb -n * remote blast support etc. 2001/12/18 * lib/bio/id.rb -k * newly created * lib/bio/io/brdb.rb -k * newly created * lib/bio/db.rb -k * template methods are deleted * detailed docuement added * lib/bio/sequence.rb -k * to_fasta, complement, translate fixed (due to the changes made in 0.3.3) * Sequence::NA#initialize doesn't replace 'u' with 't' any longer * gc_percent, complement, translate, to_re, molecular_weight methods are adapted to this change * molecular_weight changed to calculate more precisely * test code added * lib/bio.rb -k * rescue for require 'bio/appl/blast' is deleted 2001/12/15 * lib/bio/sequence.rb -o * Sequence#to_str added 2001/12/15 * version 0.3.3 released -k bio-2.0.3/doc/bioruby.css0000644000175000017500000000761614141516614014562 0ustar nileshnilesh/* body */ body { color: #000000; background-color: #ffffff; margin-left: 5%; margin-right: 5%; font-family: verdana, arial; } em { font-weight: bold; font-style: normal; } /* link */ a:link { color: #00ca65; text-decoration: none; } a:visited { color: #49ba18; text-decoration: none; } a:hover, a:focus { color: #c2fe20; background-color: #ffffff; text-decoration: underline; } /* header */ h1 { color: #000000; background-color: transparent; border-color: #49ba18; /* border-width: 1px 0px 1px 0px;*/ border-width: 0px 0px 5px 0px; border-style: solid; padding-bottom: 3px; padding-right: 10%; text-align: left; } h2 { color: #000000; /* background-color: transparent;*/ background-color: #b0ffb0; border-color: #b0ffb0; border-style: none; border-width: 1px 0px 1px 0px; margin-bottom: 0px; margin-top: 1em; padding: 3px; } h3 { padding-top: 2px; padding-left: 5px; font-weight: bold; border-style: solid; border-color: #d8ffd8; /* border-width: 2px 0px 0px 10px;*/ border-width: 2px 0px 0px 2px; } h4 { padding-bottom: 1px; padding-left: 5px; font-weight: bold; font-weight: bold; border-style: solid; border-color: #b0ffb0; border-width: 0px 0px 2px 0px; /* margin: 1.5em 10px 0px*/ } h5 { padding-top: 2px; padding-left: 5px; font-weight: bold; border-style: dotted; border-color: #d8ffd8; border-width: 2px 0px 0px 2px; /* margin: 1.5em 20px 0px*/ } h6 { padding-left: 5px; font-weight: bold; border-style: solid; border-color: #b0ffb0; border-width: 0px 0px 0px 5px; /* margin: 1.5em 20px 0px*/ } /* paragraph */ p { margin-left: 0em! important; text-indent: 0em } /* line */ hr { border-style: solid; border-color: #00ca65; border-width: 1px; } /* list */ dl { } dt { padding-left: 5px; font-size: 110%; border-style: solid; border-color: #B0FFB0; border-width: 0px 0px 0px 5px; font-weight: bold; } dd { } li { /* list-style-type: disc; */ } ul,ol{ } /* table */ th { padding: 5px; text-align: left; } td { padding: 5px; } /* quote */ pre { color: #000000; background-color: #d8ffd8; margin-left: 20px; padding: 8px; border-style: solid; border-color: #b0ffb0; border-width: 1px 5px 1px 5px; white-space: pre; } blockquote { color: #008080; background-color: #ffffff; margin-left: 20px; padding: 8px; border-style: solid; border-color: #38c868; border-width: 3px 1px 3px 1px; } /* image */ img { border-width: 0px; } /* form */ input, select { color: #000000; background-color: transparent; padding: 2px; border-style: solid; border-color: #71e63e; border-width: 1px; } textarea { color: #000000; background-color: #ffffff; border-style: solid; border-color: #00ca65; border-width: 1px; font-family: monospace; } /* reviz */ table.reviz { width: 100%; } th.file { background-color: #008080; width: 15%; } th.rev, th.age, th.author { background-color: #38c868; width: 10%; } th.log { background-color: #c2fe20; } td.dir { background-color: #b0ffb0; background-image: url(reviz/dir.gif); background-position: center left; background-repeat: no-repeat; text-indent: 15px; } td.file { background-color: #b0ffb0; background-image: url(reviz/file.gif); background-position: center left; background-repeat: no-repeat; text-indent: 15px; } td.rev, td.age, td.author, td.log { background-color: #d8ffd8; } /* rwiki */ .navi { text-align: right; } .headerURL { text-align: right; font-size: 10pt; } address { color: gray; text-align: right; font-style: normal; font-variant: normal; font-weight: normal; } /* hiki */ ins.added { font-weight: bold; } del.deleted { text-decoration: line-through; } form.update textarea.keyword { width: 15em; height: 3em; } div.adminmenu { text-align: right; } div.caption { text-align: right; } div.footer { text-align: right; } /* top */ span.title { font-size: +2 } span.lead { text-decoration: underline } span.expire { color: #c04040 } span.ruby { font-weight: bold } bio-2.0.3/doc/ChangeLog-before-1.4.20000644000175000017500000060163714141516614016052 0ustar nileshnileshcommit 510dc41f32a72b86df1aa9021de140a64e6393de Author: Naohisa Goto Date: Fri Aug 26 15:03:14 2011 +0900 BioRuby 1.4.2 is released. ChangeLog | 954 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 954 insertions(+), 0 deletions(-) commit 3acc1e098839cacbe85b5c23367ab14e0c4fe3ea Author: Naohisa Goto Date: Fri Aug 26 15:01:49 2011 +0900 Preparation for bioruby-1.4.2 release. bioruby.gemspec | 2 +- lib/bio/version.rb | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) commit bf69125192fa01ae3495e094e7ef1b5e895954ad Author: Naohisa Goto Date: Fri Aug 26 14:42:02 2011 +0900 updated bioruby.gemspec bioruby.gemspec | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) commit e0a3ead917812199c6a0e495f3afa6a636bbf0c5 Author: Naohisa Goto Date: Fri Aug 26 14:39:54 2011 +0900 Added PLUGIN section to README.rdoc, and some changes made. README.rdoc | 20 +++++++++++++++++--- 1 files changed, 17 insertions(+), 3 deletions(-) commit 1da0a1ce6eddcef8f8fc811b0a2cf8d58f880642 Author: Naohisa Goto Date: Fri Aug 26 14:38:13 2011 +0900 updated doc/Tutorial.rd.html doc/Tutorial.rd.html | 41 ++++++++++++++++++++--------------------- 1 files changed, 20 insertions(+), 21 deletions(-) commit a8b90367b830e58b397536be3dada10cdde97aab Author: Naohisa Goto Date: Fri Aug 26 14:36:24 2011 +0900 Removed sections contain obsolete (404 Not Found) URL in Tutorial.rd. doc/Tutorial.rd | 12 ------------ 1 files changed, 0 insertions(+), 12 deletions(-) commit e17546cb90a012cd1f51674ceb4c8da5dd516bdf Author: Michael O'Keefe Date: Tue Aug 23 20:15:44 2011 -0400 Updated tutorial * Updated tutorial (original commit id: 7b9108657961cf2354278e04971c32059b3ed4e2 and some preceding commits) doc/Tutorial.rd | 55 ++++++++++++++++++++++++++++++++----------------------- 1 files changed, 32 insertions(+), 23 deletions(-) commit de8a394129c752a0b9a5975a73c5eb582d9681d3 Author: Naohisa Goto Date: Fri Aug 26 13:24:27 2011 +0900 fix typo and change order of lines RELEASE_NOTES.rdoc | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) commit 1cf2a11199655e4c9f5fc49c5a588b99c18ab7ca Author: Naohisa Goto Date: Fri Aug 26 13:16:11 2011 +0900 RELEASE_NOTE.rdoc modified to reflect recent changes RELEASE_NOTES.rdoc | 14 ++++++++++++++ 1 files changed, 14 insertions(+), 0 deletions(-) commit b44871a5866eeb2d379f080b39b09693c9e9e3cc Author: Naohisa Goto Date: Fri Aug 26 13:15:14 2011 +0900 In BioRuby Shell, getent() fails when EMBOSS seqret does not found. lib/bio/shell/plugin/entry.rb | 10 +++++++--- 1 files changed, 7 insertions(+), 3 deletions(-) commit 179e7506b008a220d5dd42ce1a6c7ce589c3fcda Author: Naohisa Goto Date: Fri Aug 26 12:26:52 2011 +0900 New methods Bio::NCBI::REST::EFetch.nucleotide and protein * New methods Bio::NCBI::REST::EFetch.nucleotide and protein, to get data from "nucleotide" and "protein" database respectively. Because NCBI changed not to accept "gb" format for the database "sequence", the two new methods are added for convenience. * In BioRuby Shell, efetch method uses the above new methods. lib/bio/io/ncbirest.rb | 122 +++++++++++++++++++++++++++++++++++++- lib/bio/shell/plugin/ncbirest.rb | 6 ++- 2 files changed, 126 insertions(+), 2 deletions(-) commit 99b31379bb41c7cad34c1e7dc00f802da37de1cd Author: Naohisa Goto Date: Thu Aug 25 19:03:43 2011 +0900 New method Bio::Fastq#to_s * New method Bio::Fastq#to_s. Thanks to Tomoaki NISHIYAMA who wrote a patch. (https://github.com/bioruby/bioruby/pull/37) lib/bio/db/fastq.rb | 14 ++++++++++++++ test/unit/bio/db/test_fastq.rb | 14 ++++++++++++++ 2 files changed, 28 insertions(+), 0 deletions(-) commit 8ab772b37850c3874b55cf37d091046394cda5bd Author: Naohisa Goto Date: Thu Aug 25 15:23:00 2011 +0900 RELEASE_NOTES.rdoc changed to reflect recent changes. RELEASE_NOTES.rdoc | 16 ++++++++++++++++ 1 files changed, 16 insertions(+), 0 deletions(-) commit 8db6abdcc81db6a58bdd99e7f8d410b1a74496b1 Author: Naohisa Goto Date: Thu Aug 25 14:28:42 2011 +0900 A test connecting to DDBJ BLAST web service is enabled. test/functional/bio/appl/test_blast.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 121ad93c0c1f018ee389972ac5e5e8cc395f00d1 Author: Naohisa Goto Date: Thu Aug 25 14:15:23 2011 +0900 Bio::DDBJ::REST::*. new classes for DDBJ REST web services. * Bio::DDBJ::REST::*: new classes for DDBJ REST web services (WABI). Currently not all services are covered. (lib/bio/io/ddbjrest.rb) * autoload of the above (lib/bio/db/genbank/ddbj.rb, lib/bio.rb) * Tests for the above (but still incomplete) (test/functional/bio/io/test_ddbjrest.rb) * Remote BLAST using DDBJ server now uses REST interface instead of SOAP, for Ruby 1.9.x support. (lib/bio/appl/blast/ddbj.rb) lib/bio.rb | 1 + lib/bio/appl/blast/ddbj.rb | 33 +--- lib/bio/db/genbank/ddbj.rb | 3 +- lib/bio/io/ddbjrest.rb | 344 +++++++++++++++++++++++++++++++ test/functional/bio/io/test_ddbjrest.rb | 47 +++++ 5 files changed, 399 insertions(+), 29 deletions(-) create mode 100644 lib/bio/io/ddbjrest.rb create mode 100644 test/functional/bio/io/test_ddbjrest.rb commit 7e8ba7c1388204daa5245d2128d01f6f40298185 Author: Naohisa Goto Date: Thu Aug 18 00:08:51 2011 +0900 In Fastq formatter, default width value changed to nil * In Bio::Sequence#output(:fastq) (Fastq output formatter), default width value is changed from 70 to nil, which means "without wrapping". close [Feature #3191] (https://redmine.open-bio.org/issues/3191) RELEASE_NOTES.rdoc | 8 ++++++-- lib/bio/db/fastq/format_fastq.rb | 4 ++-- test/unit/bio/db/test_fastq.rb | 12 ++++++++++++ 3 files changed, 20 insertions(+), 4 deletions(-) commit 0fb65211519febff18413c589fe7af753ee2e61d Author: Naohisa Goto Date: Wed Aug 17 22:02:03 2011 +0900 Bug fix: Bio::SPTR follow-up of UniProtKB format changes * Bug fix: Bio::SPTR follow-up of UniProtKB format changes. * Tests are added about the fix. * Bug fix: Bio::SPTR#cc_web_resource should be private. * Incompatible changes in Bio::SPTR#cc("WEB RESOURCE") is documented in RELEASE_NOTES.rdoc. * KNOWN_ISSUES.rdoc: description about incompleteness of the fix. * Thanks to Nicholas Letourneau who reports the issue. (https://github.com/bioruby/bioruby/pull/36) KNOWN_ISSUES.rdoc | 5 + RELEASE_NOTES.rdoc | 20 ++- lib/bio/db/embl/sptr.rb | 214 +++++++++++++++++++++--- test/unit/bio/db/embl/test_sptr.rb | 12 +- test/unit/bio/db/embl/test_uniprot_new_part.rb | 208 +++++++++++++++++++++++ 5 files changed, 430 insertions(+), 29 deletions(-) create mode 100644 test/unit/bio/db/embl/test_uniprot_new_part.rb commit 0d066ab6b8fc19f1cf6e66e07c2065775739cccd Author: Naohisa Goto Date: Sat Aug 13 00:58:51 2011 +0900 preparation for release: alpha test version 1.4.2-alpha1 bioruby.gemspec | 23 +++++++++++++++++++++-- lib/bio/version.rb | 4 ++-- 2 files changed, 23 insertions(+), 4 deletions(-) commit 55ece17775f5d24cf62f93d54ded5dc6eed53584 Author: Naohisa Goto Date: Fri Aug 12 21:57:25 2011 +0900 Test bug fix: use sort command in PATH * Test bug fix: FuncTestCommandQuery: use sort command in PATH. Thanks to Tomoaki Nishiyama who reports the issue. (https://github.com/bioruby/bioruby/pull/13) test/functional/bio/test_command.rb | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) commit 2f464aae016387cd50031f9d9664e78e220e2d01 Author: Naohisa Goto Date: Fri Aug 12 20:37:18 2011 +0900 RELEASE_NOTES.rdoc is updated following recent changes. RELEASE_NOTES.rdoc | 21 ++++++++++++++------- 1 files changed, 14 insertions(+), 7 deletions(-) commit d1a193684afdfd4c632ef75a978d4f3680d1bdf3 Author: Naohisa Goto Date: Fri Aug 12 20:30:53 2011 +0900 README.rdoc: changed required Ruby version etc. * README.rdoc: now Ruby 1.8.6 or later is required. * README.rdoc: removed old obsolete descriptions. * README.rdoc: modified about RubyGems. * KNOWN_ISSUES.rdoc: moved descriptions about older RubyGems and CVS from README.rdoc. * KNOWN_ISSUES.rdoc: modified about end-of-life Ruby versions. KNOWN_ISSUES.rdoc | 40 ++++++++++++++++++++++++++++++---- README.rdoc | 61 ++++++++++++++++++++-------------------------------- 2 files changed, 59 insertions(+), 42 deletions(-) commit b5cbdc6ab7e81aae4db9aeb708fac57ffbce5636 Author: Naohisa Goto Date: Sat Jul 16 00:12:17 2011 +0900 Added topics for the release notes RELEASE_NOTES.rdoc | 39 ++++++++++++++++++++++++++++++++++++++- 1 files changed, 38 insertions(+), 1 deletions(-) commit f062b5f37a6d8ad35b5b10c942fd61e1a4d37e08 Author: Naohisa Goto Date: Sat Jul 2 01:05:42 2011 +0900 Speedup of Bio::RestrictionEnzyme::Analysis.cut. * Speedup of Bio::RestrictionEnzyme::Analysis.cut. The new code is 50 to 80 fold faster than the previous code when cutting 1Mbp sequence running on Ruby 1.9.2p180. * Thanks to Tomoaki NISHIYAMA who wrote the first version of the patch. Thanks to ray1729 (on GitHub) who reports the issue. (https://github.com/bioruby/bioruby/issues/10) lib/bio/util/restriction_enzyme.rb | 3 + .../restriction_enzyme/range/sequence_range.rb | 14 ++-- .../range/sequence_range/calculated_cuts.rb | 75 +++++++++++++++----- .../range/sequence_range/fragment.rb | 4 +- 4 files changed, 69 insertions(+), 27 deletions(-) commit 735379421d9d6b7ceb06b91dcfcca6d5ff841236 Author: Naohisa Goto Date: Sat Jul 2 00:59:58 2011 +0900 New classes (for internal use only) for restriction enzyme classes * New classes Bio::RestrictionEnzyme::SortedNumArray and Bio::RestrictionEnzyme::DenseIntArray. Both of them are for Bio::RestrictionEnzyme internal use only. They will be used for the speedup of restriction enzyme analysis. lib/bio/util/restriction_enzyme/dense_int_array.rb | 195 ++++++++++++++ .../util/restriction_enzyme/sorted_num_array.rb | 219 +++++++++++++++ .../restriction_enzyme/test_dense_int_array.rb | 201 ++++++++++++++ .../restriction_enzyme/test_sorted_num_array.rb | 281 ++++++++++++++++++++ 4 files changed, 896 insertions(+), 0 deletions(-) create mode 100644 lib/bio/util/restriction_enzyme/dense_int_array.rb create mode 100644 lib/bio/util/restriction_enzyme/sorted_num_array.rb create mode 100644 test/unit/bio/util/restriction_enzyme/test_dense_int_array.rb create mode 100644 test/unit/bio/util/restriction_enzyme/test_sorted_num_array.rb commit 6cbb0c230d1a0bf3125c3b0fdb9ec3333d9564f8 Author: Naohisa Goto Date: Thu Jun 30 20:47:26 2011 +0900 A sample benchmark script for Bio::RestrictionEnzyme::Analysis.cut sample/test_restriction_enzyme_long.rb | 4403 ++++++++++++++++++++++++++++++++ 1 files changed, 4403 insertions(+), 0 deletions(-) create mode 100644 sample/test_restriction_enzyme_long.rb commit 413442bd7424f837c73d8170ced8e01a01f87a59 Author: Naohisa Goto Date: Tue May 24 23:26:41 2011 +0900 Added a test for Bio::FastaFormat#entry_overrun etc. * Added a test for Bio::FastaFormat#entry_overrun. * Removed a void test class. test/unit/bio/db/test_fasta.rb | 24 ++++++++++++------------ 1 files changed, 12 insertions(+), 12 deletions(-) commit b74020ff9b5c9fc8531c584898a329987008870e Author: Naohisa Goto Date: Tue May 24 22:21:17 2011 +0900 Bug fix: Bio::FastaFormat#query passes nil to the given factory * Bug fix: Bio::FastaFormat#query passes nil to the given factory object. Thanks to Philipp Comans who reports the bug. (https://github.com/bioruby/bioruby/issues/35) * Test method for Bio::FastaFormat#query is added. lib/bio/db/fasta.rb | 2 +- test/unit/bio/db/test_fasta.rb | 22 ++++++++++++++++++++++ 2 files changed, 23 insertions(+), 1 deletions(-) commit 80e49373e0e9013442680ba33499be80c58471db Author: Naohisa Goto Date: Tue May 17 22:33:56 2011 +0900 Changed database name in the example. * Changed database name in the example. Thanks to Philipp Comans who reports the issue. lib/bio/appl/blast.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 7427d1f1355a6c190c6bf8522978e462dea64134 Author: Naohisa Goto Date: Thu May 12 22:15:37 2011 +0900 Bug fix: changed GenomeNet remote BLAST URL. * Bug fix: changed GenomeNet remote BLAST host name and path. Thanks to Philipp Comans who reports the bug. ( https://github.com/bioruby/bioruby/issues/34 ) lib/bio/appl/blast/genomenet.rb | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) commit c1c231b0a17c06ec042534245ed903e0256a59ed Author: Naohisa Goto Date: Tue May 10 20:57:17 2011 +0900 updated doc/Tutorial.rd.html doc/Tutorial.rd.html | 34 ++++++++++++++++++---------------- 1 files changed, 18 insertions(+), 16 deletions(-) commit 5261c926cae8dac890d7d0380e84f2eb88912417 Author: Pjotr Prins Date: Thu May 5 12:07:54 2011 +0200 Tutorial: Fixed URL and the intro doc/Tutorial.rd | 34 ++++++++++++++++++++-------------- 1 files changed, 20 insertions(+), 14 deletions(-) commit 71de394053376f4759d705c52e6f16eca3da9d62 Author: Pjotr Prins Date: Wed Mar 9 10:26:53 2011 +0100 Tutorial: Added a commnet for rubydoctest, changed Ruby version * Added a comment for rubydoctest * Changed example Ruby version representation * This is part of commit ba5b9c2d29223860252451110a99d4ff0250395d and modified to merge with the current HEAD. doc/Tutorial.rd | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit 1d27153065b8e8595a470b2201961b0a39bf8ca1 Author: Naohisa Goto Date: Thu Apr 28 23:58:57 2011 +0900 updated doc/Tutorial.rd.html doc/Tutorial.rd.html | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit ae9beff3bc43db3724a292b10a214583d9fbc111 Author: Michael O'Keefe Date: Wed Apr 6 11:46:54 2011 -0400 Updated through the section on Homology searching with BLAST doc/Tutorial.rd | 64 ++++++++++++++++++++-------------------------- doc/Tutorial.rd.html | 68 ++++++++++++++++++++++---------------------------- 2 files changed, 58 insertions(+), 74 deletions(-) commit 971da799b16628a927abd7dd6c218994506f8fd8 Author: Michael O'Keefe Date: Thu Mar 24 18:29:20 2011 -0400 Updated the html file generated from the RDoc doc/Tutorial.rd.html | 224 ++++++++++++++++++++++++++------------------------ 1 files changed, 117 insertions(+), 107 deletions(-) commit c6afd7eeed121926b56d300cb4170b5024f29eb0 Author: Michael O'Keefe Date: Thu Mar 24 18:08:17 2011 -0400 Finished updating the tutorial doc/Tutorial.rd | 16 ++++++++-------- 1 files changed, 8 insertions(+), 8 deletions(-) commit 7349ac550ec03e2c5266496297becbdb3f4e0edd Author: Michael O'Keefe Date: Thu Mar 24 15:56:37 2011 -0400 Edited tutorial up through the extra stuff section doc/Tutorial.rd | 28 ++++++++++++++-------------- 1 files changed, 14 insertions(+), 14 deletions(-) commit 32ba3b15ad00d02b12ef2b44636505e23caaf620 Author: Michael O'Keefe Date: Thu Mar 24 15:26:48 2011 -0400 Updated tutorial up through BioSQL doc/Tutorial.rd | 172 ++++++++++++++++++------------------------------------- 1 files changed, 57 insertions(+), 115 deletions(-) commit 249580edb49a13545708fdcb559104217e37f162 Author: Michael O'Keefe Date: Thu Mar 24 12:09:03 2011 -0400 Updated tutorial through the section on alignments doc/Tutorial.rd | 40 +++++++++++++++++++++++++++++++++------- 1 files changed, 33 insertions(+), 7 deletions(-) commit 54f7b54044bb245ec5953dc7426f1c434b41f24f Author: Michael O'Keefe Date: Thu Mar 24 11:51:23 2011 -0400 Updated the tutorial (mostly grammar fixes) up until GenBank doc/Tutorial.rd | 31 +++++++++++++++---------------- 1 files changed, 15 insertions(+), 16 deletions(-) commit f046a52081a8af0e9afbf65fd2673c29689be769 Author: Naohisa Goto Date: Tue Feb 8 12:58:50 2011 +0900 Added a test protein sequence data for BLAST test. test/data/fasta/EFTU_BACSU.fasta | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) create mode 100644 test/data/fasta/EFTU_BACSU.fasta commit f61a5f4bdde16fa051f43cbe3efef4570b415a6a Author: Anthony Underwood Date: Mon Jan 31 12:44:55 2011 +0000 Bug fix: GenBank sequence output should format date as 27-JAN-2011 * Bug fix: GenBank sequence output should format date as 27-JAN-2011 rather than 2011-01-27 as specified by offical GenBank specs. lib/bio/db/genbank/format_genbank.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit be144e75a059058ab000a55d7bf535597e7e2617 Author: Naohisa Goto Date: Thu Feb 3 20:28:03 2011 +0900 Added tests for remote BLAST execution via GenomeNet and DDBJ. * Added tests for remote BLAST execution via GenomeNet and DDBJ. Currently, a test for DDBJ BLAST web API is disabled because it takes relatively long time. * Tests to retrieve remote BLAST database information for GenomeNet and DDBJ servers are also added. test/functional/bio/appl/blast/test_remote.rb | 93 +++++++++++++++++++++++++ test/functional/bio/appl/test_blast.rb | 61 ++++++++++++++++ 2 files changed, 154 insertions(+), 0 deletions(-) create mode 100644 test/functional/bio/appl/blast/test_remote.rb create mode 100644 test/functional/bio/appl/test_blast.rb commit 67314f1f1a248954c030f7ffe048faf862bf07d2 Author: Naohisa Goto Date: Thu Feb 3 20:19:11 2011 +0900 Updated _parse_databases following the changes in the DDBJ server * Updated _parse_databases following the changes in the DDBJ server. Changed to use (NT) or (AA) in the tail of each description. Thanks to DDBJ to improve their web service API. lib/bio/appl/blast/ddbj.rb | 29 ++++++++++++++++++++--------- 1 files changed, 20 insertions(+), 9 deletions(-) commit d6aad2f4cc53c1227c86b6b573644cca15c9ed82 Author: Naohisa Goto Date: Wed Feb 2 00:02:32 2011 +0900 Release notes for the next release is added. RELEASE_NOTES.rdoc | 38 ++++++++++++++++++++++++++++++++++++++ 1 files changed, 38 insertions(+), 0 deletions(-) create mode 100644 RELEASE_NOTES.rdoc commit b4a30cc8ac9472b9e1c2a298afc624d0229c64c9 Author: Naohisa Goto Date: Tue Feb 1 23:33:18 2011 +0900 Bug fix: Execution failure due to the changes of DDBJ BLAST server lib/bio/appl/blast/ddbj.rb | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) commit d30cb5975febd8b526088612c4fb9689a6cc46ba Author: Naohisa Goto Date: Tue Feb 1 23:01:34 2011 +0900 Support for database "mine-aa" and "mine-nt" with KEGG organism codes * Added support for database "mine-aa" and "mine-nt" combined with KEGG organism codes. When database name starts with mine-aa or mine-nt, space-separated list of KEGG organism codes can be given. For example, "mine-aa eco bsu hsa". lib/bio/appl/blast/genomenet.rb | 11 +++++++++++ 1 files changed, 11 insertions(+), 0 deletions(-) commit abcba798ccf57894dcd570a6578ef78db30a3e25 Author: Naohisa Goto Date: Tue Feb 1 22:20:02 2011 +0900 RELEASE_NOTES.rdoc is renamed to doc/RELEASE_NOTES-1.4.1.rdoc RELEASE_NOTES.rdoc | 104 ------------------------------------------ doc/RELEASE_NOTES-1.4.1.rdoc | 104 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 104 insertions(+), 104 deletions(-) delete mode 100644 RELEASE_NOTES.rdoc create mode 100644 doc/RELEASE_NOTES-1.4.1.rdoc commit 8719cf4e06fc8a8cd0564aeb0b95372a7a0bcefb Author: Naohisa Goto Date: Tue Feb 1 22:07:32 2011 +0900 Bug: Options "-v" and "-b" should be used for the limit of hits. * Bug: Options "-v" and "-b" should be used for the limit of hits, and "-V" and "-B" should not be used for the purpose. lib/bio/appl/blast/genomenet.rb | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) commit 974b640badae9837fe9fc173d690b27c9b045454 Author: Naohisa Goto Date: Tue Feb 1 20:34:50 2011 +0900 Bug fix: Workaround for a change in the GenomeNet BLAST site. * Bug fix: Workaround for a change in the GenomeNet BLAST site. Thanks to the anonymous reporter. The patch was originally made by Toshiaki Katyama. lib/bio/appl/blast/genomenet.rb | 7 +++---- 1 files changed, 3 insertions(+), 4 deletions(-) commit 001d3e3570c77185cece4aed1be5be2ed6f94f7e Author: Naohisa Goto Date: Thu Jan 6 23:39:19 2011 +0900 Added tests to check the previous Bio::Reference#endnote fix. test/unit/bio/test_reference.rb | 30 ++++++++++++++++++++++++++++++ 1 files changed, 30 insertions(+), 0 deletions(-) commit e1cd766abe24dbcc08a42103127c75ad0ab929aa Author: Naohisa Goto Date: Thu Jan 6 23:07:35 2011 +0900 Bio::Reference#pubmed_url is updated to follow recent NCBI changes. lib/bio/reference.rb | 5 ++--- test/unit/bio/test_reference.rb | 5 +++++ 2 files changed, 7 insertions(+), 3 deletions(-) commit 48024313a7568a38f4291618708541ae1dac312c Author: Naohisa Goto Date: Thu Jan 6 22:56:37 2011 +0900 Bug fix: Bio::Reference#endnote fails when url is not set * Bug fix: Bio::Reference#endnote fails when url is not set. Thanks to Warren Kibbe who reports the bug. (https://github.com/bioruby/bioruby/issues#issue/15) lib/bio/reference.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 577278a95340abfa32d3e67415d3a10bc74b82c0 Author: Pjotr Prins Date: Fri Dec 17 12:16:31 2010 +0100 Bug fix: In Bio::MEDLINE#reference, doi field should be filled. * Bug fix: In Bio::MEDLINE#reference, doi field should be filled. (https://github.com/bioruby/bioruby/issues#issue/29) lib/bio/db/medline.rb | 1 + test/unit/bio/db/test_medline.rb | 1 + 2 files changed, 2 insertions(+), 0 deletions(-) commit daa20c85681576d3bfbdc8f87580a4b6227b122c Author: Naohisa Goto Date: Thu Jan 6 20:25:03 2011 +0900 Bug fix: Bio::Newick#reparse failure * Bug fix: Bio::Newick#reparse failure. Thanks to jdudley who reports the bug. (https://github.com/bioruby/bioruby/issues#issue/28) * Tests are added to confirm the bug fix. lib/bio/db/newick.rb | 4 +++- test/unit/bio/db/test_newick.rb | 12 ++++++++++++ 2 files changed, 15 insertions(+), 1 deletions(-) commit 16117aefdf57ac3ae16b5568f462f7b919ef005f Author: Naohisa Goto Date: Thu Jan 6 20:14:18 2011 +0900 Use setup for the preparation of adding more test methods. test/unit/bio/db/test_newick.rb | 14 ++++++++++---- 1 files changed, 10 insertions(+), 4 deletions(-) commit e5dc5896a5c6249e2a6cb03d63a3c2ade36b67e7 Author: Naohisa Goto Date: Fri Nov 19 21:07:13 2010 +0900 Ruby 1.9 support: suppressed warning "mismatched indentations" test/unit/bio/db/pdb/test_pdb.rb | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit 1bce41c7ebed46ac6cf433b047fe6a4c3a538089 Author: Naohisa Goto Date: Fri Nov 19 21:05:06 2010 +0900 Ruby 1.9 support: Suppressed warning "shadowing outer local variable" lib/bio/db/pdb/residue.rb | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit 0f31727769833ccf9d6891ae192da1bc180223e0 Author: Naohisa Goto Date: Fri Nov 19 21:04:07 2010 +0900 Ruby 1.9 support: Suppressed warning "shadowing outer local variable" lib/bio/location.rb | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit 70c1135b171fcf47dd9dc1bc396d15d1c3acfa62 Author: Naohisa Goto Date: Fri Nov 19 21:02:57 2010 +0900 Ruby 1.9 support: Suppressed warning "shadowing outer local variable" lib/bio/db/pdb/pdb.rb | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) commit a77e4ab78211a85aa052ca6645a2051a4f3b76d8 Author: Naohisa Goto Date: Fri Nov 19 20:42:22 2010 +0900 Ruby 1.9 support: use Array#join instead of Array#to_s lib/bio/db/pdb/pdb.rb | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit 734c2e54613e3ed5efd95e1212feab8f014d5f19 Author: Naohisa Goto Date: Fri Nov 19 20:38:11 2010 +0900 Changed to use assert_instance_of * Changed to use assert_instance_of(klass, obj) instead of assert_equal(klass, obj.class). test/unit/bio/db/pdb/test_pdb.rb | 16 ++++++++-------- 1 files changed, 8 insertions(+), 8 deletions(-) commit 490286018c8ce314da441c646ea9c5fb3f765c95 Author: Naohisa Goto Date: Fri Nov 19 20:24:36 2010 +0900 Float::EPSILON was too small for the delta tolerance. test/unit/bio/db/pdb/test_pdb.rb | 20 ++++++++++---------- 1 files changed, 10 insertions(+), 10 deletions(-) commit e92d225cadabe63fe23c7c32a4d1d50a371366cc Author: Naohisa Goto Date: Fri Nov 19 20:16:30 2010 +0900 Ruby 1.9 support: use assert_in_delta test/unit/bio/db/pdb/test_pdb.rb | 24 +++++++++++++++--------- 1 files changed, 15 insertions(+), 9 deletions(-) commit 41452971a132ef55de3486022962fa2c333b4c85 Author: Naohisa Goto Date: Fri Nov 19 13:19:39 2010 +0900 Fixed Object#id problem and suppressed warning messages. * Changed not to call nil.id (==4) invoked from chain.id. * Suppressed warning message about useless use of a variable. * Suppressed waring about conflict of IDs when testing addResidue, addLigand and addChain methods. test/unit/bio/db/pdb/test_pdb.rb | 119 +++++++++++++++++++------------------- 1 files changed, 59 insertions(+), 60 deletions(-) commit b4af5826f77002933de9d3c2ddfcc5a7cb5629db Author: Naohisa Goto Date: Wed Nov 17 22:26:26 2010 +0900 Adjusted copyright line test/unit/bio/db/pdb/test_pdb.rb | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit 06fd989072b2287a3accbb60684b8a029bfc0ac3 Author: Naohisa Goto Date: Wed Nov 17 21:54:05 2010 +0900 A module name is changed to avoid potential name conflict. * A module name is changed to avoid potential name conflict. * Removed a Shift_JIS character (Zenkaku space) in a comment line. test/unit/bio/db/pdb/test_pdb.rb | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit 09007f93abe8c9c5e7561f082b55ca307a7d4a1e Author: Kazuhiro Hayashi Date: Thu Jul 15 21:06:28 2010 +0900 Added more unit tests for Bio::PDB * Added more unit tests for Bio::PDB. * This is part of combinations of the 13 commits: * 555f7b49a43e7c35c82cd48b199af96ca93d4179 * 2b5c87a9ada248597e0517e22191bb4c88be2a42 * a16e24fa35debdcacd11cf36fdf0e60fe82b3684 * e3607c0f7154a4156fd53ed17470aa3628cd2586 * 4e74c9107f09c5eb3fc7fc9ec38d9d773fe89367 * 605fb0a222f70eeaa1e576d31af484a9a6a2ac27 * 2c8b2b5496fee04b886bfcbd11fb99948816cc28 * 202cf2b1b57fbcac215aa135cf6343af6a49d2ef * f13c070c763c9035451c3639e6e29c9a156947cd * 843378e608bd1ef27a713d9be2d50f0f56915b0b * a9078b8a586b66d8026af612d9a5999c6c77de33 * f0174a8ca3ee136271c51047fce12597d3fbb58c * 6675fd930718e41ad009f469b8167f81c9b2ad52 test/unit/bio/db/pdb/test_pdb.rb | 3281 +++++++++++++++++++++++++++++++++++++- 1 files changed, 3276 insertions(+), 5 deletions(-) commit 2f6a1d29b14d89ac39b408582c9865ad06560ae1 Author: Naohisa Goto Date: Sat Nov 6 00:41:21 2010 +0900 Adjusted test data file path, required files and header descriptions. test/unit/bio/db/test_litdb.rb | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) commit d3394e69c98b3be63c8287af84dc530830fb977a Author: Kazuhiro Hayashi Date: Fri Jun 18 17:11:25 2010 +0900 added unit test for Bio::LITDB with a sample file test/data/litdb/1717226.litdb | 13 +++++ test/unit/bio/db/test_litdb.rb | 95 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 108 insertions(+), 0 deletions(-) create mode 100644 test/data/litdb/1717226.litdb create mode 100644 test/unit/bio/db/test_litdb.rb commit cef1d2c824f138fe268d20eaa9ffd85223c85ef9 Author: Naohisa Goto Date: Fri Nov 5 23:54:31 2010 +0900 Adjusted test data file path, required files and header descriptions. test/unit/bio/db/test_nbrf.rb | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) commit e892e7f9b9b4daadeff44c9e479e6f51f02e383e Author: Kazuhiro Hayashi Date: Fri Jun 25 13:20:46 2010 +0900 Added unit tests for Bio::NBRF with test data. * Added unit tests for Bio::NBRF with test data. * This is part of combinations of the two commits: * 53873a82182e072e738da20381dcb2bfd8bc9e96 (Modified the unit test for Bio::NBRF) * 4675cf85aa9c0b4de9f527f9c6bb80804fdaaaa9 (Modified Bio::TestNBRF and Bio::TestTRANSFAC.) test/data/pir/CRAB_ANAPL.pir | 6 +++ test/unit/bio/db/test_nbrf.rb | 82 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 88 insertions(+), 0 deletions(-) create mode 100644 test/data/pir/CRAB_ANAPL.pir create mode 100644 test/unit/bio/db/test_nbrf.rb commit 4922d5151138312d5a09ac60a06419c23978ba3c Author: Naohisa Goto Date: Fri Nov 5 23:07:13 2010 +0900 Mock class for testing is moved under the test class * Mock class for testing is moved under the test class, to avoid potential name conflicts. test/unit/bio/db/genbank/test_common.rb | 25 ++++++++++++------------- 1 files changed, 12 insertions(+), 13 deletions(-) commit 8f630700bc6dd8c183d08291c66c665394873586 Author: Naohisa Goto Date: Fri Nov 5 23:01:32 2010 +0900 Adjusted header descriptions. test/unit/bio/db/genbank/test_common.rb | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit ab3d6384ca721fb6004efb5988461eecefad4d6b Author: Naohisa Goto Date: Fri Nov 5 22:58:21 2010 +0900 Adjusted test data file path and header descriptions. test/unit/bio/db/genbank/test_genpept.rb | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) commit 1d24aecfac9dbc190cdc3eef0956451cc88cfe4f Author: Naohisa Goto Date: Fri Nov 5 22:54:19 2010 +0900 Adjusted test data file path, required files and header descriptions. test/unit/bio/db/genbank/test_genbank.rb | 12 ++++++++---- 1 files changed, 8 insertions(+), 4 deletions(-) commit 09275d8661f3d49a7e40be59a086ba33659b2448 Author: Kazuhiro Hayashi Date: Thu Jun 17 22:08:15 2010 +0900 Added unit tests for Bio::GenPept newly. * Added unit tests for Bio::GenPept newly. * This is part of the commit 8e46ff42b627791f259033d5a20c1610e32cfa62 (Added unit tests for NBRF and GenPept newly.) test/data/genbank/CAA35997.gp | 48 ++++++++++++++++++ test/unit/bio/db/genbank/test_genpept.rb | 81 ++++++++++++++++++++++++++++++ 2 files changed, 129 insertions(+), 0 deletions(-) create mode 100644 test/data/genbank/CAA35997.gp create mode 100644 test/unit/bio/db/genbank/test_genpept.rb commit 2cde22cb358a2b7ec8197866fe35a0b46ebf9b00 Author: Kazuhiro Hayashi Date: Thu Jun 24 18:52:37 2010 +0900 Added unit tests for Bio::NCBIDB::Common * Added unit tests for Bio::NCBIDB::Common. * This is part of combination of the 4 commits: * 7da8d557e8ee53da9d93c6fadfd0d8f493977c81 (added test/unit/bio/db/genbank/test_common.rb newly) * 2b5c87a9ada248597e0517e22191bb4c88be2a42 (Modified a few lines of Bio::NCBIDB::TestCommon, Bio::TestPDBRecord and Bio::TestPDB) * 10c043535dd7bf5b9682b4060183f494742c53df (Modified unit test for Bio::GenBank::Common) * 0af08fb988e08948a54e33273861b5460b7f6b2d (Modified the unit test for Bio::GenBank) test/unit/bio/db/genbank/test_common.rb | 275 +++++++++++++++++++++++++++++++ 1 files changed, 275 insertions(+), 0 deletions(-) create mode 100644 test/unit/bio/db/genbank/test_common.rb commit f775d9b7f7deda2e30d4196d4cf507b59936a654 Author: Kazuhiro Hayashi Date: Sat Jun 26 17:54:32 2010 +0900 Added unit tests for Bio::GenBank with test data. * Added unit tests for Bio::GenBank with test data. * This is part of combination of the two commits: * 555f7b49a43e7c35c82cd48b199af96ca93d4179 (added test_genbank.rb and test_go.rb with the test files. modified test_pdb.rb) * a46f895bf378ce08143ff031ddda302f970c270a (Modified Bio::GenBank and Bio::Nexus) test/data/genbank/SCU49845.gb | 167 +++++++++++++ test/unit/bio/db/genbank/test_genbank.rb | 397 ++++++++++++++++++++++++++++++ 2 files changed, 564 insertions(+), 0 deletions(-) create mode 100644 test/data/genbank/SCU49845.gb create mode 100644 test/unit/bio/db/genbank/test_genbank.rb commit 33621c1f4c16173efd05861759f577b2c4733a53 Author: Naohisa Goto Date: Fri Oct 29 16:45:22 2010 +0900 Bio::BIORUBY_EXTRA_VERSION is changed to ".5000". bioruby.gemspec | 2 +- lib/bio/version.rb | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) commit cfb2c744f3762689077f5bf2092f715d25e066ed Author: Naohisa Goto Date: Fri Oct 22 13:02:03 2010 +0900 BioRuby 1.4.1 is released. ChangeLog | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 55 insertions(+), 0 deletions(-) commit 92cfda14c08b270ed1beca33153125141f88510e Author: Naohisa Goto Date: Fri Oct 22 13:00:09 2010 +0900 Preparation for bioruby-1.4.1 release. bioruby.gemspec | 2 +- lib/bio/version.rb | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) commit d7999539392bba617b041e3120b5b2d785301f24 Author: Naohisa Goto Date: Fri Oct 22 10:27:02 2010 +0900 Newly added issue is copied from KNOWN_ISSUES.rdoc to the release note. RELEASE_NOTES.rdoc | 15 +++++++++++++++ 1 files changed, 15 insertions(+), 0 deletions(-) commit a9f287658441038a4e9bb220502523de039417f9 Author: Naohisa Goto Date: Fri Oct 22 10:26:44 2010 +0900 updated description of an issue KNOWN_ISSUES.rdoc | 12 +++++++----- 1 files changed, 7 insertions(+), 5 deletions(-) commit bb946d1c97d1eb0de62c8b509bbfb02d67efffeb Author: Naohisa Goto Date: Thu Oct 21 23:17:25 2010 +0900 Added an issue about command-line string escaping on Windows with Ruby 1.9. KNOWN_ISSUES.rdoc | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) commit fe7d26516cc6b9a3cf8c16e6f8204a4d5eb5e5ae Author: Naohisa Goto Date: Thu Oct 21 20:34:32 2010 +0900 Added descriptions. RELEASE_NOTES.rdoc | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 52 insertions(+), 0 deletions(-) commit fd5da3b47ebce1df46922f20d013439faef483e9 Author: Naohisa Goto Date: Thu Oct 21 18:27:44 2010 +0900 ChangeLog is updated. ChangeLog | 1657 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 1657 insertions(+), 0 deletions(-) commit fab16977d23bb3a5fdfc976eece14dfdabdcac4d Author: Naohisa Goto Date: Thu Oct 21 18:07:43 2010 +0900 preparation for release candidate 1.4.1-rc1 bioruby.gemspec | 40 ++++++++++++++++++++++++++++++++++++++-- lib/bio/version.rb | 4 ++-- 2 files changed, 40 insertions(+), 4 deletions(-) commit 119cc3bf5582735a5df574450ec685fd2f989b5d Author: Naohisa Goto Date: Thu Oct 21 18:05:13 2010 +0900 Temporarily removed for packaging new version. It will be reverted later. doc/howtos/sequence_codon.txt | 38 -------------------------------------- 1 files changed, 0 insertions(+), 38 deletions(-) delete mode 100644 doc/howtos/sequence_codon.txt commit 1b1b3752e3c98a29caf837bfc12c1ed79a04dba2 Author: Naohisa Goto Date: Thu Oct 21 16:48:43 2010 +0900 Fixed typo, reported by Tomoaki NISHIYAMA. KNOWN_ISSUES.rdoc | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 47ed7e5eaca4a261ef0fd4f76909c930e52aadd5 Merge: c002142 548cb58 Author: Naohisa Goto Date: Thu Oct 21 16:17:59 2010 +0900 Merge branch 'test-defline-by-jtprince' commit 548cb58aaad06bb9161e09f7b4ae45729898ca5e Author: Naohisa Goto Date: Thu Oct 21 16:16:28 2010 +0900 adjusted filename in header test/unit/bio/db/fasta/test_defline_misc.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 95be260708ef21be7848a5d4b7c494cc6bb3d81f Author: Naohisa Goto Date: Thu Oct 21 16:14:58 2010 +0900 Renamed to test_defline_misc.rb to resolve the file name conflict. test/unit/bio/db/fasta/test_defline.rb | 490 --------------------------- test/unit/bio/db/fasta/test_defline_misc.rb | 490 +++++++++++++++++++++++++++ 2 files changed, 490 insertions(+), 490 deletions(-) delete mode 100644 test/unit/bio/db/fasta/test_defline.rb create mode 100644 test/unit/bio/db/fasta/test_defline_misc.rb commit 1e7628e2c396330743d4904b100d62d2c2773bf0 Author: Naohisa Goto Date: Thu Oct 21 16:11:14 2010 +0900 Test bug fix: mistake in test_get method in two classes. test/unit/bio/db/fasta/test_defline.rb | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit c479f56f14fb531e31c7e5fdd02f6c934ac468fa Author: Naohisa Goto Date: Thu Oct 21 16:06:46 2010 +0900 Test bug fix: test classes should inherit Test::Unit::TestCase. test/unit/bio/db/fasta/test_defline.rb | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-) commit 0e8ea46e5a239df5c1da3c63e602376c04191ef4 Author: Naohisa Goto Date: Thu Oct 21 15:55:02 2010 +0900 Bug fix: syntax error in Ruby 1.8.7 due to a comma. test/unit/bio/db/fasta/test_defline.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 62a2c1d7c47fbef7a7e7c4f1c079f98fa74e5099 Author: John Prince Date: Tue Oct 19 11:20:16 2010 -0600 added individual unit tests for Bio::FastaDefline test/unit/bio/db/fasta/test_defline.rb | 490 ++++++++++++++++++++++++++++++++ 1 files changed, 490 insertions(+), 0 deletions(-) create mode 100644 test/unit/bio/db/fasta/test_defline.rb commit c002142cdb478b0ad08b7bd5e3331c7b643222f1 Author: Naohisa Goto Date: Thu Oct 21 15:36:58 2010 +0900 Adjusted file paths and the copyright line. test/unit/bio/db/fasta/test_defline.rb | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) commit a5818c5f8ae07e4ec4bdcc2229df9a59bded63f0 Author: Kazuhiro Hayashi Date: Thu Jun 17 11:54:15 2010 +0900 Newly added unit tests for Bio::FastaDefline * Newly added unit tests for Bio::FastaDefline. * This is part of combination of the two commits: bd2452caf0768e7000d19d462465b1772e3c030b (modified test file for Bio::FastaDefline) cae1b6c00cdb9044cb0dfb4db58e6acfe9b7d246 (Added test/unit/bio/db/fasta/test_defline.rb and test/unit/bio/db/kegg/test_kgml.rb with the sample file newly.) test/unit/bio/db/fasta/test_defline.rb | 160 ++++++++++++++++++++++++++++++++ 1 files changed, 160 insertions(+), 0 deletions(-) create mode 100644 test/unit/bio/db/fasta/test_defline.rb commit 4addb906df442adf4ed20275428070b651abbf07 Author: Naohisa Goto Date: Thu Oct 21 15:08:46 2010 +0900 Added note for a dead link, updated a URL, and added a new reference. lib/bio/db/fasta/defline.rb | 8 ++++++-- 1 files changed, 6 insertions(+), 2 deletions(-) commit e636f123adf28688748cc5bbbc6e0c817358d475 Author: John Prince Date: Thu Oct 14 15:38:07 2010 -0600 included TREMBL prefix to list of NSIDs (tr|) * Included TREMBL prefix to list of NSIDs (tr|). This is a standard prefix found in UniprotKB FASTA files. lib/bio/db/fasta/defline.rb | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) commit 5277eb0b5376a0dc217dc051c49993c505956400 Author: Naohisa Goto Date: Thu Oct 21 14:16:03 2010 +0900 Bug fix: Bio::ClustalW::Report#get_sequence may fail * Bug fix: Bio::ClustalW::Report#get_sequence may fail when the second argument of Bio::ClustalW::Report.new is specified. lib/bio/appl/clustalw/report.rb | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) commit 81b9238abb643573a4051dc0f10c4f9a2cff40fa Author: Naohisa Goto Date: Thu Oct 21 14:21:52 2010 +0900 Added a test class to test the second argument of Bio::ClustalW::Report.new. test/unit/bio/appl/clustalw/test_report.rb | 19 +++++++++++++++++++ 1 files changed, 19 insertions(+), 0 deletions(-) commit 3e9b149aec91585732a34efaa960c96bcec2eef8 Author: Naohisa Goto Date: Thu Oct 21 13:22:38 2010 +0900 Ruby 1.9.2 support: defined Bio::RestrictionEnzyme::DoubleStranded::AlignedStrands#initialize * Bio::RestrictionEnzyme::DoubleStranded::AlignedStrands#initialize is explicitly defined, due to the behavior change of argument number check in the default initialize method in Ruby 1.9.2. .../double_stranded/aligned_strands.rb | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) commit cfe31c02d4bd0d97e588d25dc30188da6be81e85 Author: Naohisa Goto Date: Thu Oct 21 11:49:08 2010 +0900 Ruby 1.9.2 support: assert_in_delta for a float value. test/unit/bio/db/test_aaindex.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 3f5d2ccb9ac8bc44febc88d441f47aeddf7f12ff Author: Naohisa Goto Date: Thu Oct 21 11:41:38 2010 +0900 Ruby 1.9.2 support: using assert_in_delta for float values. * Ruby 1.9.2 support: using assert_in_delta for float values. The patch is written by Tomoaki NISHIYAMA during BH2010.10. test/unit/bio/util/test_contingency_table.rb | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) commit f357929bc5dcf8295b0a11a09b4025e3592d9eda Author: Naohisa Goto Date: Thu Oct 21 11:16:37 2010 +0900 Small changes for README.rdoc. README.rdoc | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) commit c6c567fe9602ae8d7d343a5773f51d8aa22c8876 Author: Naohisa Goto Date: Thu Oct 21 11:12:17 2010 +0900 Shows message when running "ruby setup.rb test" with Ruby1.9. setup.rb | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) commit 9b66463c4150a679e63289e0cee3c4d1200c7d0f Author: Naohisa Goto Date: Wed Oct 20 17:53:36 2010 +0900 Added description about incompatible the change in Bio::AAindex2. RELEASE_NOTES.rdoc | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) commit 327ea878d4e15b99711d8121a54698da29d4b0aa Author: Naohisa Goto Date: Wed Oct 20 17:35:53 2010 +0900 Changed the expected return values in the unit tests, following the last change to Bio::AAindex2. * Changed the expected return values in the unit tests, following the last change to Bio::AAindex2. * The patch is written by Tomoaki NISHIYAMA during BH2010.10. test/unit/bio/db/test_aaindex.rb | 15 ++++++++------- 1 files changed, 8 insertions(+), 7 deletions(-) commit 31963b43daab2801087f5f6d23b04e357bb7b1e2 Author: Naohisa Goto Date: Wed Oct 20 17:32:26 2010 +0900 Ruby 1.9.2 support: Incompatible change: the symmetric elements for triangular matrix should be copied * Ruby 1.9.2 support: Incompatible change: the symmetric elements for triangular matrix should be copied. The patch is written by Tomoaki NISHIYAMA during BH2010.10. lib/bio/db/aaindex.rb | 12 +++++++++++- 1 files changed, 11 insertions(+), 1 deletions(-) commit e8a1d65984781466eff9d5a262f18cb1c3e01056 Author: Naohisa Goto Date: Wed Oct 20 16:10:51 2010 +0900 Test bug fix: confusion between assert and assert_equal * Test bug fix: the assert should be assert_equal. The bug was found with Ruby 1.9.2-p0. test/unit/bio/db/embl/test_sptr.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit feb2cda47beab91e2fc3dddf99d5cc1cacf3fbae Author: Naohisa Goto Date: Wed Oct 20 14:59:40 2010 +0900 Test bug fix: confusion between assert and assert_equal, and apparently wrong expected values. * Test bug fix: the assert should be assert_equal. The bug was found with Ruby 1.9.2-p0. * In the test_rates_hundred_and_fiftieth_position method, the index for @example_rates and the expected value of the third assertion were apparently wrong. * Reported by Tomoaki NISHIYAMA during BH2010.10. test/unit/bio/appl/paml/codeml/test_rates.rb | 12 ++++++------ 1 files changed, 6 insertions(+), 6 deletions(-) commit ffc03a11a4ef7b36ea78de58d4c8d4e9259093c4 Author: Naohisa Goto Date: Sat Oct 16 01:06:36 2010 +0900 Tests for Bio::KEGG::PATHWAY are improved with new test data. test/data/KEGG/ec00072.pathway | 23 + test/data/KEGG/hsa00790.pathway | 59 ++ test/data/KEGG/ko00312.pathway | 16 + test/data/KEGG/rn00250.pathway | 114 ++++ test/unit/bio/db/kegg/test_pathway.rb | 1055 +++++++++++++++++++++++++++++++++ 5 files changed, 1267 insertions(+), 0 deletions(-) create mode 100644 test/data/KEGG/ec00072.pathway create mode 100644 test/data/KEGG/hsa00790.pathway create mode 100644 test/data/KEGG/ko00312.pathway create mode 100644 test/data/KEGG/rn00250.pathway commit 1e1d974c2c72ddf5a45e41c6f2510729fb65a4ad Author: Toshiaki Katayama Date: Tue Jul 20 11:46:52 2010 +0900 Added methods for parsing KEGG PATHWAY fields. * Added methods for parsing KEGG PATHWAY fields. * This is part of commit e2abe5764f3ded91c82689245f19a0412d3a7afb and modified to merge with the current HEAD (original commit message: Changes for TogoWS). lib/bio/db/kegg/pathway.rb | 146 +++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 145 insertions(+), 1 deletions(-) commit 957c8ee630538a8c49c52339cb3c0364e5328378 Author: Naohisa Goto Date: Sat Oct 16 00:50:42 2010 +0900 Private method strings_as_hash is moved to Bio::KEGG::Common::StringsAsHash. lib/bio/db/kegg/common.rb | 18 ++++++++++++++++++ lib/bio/db/kegg/module.rb | 19 +++++-------------- 2 files changed, 23 insertions(+), 14 deletions(-) commit b7fd85382bccceaa29958d5daf98ca0a513e5a9a Author: Naohisa Goto Date: Fri Oct 15 22:01:05 2010 +0900 Renamed Bio::KEGG::*#pathway_modules to modules, etc. * Renamed following methods in Bio::KEGG::ORTHOLOGY and Bio::KEGG:PATHWAY classes: pathway_modules to modules, pathway_modules_as_strings to modules_as_strings, and pathway_modules_as_hash to modules_as_hash. * Unit tests are also modified. lib/bio/db/kegg/common.rb | 18 +++++++++--------- lib/bio/db/kegg/orthology.rb | 10 +++++----- lib/bio/db/kegg/pathway.rb | 10 +++++----- test/unit/bio/db/kegg/test_orthology.rb | 12 ++++++------ test/unit/bio/db/kegg/test_pathway.rb | 20 ++++++++++---------- 5 files changed, 35 insertions(+), 35 deletions(-) commit 02aea9f18ff6e3079309a76d04d02ea1f2902e7b Author: Naohisa Goto Date: Fri Oct 15 00:20:48 2010 +0900 Modified tests for Bio::KEGG::GENES following the changes of the class. test/unit/bio/db/kegg/test_genes.rb | 30 +++++++++++++++++++++++++++++- 1 files changed, 29 insertions(+), 1 deletions(-) commit f4b45ea629734ecff820483475d83fef6cbe068e Author: Naohisa Goto Date: Thu Oct 14 23:57:18 2010 +0900 Reverted Bio::KEGG::GENES#genes, gene and motif methods and modified. * Reverted Bio::KEGG::GENES#genes, gene and motif methods which are removed in the last commit. To avoid code duplication, they are also modified to use other methods, and RDoc is added about the deprecation or change of the methods. * Modified RDoc. lib/bio/db/kegg/genes.rb | 32 +++++++++++++++++++++++++++++++- 1 files changed, 31 insertions(+), 1 deletions(-) commit dd987911cb4a84e23565bb37707611d054c22101 Author: Toshiaki Katayama Date: Tue Jul 20 11:46:52 2010 +0900 New methods Bio::KEGG::GENES#keggclass etc. * New methods and aliases are added: Bio::KEGG::GENES#keggclass, keggclasses, names_as_array, names, motifs_as_strings, motifs_as_hash, motifs. * Removed Bio::KEGG::GENES#genes, gene and motif methods. * Added a comment about deprecation of CODON_USAGE lines. * This is part of commit e2abe5764f3ded91c82689245f19a0412d3a7afb (original commit message: Changes for TogoWS). lib/bio/db/kegg/genes.rb | 41 ++++++++++++++++++++++++++--------------- 1 files changed, 26 insertions(+), 15 deletions(-) commit b2575f5acfeca269c93a35baa3809fdac17a7271 Author: Naohisa Goto Date: Wed Oct 13 23:12:33 2010 +0900 Release notes for the upcoming release version. RELEASE_NOTES.rdoc | 31 +++++++++++++++++++++++++++++++ 1 files changed, 31 insertions(+), 0 deletions(-) create mode 100644 RELEASE_NOTES.rdoc commit 83992875c45a1fdd54d042c923dee51119026e49 Author: Naohisa Goto Date: Wed Oct 13 23:11:21 2010 +0900 Renamed RELEASE_NOTES.rdoc to doc/RELEASE_NOTES-1.4.0.rdoc. RELEASE_NOTES.rdoc | 167 ------------------------------------------ doc/RELEASE_NOTES-1.4.0.rdoc | 167 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 167 insertions(+), 167 deletions(-) delete mode 100644 RELEASE_NOTES.rdoc create mode 100644 doc/RELEASE_NOTES-1.4.0.rdoc commit f649629eb6216aeabbd2020bcac9b7f870b12395 Author: Naohisa Goto Date: Wed Oct 13 21:58:58 2010 +0900 Added acknowledgement to Kozo Nishida for KEGG parsers. RELEASE_NOTES.rdoc | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) commit 379f177edc0f95dee1ec0c2d2cf679c27918e41b Author: Naohisa Goto Date: Fri Oct 8 16:31:58 2010 +0900 Fixed a variable name mistake in Bio::Command, and English grammer fix. * Fixed a variable name mistake in Bio::Command#no_fork?. * English grammer fix for comments. Thanks to Andrew Grimm who reports the fix. lib/bio/command.rb | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) commit 1344c27c90438d8c8840ee507d0ab43224f89054 Author: Naohisa Goto Date: Wed Oct 6 23:47:26 2010 +0900 Bug fix: fork(2) is called on platforms that do not support it. * Bug fix: fork(2) is called on platforms that do not support it. Thanks to Andrew Grimm who reports the bug (fork() is called on platforms that do not support it; http://github.com/bioruby/bioruby/issues#issue/6). * Bio::Command#call_command and query_command can now fall back into using popen when fork(2) is not implemented. * Detection of Windows platform is improved. The idea of the code is taken from Redmine's platform.rb. lib/bio/command.rb | 98 +++++++++++++++++++++++++++++------ test/functional/bio/test_command.rb | 9 +-- 2 files changed, 84 insertions(+), 23 deletions(-) commit 0bfa1c3a8d7b8d03919d54a2a241ca96a79bad83 Author: Naohisa Goto Date: Wed Oct 6 15:49:51 2010 +0900 Bug fix: Bio::MEDLINE#reference is changed not to put empty values * Bug fix: Bio::MEDLINE#reference is changed not to put empty values in the returned Bio::Reference object. I think the original bahavior is a bug. This is an incompatible change but the effect is very small. lib/bio/db/medline.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 8e0bc03d79a1f20743c29f0a44e273d362eaa2cd Author: Naohisa Goto Date: Wed Oct 6 15:39:34 2010 +0900 Bug fix: Bio::MEDLINE#initialize should handle continuation of lines. * Bug fix: Bio::MEDLINE#initialize should handle continuation of lines. Thanks to Steven Bedrick who reports the bug (Bio::MEDLINE#initialize handles multi-line MeSH terms incorrectly; http://github.com/bioruby/bioruby/issues#issue/7). lib/bio/db/medline.rb | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) commit 728de78b438108e44066a7ce7490632c81108fb6 Author: Naohisa Goto Date: Wed Oct 6 15:29:10 2010 +0900 Added unit tests for Bio::MEDLINE with test data. * Added unit tests for Bio::MEDLINE with test data. The data is taken from NCBI and the abstract was removed to avoid possible copyright problem. The choice of the data (PMID: 20146148) is suggested by Steven Bedrick in a bug report (Bio::MEDLINE#initialize handles multi-line MeSH terms incorrectly). test/data/medline/20146148_modified.medline | 54 ++++++++++ test/unit/bio/db/test_medline.rb | 148 +++++++++++++++++++++++++++ 2 files changed, 202 insertions(+), 0 deletions(-) create mode 100644 test/data/medline/20146148_modified.medline commit 930095817ce60793ac909a4d01731d1f97bc4fa5 Author: Naohisa Goto Date: Wed Sep 29 20:51:12 2010 +0900 Bug fix: NoMethodError in Bio::Tree#collect_edge! * Bug fix: NoMethodError in Bio::Tree#collect_edge!. Thanks to Kazuhiro Hayashi who reports the bug. lib/bio/tree.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit c7927ec4743ddc4ec4501790bbed097b69f616e7 Author: Naohisa Goto Date: Wed Sep 29 20:49:44 2010 +0900 Modified and improved tests for Bio::Tree. test/unit/bio/test_tree.rb | 393 +++++++++++++++++++++++++++----------------- 1 files changed, 240 insertions(+), 153 deletions(-) commit 0161148c9b4d9ea404af92b4baf8241239a283de Author: Kazuhiro Hayashi Date: Fri Jul 16 00:09:39 2010 +0900 Modified unit tests for Bio::Tree * Modified unit tests for Bio::Tree. * This is part of combination of the two commits: * 6675fd930718e41ad009f469b8167f81c9b2ad52 (Modified unit tests and classes) * a6dc63ffe3460ea8d8980b3d6c641356881e0862 (Modified unit test for Bio::Tree) test/unit/bio/test_tree.rb | 174 +++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 173 insertions(+), 1 deletions(-) commit 31ded691a9329e45fe563e5f70138648d3b30bbf Author: Kazuhiro Hayashi Date: Thu Jul 15 21:06:28 2010 +0900 Bug fix: Bio::Tree#remove_edge_if did not work. * Bug fix: Bio::Tree#remove_edge_if did not work. * This is part of commit 6675fd930718e41ad009f469b8167f81c9b2ad52 (original commit message: Modified unit tests and classes) and removed a comment line. lib/bio/tree.rb | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) commit d0a3af23c74004688a8fc0b5be3d09f7144e33a1 Author: Naohisa Goto Date: Wed Sep 22 19:23:50 2010 +0900 Renamed test/data/go/part_of_* to avoid possible confusion * Renamed test/data/go/part_of_* to selected_* to avoid possible confusion: The word "part_of" is a keyword in Gene Ontology. test/data/go/part_of_component.ontology | 12 ---------- test/data/go/part_of_gene_association.sgd | 31 ---------------------------- test/data/go/part_of_wikipedia2go | 13 ----------- test/data/go/selected_component.ontology | 12 ++++++++++ test/data/go/selected_gene_association.sgd | 31 ++++++++++++++++++++++++++++ test/data/go/selected_wikipedia2go | 13 +++++++++++ test/unit/bio/db/test_go.rb | 6 ++-- 7 files changed, 59 insertions(+), 59 deletions(-) delete mode 100644 test/data/go/part_of_component.ontology delete mode 100644 test/data/go/part_of_gene_association.sgd delete mode 100644 test/data/go/part_of_wikipedia2go create mode 100644 test/data/go/selected_component.ontology create mode 100644 test/data/go/selected_gene_association.sgd create mode 100644 test/data/go/selected_wikipedia2go commit e4f82da52402f8175bd92b50209b09bc83bfddd6 Author: Naohisa Goto Date: Wed Sep 22 19:21:36 2010 +0900 Removed unused test/data/go/wikipedia2go.txt. test/data/go/wikipedia2go.txt | 728 ----------------------------------------- 1 files changed, 0 insertions(+), 728 deletions(-) delete mode 100644 test/data/go/wikipedia2go.txt commit 5003fd53b0a3852fa23b76ad6ec8e9e76d5850fc Author: Naohisa Goto Date: Thu Sep 16 22:37:31 2010 +0900 Adjusted test data file paths and header lines in test_go.rb. * Adjusted test data file paths. * Adjusted copyright and description in the header. test/unit/bio/db/test_go.rb | 26 +++++++++++++++++--------- 1 files changed, 17 insertions(+), 9 deletions(-) commit 540cb7ab27e79634f5436476cce51cc20ca0f70f Author: Kazuhiro Hayashi Date: Thu Jul 15 21:06:28 2010 +0900 Added tests for Bio::GO classes. * Added tests for Bio::GO classes. * This is part of combination of the three commits: * 555f7b49a43e7c35c82cd48b199af96ca93d4179 (added test_genbank.rb and test_go.rb with the test files. modified test_pdb.rb) * e966f17546427b8ad39cb9942807ceb8a068d746 (modified test/unit/bio/db/test_go.rb and added the test files for each GO class) * 6675fd930718e41ad009f469b8167f81c9b2ad52 (Modified unit tests and classes) test/unit/bio/db/test_go.rb | 163 +++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 163 insertions(+), 0 deletions(-) create mode 100644 test/unit/bio/db/test_go.rb commit 5ff01f7dfbc3661d8c66b44874a2ba4ff2f96b56 Author: Kazuhiro Hayashi Date: Fri Jun 11 21:02:29 2010 +0900 Added test data for Bio::GO classes. * Added test data for Bio::GO classes. * This is part of combination of the three commits: * 555f7b49a43e7c35c82cd48b199af96ca93d4179 (added test_genbank.rb and test_go.rb with the test files. modified test_pdb.rb) * e966f17546427b8ad39cb9942807ceb8a068d746 (modified test/unit/bio/db/test_go.rb and added the test files for each GO class) * 6675fd930718e41ad009f469b8167f81c9b2ad52 (Modified unit tests and classes) * License for the test data is the public domain. ( http://wiki.geneontology.org/index.php/Legal_FAQ ) test/data/go/part_of_component.ontology | 12 + test/data/go/part_of_gene_association.sgd | 31 ++ test/data/go/part_of_wikipedia2go | 13 + test/data/go/wikipedia2go.txt | 728 +++++++++++++++++++++++++++++ 4 files changed, 784 insertions(+), 0 deletions(-) create mode 100644 test/data/go/part_of_component.ontology create mode 100644 test/data/go/part_of_gene_association.sgd create mode 100644 test/data/go/part_of_wikipedia2go create mode 100644 test/data/go/wikipedia2go.txt commit d4210673a1a696bfb02c93b7743e60dea1a5fcc8 Author: Kazuhiro Hayashi Date: Thu Jul 15 21:06:28 2010 +0900 Bug fix: Typo and missing field in Bio::GO::GeneAssociation#to_str. * Bug fix: Typo and missing field in Bio::GO::GeneAssociation#to_str. * This is part of commit 6675fd930718e41ad009f469b8167f81c9b2ad52 (original commit message: Modified unit tests and classes) and modified. The bug is also reported by Ralf Stephan ([BioRuby] [PATCH] GO annotations fixes and improvements). lib/bio/db/go.rb | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit acab0bb4a4e0f970f8f6be3aea2c371f63a49fa7 Author: Naohisa Goto Date: Wed Aug 25 22:58:42 2010 +0900 Database names used in tests are changed, following the change of TogoWS. * Database names used in tests are changed, following the change of TogoWS: "gene" to "kegg-genes" and "enzyme" to "kegg-enzyme". test/functional/bio/io/test_togows.rb | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) commit 1fb1f1cc5ca3edb42de03874b3527ce0cf0de294 Author: Toshiaki Katayama Date: Tue Jul 20 11:46:52 2010 +0900 Database name used in tests is changed, following the change of TogoWS. * The database name "genbank" is changed to "nucleotide", following the change in TogoWS. * This is part of commit e2abe5764f3ded91c82689245f19a0412d3a7afb. (Original commit message: Changes for TogoWS) test/functional/bio/io/test_togows.rb | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-) commit 1db4b8011f4fee158aeb78ec2d76c76688714788 Author: Naohisa Goto Date: Wed Aug 11 23:33:49 2010 +0900 New method Bio::Fastq#mask for masking low score regions. * New method Bio::Fastq#mask for masking low score regions is added with unit tests. This method is implemented as a shortcut of Bio::Sequence#mask_with_quality_score method. lib/bio/db/fastq.rb | 15 +++++++++++++++ test/unit/bio/db/test_fastq.rb | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+), 0 deletions(-) commit 72b47b2391a01c5f4214fd188abe0857cd3ed166 Author: Naohisa Goto Date: Wed Aug 11 23:04:57 2010 +0900 New module Bio::Sequence::SequenceMasker to help masking a sequence. * New module Bio::Sequence::SequenceMasker to help masking a sequence. The module is only expected to be included in Bio::Sequence. In the future, methods in this module might be moved to Bio::Sequence or other module and this module might be removed. * Unit tests for Bio::Sequence::SequenceMasker are also added. lib/bio/sequence.rb | 2 + lib/bio/sequence/sequence_masker.rb | 95 +++++++++++++ test/unit/bio/sequence/test_sequence_masker.rb | 169 ++++++++++++++++++++++++ 3 files changed, 266 insertions(+), 0 deletions(-) create mode 100644 lib/bio/sequence/sequence_masker.rb create mode 100644 test/unit/bio/sequence/test_sequence_masker.rb commit a2b21fa31c87fc47ae375380fb34958460414107 Author: Naohisa Goto Date: Wed Aug 11 22:59:46 2010 +0900 New method Bio::Sequence#output_fasta, a replacement for to_fasta. * New method Bio::Sequence#output_fasta, a replacement for Bio::Sequence#to_fasta. This is also implemented as a shortcut of Bio::Sequence#output(:fasta). lib/bio/sequence/format.rb | 14 ++++++++++++++ 1 files changed, 14 insertions(+), 0 deletions(-) commit d139a1e3e7f77317102eaa24649515541761a212 Author: Toshiaki Katayama Date: Tue Jul 20 11:46:52 2010 +0900 File format autodetection for Bio::KEGG::PATHWAY and Bio::KEGG::MODULE. * Added file format autodetection for Bio::KEGG::PATHWAY and Bio::KEGG::MODULE. * This is part of commit e2abe5764f3ded91c82689245f19a0412d3a7afb. (Original commit message: Changes for TogoWS) lib/bio/io/flatfile/autodetection.rb | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) commit 920d92c13b44921a3f58ddbd8566e7a90dd59996 Author: Toshiaki Katayama Date: Tue Jul 20 11:46:52 2010 +0900 Added autoload for Bio::KEGG::PATHWAY and Bio::KEGG::MODULE. * Added autoload for Bio::KEGG::PATHWAY and Bio::KEGG::MODULE. * This is part of commit e2abe5764f3ded91c82689245f19a0412d3a7afb. (Original commit message: Changes for TogoWS) lib/bio.rb | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) commit bf342c28e0c75c9b48770144f421dd12babd9d0e Author: Naohisa Goto Date: Tue Aug 17 23:19:47 2010 +0900 Unit tests for Bio::KEGG::MODULE is modified and improved. test/unit/bio/db/kegg/test_module.rb | 194 ++++++++++++++++++++++++++++++---- 1 files changed, 173 insertions(+), 21 deletions(-) commit 1742568a4f27e75a19441e4a4437ca3f1c0251f8 Author: Naohisa Goto Date: Tue Aug 17 23:15:44 2010 +0900 In Bio::KEGG::MODULE, an internal-only method is changed to private. lib/bio/db/kegg/module.rb | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) commit 8f5ff66cca678ac6be75a7dda1ff840ac3111f42 Author: Naohisa Goto Date: Tue Aug 17 23:10:39 2010 +0900 Removed unused comments. lib/bio/db/kegg/module.rb | 32 -------------------------------- 1 files changed, 0 insertions(+), 32 deletions(-) commit f9d23fb32eeb15dc57580e25653d3f2fff5fa1dc Author: Naohisa Goto Date: Tue Aug 17 22:59:18 2010 +0900 Reverted Bio::KEGG::MODULE#keggclass. * Reverted Bio::KEGG::MODULE#keggclass. * Removed keggclasses and keggclasses_as_array methods, because they are inconsistent with Bio::KEGG::ORTHOLOGY#keggclasses. lib/bio/db/kegg/module.rb | 6 +----- 1 files changed, 1 insertions(+), 5 deletions(-) commit 94188d23ad843c7cb998c99f46371e540ce457dc Author: Toshiaki Katayama Date: Tue Jul 20 11:41:50 2010 +0900 For Bio::KEGG::MODULE, methods are added and modified. * For Bio::KEGG::MODULE, methods are added and modified. * New methods: definition, etc. * Removed methods: pathway, orthologies, keggclass, etc. * Changed methods: reactions, compounds, etc. * (Original commit message: Changes for TogoWS) lib/bio/db/kegg/module.rb | 136 +++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 126 insertions(+), 10 deletions(-) commit 92efc03707a49fa0b2c02e7b2f8b53749a75ad59 Author: Kozo Nishida Date: Thu Feb 4 22:59:20 2010 +0900 New class Bio::KEGG::MODULE, parser for KEGG MODULE (Pathway Module). lib/bio/db/kegg/module.rb | 83 ++++++++++++++++++++++++++++++ test/data/KEGG/M00118.module | 44 ++++++++++++++++ test/unit/bio/db/kegg/test_module.rb | 94 ++++++++++++++++++++++++++++++++++ 3 files changed, 221 insertions(+), 0 deletions(-) create mode 100644 lib/bio/db/kegg/module.rb create mode 100644 test/data/KEGG/M00118.module create mode 100644 test/unit/bio/db/kegg/test_module.rb commit b7c75cc6023d5dc9096111fde99a6e89db2e4bdc Author: Naohisa Goto Date: Wed May 12 01:13:52 2010 +0900 Improvement of tests for Bio::KEGG::ORTHOLOGY using updated test data. test/unit/bio/db/kegg/test_orthology.rb | 95 +++++++++++++++++++++++++++++++ 1 files changed, 95 insertions(+), 0 deletions(-) commit f61c232371f6f673044960b0626486b2e8e160b8 Author: Naohisa Goto Date: Wed May 12 01:11:16 2010 +0900 Updated test data K02338.orthology to follow KEGG format changes. test/data/KEGG/K02338.orthology | 232 ++++++++++++++++++++++++++++++--------- 1 files changed, 180 insertions(+), 52 deletions(-) commit 2aa060f42263392877683a47ef9bd744ef4de7f8 Author: Naohisa Goto Date: Wed May 12 01:03:18 2010 +0900 Incompatible change of Bio::KEGG::ORTHOLOGY#pathways, and added new methods * Incompatible change of Bio::KEGG::ORTHOLOGY#pathways due to the changes of KEGG ORTHOLOGY format changes: Because PATHWAY field is added, the method is changed to return a hash. The pathway method of old behavior is renamed to pathways_in_keggclass for compatibility. * New methods are added to Bio::KEGG::ORTHOLOGY: references, pathways_as_strings, pathways_as_hash, pathway_modules, pathway_modules_as_hash, pathway_modules_as_strings. lib/bio/db/kegg/orthology.rb | 41 ++++++++++++++++++++++++++++++++++++++++- 1 files changed, 40 insertions(+), 1 deletions(-) commit 2e6754f2598f66f29afb573c3dc83592089b411c Author: Naohisa Goto Date: Wed May 12 00:59:26 2010 +0900 Changed to use Bio::KEGG::Common::PathwayModulesAsHash. lib/bio/db/kegg/pathway.rb | 25 ++++++++----------------- 1 files changed, 8 insertions(+), 17 deletions(-) commit 527920da990f4374e20333d6852b810ea73ead02 Author: Naohisa Goto Date: Wed May 12 00:55:10 2010 +0900 New module Bio::KEGG::Common::PathwayModulesAsHash (internal use only) * New module Bio::KEGG::Common::PathwayModulesAsHash is added, based on Bio::KEGG::PATHWAY#pathway_modules_as_hash method. Note that the method is Bio::KEGG::* internal use only. lib/bio/db/kegg/common.rb | 22 ++++++++++++++++++++++ 1 files changed, 22 insertions(+), 0 deletions(-) commit 36041377dbafce642180eb1c664ee36ef21d3bfb Author: Naohisa Goto Date: Fri Mar 19 23:25:56 2010 +0900 New method Bio::KEGG::PATHWAY#references. * New method Bio::KEGG::PATHWAY#references. * Additional unit tests for Bio::KEGG::PATHWAY with test data. lib/bio/db/kegg/pathway.rb | 8 ++ test/data/KEGG/map00030.pathway | 37 ++++++++++ test/unit/bio/db/kegg/test_pathway.rb | 120 ++++++++++++++++++++++++++++++++- 3 files changed, 162 insertions(+), 3 deletions(-) create mode 100644 test/data/KEGG/map00030.pathway commit d743892d298786eb9e88e2a51ac9f7774785848f Author: Naohisa Goto Date: Fri Mar 19 00:06:20 2010 +0900 Improvement of Bio::KEGG::Common::References#references. * Improvement of Bio::KEGG::Common::References#references: added support for parsing "journal (year)" style. lib/bio/db/kegg/common.rb | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) commit 35807ae22c9ad9a3ce37ed5c655d1c080f8d2334 Author: Naohisa Goto Date: Fri Mar 19 00:02:10 2010 +0900 Implementation of Bio::KEGG::GENOME#references is moved. * Implementation of Bio::KEGG::GENOME#references is moved to Bio::KEGG::Common::References#references, which will be shared with Bio::KEGG::Pathway and other classes. lib/bio/db/kegg/common.rb | 61 +++++++++++++++++++++++++++++++++++++++++++- lib/bio/db/kegg/genome.rb | 62 +++++--------------------------------------- 2 files changed, 67 insertions(+), 56 deletions(-) commit 263c37a07203b87e3b33d35adef3aa3ddcf89601 Author: Naohisa Goto Date: Wed Mar 17 00:26:56 2010 +0900 Bug fix: Bio::KEGG::GENES#pathway may fail, and other parse issues due to the format changes of KEGG GENES. * Bug fix: Bio::KEGG::GENES#pathway may return unexpected value after calling pathways, pathways_as_hash or pathways_as_string methods. * Bio::KEGG::GENES#eclinks, Bio::KEGG::Common::PathwaysAsHash, and Bio::KEGG::Common::OrthologsAsHash are modified due to the file format changes of KEGG::GENES. lib/bio/db/kegg/common.rb | 9 +++++---- lib/bio/db/kegg/genes.rb | 17 +++++++++++------ 2 files changed, 16 insertions(+), 10 deletions(-) commit 364cd405a10d0742091281c5a16b77cb54a8087e Author: Naohisa Goto Date: Wed Mar 17 00:25:51 2010 +0900 New methods Bio::Location#== and Bio::Locations#==. lib/bio/location.rb | 39 +++++++++++++++++++++++++++++++++++++++ 1 files changed, 39 insertions(+), 0 deletions(-) commit 2c7ffd6808e572cf35b82d6e74790447d44d08cc Author: Naohisa Goto Date: Wed Mar 17 00:23:00 2010 +0900 Improved unit tests for Bio::KEGG::GENES with new test data. test/data/KEGG/b0529.gene | 47 +++++++ test/unit/bio/db/kegg/test_genes.rb | 254 ++++++++++++++++++++++++++++++++++- 2 files changed, 300 insertions(+), 1 deletions(-) create mode 100644 test/data/KEGG/b0529.gene commit 764869fd42d1e3f96885b3499844bf4fadde80f1 Author: Naohisa Goto Date: Wed Mar 17 00:10:51 2010 +0900 Bug fix: Bio::KEGG::GENOME parser issues for PLASMID, REFERENCE, and ORIGINAL_DB fields. * Bug fix: Fixed parse error for PLASMID fields due to the changes of the KEGG GENOME file format. For the bug fix, tag_get and tag_cut methods are redefined. * Bug fix: Fixed parse error for REFERENCE fields due to the changes of the file format. * New method Bio::KEGG::GENOME#original_databases is added to get ORIGINAL_DB record as an Array of String objects. lib/bio/db/kegg/genome.rb | 69 +++++++++++++++++++++++++++++++++++++++----- 1 files changed, 61 insertions(+), 8 deletions(-) commit 75db7c6c7132f19e212be36d06643a0f48a7df44 Author: Naohisa Goto Date: Wed Mar 17 00:09:09 2010 +0900 New method Bio::Reference#==. lib/bio/reference.rb | 24 ++++++++++++++++++++++++ 1 files changed, 24 insertions(+), 0 deletions(-) commit 64a6bfb52ca1bb27bd38c86c060e2925f38924fb Author: Naohisa Goto Date: Wed Mar 17 00:07:43 2010 +0900 Newly added unit tests for Bio::KEGG::GENOME with test data. test/data/KEGG/T00005.genome | 140 ++++++++++++ test/data/KEGG/T00070.genome | 34 +++ test/unit/bio/db/kegg/test_genome.rb | 408 ++++++++++++++++++++++++++++++++++ 3 files changed, 582 insertions(+), 0 deletions(-) create mode 100644 test/data/KEGG/T00005.genome create mode 100644 test/data/KEGG/T00070.genome create mode 100644 test/unit/bio/db/kegg/test_genome.rb commit 21c92bb991c83dce27a4411382c456cdd6029a82 Author: Naohisa Goto Date: Tue Mar 9 23:24:15 2010 +0900 Renamed Bio::KEGG::PATHWAY#keggmodules to pathway_modules_as_strings, etc. * Bio::KEGG::PATHWAY#keggmodules is renamed to pathway_modules_as_strings. * New method pathway_modules_as_hash and its alias method pathway_modules is added. * Bio::KEGG::PATHWAY#rel_pathways is renamed to rel_pathways_as_strings. * New method rel_pathways_as_hash is added, and rel_pathways is changed to be the alias of the rel_pathways_as_hash method. * Unit tests are also changed. lib/bio/db/kegg/pathway.rb | 42 +++++++++++++++++++++++++++++++- test/unit/bio/db/kegg/test_pathway.rb | 29 ++++++++++++++++++++-- 2 files changed, 66 insertions(+), 5 deletions(-) commit fa10f38716ec2eecd6fa8e8b027085377e9ee421 Author: Naohisa Goto Date: Tue Mar 9 21:13:18 2010 +0900 Fixed text indentations. lib/bio/db/kegg/pathway.rb | 4 ++-- test/unit/bio/db/kegg/test_pathway.rb | 25 +++++++++++++------------ 2 files changed, 15 insertions(+), 14 deletions(-) commit 0916b0cac5d17ce47ef5cc3382e3167293bcf4c2 Author: Kozo Nishida Date: Tue Feb 2 17:34:17 2010 +0900 Newly added Bio::KEGG::PATHWAY with test code and test data. lib/bio/db/kegg/pathway.rb | 73 +++++++++++++++++++++++++++++++++ test/data/KEGG/map00052.pathway | 13 ++++++ test/unit/bio/db/kegg/test_pathway.rb | 57 +++++++++++++++++++++++++ 3 files changed, 143 insertions(+), 0 deletions(-) create mode 100644 lib/bio/db/kegg/pathway.rb create mode 100644 test/data/KEGG/map00052.pathway create mode 100644 test/unit/bio/db/kegg/test_pathway.rb commit c3ceea339164754071f03ce13da4f65e08230f40 Author: Naohisa Goto Date: Fri Feb 19 00:43:38 2010 +0900 Tutorial.rd.html is regenerated. doc/Tutorial.rd.html | 55 +++++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 54 insertions(+), 1 deletions(-) commit 315da0213edfece696d22cc4648cb7a74f18ad34 Author: Ra Date: Sun Feb 7 10:38:36 2010 +0100 Added BioSQL docs links doc/Tutorial.rd | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit 22374415873906f4bcd3e84950c14b5f0b6c7e61 Author: Ra Date: Sun Feb 7 02:39:16 2010 +0100 Added link to BioSQL install doc. doc/Tutorial.rd | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 4d18dd2f5a3f18348e5f4aa07b14c104d3a65f5b Author: Ra Date: Fri Feb 5 21:30:36 2010 +0100 Added other examples about BioSQL doc/Tutorial.rd | 36 ++++++++++++++++++++++++++++-------- 1 files changed, 28 insertions(+), 8 deletions(-) commit b704f01cd0799ab1a7e3975119e9d6139ddfbd51 Author: Ra Date: Wed Jan 27 21:08:25 2010 +0100 BioSQL tutorial continue... doc/Tutorial.rd | 18 ++++++++++++++++-- 1 files changed, 16 insertions(+), 2 deletions(-) commit 1993a1566b5ade937703d0291c4eaf2de673d170 Author: Ra Date: Wed Jan 27 20:32:23 2010 +0100 BioSQL tutorial inital draft. doc/Tutorial.rd | 25 ++++++++++++++++++++++++- 1 files changed, 24 insertions(+), 1 deletions(-) commit 09047b664a03492d7546d92b619faacee72d0cd5 Author: Jan Aerts Date: Sun Feb 7 17:58:59 2010 +0900 Added code example that will serve as basis for sequence/codon howto doc/howtos/sequence_codon.txt | 38 ++++++++++++++++++++++++++++++++++++++ 1 files changed, 38 insertions(+), 0 deletions(-) create mode 100644 doc/howtos/sequence_codon.txt commit c1e2165ba801cccd52135b13ed36713517e1fa8a Author: Naohisa Goto Date: Fri Feb 5 12:50:38 2010 +0900 Suppressed "warning: parenthesize argument(s) for future version" in Ruby 1.8.5. lib/bio/appl/paml/codeml/report.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit f7ce9ba6a2f4e680ee40017a21aa95d05baf34f4 Author: Naohisa Goto Date: Thu Feb 4 20:26:00 2010 +0900 Added :startdoc: and removed an empty line for RDoc. lib/bio/appl/paml/codeml/report.rb | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) commit 6d4d2e1f37efb1e53091fbc9a0977568996788ff Author: Naohisa Goto Date: Thu Feb 4 16:58:10 2010 +0900 New unit test for Bio::PAML::Codeml::Report. * New unit test for Bio::PAML::Codeml::Report and related classes. The test code is copied from the examples described in lib/bio/appl/paml/codeml/report.rb and modified for the unit test. test/unit/bio/appl/paml/codeml/test_report.rb | 253 +++++++++++++++++++++++++ 1 files changed, 253 insertions(+), 0 deletions(-) create mode 100644 test/unit/bio/appl/paml/codeml/test_report.rb commit 8418549811293c3e20b91d4e95da2cb2a282a064 Author: Naohisa Goto Date: Thu Feb 4 16:47:37 2010 +0900 Changes due to the rename to report_single.rb. .../bio/appl/paml/codeml/test_report_single.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 7bfb428da237709b243c8d0e4646bd41710d1519 Author: Naohisa Goto Date: Thu Feb 4 16:39:10 2010 +0900 Renamed codeml/test_report.rb to codeml/test_report_single.rb. * Renamed test/unit/bio/appl/paml/codeml/test_report.rb to test_report_single.rb. test/unit/bio/appl/paml/codeml/test_report.rb | 46 -------------------- .../bio/appl/paml/codeml/test_report_single.rb | 46 ++++++++++++++++++++ 2 files changed, 46 insertions(+), 46 deletions(-) delete mode 100644 test/unit/bio/appl/paml/codeml/test_report.rb create mode 100644 test/unit/bio/appl/paml/codeml/test_report_single.rb commit 762d38b1564da7d846e3dcd461cf465aa685a1ae Author: Pjotr Prins Date: Tue Jan 12 10:13:35 2010 +0100 Modified output of Bio::PAML::Codeml::PositiveSites#graph_to_s * Modified output of Bio::PAML::Codeml::PositiveSites#graph_to_s. (Part of commit ea350da85e5db2ba35cb8dd1e86e3d4323ee3fd1. Original commit message is: HtmlPaml: fixed some missing output use real greek omega in output) lib/bio/appl/paml/codeml/report.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit f88645cd783b7027950133c0badb0a8da8e4fb95 Author: Pjotr Prins Date: Tue Jan 12 09:24:46 2010 +0100 Codeml: no negative gaps lib/bio/appl/paml/codeml/report.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 978b21cf90d0280e6e6c7d6e4fa65c49692bdd69 Author: Pjotr Prins Date: Mon Jan 11 17:31:45 2010 +0100 Codeml: always raise an error when significance can not be calculated lib/bio/appl/paml/codeml/report.rb | 15 ++++++++++----- 1 files changed, 10 insertions(+), 5 deletions(-) commit 12b5895f6f1819252d616bb0a38aa88a7828daff Author: Pjotr Prins Date: Mon Jan 11 17:22:34 2010 +0100 Codeml: oops lib/bio/appl/paml/codeml/report.rb | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) commit a8ff0a07fdbef72f72103f0bceb9c24a63162fc6 Author: Pjotr Prins Date: Mon Jan 11 17:19:26 2010 +0100 Codeml: added significance testing for a few model combinations lib/bio/appl/paml/codeml/report.rb | 57 +++++++++++++++++++++++++++++++++++- 1 files changed, 56 insertions(+), 1 deletions(-) commit 0e11af19450faca3568f89b23d5bd764688f75c0 Author: Pjotr Prins Date: Mon Jan 11 16:24:51 2010 +0100 Codeml: raise error instead of a 'nil' error when buffer is incomplete lib/bio/appl/paml/codeml/report.rb | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) commit 1cb2aaaa701a1613812dd479201d27c1d7dcf016 Author: Pjotr Prins Date: Mon Jan 11 14:37:52 2010 +0100 Bio::PAML::Codeml::PositiveSites#graph_to_s gets fill character * Bio::PAML::Codeml::PositiveSites#graph_to_s gets fill character as an argument. (Part of commit d67259c9f203dc92c68ad04b4112329a7093a259. Original commit message is: HtmlPaml: show colors for probabilities of positive selection) lib/bio/appl/paml/codeml/report.rb | 15 +++++++++------ 1 files changed, 9 insertions(+), 6 deletions(-) commit ae5b9cf9ee697cc237c77335b12b57709a0e7a46 Author: Pjotr Prins Date: Mon Jan 11 13:37:52 2010 +0100 Codeml: return correct buffer lib/bio/appl/paml/codeml/report.rb | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-) commit 44f2e28c3e0d382505b067ec3c7aa55cbb9f0a38 Author: Pjotr Prins Date: Mon Jan 11 13:10:40 2010 +0100 Improvement of Bio::PAML::Codeml::PositiveSites#initialize, etc * Improved target analysis location detection in Bio::PAML::Codeml::PositiveSites#initialize. * Changed description inside Bio::PAML::Codeml::Report#nb_sites and sites methods. * This is part of commit e88ff474748b3295a8a4089356d3086638200d64. (Original commit message: HtmlPaml: improved output) lib/bio/appl/paml/codeml/report.rb | 20 ++++++++++++-------- 1 files changed, 12 insertions(+), 8 deletions(-) commit ee8973696d0434c591ceaffc580f1aa30fd036f9 Author: Pjotr Prins Date: Mon Jan 11 12:54:40 2010 +0100 Codeml: fixed doctests lib/bio/appl/paml/codeml/report.rb | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) commit f7bbb0859e28cb51137d0a8f8d962821eb67db91 Author: Pjotr Prins Date: Mon Jan 11 12:51:25 2010 +0100 New method Bio::PAML::Codeml::PositiveSites#to_s * New method Bio::PAML::Codeml::PositiveSites#to_s (part of the commit 82e933fd1961a2b31873bc37cbf3205adbf0a6de, original commit message: HtmlPaml: add facility for color output) lib/bio/appl/paml/codeml/report.rb | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) commit d4f3dbaf78f623d870a1a76ab1353d786e0fb73b Author: Pjotr Prins Date: Mon Jan 11 12:34:59 2010 +0100 Codeml: HtmlPaml: minor tweaks lib/bio/appl/paml/codeml/report.rb | 7 ++----- 1 files changed, 2 insertions(+), 5 deletions(-) commit 7d41b6acb41c5913622fde127a030f940a432cc5 Author: Pjotr Prins Date: Mon Jan 11 12:19:15 2010 +0100 Codeml: add short description to positive sites line lib/bio/appl/paml/codeml/report.rb | 13 +++++++++++++ 1 files changed, 13 insertions(+), 0 deletions(-) commit a9d6765b3a5d23be7e8cf59954d67cd2354e5878 Author: Pjotr Prins Date: Mon Jan 11 12:13:55 2010 +0100 Codeml: fixed bug in graph output lib/bio/appl/paml/codeml/report.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit af124dcaa2adb446e273456f8dd0b84aff9b00db Author: Pjotr Prins Date: Mon Jan 11 12:11:00 2010 +0100 Codeml: added graph_seq, which shows the AA of the first sequence at positive sites lib/bio/appl/paml/codeml/report.rb | 11 +++++++++-- 1 files changed, 9 insertions(+), 2 deletions(-) commit 1924dcd951a9e655726bc1af72626526ed223258 Author: Pjotr Prins Date: Fri Jan 8 10:13:37 2010 +0100 Codeml: added :stopdoc: directive for rdoc lib/bio/appl/paml/codeml/report.rb | 21 ++++++++++++--------- 1 files changed, 12 insertions(+), 9 deletions(-) commit 7b68fd9d785723935f90144f67999fcf74bcc7c0 Author: Pjotr Prins Date: Fri Jan 8 10:01:49 2010 +0100 Codeml: fixed the doctests and added some info. all tests pass & lib/bio/appl/paml/codeml/report.rb | 43 ++++++++++++++++++++++++----------- 1 files changed, 29 insertions(+), 14 deletions(-) commit ce212c507e9e81120e7ad12be6df955e90d0ad33 Author: Pjotr Prins Date: Fri Jan 8 09:46:05 2010 +0100 Codeml: exclude TestFile class from RDoc generated documentation lib/bio/appl/paml/codeml/report.rb | 9 ++++++--- 1 files changed, 6 insertions(+), 3 deletions(-) commit 9d64a70b628cb3cda9cc1576b9966dae242e5230 Author: Pjotr Prins Date: Fri Jan 8 09:39:02 2010 +0100 Codeml: added many comments lib/bio/appl/paml/codeml/report.rb | 87 +++++++++++++++++++++++++++++------ 1 files changed, 72 insertions(+), 15 deletions(-) commit 2b92df64016865a2ab40c93650409cfd67a2a98e Author: Pjotr Prins Date: Mon Jan 4 17:57:32 2010 +0100 codeml: Added parser for full Bayesian sites all tests pass & lib/bio/appl/paml/codeml/report.rb | 52 ++++++++++++++++++++++++++++++----- 1 files changed, 44 insertions(+), 8 deletions(-) commit 198f0c014f7993fbe22825d0be67e9b1aa19d2de Author: Pjotr Prins Date: Mon Jan 4 17:37:15 2010 +0100 codeml: show graph lib/bio/appl/paml/codeml/report.rb | 46 ++++++++++++++++++++++++++++++++++- 1 files changed, 44 insertions(+), 2 deletions(-) commit 0843bb4d79dc4d94a22d79a797a39dc2866222c5 Author: Pjotr Prins Date: Mon Jan 4 17:09:27 2010 +0100 Codeml: added full support for positive selection sites doctests + unit tests pass & lib/bio/appl/paml/codeml/report.rb | 150 ++++++++++++++++++++++++++++-------- 1 files changed, 118 insertions(+), 32 deletions(-) commit 1610c5d86cddd53f7f0300d0f7a137daaa61ef94 Author: Pjotr Prins Date: Mon Jan 4 12:33:20 2010 +0100 codeml: added M3 classes lib/bio/appl/paml/codeml/report.rb | 28 +++++++++++++++++++++++++--- 1 files changed, 25 insertions(+), 3 deletions(-) commit efc939fbd300ceb371b342172382cdec9fcc74b7 Author: Pjotr Prins Date: Mon Jan 4 12:02:47 2010 +0100 codeml: adding compatibility layer for single model (old type) unit tests pass & lib/bio/appl/paml/codeml/report.rb | 44 +++++++++++++++++++++++++++++++----- 1 files changed, 38 insertions(+), 6 deletions(-) commit b89990daa3ea26a9f9195ec16044fb2070bcdd1a Author: Pjotr Prins Date: Sun Jan 3 12:34:53 2010 +0100 Implementation parsing one model - doctests for M0 pass lib/bio/appl/paml/codeml/report.rb | 90 ++++++++++++++++++++++++++++++++---- 1 files changed, 81 insertions(+), 9 deletions(-) commit dce447d3e81e738323e6fb6b2d28324e1fa62e7d Author: Pjotr Prins Date: Sun Jan 3 11:27:54 2010 +0100 Codeml: use BioTestFile for locating test data in the doctest lib/bio/appl/paml/codeml/report.rb | 9 ++++++++- 1 files changed, 8 insertions(+), 1 deletions(-) commit 27a7b558d60b7ec127df2c351542433c321704ac Author: Pjotr Prins Date: Sat Jan 2 23:36:47 2010 +0100 Codeml: split new type report and old type report lib/bio/appl/paml/codeml/report.rb | 19 +++++++++++++++---- 1 files changed, 15 insertions(+), 4 deletions(-) commit 027808e4723ca77af3e15b461ddcc09faf692732 Author: Pjotr Prins Date: Sat Jan 2 23:25:22 2010 +0100 Added example files for PAML codeml dual model runs test/data/paml/codeml/models/aa.aln | 26 ++ test/data/paml/codeml/models/aa.dnd | 13 + test/data/paml/codeml/models/aa.ph | 13 + test/data/paml/codeml/models/alignment.phy | 49 ++++ test/data/paml/codeml/models/results0-3.txt | 312 ++++++++++++++++++++++++ test/data/paml/codeml/models/results7-8.txt | 340 +++++++++++++++++++++++++++ 6 files changed, 753 insertions(+), 0 deletions(-) create mode 100644 test/data/paml/codeml/models/aa.aln create mode 100644 test/data/paml/codeml/models/aa.dnd create mode 100644 test/data/paml/codeml/models/aa.ph create mode 100644 test/data/paml/codeml/models/alignment.phy create mode 100644 test/data/paml/codeml/models/results0-3.txt create mode 100644 test/data/paml/codeml/models/results7-8.txt commit 1d35e616ce411bf643ab6dcb7126a6e1aca1e186 Author: Pjotr Prins Date: Sat Jan 2 17:04:56 2010 +0100 Codeml::Report Added new description and reference lib/bio/appl/paml/codeml/report.rb | 113 ++++++++++++++++++++++++++++++------ 1 files changed, 96 insertions(+), 17 deletions(-) commit d21b26044e776fab44dbc95f181afd04b67abe28 Author: Naohisa Goto Date: Mon Feb 1 22:31:21 2010 +0900 Bug fix and Ruby 1.9 support: Bio::Command.call_command_fork etc. * Bug fix: In Bio::Command.call_command_fork, thread switching is disabled in the child process. Thanks to Andrew Grimm who reports the bug ([BioRuby] Thread-safety of alignment). Note that call_command_fork no longer works in Ruby 1.9 because it is changed to use Thread.critical which is removed in Ruby 1.9. * Ruby 1.9 support: In Ruby 1.9, Bio::Command.call_command_popen bypasses shell execution by passing command-line as an Array, which is a new feature added in Ruby 1.9. Now, call_command_popen is safe and robust enough with Ruby 1.9. * Ruby 1.9 support: In Ruby 1.9, Bio::Command.call_command and query_command use call_command_popen and query_command_popen, respectively. * RDoc for the above and related methods are modified. lib/bio/command.rb | 80 +++++++++++++++++++++++++++++++---- test/functional/bio/test_command.rb | 4 ++ 2 files changed, 76 insertions(+), 8 deletions(-) commit 981dc1c89049bf00e56a9e83ef352cb4c4b45d6a Author: Naohisa Goto Date: Tue Feb 2 22:47:36 2010 +0900 Bug fix: Bio::FastaNumericFormat#to_biosequence bug fix * Bug fix: New method Bio::FastaNumericFormat#to_biosequence is defined to avoid NomethodError occurred in the superclass'es method. For the purpose, a new module Bio::Sequence::Adapter::FastaNumericFormat is added. Thanks to Hiroyuki MISHIMA who reports the bug ([BioRuby] trouble on the FASTA.QUAL format (Bio::FastaNumericFormat)). * Newly added unit test for Bio::FastaNumericFormat#to_biosequence. lib/bio/db/fasta/qual.rb | 24 ++++++++++++++++++++++++ lib/bio/db/fasta/qual_to_biosequence.rb | 29 +++++++++++++++++++++++++++++ lib/bio/sequence/adapter.rb | 1 + test/unit/bio/db/test_qual.rb | 11 +++++++++-- 4 files changed, 63 insertions(+), 2 deletions(-) create mode 100644 lib/bio/db/fasta/qual_to_biosequence.rb commit 29ed6870e453f54aac2ce9dcb7891186eb01c40d Author: Ben J Woodcroft Date: Wed Jan 13 14:38:13 2010 +1000 Bug fix: fixed uniprot GN parsing issue lib/bio/db/embl/sptr.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit a3002a79ec012559f5847ba8ebe4faf6e7fa609e Author: Naohisa Goto Date: Fri Jan 8 22:14:15 2010 +0900 Tutorial.rd.html is regenerated. doc/Tutorial.rd.html | 29 ++++++++++++++--------------- 1 files changed, 14 insertions(+), 15 deletions(-) commit 9238c3cb0e8f1156d23a5dfb3ce4e299a91b9f23 Author: Pjotr Prins Date: Fri Jan 8 09:04:23 2010 +0100 Tutorial: removed bad links doc/Tutorial.rd | 10 +--------- 1 files changed, 1 insertions(+), 9 deletions(-) commit 60542fd9863c5fc1240a15cc76f8fa90644a15c8 Author: Naohisa Goto Date: Wed Jan 6 20:38:25 2010 +0900 Changed header and the depth of loading helper due to the rename. test/unit/bio/appl/clustalw/test_report.rb | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit 68924e736df76fe3c77d9fe132b6df01fc0621fe Author: Naohisa Goto Date: Wed Jan 6 20:34:10 2010 +0900 Renamed test/unit/bio/db/test_clustalw.rb to test/unit/bio/appl/clustalw/test_report.rb. test/unit/bio/appl/clustalw/test_report.rb | 61 ++++++++++++++++++++++++++++ test/unit/bio/db/test_clustalw.rb | 61 ---------------------------- 2 files changed, 61 insertions(+), 61 deletions(-) create mode 100644 test/unit/bio/appl/clustalw/test_report.rb delete mode 100644 test/unit/bio/db/test_clustalw.rb commit 8368eee50de51f6218ffc7b1bf1aad332702c4ba Author: Pjotr Prins Date: Tue Jan 5 12:54:43 2010 +0100 Clustal: unit tests according to Naohisa lib/bio/appl/clustalw/report.rb | 6 +++--- test/unit/bio/db/test_clustalw.rb | 10 +++++----- 2 files changed, 8 insertions(+), 8 deletions(-) commit ad525a01fa17052e9b7e9b7f30639c48596552ba Author: Pjotr Prins Date: Tue Jan 5 12:50:17 2010 +0100 Clustal: unit test uses File.read test/unit/bio/db/test_clustalw.rb | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit 7ab517e05cc470b9ca57273092599adb8c00dc11 Author: Pjotr Prins Date: Tue Jan 5 12:49:21 2010 +0100 Clustal: unit test, changed class name and copyright header test/unit/bio/db/test_clustalw.rb | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit 0829ee91a97976eb6671a2feec7edfc524f44b2c Author: Pjotr Prins Date: Tue Jan 5 12:46:37 2010 +0100 Clustal: Changed [] to get_sequence, with method description * Clustal: Added copyright. * Changed [] to get_sequence, with method description. lib/bio/appl/clustalw/report.rb | 12 ++++++++++-- 1 files changed, 10 insertions(+), 2 deletions(-) commit 3b8968a6b7b98e0f03b0822849594262a8f4ac99 Author: Pjotr Prins Date: Sun Dec 27 16:44:30 2009 +0100 ClustalW: Added [] method to reach sequence + definition lib/bio/appl/clustalw/report.rb | 9 +++++++++ test/unit/bio/db/test_clustalw.rb | 6 ++---- 2 files changed, 11 insertions(+), 4 deletions(-) commit 3926fabbcc0636c6e4ed08233af3d647c620cd5b Author: Pjotr Prins Date: Sun Dec 27 16:22:38 2009 +0100 ClustalW: Add ALN parser unit test test/data/clustalw/example1.aln | 58 ++++++++++++++++++++++++++++++++++ test/unit/bio/db/test_clustalw.rb | 63 +++++++++++++++++++++++++++++++++++++ 2 files changed, 121 insertions(+), 0 deletions(-) create mode 100644 test/data/clustalw/example1.aln create mode 100644 test/unit/bio/db/test_clustalw.rb commit 2ef97f945b122dc279eb0ec0a34a2adb0c5f0cff Author: Pjotr Prins Date: Sat Jan 2 13:24:33 2010 +0100 Tutorial: Fixed URLs doc/Tutorial.rd | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-) commit 567ca8b010e15cbea9398ee74c78eae01fc6671d Author: Pjotr Prins Date: Fri Jan 1 12:08:50 2010 +0100 Tutorial: Added info on gem install doc/Tutorial.rd | 9 +++++++-- 1 files changed, 7 insertions(+), 2 deletions(-) commit 21070ab4928d9c7446d58f3003d43ee6235046aa Author: Pjotr Prins Date: Thu Dec 31 11:41:54 2009 +0100 Tutorial.rd: Added Naohisa's Ruby replacement for sed conversion doc/Tutorial.rd | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) commit ebded2364f716fa03b0fdbec9887f807836eb789 Author: Naohisa Goto Date: Wed Jan 6 10:59:39 2010 +0900 Bio::BIORUBY_EXTRA_VERSION is changed to ".5000". bioruby.gemspec | 2 +- lib/bio/version.rb | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) commit a1bda9088662edec55af0106b4292c39e51c8b7b Author: Naohisa Goto Date: Mon Dec 28 21:56:33 2009 +0900 BioRuby 1.4.0 is released. ChangeLog | 32 ++++++++++++++++++++++++++++++++ bioruby.gemspec | 3 ++- 2 files changed, 34 insertions(+), 1 deletions(-) commit 5c88896357e1eff0686ceb06cbec0a7837f85050 Author: Naohisa Goto Date: Mon Dec 28 21:55:41 2009 +0900 Preparation for bioruby-1.4.0 release. lib/bio/version.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 11f56d3d8efc2cf5d9408da865af044fa099b925 Author: Naohisa Goto Date: Mon Dec 28 21:52:25 2009 +0900 Added about ChangeLog which is replaced by git-log. RELEASE_NOTES.rdoc | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) commit 17d5b1825b6c73d710d72903d8710caa9996353a Author: Naohisa Goto Date: Mon Dec 28 20:11:49 2009 +0900 ChangeLog is autogenerated from git log. * ChangeLog is autogenerated from git log with the following command: % git log --stat --summary \ 3d1dfcc0e13ad582b9c70c7fdde3a89d0bacdc80..HEAD > ChangeLog ChangeLog | 2306 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 2306 insertions(+), 0 deletions(-) create mode 100644 ChangeLog commit 02bf77af589ea62df81e9634df6fe949df2fd3ef Author: Naohisa Goto Date: Mon Dec 28 19:25:39 2009 +0900 test_output_size is disabled because it depends on html decorations test/functional/bio/appl/test_pts1.rb | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) commit 5781fb402e85e73fd47948b4466c8129355b714b Author: Naohisa Goto Date: Mon Dec 28 19:21:21 2009 +0900 The PTS1 Predictor's URL had been changed. * The PTS1 Predictor's URL had been changed. * Changed to use @uri instead of @host and @cgi_path. lib/bio/appl/pts1.rb | 6 ++---- 1 files changed, 2 insertions(+), 4 deletions(-) commit a4e691d913e1ae51eadb1a871efc2c8718ef5587 Author: Naohisa Goto Date: Mon Dec 28 18:33:00 2009 +0900 Preparation of ChangeLog autogeneration: old ChangeLog is moved to doc/ChangeLog-before-1.3.1. ChangeLog | 3961 -------------------------------------------- doc/ChangeLog-before-1.3.1 | 3961 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 3961 insertions(+), 3961 deletions(-) delete mode 100644 ChangeLog create mode 100644 doc/ChangeLog-before-1.3.1 commit c011604766baa3cdf5ca2f4a776aa67c56460d29 Author: Naohisa Goto Date: Mon Dec 28 17:53:51 2009 +0900 Tutorial.rd.html is regenerated. doc/Tutorial.rd.html | 70 +++++++++++++------------------------------------ 1 files changed, 19 insertions(+), 51 deletions(-) commit 6e2cdd13d61970aa4704475bfb5aefb70719c2e1 Author: Naohisa Goto Date: Mon Dec 28 17:42:25 2009 +0900 Added Bio::NCBI.default_email= in the example, and examples using deprecated Bio::PubMed methods are temporarily commented out. doc/Tutorial.rd | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) commit 8e6d5e9baf98be7e58f4dda8b5d043a42149874b Author: Naohisa Goto Date: Mon Dec 28 17:15:09 2009 +0900 Reinserted "==>" for Blast example, and removed duplicated Ruby Ensembl API example. doc/Tutorial.rd | 25 ++----------------------- 1 files changed, 2 insertions(+), 23 deletions(-) commit 849edd7e8c5b26923cab47e7f5542948fab2b1fb Author: Pjotr Prins Date: Sun Dec 27 09:49:14 2009 +0100 Tutorial: Added info on how to run rubydoctest Removed bioruby> prefix for one failing BLAST test doc/Tutorial.rd | 69 ++++++++++++++++++++++++++++++++++++++---------------- 1 files changed, 48 insertions(+), 21 deletions(-) commit a39fcf0ca1a5265789110f42cc616fc5d3c16414 Author: Naohisa Goto Date: Fri Dec 25 12:30:18 2009 +0900 Modified for release notes and fixed typo. RELEASE_NOTES.rdoc | 29 +++++++++++++++-------------- 1 files changed, 15 insertions(+), 14 deletions(-) commit 3fa8b68f19fc2b6aaf8f54eb10517cc761b2193b Author: Naohisa Goto Date: Fri Dec 25 12:10:34 2009 +0900 Changes following the rename to RELEASE_NOTES.rdoc. README.rdoc | 2 +- bioruby.gemspec | 6 +++--- bioruby.gemspec.erb | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) commit fd692a1165d368b9bdbe068ea6bf63fd91c9925c Author: Naohisa Goto Date: Fri Dec 25 12:03:41 2009 +0900 Renamed doc/Changes-1.4.rdoc to RELEASE_NOTES.rdoc. RELEASE_NOTES.rdoc | 160 ++++++++++++++++++++++++++++++++++++++++++++++++++ doc/Changes-1.4.rdoc | 160 -------------------------------------------------- 2 files changed, 160 insertions(+), 160 deletions(-) create mode 100644 RELEASE_NOTES.rdoc delete mode 100644 doc/Changes-1.4.rdoc commit 0e37f04dd8d34517693fdd4bc27f8bdada7c2f13 Author: Naohisa Goto Date: Thu Dec 24 21:48:52 2009 +0900 Changed Bio::PhyloXML::Parser.new to open, and regenerated html. doc/Tutorial.rd | 10 ++-- doc/Tutorial.rd.html | 125 ++++++++++++++++++++++++++++++++++++++++++------- 2 files changed, 112 insertions(+), 23 deletions(-) commit aeacbbd425c2e88369c171bd92c60bf8e520a9e5 Author: Naohisa Goto Date: Thu Dec 24 19:26:49 2009 +0900 bioruby.gemspec is regenerated bioruby.gemspec | 8 +++++++- 1 files changed, 7 insertions(+), 1 deletions(-) commit 1034205c199a638c359780922293f8b39c467356 Author: Naohisa Goto Date: Thu Dec 24 19:24:56 2009 +0900 Version number changed to 1.4.0-rc1 lib/bio/version.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 04bf2da43f78fbb702b67323f3be1fe3bd2d0351 Author: Naohisa Goto Date: Thu Dec 24 19:22:41 2009 +0900 Issues added and modified. KNOWN_ISSUES.rdoc | 35 +++++++++++++++++++++++++++++++++-- 1 files changed, 33 insertions(+), 2 deletions(-) commit f1a76157b009fb0ca94d9a0e0f8a85522c383b19 Author: Naohisa Goto Date: Thu Dec 24 19:22:19 2009 +0900 Added news and incompatible changes. doc/Changes-1.4.rdoc | 102 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 98 insertions(+), 4 deletions(-) commit 9c8ef18a20c49f17d5b89aa1db5819b2c8ee9b1d Author: Naohisa Goto Date: Thu Dec 24 19:10:02 2009 +0900 Email address for NCBI Entrez is given with Bio::NCBI.default_email=. bin/bioruby | 5 ++++- sample/demo_ncbi_rest.rb | 2 ++ sample/demo_pubmed.rb | 2 ++ sample/pmfetch.rb | 2 ++ sample/pmsearch.rb | 2 ++ test/functional/bio/io/test_pubmed.rb | 4 ++++ 6 files changed, 16 insertions(+), 1 deletions(-) commit 7a7179665694da35ab0970909bfbda9ad1b057da Author: Naohisa Goto Date: Thu Dec 24 19:09:09 2009 +0900 Changed autoload hierarchy of Bio::NCBI. lib/bio.rb | 10 ++++++---- lib/bio/io/ncbisoap.rb | 3 ++- 2 files changed, 8 insertions(+), 5 deletions(-) commit f8dc0268d9edf699fd3f0cf18dd55a2b10ec3bcc Author: Naohisa Goto Date: Thu Dec 24 18:58:18 2009 +0900 New singleton methods Bio::NCBI.default_email=, default_tool=, etc. * New singleton methods Bio::NCBI.default_email=, default_email, default_tools=, default_tools, etc., because email and tool parameters will be mandatory in Entrez eUtils. * Changed to raise error when email or tool is empty. Note that default email is nil and library users should always set their email address. * Default tool name is changed to include $0 and bioruby version ID. * Added multi-thread support for Bio::NCBI::REST#ncbi_access_wait. lib/bio/io/ncbirest.rb | 161 ++++++++++++++++++++++++++++++++++++++--------- 1 files changed, 130 insertions(+), 31 deletions(-) commit 2e311dc44290ef6bda48f0bcba09a3c22bf32d9a Author: Naohisa Goto Date: Mon Dec 21 22:24:52 2009 +0900 Description about the incompatible change of Bio::KEGG::REACTION#rpairs. doc/Changes-1.4.rdoc | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) commit d57ace3a89077caae3c681743da4b92d16b90af8 Author: Naohisa Goto Date: Mon Dec 21 22:17:46 2009 +0900 Bio::KEGG::R#ACTION#rpairs is changed to return a hash. lib/bio/db/kegg/reaction.rb | 65 ++++++++++++++++++++++++-------- test/unit/bio/db/kegg/test_reaction.rb | 27 ++++++++++++- 2 files changed, 74 insertions(+), 18 deletions(-) commit 60e4c77d184ee81c51668b446518cfbc9256be50 Author: Naohisa Goto Date: Mon Dec 21 22:15:44 2009 +0900 Document bug fix: return value mistake. lib/bio/db/kegg/genes.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 6376dd55aa4995769746e556ca719d37f02975d6 Author: Naohisa Goto Date: Sun Dec 20 17:32:52 2009 +0900 Added README.txt for FASTQ example data. test/data/fastq/README.txt | 109 ++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 109 insertions(+), 0 deletions(-) create mode 100644 test/data/fastq/README.txt commit 8dec18794c846726733d66c5a22170f5b2c4bb1a Author: Naohisa Goto Date: Tue Dec 15 13:51:13 2009 +0900 Newly added unit tests for Bio::KEGG::GLYCAN with test data. test/data/KEGG/G00024.glycan | 47 ++++++ test/data/KEGG/G01366.glycan | 18 +++ test/unit/bio/db/kegg/test_glycan.rb | 260 ++++++++++++++++++++++++++++++++++ 3 files changed, 325 insertions(+), 0 deletions(-) create mode 100644 test/data/KEGG/G00024.glycan create mode 100644 test/data/KEGG/G01366.glycan create mode 100644 test/unit/bio/db/kegg/test_glycan.rb commit 90b97bfbcfb3f7e3d5c28b195bdb9b9c058df887 Author: Naohisa Goto Date: Tue Dec 15 11:42:39 2009 +0900 Newly added unit test for Bio::KEGG::DRUG with test data. test/data/KEGG/D00063.drug | 104 +++++++++++++++++++ test/unit/bio/db/kegg/test_drug.rb | 194 ++++++++++++++++++++++++++++++++++++ 2 files changed, 298 insertions(+), 0 deletions(-) create mode 100644 test/data/KEGG/D00063.drug create mode 100644 test/unit/bio/db/kegg/test_drug.rb commit 443f778795b82a7f572cb8b85d2a8a8b3cea1334 Author: Naohisa Goto Date: Tue Dec 15 11:38:59 2009 +0900 New method Bio::KEGG::DRUG#products * New method Bio::KEGG::DRUG#products. * Improved RDoc. lib/bio/db/kegg/drug.rb | 50 +++++++++++++++++++++++++++++++++++++--------- 1 files changed, 40 insertions(+), 10 deletions(-) commit 48184d96b989f909ac0effb759cbc4b1ddc98dd1 Author: Naohisa Goto Date: Fri Dec 11 01:36:54 2009 +0900 Methods in Bio::KEGG::Common::* are changed to cache return values in instance variables. lib/bio/db/kegg/common.rb | 62 ++++++++++++++++++++++++++------------------ 1 files changed, 37 insertions(+), 25 deletions(-) commit f364ea609f1e01ca5270a5bd7404e0bbf752bc89 Author: Naohisa Goto Date: Fri Dec 11 01:23:42 2009 +0900 Version is changed to 1.4.0-alpha1, and bioruby.gemspec is regenerated. bioruby.gemspec | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++- bioruby.gemspec.erb | 4 +- lib/bio/version.rb | 4 +- 3 files changed, 145 insertions(+), 5 deletions(-) commit 096b5fbf6b7ff906203aabf93eb9a0bd56ae9ba2 Author: Naohisa Goto Date: Fri Dec 11 01:22:59 2009 +0900 Added documents about Bio::KEGG incompatible changes. doc/Changes-1.4.rdoc | 48 ++++++++++++++++++++++++++++++++++++++++++------ 1 files changed, 42 insertions(+), 6 deletions(-) commit 72ed277fe30bb1033cbc16d462f137510afb84e6 Author: Naohisa Goto Date: Fri Dec 11 01:21:26 2009 +0900 Newly added unit tests for Bio::KEGG::ENZYME with test data. test/data/KEGG/1.1.1.1.enzyme | 935 ++++++++++++++++++++++++++++++++++ test/unit/bio/db/kegg/test_enzyme.rb | 241 +++++++++ 2 files changed, 1176 insertions(+), 0 deletions(-) create mode 100644 test/data/KEGG/1.1.1.1.enzyme create mode 100644 test/unit/bio/db/kegg/test_enzyme.rb commit b99fcb39f7c5d2857cbb65283d85ea868ae8561d Author: Naohisa Goto Date: Fri Dec 11 01:09:03 2009 +0900 Changed Bio::KEGG::*#dblinks, pathways, orthologs, genes methods. * In Bio::KEGG::COMPOUND, DRUG, ENZYME, GLYCAN and ORTHOLOGY, the method dblinks is changed to return a Hash. The old methods are renamed to dblinks_as_strings. * In Bio::KEGG::COMPOUND, DRUG, ENZYME, GENES, GLYCAN and REACTION, the method pathways is changed to return a Hash. The old methods are renamed to pathways_as_strings except for GENES. * In Bio::KEGG::ENZYME, GENES, GLYCAN and REACTION, the method orthologs is changed to return a Hash. The old methods are renamed to orthologs_as_strings. * Bio::KEGG::ENZYME#genes and Bio::KEGG::ORTHOLOGY#genes is changed to return a Hash. The old methods are renamed to genes_as_strings. * Added Bio::KEGG::REACTION#rpairs_as_tokens, older behavior of rpairs. * Modules in lib/bio/db/kegg/common.rb are moved uner Bio::KEGG::Common namespace. * Refactoring. * Added documents. * Tests modified. lib/bio/db/kegg/common.rb | 40 +++++++++++++++++++++++++------ lib/bio/db/kegg/compound.rb | 10 ++++--- lib/bio/db/kegg/drug.rb | 27 +++++++++++++++------ lib/bio/db/kegg/enzyme.rb | 31 ++++++++++++++++++++---- lib/bio/db/kegg/genes.rb | 39 +++++++++++++++++++------------ lib/bio/db/kegg/glycan.rb | 22 +++++++++++++++-- lib/bio/db/kegg/orthology.rb | 25 +++++++------------ lib/bio/db/kegg/reaction.rb | 16 +++++++++--- test/unit/bio/db/kegg/test_compound.rb | 27 ++++++++++++-------- test/unit/bio/db/kegg/test_reaction.rb | 13 +++++---- 10 files changed, 170 insertions(+), 80 deletions(-) commit 2cc9d4e2f28f6b2bbcb8f714f9e2eb144c594fbf Author: Naohisa Goto Date: Thu Dec 10 16:02:54 2009 +0900 Bio::KEGG::GENES#structure no more adds PDB: prefix. * Bio::KEGG::GENES#structure no more adds PDB: prefix. * Added Bio::KEGG::GENES#structures as an alias of structure. lib/bio/db/kegg/genes.rb | 7 +++---- test/unit/bio/db/kegg/test_genes.rb | 7 ++++--- 2 files changed, 7 insertions(+), 7 deletions(-) commit a8ceb23bdf19d6649aa4d879cba76a9e3f91d1d4 Author: Naohisa Goto Date: Thu Dec 10 15:28:33 2009 +0900 Refactoring of Bio::KEGG::Orthology#dblinks and genes. * Refactoring of Bio::KEGG::Orthology#dblinks and genes: no need to treat @data because lines_fetch internally does so. lib/bio/db/kegg/orthology.rb | 10 ++-------- 1 files changed, 2 insertions(+), 8 deletions(-) commit 720e0bccdfdc6fac6222cac1a9f05d6e2419896c Author: Naohisa Goto Date: Wed Dec 9 16:39:03 2009 +0900 Changed dummy lines for RDoc. lib/bio/db/kegg/compound.rb | 4 ++-- lib/bio/db/kegg/orthology.rb | 2 +- lib/bio/db/kegg/reaction.rb | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) commit 20f8c03af92e5cfedcb49e8ed9fc6fda2b86e9c9 Author: Naohisa Goto Date: Wed Dec 9 15:17:39 2009 +0900 Refactoring of Bio::KEGG::REACTION#orthologs. * Refactoring of Bio::KEGG::REACTION#orthologs: no need to treat @data because lines_fetch internally does so. lib/bio/db/kegg/reaction.rb | 5 +---- 1 files changed, 1 insertions(+), 4 deletions(-) commit b924601bacd643f66b37dd991913e6862df704a9 Author: Naohisa Goto Date: Sun Dec 6 15:51:03 2009 +0900 Bio::KEGG::GENES#pathways is changed to return raw lines as an Array of strings. * Bio::KEGG::GENES#pathways is changed to return raw lines as an Array of strings. * RDoc is added for Bio::KEGG::GENES. lib/bio/db/kegg/genes.rb | 99 ++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 96 insertions(+), 3 deletions(-) commit 4c840dc6a539db1d854b23991269b3e6515f637e Author: Kozo Nishida Date: Wed Dec 2 17:02:00 2009 +0900 Added test methods. test/unit/bio/db/kegg/test_compound.rb | 47 ++++++++++++++++++++++++++++++++ 1 files changed, 47 insertions(+), 0 deletions(-) commit 105efa1ecd1bc99a54aac32710a97df15035119d Author: Naohisa Goto Date: Wed Dec 2 23:31:07 2009 +0900 Refactoring: to use lib/bio/db/kegg/common.rb for dblinks_as_hash method. lib/bio/db/kegg/orthology.rb | 16 +++++----------- 1 files changed, 5 insertions(+), 11 deletions(-) commit c394ead051c3a13ceb534f93816af7ad35be932a Author: Naohisa Goto Date: Wed Dec 2 23:07:23 2009 +0900 Bio::KEGG::REACTION#orthologies is renamed to orthologs_as_hash with changing its return value to a hash. * Bio::KEGG::REACTION#orthologies is renamed to orthologs_as_hash with changing its return value to a hash. * The code of the orthologs_as_hash method is moved to lib/bio/db/kegg/common.rb. * Added new method Bio::KEGG::REACTION#orthologs, copied from lib/bio/db/kegg/glycan.rb. lib/bio/db/kegg/common.rb | 18 +++++++++++++++++- lib/bio/db/kegg/reaction.rb | 14 ++++++-------- sample/demo_kegg_reaction.rb | 6 ++++-- test/unit/bio/db/kegg/test_reaction.rb | 12 ++++++++++-- 4 files changed, 37 insertions(+), 13 deletions(-) commit 4e01fda27166faf066104ab9897904fd46f57123 Author: Naohisa Goto Date: Wed Dec 2 22:48:06 2009 +0900 Added Bio::KEGG::REACTION#pathways_as_hash and reverted pathways method. * New method Bio::KEGG::REACTION#pathways_as_hash, using a module in lib/bio/db/kegg/common.rb. * Bio::KEGG::REACTION#pathways is reverted to return an array of string. lib/bio/db/kegg/reaction.rb | 18 +++++++++++------- test/unit/bio/db/kegg/test_reaction.rb | 8 +++++++- 2 files changed, 18 insertions(+), 8 deletions(-) commit 0c2ce4b8462792d496ab3f58206fdbd47143e280 Author: Naohisa Goto Date: Wed Dec 2 22:35:21 2009 +0900 New methods Bio::KEGG::COMPOUND#dblinks_as_hash and pathways_as_hash, using modules in lib/bio/db/kegg/common.rb. lib/bio/db/kegg/compound.rb | 14 +++++++++++ test/unit/bio/db/kegg/test_compound.rb | 38 ++++++++++++++++++++++++++++++++ 2 files changed, 52 insertions(+), 0 deletions(-) commit 63df07e030120eb43de22555277529822b072270 Author: Naohisa Goto Date: Wed Dec 2 22:25:20 2009 +0900 Methods commonly used from Bio::KEGG::* classes. * Modules containing methods commonly used from Bio::KEGG::* classes. The "dblinks_as_hash" method is copied from lib/bio/db/kegg/orthology.rb. The "pathways_as_hash" method is derived from the dblinks_as_hash and Bio::KEGG::REACTION#pathways methods. lib/bio/db/kegg/common.rb | 60 +++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 60 insertions(+), 0 deletions(-) create mode 100644 lib/bio/db/kegg/common.rb commit 0e55c6701b09a52356ac55300181ee656773826f Author: Naohisa Goto Date: Wed Dec 2 21:39:06 2009 +0900 Bio::KEGG::COMPOUND#dblinks is reverted to return an array of string. lib/bio/db/kegg/compound.rb | 11 ++--------- test/unit/bio/db/kegg/test_compound.rb | 8 +++++++- 2 files changed, 9 insertions(+), 10 deletions(-) commit a05adcddf6c7ed67c042f31ecd86848af1ba8a22 Author: Naohisa Goto Date: Wed Dec 2 21:13:39 2009 +0900 Bug fix: fixed a copy-and-paste mistake. lib/bio/db/kegg/drug.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 86925f3c80730e3ea3377a23a70cadb3876258c4 Author: Naohisa Goto Date: Tue Dec 1 21:31:40 2009 +0900 Bio::KEGG::ORTHOLOGY#dblinks_as_hash should preserve database names. doc/Changes-1.4.rdoc | 4 ++++ lib/bio/db/kegg/orthology.rb | 2 +- test/unit/bio/db/kegg/test_orthology.rb | 2 +- 3 files changed, 6 insertions(+), 2 deletions(-) commit 60847cd2d0701fa38a499578649cb216c93993a2 Author: Naohisa Goto Date: Tue Dec 1 20:41:51 2009 +0900 Test class names are changed to avoid potential class name conflict. test/unit/bio/db/kegg/test_compound.rb | 2 +- test/unit/bio/db/kegg/test_genes.rb | 4 ++-- test/unit/bio/db/kegg/test_orthology.rb | 2 +- test/unit/bio/db/kegg/test_reaction.rb | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) commit 2bda62af7a020c22379dd9ec3a42496d2a5b94cb Author: Kozo Nishida Date: Tue Dec 1 04:38:27 2009 +0900 Added unit tests for Bio::KEGG::ORTHOLOGY. test/data/KEGG/K02338.orthology | 902 +++++++++++++++++++++++++++++++ test/unit/bio/db/kegg/test_orthology.rb | 50 ++ 2 files changed, 952 insertions(+), 0 deletions(-) create mode 100644 test/data/KEGG/K02338.orthology create mode 100644 test/unit/bio/db/kegg/test_orthology.rb commit acad9497caf5d737394568e911691fdad11ca091 Author: Naohisa Goto Date: Mon Nov 30 21:39:32 2009 +0900 Changed to use BioRubyTestDataPath instead of __FILE__. test/unit/bio/db/kegg/test_compound.rb | 3 +-- test/unit/bio/db/kegg/test_reaction.rb | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) commit 8e95f3fb60cd61b2bfad8e66caf03d3ff02a6dca Author: Naohisa Goto Date: Sun Nov 29 16:37:21 2009 +0900 Bio::Fastq::QualityScore is renamed to Bio::Sequence::QualityScore. * Bio::Fastq::QualityScore is renamed to Bio::Sequence::QualityScore. * Changes of filenames due to the previous file move. lib/bio/db/fasta/format_qual.rb | 18 ++++++++-------- lib/bio/db/fastq.rb | 7 ++--- lib/bio/sequence.rb | 3 +- lib/bio/sequence/quality_score.rb | 25 +++++++++++------------ test/unit/bio/sequence/test_quality_score.rb | 28 +++++++++++++------------- 5 files changed, 40 insertions(+), 41 deletions(-) commit 2b29654c1d7e927e445e7acdd525835a873c2a2a Author: Naohisa Goto Date: Sun Nov 29 16:15:42 2009 +0900 lib/bio/db/fastq/quality_score.rb is moved to lib/bio/sequence/. The unit test is also moved. * lib/bio/db/fastq/quality_score.rb is moved to lib/bio/sequence/. * test/unit/bio/db/fastq/test_quality_score.rb is moved to test/unit/bio/sequence/. * The file contents will be modified with the following commit. lib/bio/db/fastq/quality_score.rb | 206 ---------------- lib/bio/sequence/quality_score.rb | 206 ++++++++++++++++ test/unit/bio/db/fastq/test_quality_score.rb | 330 -------------------------- test/unit/bio/sequence/test_quality_score.rb | 330 ++++++++++++++++++++++++++ 4 files changed, 536 insertions(+), 536 deletions(-) delete mode 100644 lib/bio/db/fastq/quality_score.rb create mode 100644 lib/bio/sequence/quality_score.rb delete mode 100644 test/unit/bio/db/fastq/test_quality_score.rb create mode 100644 test/unit/bio/sequence/test_quality_score.rb commit aa8d49bf31f90dd2796c18ee0aa6291979284ec2 Author: Naohisa Goto Date: Sun Nov 29 15:20:36 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_gff1.rb. lib/bio/db/gff.rb | 17 ----------------- sample/demo_gff1.rb | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 49 insertions(+), 17 deletions(-) create mode 100644 sample/demo_gff1.rb commit 76fffd2d2429346478fb3d8c88cdcd878a1047b1 Author: Naohisa Goto Date: Sun Nov 29 15:06:41 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_tmhmm_report.rb. lib/bio/appl/tmhmm/report.rb | 36 ---------------------- sample/demo_tmhmm_report.rb | 68 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 68 insertions(+), 36 deletions(-) create mode 100644 sample/demo_tmhmm_report.rb commit dfafb0a2bcec4c0b4cd3640374e151e2039056dc Author: Naohisa Goto Date: Sun Nov 29 14:59:27 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_targetp_report.rb. lib/bio/appl/targetp/report.rb | 105 +------------------------------ sample/demo_targetp_report.rb | 135 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 136 insertions(+), 104 deletions(-) create mode 100644 sample/demo_targetp_report.rb commit 75f7c8527546f8ea3079f53b90a9b4d8260b4de0 Author: Naohisa Goto Date: Sun Nov 29 14:33:28 2009 +0900 Follow-up of the SOSUI server URL change. lib/bio/appl/sosui/report.rb | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) commit 8022696295dc296462f73b40cc74ad5259bee387 Author: Naohisa Goto Date: Sun Nov 29 14:32:11 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_sosui_report.rb. lib/bio/appl/sosui/report.rb | 53 +------------------------ sample/demo_sosui_report.rb | 89 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 90 insertions(+), 52 deletions(-) create mode 100644 sample/demo_sosui_report.rb commit 4acfe7f565039b34a036682912a75f55da808b45 Author: Naohisa Goto Date: Sun Nov 29 14:02:32 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_hmmer_report.rb. lib/bio/appl/hmmer/report.rb | 100 ---------------------------- sample/demo_hmmer_report.rb | 149 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 149 insertions(+), 100 deletions(-) create mode 100644 sample/demo_hmmer_report.rb commit 4f7bd1b7628d90661d8b557ca854b14cc44fb99c Author: Naohisa Goto Date: Thu Nov 26 15:49:21 2009 +0900 Demo codes in the "if __FILE__ == $0" are removed because they are very short. lib/bio/appl/fasta/format10.rb | 14 -------------- lib/bio/appl/hmmer.rb | 16 +--------------- lib/bio/io/flatfile.rb | 8 +------- 3 files changed, 2 insertions(+), 36 deletions(-) commit c2a72d195189755532e7e206af34d152ab6332d8 Author: Naohisa Goto Date: Thu Nov 26 15:20:28 2009 +0900 Bug fix: Failure of Bio::Fasta.remote due to the remote site changes. lib/bio/appl/fasta.rb | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) commit 549112fb4dfb5f6b2fe3491fb161887a9f5262ac Author: Naohisa Goto Date: Thu Nov 26 15:13:10 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_fasta_remote.rb. lib/bio/appl/fasta.rb | 18 --------------- sample/demo_fasta_remote.rb | 51 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+), 18 deletions(-) create mode 100644 sample/demo_fasta_remote.rb commit 0e4ca0db83692fdbbe93e90272a07bcbac89192c Author: Naohisa Goto Date: Thu Nov 26 10:17:47 2009 +0900 Text indents for some comment lines are changed. sample/demo_blast_report.rb | 4 ++-- sample/demo_kegg_compound.rb | 4 ++-- sample/demo_prosite.rb | 4 ++-- sample/demo_sirna.rb | 4 ++-- 4 files changed, 8 insertions(+), 8 deletions(-) commit c0cf91fe2a9247bc3705b20515f9d4fa14288d5a Author: Naohisa Goto Date: Thu Nov 26 10:13:26 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_keggapi.rb. * Demo codes in the "if __FILE__ == $0" are moved to sample/demo_keggapi.rb. * Commented out demonstrations of deprecated methods: get_neighbors_by_gene, get_similarity_between_genes, get_ko_members, get_oc_members_by_gene, get_pc_members_by_gene. * Commented out demonstrations of methods internally using the deprecated methods: get_all_neighbors_by_gene, get_all_oc_members_by_gene, get_all_pc_members_by_gene. lib/bio/io/keggapi.rb | 442 ------------------------------------------ sample/demo_keggapi.rb | 502 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 502 insertions(+), 442 deletions(-) create mode 100644 sample/demo_keggapi.rb commit 8b8206c1d8ee699185fdd19d3329311c85ee003c Author: Naohisa Goto Date: Thu Nov 26 01:50:06 2009 +0900 Fixed the license line. lib/bio/db/prosite.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit ebfeec8243abd4e2f65335fda1ead18efff66897 Author: Naohisa Goto Date: Thu Nov 26 01:41:58 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_ncbi_rest.rb. lib/bio/io/ncbirest.rb | 101 ------------------------------------ sample/demo_ncbi_rest.rb | 128 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 128 insertions(+), 101 deletions(-) create mode 100644 sample/demo_ncbi_rest.rb commit 5a0f8379a374650d12fc88fbbd5b28c38ae96395 Author: Naohisa Goto Date: Thu Nov 26 01:33:07 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_prosite.rb. lib/bio/db/prosite.rb | 95 +------------------------------------- sample/demo_prosite.rb | 120 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 121 insertions(+), 94 deletions(-) create mode 100644 sample/demo_prosite.rb commit c560a5d0ba9d4919dbcca156ea620056dcb8f725 Author: Naohisa Goto Date: Thu Nov 26 01:14:37 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_psort.rb. lib/bio/appl/psort.rb | 111 --------------------------------------- sample/demo_psort.rb | 138 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 138 insertions(+), 111 deletions(-) create mode 100644 sample/demo_psort.rb commit 1299a55d214784a536ae3cd8bfabdfd61fe1da86 Author: Naohisa Goto Date: Thu Nov 26 01:04:29 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_psort_report.rb. * Demo codes in the "if __FILE__ == $0" are moved to sample/demo_psort_report.rb, without any checks. lib/bio/appl/psort/report.rb | 46 +--------------------------- sample/demo_psort_report.rb | 70 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+), 45 deletions(-) create mode 100644 sample/demo_psort_report.rb commit a2686fe3c5a93947c94d4602514a62a808c182d5 Author: Naohisa Goto Date: Thu Nov 26 00:53:54 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_genscan_report.rb. lib/bio/appl/genscan/report.rb | 176 ---------------------------------- sample/demo_genscan_report.rb | 202 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 202 insertions(+), 176 deletions(-) create mode 100644 sample/demo_genscan_report.rb commit 22f662ba69dd2d4a2273562dd7ea921f5cdd84bd Author: Naohisa Goto Date: Thu Nov 26 00:28:01 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_ddbjxml.rb. lib/bio/io/ddbjxml.rb | 182 +----------------------------------------- sample/demo_ddbjxml.rb | 212 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 213 insertions(+), 181 deletions(-) create mode 100644 sample/demo_ddbjxml.rb commit ed3b34b6598f632c7b9b3f1a17b42406c19ca32d Author: Naohisa Goto Date: Thu Nov 26 00:12:33 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_pubmed.rb. * Demo codes in the "if __FILE__ == $0" are moved to sample/demo_pubmed.rb. * Codes using Entrez CGI are disabled in the demo. lib/bio/io/pubmed.rb | 88 ------------------------------------- sample/demo_pubmed.rb | 116 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 116 insertions(+), 88 deletions(-) create mode 100644 sample/demo_pubmed.rb commit 9e6d720f383e88e247eacab6f0e43f38140a62f2 Author: Naohisa Goto Date: Wed Nov 25 23:59:10 2009 +0900 Demo codes in the "if __FILE__ == $0" are removed. * Demo codes in the "if __FILE__ == $0" are removed because their function have already been moved to sample/demo_blast_report.rb. lib/bio/appl/blast/format0.rb | 193 -------------------------------------- lib/bio/appl/blast/report.rb | 149 +----------------------------- lib/bio/appl/blast/wublast.rb | 208 ----------------------------------------- 3 files changed, 2 insertions(+), 548 deletions(-) commit bbba2812fa9131d01fc655eb174d84f06facd8b8 Author: Naohisa Goto Date: Wed Nov 25 23:49:36 2009 +0900 New demo code of BLAST parser based on codes in "if __FILE__ ==$0" * Newly added sample/demo_blast_report.rb, demonstration of BLAST parsers Bio::Blast::Report, Bio::Blast::Default::Report, and Bio::Blast::WU::Report. It is based on the demonstration codes in the "if __FILE__ == $0" in lib/bio/appl/blast/report.rb, lib/bio/appl/blast/format0.rb, and lib/bio/appl/blast/wublast.rb. sample/demo_blast_report.rb | 285 +++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 285 insertions(+), 0 deletions(-) create mode 100644 sample/demo_blast_report.rb commit 5235ed15db8d3ba3e59d8dc3bbbcf1b5b9c58281 Author: Naohisa Goto Date: Wed Nov 25 21:57:08 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_das.rb. * Demo codes in the "if __FILE__ == $0" are moved to sample/demo_das.rb. * Demo codes using UCSC DAS server is added. * Demo using the WormBase DAS server is temporarily disabled because it does not work well possibly because of the server trouble. lib/bio/io/das.rb | 44 ---------------------- sample/demo_das.rb | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 105 insertions(+), 44 deletions(-) create mode 100644 sample/demo_das.rb commit b7b0f7bef0505b9678673e54bb863d4ff7897dd5 Author: Naohisa Goto Date: Wed Nov 25 20:58:35 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_kegg_taxonomy.rb, although it does not work correctly now. lib/bio/db/kegg/taxonomy.rb | 53 +----------------------- sample/demo_kegg_taxonomy.rb | 92 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 93 insertions(+), 52 deletions(-) create mode 100644 sample/demo_kegg_taxonomy.rb commit 23da98ca19fce1f0b487e1f955ef4cd896839590 Author: Naohisa Goto Date: Wed Nov 25 20:11:12 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_kegg_reaction.rb. lib/bio/db/kegg/reaction.rb | 16 +---------- sample/demo_kegg_reaction.rb | 64 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+), 15 deletions(-) create mode 100644 sample/demo_kegg_reaction.rb commit 9c0bfb857a6b41d8e6a42ff2cbf7b06ca1d38d78 Author: Naohisa Goto Date: Wed Nov 25 19:12:00 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_kegg_orthology.rb. lib/bio/db/kegg/orthology.rb | 23 +-------------- sample/demo_kegg_orthology.rb | 62 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 63 insertions(+), 22 deletions(-) create mode 100644 sample/demo_kegg_orthology.rb commit 6f6f1eb3d87dea588ea333708c4d4486ac7136b6 Author: Naohisa Goto Date: Wed Nov 25 12:19:26 2009 +0900 Commented out demo for nonexistent method "bindings". sample/demo_kegg_glycan.rb | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit 98a6d904058b5af4808f16bcb710d73bd97c9764 Author: Naohisa Goto Date: Wed Nov 25 12:18:31 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_kegg_glycan.rb. lib/bio/db/kegg/glycan.rb | 21 ------------- sample/demo_kegg_glycan.rb | 72 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 72 insertions(+), 21 deletions(-) create mode 100644 sample/demo_kegg_glycan.rb commit d26e835ca9def2287f1050f1b048892e3cafdaa0 Author: Naohisa Goto Date: Wed Nov 25 11:49:05 2009 +0900 Added references. lib/bio/db/kegg/genome.rb | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) commit c3c460462481b5b8d6e9441216bcf6370b4890ef Author: Naohisa Goto Date: Wed Nov 25 11:45:31 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_kegg_genome.rb. lib/bio/db/kegg/genome.rb | 42 +------------------------ sample/demo_kegg_genome.rb | 74 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 75 insertions(+), 41 deletions(-) create mode 100644 sample/demo_kegg_genome.rb commit 0d8e709b66bf18ead5944c27a50eb6cf2c47862f Author: Naohisa Goto Date: Wed Nov 25 11:43:24 2009 +0900 Added document about downloading sample data. sample/demo_kegg_drug.rb | 13 ++++++++++++- 1 files changed, 12 insertions(+), 1 deletions(-) commit 0608893198e9bc88521b6c013069d8c7a13bb0e5 Author: Naohisa Goto Date: Wed Nov 25 00:10:48 2009 +0900 Added documents. lib/bio/db/kegg/drug.rb | 15 +++++++++++++++ 1 files changed, 15 insertions(+), 0 deletions(-) commit 0ecdc1ee0460f16dba1e4cd5ab575c92e1c6b1ac Author: Naohisa Goto Date: Wed Nov 25 00:06:02 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_kegg_drug.rb. lib/bio/db/kegg/drug.rb | 18 +-------------- sample/demo_kegg_drug.rb | 54 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 55 insertions(+), 17 deletions(-) create mode 100644 sample/demo_kegg_drug.rb commit b0c349103f01a26f4741999bd696bf5b1c032e06 Author: Naohisa Goto Date: Tue Nov 24 23:51:13 2009 +0900 Added documents. lib/bio/db/kegg/compound.rb | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) commit e965b454c553ed9670bc83962a2a9d7c5de49929 Author: Naohisa Goto Date: Tue Nov 24 23:45:15 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_kegg_compound.rb. lib/bio/db/kegg/compound.rb | 19 +------------- sample/demo_kegg_compound.rb | 57 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+), 18 deletions(-) create mode 100644 sample/demo_kegg_compound.rb commit 7454db7c8b8ef7202736d311356d4ca350af336f Author: Naohisa Goto Date: Tue Nov 24 23:06:21 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_litdb.rb. lib/bio/db/litdb.rb | 17 +---------------- sample/demo_litdb.rb | 42 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 43 insertions(+), 16 deletions(-) create mode 100644 sample/demo_litdb.rb commit fde284248e013e44184ee2ba7da85e5b83155a69 Author: Naohisa Goto Date: Tue Nov 24 22:57:01 2009 +0900 Ruby 1.9 support: String#each_line instead of String#each lib/bio/db/go.rb | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) commit 8b60099615790fe372b4fde27a391dedc767aab2 Author: Naohisa Goto Date: Tue Nov 24 22:53:12 2009 +0900 Sample code bug fix: fixed method names, and workaround for Zlib error. * Sample code bug fix: Following method name changes. * Workaround for Zlib::DataError. sample/demo_go.rb | 13 +++++++++---- 1 files changed, 9 insertions(+), 4 deletions(-) commit 737fec3db555811d127d2356e5ceef63b0413fb8 Author: Naohisa Goto Date: Tue Nov 24 19:47:14 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_go.rb. lib/bio/db/go.rb | 70 +--------------------------------------- sample/demo_go.rb | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 94 insertions(+), 69 deletions(-) create mode 100644 sample/demo_go.rb commit 8264b15690132d9e766f16d0829bb12cd122b900 Author: Naohisa Goto Date: Tue Nov 24 19:20:52 2009 +0900 Document bug fix: Changed Bio::Bl2seq to Bio::Blast::Bl2seq in the RDoc. * Document bug fix: Changed Bio::Bl2seq to Bio::Blast::Bl2seq in the RDoc. * Modified copyright line. lib/bio/appl/bl2seq/report.rb | 18 +++++++++--------- 1 files changed, 9 insertions(+), 9 deletions(-) commit c572ff022fee43505355608f0a0e3ba2181e87e2 Author: Naohisa Goto Date: Tue Nov 24 19:17:04 2009 +0900 Bug fix: Failed to read Bio::Blast::Bl2seq::Report data by using Bio::FlatFile. lib/bio/appl/bl2seq/report.rb | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) commit 4f6b080623442dfcc5864e2aefde7e53ace068e8 Author: Naohisa Goto Date: Tue Nov 24 19:15:11 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_bl2seq_report.rb. lib/bio/appl/bl2seq/report.rb | 194 +------------------------------------ sample/demo_bl2seq_report.rb | 220 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 221 insertions(+), 193 deletions(-) create mode 100644 sample/demo_bl2seq_report.rb commit 2f03e8757383e0d1a26c0f6942c74a30f3b26d90 Author: Naohisa Goto Date: Tue Nov 24 18:24:43 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_genbank.rb. * Demo codes in the "if __FILE__ == $0" are moved to sample/demo_genbank.rb, and modified as below. * To get sequences from the NCBI web service. * By default, arguments are sequence IDs (accession numbers). * New option "--files" (or "-files", "--file", or "-file") to read sequences from file(s). lib/bio/db/genbank/genbank.rb | 87 +-------------------------- sample/demo_genbank.rb | 132 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 133 insertions(+), 86 deletions(-) create mode 100644 sample/demo_genbank.rb commit a2981c28fdb629a655c71c920f6588f8b80aff06 Author: Naohisa Goto Date: Tue Nov 24 15:06:50 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_aaindex.rb. lib/bio/db/aaindex.rb | 39 +--------------------------- sample/demo_aaindex.rb | 67 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 68 insertions(+), 38 deletions(-) create mode 100644 sample/demo_aaindex.rb commit b741d17ec5c5ac234bab35b8716fee072635de1a Author: Naohisa Goto Date: Tue Nov 24 12:45:43 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_sirna.rb * Demo codes in the "if __FILE__ == $0" are moved to sample/demo_sirna.rb, and modified for reading normal sequence files instead of a raw sequence. lib/bio/util/sirna.rb | 24 +------------------ sample/demo_sirna.rb | 63 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 64 insertions(+), 23 deletions(-) create mode 100644 sample/demo_sirna.rb commit 7cc778e78bc63ef73796ee15d6f0db8d6967aefe Author: Naohisa Goto Date: Mon Nov 23 23:00:42 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_pathway.rb. lib/bio/pathway.rb | 171 ----------------------------------------- sample/demo_pathway.rb | 196 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 196 insertions(+), 171 deletions(-) create mode 100644 sample/demo_pathway.rb commit 7e5510587abc0b50b6851f005a3236bf9dc79d08 Author: Naohisa Goto Date: Mon Nov 23 22:49:13 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_locations.rb. lib/bio/location.rb | 73 ---------------------------------- sample/demo_locations.rb | 99 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 99 insertions(+), 73 deletions(-) create mode 100644 sample/demo_locations.rb commit f1c02666f4b11d5cf208d6beb592d8ac962ce2da Author: Naohisa Goto Date: Mon Nov 23 22:35:50 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_codontable.rb. lib/bio/data/codontable.rb | 96 +----------------------------------- sample/demo_codontable.rb | 119 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 120 insertions(+), 95 deletions(-) create mode 100644 sample/demo_codontable.rb commit c11a7793f85faf3d66d630833c38358ffa34a698 Author: Naohisa Goto Date: Mon Nov 23 16:35:16 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_nucleicacid.rb. lib/bio/data/na.rb | 27 +----------------------- sample/demo_nucleicacid.rb | 49 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+), 26 deletions(-) create mode 100644 sample/demo_nucleicacid.rb commit de41b67c3f65baa0f122689b2e9f479d8a247934 Author: Naohisa Goto Date: Mon Nov 23 16:25:05 2009 +0900 Demo codes in the "if __FILE__ == $0" are moved to sample/demo_aminoacid.rb. lib/bio/data/aa.rb | 78 +----------------------------------- sample/demo_aminoacid.rb | 101 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 102 insertions(+), 77 deletions(-) create mode 100644 sample/demo_aminoacid.rb commit e652dd44ecb6b6dad652e33a398f92bb8373e7dd Author: Naohisa Goto Date: Mon Nov 23 16:12:48 2009 +0900 Added an error message about encoding in Ruby 1.9.1 KNOWN_ISSUES.rdoc | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) commit 003133b0d4e2234c27927c9d10b75185c354102e Author: Naohisa Goto Date: Mon Nov 23 15:52:21 2009 +0900 changed recommended Ruby version README.rdoc | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit 408483d36b713678361cecf6c77ff7a2098f71fc Author: Naohisa Goto Date: Sun Nov 22 17:07:37 2009 +0900 added information about doc/Changes-1.4.rdoc README.rdoc | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) commit ef342933839e8c6cef9883045fcaf468aff5da23 Author: Naohisa Goto Date: Sun Nov 22 16:50:55 2009 +0900 In PhyloXML support, added a link to GNOME Libxml2 and fixed RDoc syntax. README.rdoc | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) commit 0237ef42d60c7a76cadf8ea78f4251bcfe89c95f Author: Naohisa Goto Date: Thu Nov 19 09:43:15 2009 +0900 Ruby 1.9 support: String#each_line instead of String#each lib/bio/appl/meme/mast/report.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit ec935ea9b19415bf3325bcc0763fbc22f3c71a3d Author: Naohisa Goto Date: Thu Nov 19 09:40:49 2009 +0900 The "libpath magic" is replaced by loading helper routine. test/unit/bio/appl/meme/mast/test_report.rb | 11 ++++++----- test/unit/bio/appl/meme/test_mast.rb | 11 ++++++----- test/unit/bio/appl/meme/test_motif.rb | 8 +++++--- 3 files changed, 17 insertions(+), 13 deletions(-) commit 3f65eeb503f3b2ef866cab4c73d2d700ca572835 Author: Adam Kraut Date: Tue Mar 17 19:41:31 2009 -0400 Added basic support for MEME/MAST applications lib/bio/appl/meme/mast.rb | 156 +++++++++++++++++++++++++++ lib/bio/appl/meme/mast/report.rb | 91 ++++++++++++++++ lib/bio/appl/meme/motif.rb | 48 ++++++++ test/data/meme/mast.out | 13 +++ test/data/meme/meme.out | 3 + test/unit/bio/appl/meme/mast/test_report.rb | 45 ++++++++ test/unit/bio/appl/meme/test_mast.rb | 102 +++++++++++++++++ test/unit/bio/appl/meme/test_motif.rb | 36 ++++++ 8 files changed, 494 insertions(+), 0 deletions(-) create mode 100644 lib/bio/appl/meme/mast.rb create mode 100644 lib/bio/appl/meme/mast/report.rb create mode 100644 lib/bio/appl/meme/motif.rb create mode 100644 test/data/meme/db create mode 100644 test/data/meme/mast create mode 100644 test/data/meme/mast.out create mode 100644 test/data/meme/meme.out create mode 100644 test/unit/bio/appl/meme/mast/test_report.rb create mode 100644 test/unit/bio/appl/meme/test_mast.rb create mode 100644 test/unit/bio/appl/meme/test_motif.rb commit 3862f54fda0caec2a07e563a1f8a11913baca2e3 Author: Naohisa Goto Date: Wed Nov 18 20:29:56 2009 +0900 New version of PhyloXML schema, version 1.10. * Upgraded to New version of PhyloXML schema, version 1.10, developed by Christian M Zmasek. lib/bio/db/phyloxml/phyloxml.xsd | 1155 +++++++++++++++++++------------------- 1 files changed, 582 insertions(+), 573 deletions(-) commit 45ffd9228d513b3dbf29e1011c6a6689a8bd1b08 Author: Naohisa Goto Date: Wed Nov 18 00:26:44 2009 +0900 Newly added sample script to test big PhyloXML data * Newly added a sample script to test big PhyloXML data based on Diana Jaunzeikare's work. (http://github.com/latvianlinuxgirl/bioruby/blob/ 20627fc5a443d6c2e3dc73ed50e9c578ffcbc330/ test/unit/bio/db/test_phyloxml_big.rb). sample/test_phyloxml_big.rb | 205 +++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 205 insertions(+), 0 deletions(-) create mode 100644 sample/test_phyloxml_big.rb commit 828a8971e057919b80508cf29fd9518828b74a2f Author: Naohisa Goto Date: Tue Nov 17 23:54:37 2009 +0900 Speed up of Bio::Tree#children and parent: caching node's parent. * For speed up of Bio::Tree#children and parent, internal cache of the parent for each node is added. The cache is automatically cleared when the tree is modified. Note that the cache can only be accessed from inside Bio::Tree. * Bio::Tree#parent is changed to directly raise IndexError when both of the root specified in the argument and preset in the tree are nil (previously, the same error is raised in the path method which is internally called from the parent method). * Bio::Tree#path is changed not to call bfs_shortest_path if the node1 and node2 are adjacent. lib/bio/tree.rb | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 70 insertions(+), 5 deletions(-) commit 75862212e6bb807a570338e39e19d527219b6f13 Author: Naohisa Goto Date: Mon Nov 16 22:11:15 2009 +0900 Documented incompatible changes of Bio::KEGG::COMPOUND and Bio::KEGG:REACTION. doc/Changes-1.4.rdoc | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) commit c74cfabd6414c8b50db0251739f967accd90773f Author: Naohisa Goto Date: Mon Nov 16 21:20:42 2009 +0900 Ruby compatibility issue: Enumerable#each_slice(4).each does not work in Ruby 1.8.5. lib/bio/db/kegg/reaction.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit e6a920e401a2b06c355174ccdc9b993a38f9d7ec Author: Mitsuteru Nakao Date: Wed Jul 22 22:50:22 2009 +0900 Added new method Bio::KEGG::GENES#structure with the unit tests. lib/bio/db/kegg/genes.rb | 12 ++++++++++++ test/unit/bio/db/kegg/test_genes.rb | 25 +++++++++++++++++++++++++ 2 files changed, 37 insertions(+), 0 deletions(-) commit 9fee0c133d069348857014410983f682e468c1c7 Author: Naohisa Goto Date: Mon Nov 16 21:04:02 2009 +0900 The "libpath magic" is replaced by loading helper routine. test/unit/bio/db/kegg/test_compound.rb | 6 ++++-- test/unit/bio/db/kegg/test_reaction.rb | 6 ++++-- 2 files changed, 8 insertions(+), 4 deletions(-) commit 1199330eab95b8434303e92c5f792818e96db814 Author: Kozo Nishida Date: Sat Nov 14 09:03:50 2009 +0900 Newly added unit tests for Bio::KEGG::COMPOUND and Bio::KEGG::REACTION * Newly added unit tests for Bio::KEGG::COMPOUND and Bio::KEGG::REACTION with test data. (Note that this is a combination of several commits made by Kozo Nishida, merged from git://github.com/kozo2/bioruby.git ). test/data/KEGG/C00025.compound | 102 ++++++++++++++++++++++++++++++++ test/data/KEGG/R00006.reaction | 14 ++++ test/unit/bio/db/kegg/test_compound.rb | 49 +++++++++++++++ test/unit/bio/db/kegg/test_reaction.rb | 57 ++++++++++++++++++ 4 files changed, 222 insertions(+), 0 deletions(-) create mode 100644 test/data/KEGG/C00025.compound create mode 100644 test/data/KEGG/R00006.reaction create mode 100644 test/unit/bio/db/kegg/test_compound.rb create mode 100644 test/unit/bio/db/kegg/test_reaction.rb commit 1b47640665d4332bafd9e9709628ee9722f1f3f4 Author: Kozo Nishida Date: Sat Nov 14 09:03:50 2009 +0900 Bio::KEGG::COMPOUND#dblinks changed to return hash list * Bio::KEGG::COMPOUND#dblinks is changed to return hash list (array containing hashes). lib/bio/db/kegg/compound.rb | 11 +++++++++-- 1 files changed, 9 insertions(+), 2 deletions(-) commit 2aa43a0aa765ee4502923c2102e352826a9a7abd Author: Kozo Nishida Date: Sat Nov 14 07:29:19 2009 +0900 Bio::KEGG:REACTION#rpair and pathways changed to return hash list, and added orthologies method. * New method: Bio::KEGG:REACTION#orthologies * Bio::KEGG:REACTION#rpair and pathways are changed to return hash list (array containing hashes). lib/bio/db/kegg/reaction.rb | 33 ++++++++++++++++++++++++++++++--- 1 files changed, 30 insertions(+), 3 deletions(-) commit a82f5d228370beeeb397be07e07394652fd7837e Author: Naohisa Goto Date: Mon Nov 16 20:03:19 2009 +0900 Changed not to modify given argument lib/bio/util/restriction_enzyme/single_strand.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit ff10d5540759a5e7eaaa71da020d95170b98e007 Author: Naohisa Goto Date: Mon Nov 16 19:50:08 2009 +0900 Newly added a document for incompatible and/or important changes of the new release version. * Newly added a document for incompatible and/or important changes of the new release version. * Added description about Bio::RestrictionEnzyme validation is disabled (although very small change). doc/Changes-1.4.rdoc | 16 ++++++++++++++++ 1 files changed, 16 insertions(+), 0 deletions(-) create mode 100644 doc/Changes-1.4.rdoc commit 629e537f90e0825fadeec2e0207f8caddfbed59a Author: trevor <> Date: Sat Sep 19 11:03:23 2009 -0500 speed-up serial calls to RestrictionEnzyme lib/bio/db/rebase.rb | 2 +- lib/bio/util/restriction_enzyme/single_strand.rb | 3 +- .../util/restriction_enzyme/test_single_strand.rb | 24 ++++++++++--------- .../test_single_strand_complement.rb | 24 ++++++++++--------- 4 files changed, 29 insertions(+), 24 deletions(-) commit 9b55a92d5300294bef7b624d0f9aa3edd3e8d7fc Author: trevor <> Date: Sat Sep 19 10:46:21 2009 -0500 speed-up rebase library lib/bio/db/rebase.rb | 9 ++++----- 1 files changed, 4 insertions(+), 5 deletions(-) commit 4aaa24b3fc3cf2d1f7cf8b6d974d2115958b5a1b Author: Naohisa Goto Date: Mon Nov 16 15:08:31 2009 +0900 Ruby 1.9 support: Array#to_s is changed to join('') lib/bio/db/sanger_chromatogram/scf.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 1cf924b81150545e807169144aeaca6a75f9731c Author: Naohisa Goto Date: Mon Nov 16 12:59:10 2009 +0900 Ruby 1.9 support: Array#nitems (counts the number of non-nil elements) is removed in 1.9. * Ruby 1.9 support: Array#nitems (counts the number of non-nil elements) is removed in Ruby 1.9. In scf.rb, it seems that nil would never be included in the array, and simply replaced by Array#size. lib/bio/db/sanger_chromatogram/scf.rb | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) commit a95994d9cfbf4fc89fa716358ac5b92d42a1307b Author: Naohisa Goto Date: Sun Nov 15 19:23:12 2009 +0900 Bug fix: error when quality_scores are larger than the sequence length, and added a require line. * Bug fix: error when sequence.quality_scores are larger than the sequence length. * Added a require line. lib/bio/db/fasta/format_qual.rb | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) commit c5aafca19b58b1651080e81699b7020cd3fd3f47 Author: Naohisa Goto Date: Sun Nov 15 19:21:49 2009 +0900 Newly added unit tests for Bio::Sequence::Format::Formatter::Fasta_numeric and Qual. test/unit/bio/db/fasta/test_format_qual.rb | 346 ++++++++++++++++++++++++++++ 1 files changed, 346 insertions(+), 0 deletions(-) create mode 100644 test/unit/bio/db/fasta/test_format_qual.rb commit 6e24170e29d2576ff69b18eaadc94e9769b8612a Author: Naohisa Goto Date: Sat Nov 14 02:41:13 2009 +0900 Newly added Bio::Sequence::Format::Formatter::Qual and Fasta_numeric, formatter for Qual format and FastaNumericFormat. lib/bio/db/fasta/format_qual.rb | 201 +++++++++++++++++++++++++++++++++++++++ lib/bio/sequence/format.rb | 7 ++ 2 files changed, 208 insertions(+), 0 deletions(-) create mode 100644 lib/bio/db/fasta/format_qual.rb commit 6959fd359040b6ca9570111d515118dc2d472029 Author: Naohisa Goto Date: Sat Nov 14 02:19:17 2009 +0900 Split quality score methods in Bio::Fastq::FormatData into separete modules * Quality score calculation methods in Bio::Fastq::FormatData in lib/bio/db/fastq.rb is splitted into separate modules Bio::Fastq::QualityScore::Converter, Phred, and Solexa in lib/bio/db/fastq/quality_score.rb. * Unit tests for Bio::Fastq::QualityScore::* are newly added in test/unit/bio/db/fastq/test_quality_score.rb. * Possible bug fix: probability should be 0 <= p <= 1. lib/bio/db/fastq.rb | 112 +-------- lib/bio/db/fastq/quality_score.rb | 206 ++++++++++++++++ test/unit/bio/db/fastq/test_quality_score.rb | 330 ++++++++++++++++++++++++++ 3 files changed, 544 insertions(+), 104 deletions(-) create mode 100644 lib/bio/db/fastq/quality_score.rb create mode 100644 test/unit/bio/db/fastq/test_quality_score.rb commit 98f7703c28f0c2c34e4fe1631de227e20b9666c3 Author: Naohisa Goto Date: Fri Nov 13 23:48:19 2009 +0900 When no error_probabilities in the sequence and quality_score_type is nil, Fastq formatter implicitly assumes that the quality_score_type is :phred. * When no error_probabilities in the sequence and quality_score_type is nil, Fastq formatter implicitly assumes that the quality_score_type is :phred. * Bug fix: fixed typo in lib/bio/db/fastq/format_fastq.rb. lib/bio/db/fastq/format_fastq.rb | 5 ++++- lib/bio/sequence.rb | 3 +++ 2 files changed, 7 insertions(+), 1 deletions(-) commit f85a6aee9827bc573dcb735f4a1a1827926cc66c Author: Naohisa Goto Date: Fri Nov 13 23:29:19 2009 +0900 Bug fix: fixed typo for Bio::Sequence#quality_score_type. lib/bio/sequence.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit d81a611d7c7c46a789b86e99cebe064ba559e3e0 Author: Naohisa Goto Date: Fri Nov 13 20:42:12 2009 +0900 Splitting lib/bio/db/fasta.rb: FastaNumericFormat is moved to a new file, etc. * Splitting lib/bio/db/fasta.rb as follows: * Bio::FastaNumericFormat is moved to lib/bio/db/fasta/qual.rb. * Demo codes in the "if __FILE__ == $0" are moved to sample/demo_fastaformat.rb. * Unit tests for Bio::FastaNumericFormat are moved from test/unit/bio/db/test_fasta.rb to test/unit/bio/db/test_qual.rb. * lib/bio.rb is also modified for the autoload. * Bug fix: fixed incorrect autoload path for Bio::FastaDefline. lib/bio.rb | 4 +- lib/bio/db/fasta.rb | 135 +--------------------------------------- lib/bio/db/fasta/qual.rb | 102 ++++++++++++++++++++++++++++++ sample/demo_fastaformat.rb | 105 +++++++++++++++++++++++++++++++ test/unit/bio/db/test_fasta.rb | 43 ------------- test/unit/bio/db/test_qual.rb | 63 +++++++++++++++++++ 6 files changed, 273 insertions(+), 179 deletions(-) create mode 100644 lib/bio/db/fasta/qual.rb create mode 100644 sample/demo_fastaformat.rb create mode 100644 test/unit/bio/db/test_qual.rb commit c70bed5c3f828c94084fdeabe255fbb3930097d0 Author: Andrew Grimm Date: Sun Aug 16 19:49:38 2009 +1000 Removed use of uninitialized variable in FastaNumericFormat. lib/bio/db/fasta.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit ce6dcc344a3d7beec544d7164308dd97bafa8a19 Author: Naohisa Goto Date: Fri Nov 13 13:04:11 2009 +0900 User data type should be stored as is, even if unknown data type. lib/bio/db/sanger_chromatogram/abif.rb | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) commit ca96e59f151c2e10b5cd8c0690b8297979e52036 Author: Naohisa Goto Date: Fri Nov 13 12:50:59 2009 +0900 Removed Bio::Abif#method_missing and added alternative method * Removed Bio::Abif#method_missing, because method_missing can hide many errors related to method calls e.g. method name typo, and it is not suitable for only getting data. * New method Bio::Abif#data is added to get data (alternative of the method_missing). lib/bio/db/sanger_chromatogram/abif.rb | 19 ++++++++++++------- 1 files changed, 12 insertions(+), 7 deletions(-) commit 8b0da27523998cb9a9df07f5e907cda6e3cef0dc Author: Naohisa Goto Date: Fri Nov 13 12:04:41 2009 +0900 removed a non-ascii character in comment lib/bio/db/sanger_chromatogram/chromatogram.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 55c1a180fec97338bae8e3c5b5d5ceec64aed0f6 Author: Naohisa Goto Date: Fri Nov 13 12:02:53 2009 +0900 Bug fix: Bio::SangerChromatogram#complement fails when the object is frozen. lib/bio/db/sanger_chromatogram/chromatogram.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 4e7d8b0ba304d1ff01364fad68035f7ec9463fb9 Author: Naohisa Goto Date: Thu Nov 12 22:23:47 2009 +0900 fixed a typo in a copyright line test/unit/bio/util/test_sirna.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit f376124112d23ba9b0491dbd427d328edc81d872 Author: Naohisa Goto Date: Thu Nov 12 22:05:37 2009 +0900 The "libpath magic" in tests are replaced by the load of helper routine. * In all unit tests, the "libpath magic" are replaced by the load of helper routine. * Changed to use a constant BioRubyTestDataPath for generating test data file path. * Some "require" lines are modified. * "File.open(...).read" in some tests are replaced by "File.read(...)". * Header comment lines of some tests with wrong filename and/or class/module name information are fixed. test/functional/bio/appl/test_pts1.rb | 6 ++++-- test/functional/bio/io/test_ensembl.rb | 7 ++++--- test/functional/bio/io/test_pubmed.rb | 8 +++++--- test/functional/bio/io/test_soapwsdl.rb | 9 +++++---- test/functional/bio/io/test_togows.rb | 9 +++++---- test/functional/bio/sequence/test_output_embl.rb | 10 ++++++---- test/functional/bio/test_command.rb | 10 +++++----- test/runner.rb | 8 +++++--- test/unit/bio/appl/bl2seq/test_report.rb | 9 +++++---- test/unit/bio/appl/blast/test_ncbioptions.rb | 6 ++++-- test/unit/bio/appl/blast/test_report.rb | 9 +++++---- test/unit/bio/appl/blast/test_rpsblast.rb | 9 +++++---- test/unit/bio/appl/gcg/test_msf.rb | 10 +++++----- test/unit/bio/appl/genscan/test_report.rb | 17 ++++++++--------- test/unit/bio/appl/hmmer/test_report.rb | 9 +++++---- test/unit/bio/appl/iprscan/test_report.rb | 11 ++++++----- test/unit/bio/appl/mafft/test_report.rb | 11 ++++++----- test/unit/bio/appl/paml/codeml/test_rates.rb | 9 +++++---- test/unit/bio/appl/paml/codeml/test_report.rb | 9 +++++---- test/unit/bio/appl/paml/test_codeml.rb | 9 +++++---- test/unit/bio/appl/sim4/test_report.rb | 9 +++++---- test/unit/bio/appl/sosui/test_report.rb | 11 ++++++----- test/unit/bio/appl/targetp/test_report.rb | 8 +++++--- test/unit/bio/appl/test_blast.rb | 9 +++++---- test/unit/bio/appl/test_fasta.rb | 6 ++++-- test/unit/bio/appl/test_pts1.rb | 6 ++++-- test/unit/bio/appl/tmhmm/test_report.rb | 11 ++++++----- test/unit/bio/data/test_aa.rb | 8 +++++--- test/unit/bio/data/test_codontable.rb | 9 +++++---- test/unit/bio/data/test_na.rb | 8 +++++--- test/unit/bio/db/biosql/tc_biosql.rb | 6 +++++- test/unit/bio/db/embl/test_common.rb | 6 ++++-- test/unit/bio/db/embl/test_embl.rb | 12 ++++++------ test/unit/bio/db/embl/test_embl_rel89.rb | 12 ++++++------ test/unit/bio/db/embl/test_embl_to_bioseq.rb | 15 +++++++-------- test/unit/bio/db/embl/test_sptr.rb | 14 ++++++-------- test/unit/bio/db/embl/test_uniprot.rb | 11 ++++++----- test/unit/bio/db/kegg/test_genes.rb | 8 +++++--- test/unit/bio/db/pdb/test_pdb.rb | 6 ++++-- test/unit/bio/db/sanger_chromatogram/test_abif.rb | 3 ++- test/unit/bio/db/sanger_chromatogram/test_scf.rb | 3 ++- test/unit/bio/db/test_aaindex.rb | 12 ++++++------ test/unit/bio/db/test_fasta.rb | 8 +++++--- test/unit/bio/db/test_fastq.rb | 10 +++++----- test/unit/bio/db/test_gff.rb | 6 ++++-- test/unit/bio/db/test_lasergene.rb | 12 +++++++----- test/unit/bio/db/test_medline.rb | 6 ++++-- test/unit/bio/db/test_newick.rb | 12 ++++++------ test/unit/bio/db/test_nexus.rb | 6 ++++-- test/unit/bio/db/test_phyloxml.rb | 14 +++++++------- test/unit/bio/db/test_phyloxml_writer.rb | 15 +++++++-------- test/unit/bio/db/test_prosite.rb | 11 ++++++----- test/unit/bio/db/test_rebase.rb | 8 +++++--- test/unit/bio/db/test_soft.rb | 13 +++++++------ test/unit/bio/io/flatfile/test_autodetection.rb | 13 ++++++------- test/unit/bio/io/flatfile/test_buffer.rb | 11 ++++++----- test/unit/bio/io/flatfile/test_splitter.rb | 8 ++++---- test/unit/bio/io/test_ddbjxml.rb | 7 ++++--- test/unit/bio/io/test_ensembl.rb | 8 +++++--- test/unit/bio/io/test_fastacmd.rb | 7 ++++--- test/unit/bio/io/test_flatfile.rb | 11 ++++++----- test/unit/bio/io/test_soapwsdl.rb | 7 ++++--- test/unit/bio/io/test_togows.rb | 6 ++++-- test/unit/bio/sequence/test_aa.rb | 8 +++++--- test/unit/bio/sequence/test_common.rb | 6 ++++-- test/unit/bio/sequence/test_compat.rb | 6 ++++-- test/unit/bio/sequence/test_dblink.rb | 8 +++++--- test/unit/bio/sequence/test_na.rb | 6 ++++-- test/unit/bio/shell/plugin/test_seq.rb | 8 +++++--- test/unit/bio/test_alignment.rb | 8 +++++--- test/unit/bio/test_command.rb | 7 ++++--- test/unit/bio/test_db.rb | 8 +++++--- test/unit/bio/test_feature.rb | 6 ++++-- test/unit/bio/test_location.rb | 6 ++++-- test/unit/bio/test_map.rb | 8 +++++--- test/unit/bio/test_pathway.rb | 6 ++++-- test/unit/bio/test_reference.rb | 6 ++++-- test/unit/bio/test_sequence.rb | 8 +++++--- test/unit/bio/test_shell.rb | 8 +++++--- test/unit/bio/test_tree.rb | 12 ++++++------ .../analysis/test_calculated_cuts.rb | 6 ++++-- .../restriction_enzyme/analysis/test_cut_ranges.rb | 6 ++++-- .../analysis/test_sequence_range.rb | 6 ++++-- .../double_stranded/test_aligned_strands.rb | 6 ++++-- .../double_stranded/test_cut_location_pair.rb | 6 ++++-- .../test_cut_location_pair_in_enzyme_notation.rb | 6 ++++-- .../double_stranded/test_cut_locations.rb | 6 ++++-- .../test_cut_locations_in_enzyme_notation.rb | 6 ++++-- .../test_cut_locations_in_enzyme_notation.rb | 6 ++++-- .../bio/util/restriction_enzyme/test_analysis.rb | 6 ++++-- .../bio/util/restriction_enzyme/test_cut_symbol.rb | 6 ++++-- .../restriction_enzyme/test_double_stranded.rb | 6 ++++-- .../util/restriction_enzyme/test_single_strand.rb | 6 ++++-- .../test_single_strand_complement.rb | 6 ++++-- .../restriction_enzyme/test_string_formatting.rb | 6 ++++-- test/unit/bio/util/test_color_scheme.rb | 8 +++++--- test/unit/bio/util/test_contingency_table.rb | 8 +++++--- test/unit/bio/util/test_restriction_enzyme.rb | 6 ++++-- test/unit/bio/util/test_sirna.rb | 8 +++++--- 99 files changed, 479 insertions(+), 343 deletions(-) commit f4fa0a5edc6ff6fc35577d84bda86363014a57a4 Author: Naohisa Goto Date: Wed Nov 11 17:04:43 2009 +0900 test_chromatogram.rb is splitted into test_abif.rb and test_scf.rb test/unit/bio/db/sanger_chromatogram/test_abif.rb | 75 +++++++++++++++ .../db/sanger_chromatogram/test_chromatogram.rb | 101 -------------------- test/unit/bio/db/sanger_chromatogram/test_scf.rb | 97 +++++++++++++++++++ 3 files changed, 172 insertions(+), 101 deletions(-) create mode 100644 test/unit/bio/db/sanger_chromatogram/test_abif.rb delete mode 100644 test/unit/bio/db/sanger_chromatogram/test_chromatogram.rb create mode 100644 test/unit/bio/db/sanger_chromatogram/test_scf.rb commit d9cc613273cadc7f9fdfe2bafbd933efb1f403ca Author: Naohisa Goto Date: Wed Nov 11 17:01:37 2009 +0900 Newly added unit test helper routine which aims to replace the libpath magic test/bioruby_test_helper.rb | 61 +++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 61 insertions(+), 0 deletions(-) create mode 100644 test/bioruby_test_helper.rb commit 10e76db2a8ec37bde541157d0735303b4ca8b3b8 Author: Naohisa Goto Date: Tue Nov 10 20:59:01 2009 +0900 Bio::SangerChromatogram#to_s is renamed to sequence_string. lib/bio/db/sanger_chromatogram/chromatogram.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 0ec2c2f38c4b4a3e451841dc32540dfa10743bc2 Author: Naohisa Goto Date: Fri Oct 30 22:49:19 2009 +0900 Renamed/moved files/directories following the rename of class names. * renamed: lib/bio/db/chromatogram.rb -> lib/bio/db/sanger_chromatogram/chromatogram.rb * renamed: lib/bio/db/chromatogram/abi.rb -> lib/bio/db/sanger_chromatogram/abif.rb * renamed: lib/bio/db/chromatogram/scf.rb -> lib/bio/db/sanger_chromatogram/scf.rb * renamed: lib/bio/db/chromatogram/chromatogram_to_biosequence.rb -> lib/bio/db/sanger_chromatogram/chromatogram_to_biosequence.rb * renamed: test/unit/bio/db/test_chromatogram.rb -> test/unit/bio/db/sanger_chromatogram/test_chromatogram.rb * renamed: test/data/chromatogram/test_chromatogram_abi.ab1 -> test/data/sanger_chromatogram/test_chromatogram_abif.ab1 * renamed: test/data/chromatogram/*.scf -> test/data/sanger_chromatogram/*.scf lib/bio/db/chromatogram.rb | 133 ------------- lib/bio/db/chromatogram/abi.rb | 114 ----------- .../db/chromatogram/chromatogram_to_biosequence.rb | 32 --- lib/bio/db/chromatogram/scf.rb | 210 -------------------- lib/bio/db/sanger_chromatogram/abif.rb | 114 +++++++++++ lib/bio/db/sanger_chromatogram/chromatogram.rb | 133 +++++++++++++ .../chromatogram_to_biosequence.rb | 32 +++ lib/bio/db/sanger_chromatogram/scf.rb | 210 ++++++++++++++++++++ test/data/chromatogram/test_chromatogram_abi.ab1 | Bin 228656 -> 0 bytes .../data/chromatogram/test_chromatogram_scf_v2.scf | Bin 47503 -> 0 bytes .../data/chromatogram/test_chromatogram_scf_v3.scf | Bin 47503 -> 0 bytes .../sanger_chromatogram/test_chromatogram_abif.ab1 | Bin 0 -> 228656 bytes .../test_chromatogram_scf_v2.scf | Bin 0 -> 47503 bytes .../test_chromatogram_scf_v3.scf | Bin 0 -> 47503 bytes .../db/sanger_chromatogram/test_chromatogram.rb | 101 ++++++++++ test/unit/bio/db/test_chromatogram.rb | 101 ---------- 16 files changed, 590 insertions(+), 590 deletions(-) delete mode 100644 lib/bio/db/chromatogram.rb delete mode 100644 lib/bio/db/chromatogram/abi.rb delete mode 100644 lib/bio/db/chromatogram/chromatogram_to_biosequence.rb delete mode 100644 lib/bio/db/chromatogram/scf.rb create mode 100644 lib/bio/db/sanger_chromatogram/abif.rb create mode 100644 lib/bio/db/sanger_chromatogram/chromatogram.rb create mode 100644 lib/bio/db/sanger_chromatogram/chromatogram_to_biosequence.rb create mode 100644 lib/bio/db/sanger_chromatogram/scf.rb delete mode 100644 test/data/chromatogram/test_chromatogram_abi.ab1 delete mode 100644 test/data/chromatogram/test_chromatogram_scf_v2.scf delete mode 100644 test/data/chromatogram/test_chromatogram_scf_v3.scf create mode 100644 test/data/sanger_chromatogram/test_chromatogram_abif.ab1 create mode 100644 test/data/sanger_chromatogram/test_chromatogram_scf_v2.scf create mode 100644 test/data/sanger_chromatogram/test_chromatogram_scf_v3.scf create mode 100644 test/unit/bio/db/sanger_chromatogram/test_chromatogram.rb delete mode 100644 test/unit/bio/db/test_chromatogram.rb commit 49bfe319e535c8414be32b47c07fe5204a24b398 Author: Naohisa Goto Date: Fri Oct 30 22:02:08 2009 +0900 Renamed Chromatogram to SangerChromatogram and Abi to Abif, and preparation of filename changes. * Renamed Chromatogram to SangerChromatogram because the word "chromatogram" may be used by various experimental methods other than the Sanger chromatogram. * Renamed Abi to Abif because Applied Biosystems who determined the file format says that its name is ABIF. * Preparation of changing filenames. However, filenames are not really changed now because of recording history of file contents modification. The paths shown in the "require" lines and test data paths may not be existed now. lib/bio.rb | 6 ++-- lib/bio/db/chromatogram.rb | 24 +++++++------- lib/bio/db/chromatogram/abi.rb | 15 +++++---- .../db/chromatogram/chromatogram_to_biosequence.rb | 10 +++--- lib/bio/db/chromatogram/scf.rb | 15 +++++---- lib/bio/sequence/adapter.rb | 3 +- test/unit/bio/db/test_chromatogram.rb | 32 ++++++++++--------- 7 files changed, 57 insertions(+), 48 deletions(-) commit 6c020440663214014973ae8e5007ce2d31d8d45e Author: Naohisa Goto Date: Sat Oct 24 00:44:47 2009 +0900 New class method Bio::PhyloXML::Parser.open(filename) and API change of new(), etc. * New class methods to create parser object from various data source are added: Bio::PhyloXML::Parser.open(filename), for_io(io), open_uri(uri). * API change of Bio::PhyloXML::Parser.new(). Now, new(filename) is deprecated and it can only take a XML-formatted string. * Tests are added and modified to reflect the above changes. * test/unit/bio/db/test_phyloxml_writer.rb: avoid using WeakRef for temporary directory maintenance. lib/bio/db/phyloxml/phyloxml_parser.rb | 224 +++++++++++++++++++++++++++--- lib/bio/db/phyloxml/phyloxml_writer.rb | 4 +- test/unit/bio/db/test_phyloxml.rb | 178 ++++++++++++++++++++++-- test/unit/bio/db/test_phyloxml_writer.rb | 70 +++++----- 4 files changed, 408 insertions(+), 68 deletions(-) commit fca5e800fc051a38ac6d25652c684fdd4f9bff14 Author: Naohisa Goto Date: Fri Oct 23 15:13:25 2009 +0900 Rearrangement of require and autoload so as to correctly load PhyloXML classes lib/bio.rb | 11 +++++++---- lib/bio/db/phyloxml/phyloxml_elements.rb | 16 +++++++++++++++- lib/bio/db/phyloxml/phyloxml_parser.rb | 11 ++++++----- lib/bio/db/phyloxml/phyloxml_writer.rb | 5 ++++- test/unit/bio/db/test_phyloxml.rb | 5 ----- test/unit/bio/db/test_phyloxml_writer.rb | 3 --- 6 files changed, 32 insertions(+), 19 deletions(-) commit a291af62ef262ee04f3a0e1b6415d4e256c56a94 Author: Naohisa Goto Date: Fri Oct 23 00:08:44 2009 +0900 Fixed argument order of assert_equal(expected, actual), etc. * Test bug fix: Argument order of assert_equal must be assert_equal(expected, actual). * assert_instance_of() instead of assert_equal() in TestPhyloXML1#test_init. * Removed some commented-out tests which may not be needed. test/unit/bio/db/test_phyloxml.rb | 295 +++++++++++++++--------------- test/unit/bio/db/test_phyloxml_writer.rb | 8 +- 2 files changed, 147 insertions(+), 156 deletions(-) commit 152304dc9809102f56a2f1779c59111f84b9cd02 Author: Naohisa Goto Date: Sat Oct 17 01:40:49 2009 +0900 Improvement of tests for Bio::Fastq and related classes. test/unit/bio/db/test_fastq.rb | 372 ++++++++++++++++++++++++++-------------- 1 files changed, 245 insertions(+), 127 deletions(-) commit 61556223a469a5f8b1bb4f343eca92c88c66cb9a Author: Naohisa Goto Date: Sat Oct 17 01:38:52 2009 +0900 FASTQ output support is added to Bio::Sequence. lib/bio/db/fastq/format_fastq.rb | 172 ++++++++++++++++++++++++++++++++++++++ lib/bio/sequence/format.rb | 9 ++ 2 files changed, 181 insertions(+), 0 deletions(-) create mode 100644 lib/bio/db/fastq/format_fastq.rb commit ea4203ebb7ca268a5b6d6c50aeb63ed0eed5a803 Author: Naohisa Goto Date: Sat Oct 17 01:32:50 2009 +0900 New attributes for genome sequencer data are added to Bio::Sequence. * New attributes for genome sequencer data are added to Bio::Sequence class: quality_scores, quality_scores_type, error_probabilities. lib/bio/sequence.rb | 13 +++++++++++++ 1 files changed, 13 insertions(+), 0 deletions(-) commit fce158b2194519081361e12c170882ec2e87fc5e Author: Naohisa Goto Date: Sat Oct 17 01:13:27 2009 +0900 New methods Bio::Fastq#to_biosequence, etc. and improvement of tolerance for overflows * Bio::Fastq#to_biosequence is newly added. * New methods: Bio::Fastq#seq, entry_id, quality_score_type. * Default behavior of Bio::Fastq::FormatData#scores2str is changed not to raise error but to truncate saturated values. * Improvement of tolerance for overflows, and preventing to calculate log of negative number. lib/bio/db/fastq.rb | 105 ++++++++++++++++++++++++++++-- lib/bio/db/fastq/fastq_to_biosequence.rb | 40 +++++++++++ lib/bio/sequence/adapter.rb | 1 + 3 files changed, 139 insertions(+), 7 deletions(-) create mode 100644 lib/bio/db/fastq/fastq_to_biosequence.rb commit 0f189974d2027cecee575b27e969de7f62508309 Author: Naohisa Goto Date: Tue Oct 13 21:41:58 2009 +0900 Avoid using Numeric#fdiv because it can only be used in Ruby 1.8.7 or later lib/bio/db/fastq.rb | 6 +++--- test/unit/bio/db/test_fastq.rb | 10 +++++----- 2 files changed, 8 insertions(+), 8 deletions(-) commit 42999fc6230e52c4f241f411d299db941196f62e Author: Naohisa Goto Date: Tue Oct 13 21:30:25 2009 +0900 Bio::Fastq#qualities is renamed to quality_scores. * Bio::Fastq#qualities is renamed to Bio::Fastq#quality_scores, and the original method name is changed to be an alias of the new name. lib/bio/db/fastq.rb | 16 +++++++++------- test/unit/bio/db/test_fastq.rb | 30 +++++++++++++++--------------- 2 files changed, 24 insertions(+), 22 deletions(-) commit cc0ee2169f298046c5e55fcbadfeaac01f6bf704 Author: Naohisa Goto Date: Sun Oct 11 19:19:18 2009 +0900 Newly added unit tests for Bio::Fastq with test data * Newly added unit tests for Bio::Fastq with test data. The test data is created by P.J.A. Cock et al., and is also used in Biopython and BioPerl. test/data/fastq/error_diff_ids.fastq | 20 + test/data/fastq/error_double_qual.fastq | 22 + test/data/fastq/error_double_seq.fastq | 22 + test/data/fastq/error_long_qual.fastq | 20 + test/data/fastq/error_no_qual.fastq | 20 + test/data/fastq/error_qual_del.fastq | 20 + test/data/fastq/error_qual_escape.fastq | 20 + test/data/fastq/error_qual_null.fastq | Bin 0 -> 610 bytes test/data/fastq/error_qual_space.fastq | 21 + test/data/fastq/error_qual_tab.fastq | 21 + test/data/fastq/error_qual_unit_sep.fastq | 20 + test/data/fastq/error_qual_vtab.fastq | 20 + test/data/fastq/error_short_qual.fastq | 20 + test/data/fastq/error_spaces.fastq | 20 + test/data/fastq/error_tabs.fastq | 21 + test/data/fastq/error_trunc_at_plus.fastq | 19 + test/data/fastq/error_trunc_at_qual.fastq | 19 + test/data/fastq/error_trunc_at_seq.fastq | 18 + test/data/fastq/error_trunc_in_plus.fastq | 19 + test/data/fastq/error_trunc_in_qual.fastq | 20 + test/data/fastq/error_trunc_in_seq.fastq | 18 + test/data/fastq/error_trunc_in_title.fastq | 17 + .../fastq/illumina_full_range_as_illumina.fastq | 8 + .../data/fastq/illumina_full_range_as_sanger.fastq | 8 + .../data/fastq/illumina_full_range_as_solexa.fastq | 8 + .../illumina_full_range_original_illumina.fastq | 8 + test/data/fastq/longreads_as_illumina.fastq | 40 ++ test/data/fastq/longreads_as_sanger.fastq | 40 ++ test/data/fastq/longreads_as_solexa.fastq | 40 ++ test/data/fastq/longreads_original_sanger.fastq | 120 ++++ test/data/fastq/misc_dna_as_illumina.fastq | 16 + test/data/fastq/misc_dna_as_sanger.fastq | 16 + test/data/fastq/misc_dna_as_solexa.fastq | 16 + test/data/fastq/misc_dna_original_sanger.fastq | 16 + test/data/fastq/misc_rna_as_illumina.fastq | 16 + test/data/fastq/misc_rna_as_sanger.fastq | 16 + test/data/fastq/misc_rna_as_solexa.fastq | 16 + test/data/fastq/misc_rna_original_sanger.fastq | 16 + .../data/fastq/sanger_full_range_as_illumina.fastq | 8 + test/data/fastq/sanger_full_range_as_sanger.fastq | 8 + test/data/fastq/sanger_full_range_as_solexa.fastq | 8 + .../fastq/sanger_full_range_original_sanger.fastq | 8 + .../data/fastq/solexa_full_range_as_illumina.fastq | 8 + test/data/fastq/solexa_full_range_as_sanger.fastq | 8 + test/data/fastq/solexa_full_range_as_solexa.fastq | 8 + .../fastq/solexa_full_range_original_solexa.fastq | 8 + test/data/fastq/wrapping_as_illumina.fastq | 12 + test/data/fastq/wrapping_as_sanger.fastq | 12 + test/data/fastq/wrapping_as_solexa.fastq | 12 + test/data/fastq/wrapping_original_sanger.fastq | 24 + test/unit/bio/db/test_fastq.rb | 711 ++++++++++++++++++++ 51 files changed, 1652 insertions(+), 0 deletions(-) create mode 100644 test/data/fastq/error_diff_ids.fastq create mode 100644 test/data/fastq/error_double_qual.fastq create mode 100644 test/data/fastq/error_double_seq.fastq create mode 100644 test/data/fastq/error_long_qual.fastq create mode 100644 test/data/fastq/error_no_qual.fastq create mode 100644 test/data/fastq/error_qual_del.fastq create mode 100644 test/data/fastq/error_qual_escape.fastq create mode 100644 test/data/fastq/error_qual_null.fastq create mode 100644 test/data/fastq/error_qual_space.fastq create mode 100644 test/data/fastq/error_qual_tab.fastq create mode 100644 test/data/fastq/error_qual_unit_sep.fastq create mode 100644 test/data/fastq/error_qual_vtab.fastq create mode 100644 test/data/fastq/error_short_qual.fastq create mode 100644 test/data/fastq/error_spaces.fastq create mode 100644 test/data/fastq/error_tabs.fastq create mode 100644 test/data/fastq/error_trunc_at_plus.fastq create mode 100644 test/data/fastq/error_trunc_at_qual.fastq create mode 100644 test/data/fastq/error_trunc_at_seq.fastq create mode 100644 test/data/fastq/error_trunc_in_plus.fastq create mode 100644 test/data/fastq/error_trunc_in_qual.fastq create mode 100644 test/data/fastq/error_trunc_in_seq.fastq create mode 100644 test/data/fastq/error_trunc_in_title.fastq create mode 100644 test/data/fastq/illumina_full_range_as_illumina.fastq create mode 100644 test/data/fastq/illumina_full_range_as_sanger.fastq create mode 100644 test/data/fastq/illumina_full_range_as_solexa.fastq create mode 100644 test/data/fastq/illumina_full_range_original_illumina.fastq create mode 100644 test/data/fastq/longreads_as_illumina.fastq create mode 100644 test/data/fastq/longreads_as_sanger.fastq create mode 100644 test/data/fastq/longreads_as_solexa.fastq create mode 100644 test/data/fastq/longreads_original_sanger.fastq create mode 100644 test/data/fastq/misc_dna_as_illumina.fastq create mode 100644 test/data/fastq/misc_dna_as_sanger.fastq create mode 100644 test/data/fastq/misc_dna_as_solexa.fastq create mode 100644 test/data/fastq/misc_dna_original_sanger.fastq create mode 100644 test/data/fastq/misc_rna_as_illumina.fastq create mode 100644 test/data/fastq/misc_rna_as_sanger.fastq create mode 100644 test/data/fastq/misc_rna_as_solexa.fastq create mode 100644 test/data/fastq/misc_rna_original_sanger.fastq create mode 100644 test/data/fastq/sanger_full_range_as_illumina.fastq create mode 100644 test/data/fastq/sanger_full_range_as_sanger.fastq create mode 100644 test/data/fastq/sanger_full_range_as_solexa.fastq create mode 100644 test/data/fastq/sanger_full_range_original_sanger.fastq create mode 100644 test/data/fastq/solexa_full_range_as_illumina.fastq create mode 100644 test/data/fastq/solexa_full_range_as_sanger.fastq create mode 100644 test/data/fastq/solexa_full_range_as_solexa.fastq create mode 100644 test/data/fastq/solexa_full_range_original_solexa.fastq create mode 100644 test/data/fastq/wrapping_as_illumina.fastq create mode 100644 test/data/fastq/wrapping_as_sanger.fastq create mode 100644 test/data/fastq/wrapping_as_solexa.fastq create mode 100644 test/data/fastq/wrapping_original_sanger.fastq create mode 100644 test/unit/bio/db/test_fastq.rb commit 951d8f7303a5c28783a2c8b25c9fb347730c1a8f Author: Naohisa Goto Date: Sun Oct 11 19:10:15 2009 +0900 Bio::Fastq API changed. * Bio::Fastq API changed. Removed methods: phred_quality, solexa_quality. New methods: qualities, error_probabilities, format, format=, validate_format. * New exception classes Bio::Fastq::Error::* for errors. * Internal structure is also changed. Internal only classes Bio::Fastq::FormatData::* which store parameters for format variants. lib/bio/db/fastq.rb | 519 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 501 insertions(+), 18 deletions(-) commit 9bb7f6ca762c615e50d98c35b60982a4caeea323 Author: Naohisa Goto Date: Fri Sep 25 23:36:13 2009 +0900 Bug fix: infinite loop in Bio::Fastq.new. Thanks to Hiroyuki Mishima for reporting the bug. lib/bio/db/fastq.rb | 16 ++++++++++------ 1 files changed, 10 insertions(+), 6 deletions(-) commit fca6aa5333a95db4dc87e8fc814bd028d5720de4 Author: Naohisa Goto Date: Fri Mar 20 11:52:33 2009 +0900 Added file format autodetection for Bio::Fastq lib/bio/io/flatfile/autodetection.rb | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) commit 1ba21545e7d49ae8b775fbed7a4e92b1daa54ac6 Author: Naohisa Goto Date: Fri Mar 20 11:48:59 2009 +0900 Added autoload for Bio::Fastq lib/bio.rb | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) commit 380b99106d4c7955b9d07ee8668b53d384c974f4 Author: Naohisa Goto Date: Thu Mar 19 17:07:25 2009 +0900 Newly added FASTQ format parser (still a prototype) lib/bio/db/fastq.rb | 162 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 162 insertions(+), 0 deletions(-) create mode 100644 lib/bio/db/fastq.rb commit 2c5df2a5f1b5ae1ea9e61c1dccc8bcd2f496f6ce Author: Naohisa Goto Date: Sun Sep 20 19:08:55 2009 +0900 Removed "require 'rubygems'". lib/bio/db/phyloxml/phyloxml_parser.rb | 2 -- 1 files changed, 0 insertions(+), 2 deletions(-) commit 67818d2550e5d53eeee0f3d710f66f7506fb8127 Author: Naohisa Goto Date: Sat Sep 19 17:06:21 2009 +0900 Use Bio::PubMed.esearch and efetch, etc. * Changed to use Bio::PubMed.esearch and efetch instead of deprecated methods. * Regular expression for extracting option is changed. sample/pmfetch.rb | 15 +++++++++++---- sample/pmsearch.rb | 17 +++++++++++++---- 2 files changed, 24 insertions(+), 8 deletions(-) commit 0c95889bf69e3140b5f09ade1203d50136aee014 Author: Naohisa Goto Date: Fri Sep 18 17:58:17 2009 +0900 Changed to use temporary directory when writing a file, etc. * To avoid unexpected file corruption and possibly security risk, changed to use temporary directory when writing files. The temporary directory is normally removed when all tests end. To prevent removing the directory, set environment variable BIORUBY_TEST_DEBUG. * To avoid test class name conflict, TestPhyloXMLData is renamed to TestPhyloXMLWriterData. * Added a new test to check existence of libxml-ruby, and removed code to raise error when it is not found. The code of the new test is completely the same as of in test_phyloxml.rb, but it is added for the purpose when test_phyloxml_writer.rb is called independently. test/unit/bio/db/test_phyloxml_writer.rb | 161 +++++++++++++++++++++--------- 1 files changed, 112 insertions(+), 49 deletions(-) commit 520d0f5ed535f621aed60b71d8765a99e97306a6 Author: Naohisa Goto Date: Sun Sep 20 18:34:19 2009 +0900 Newly added internal-only class Bio::Command::Tmpdir to handle temporary directory * Newly added internal-only class Bio::Command::Tmpdir to handle temporary directory. It is BioRuby library internal use only. * Bio::Command.mktmpdir is changed to be completely compatible with Ruby 1.9.x's Dir.mktmpdir. lib/bio/command.rb | 104 +++++++++++++++++++++++++++++++--- test/functional/bio/test_command.rb | 49 ++++++++++++++++ 2 files changed, 143 insertions(+), 10 deletions(-) commit c813b60ae62f44d9688b21d47c84e4b7083547e6 Author: Naohisa Goto Date: Fri Sep 18 17:55:14 2009 +0900 Added new test to check existence of libxml-ruby, instead of raising error. test/unit/bio/db/test_phyloxml.rb | 30 ++++++++++++++++++++---------- 1 files changed, 20 insertions(+), 10 deletions(-) commit 1b71dd9624640f3f775baab360eef0be92a86677 Author: Diana Jaunzeikare Date: Fri Sep 18 21:43:18 2009 -0400 Renamed output files generated by phyloxml_writer unit tests. test/unit/bio/db/test_phyloxml_writer.rb | 13 ++++++++++--- 1 files changed, 10 insertions(+), 3 deletions(-) commit f8e138cb9e28996f1024fa9cf7c68c8f08603941 Author: Diana Jaunzeikare Date: Fri Sep 18 21:33:35 2009 -0400 Added ncbi_taxonomy_mollusca_short.xml test file .../data/phyloxml/ncbi_taxonomy_mollusca_short.xml | 65 ++++++++++++++++++++ 1 files changed, 65 insertions(+), 0 deletions(-) create mode 100644 test/data/phyloxml/ncbi_taxonomy_mollusca_short.xml commit be1be310b7581928581cde24303fe2e16c04e82f Author: Diana Jaunzeikare Date: Fri Sep 18 21:29:20 2009 -0400 Made the code compactible with libxml-ruby 1.1.3 (previous was 0.9.4) version. lib/bio/db/phyloxml/phyloxml_elements.rb | 58 +++++++++++++++--------------- lib/bio/db/phyloxml/phyloxml_parser.rb | 10 ++++- lib/bio/db/phyloxml/phyloxml_writer.rb | 8 +++-- 3 files changed, 42 insertions(+), 34 deletions(-) commit a3441afd5650069a5ada64b202a0714e8723e911 Author: Diana Jaunzeikare Date: Tue May 26 00:55:47 2009 -0400 Newly added PhyloXML support written by Diana Jaunzeikare. * Newly added PhyloXML support written by Diana Jaunzeikare. It have been written during the Google Summer of Code 2009 "Implementing phyloXML support in BioRuby", mentored by Christian Zmasek et al. with NESCent. For details of development, see git://github.com/latvianlinuxgirl/bioruby.git and BioRuby mailing list archives. * This is a combination of 119 commits. The last commit date was Mon Aug 17 10:30:10 2009 -0400. README.rdoc | 3 + doc/Tutorial.rd | 120 ++- lib/bio.rb | 6 + lib/bio/db/phyloxml/phyloxml.xsd | 573 ++++++++ lib/bio/db/phyloxml/phyloxml_elements.rb | 1160 +++++++++++++++++ lib/bio/db/phyloxml/phyloxml_parser.rb | 767 +++++++++++ lib/bio/db/phyloxml/phyloxml_writer.rb | 223 ++++ test/data/phyloxml/apaf.xml | 666 ++++++++++ test/data/phyloxml/bcl_2.xml | 2097 ++++++++++++++++++++++++++++++ test/data/phyloxml/made_up.xml | 144 ++ test/data/phyloxml/phyloxml_examples.xml | 415 ++++++ test/unit/bio/db/test_phyloxml.rb | 619 +++++++++ test/unit/bio/db/test_phyloxml_writer.rb | 258 ++++ 13 files changed, 7050 insertions(+), 1 deletions(-) create mode 100644 lib/bio/db/phyloxml/phyloxml.xsd create mode 100644 lib/bio/db/phyloxml/phyloxml_elements.rb create mode 100644 lib/bio/db/phyloxml/phyloxml_parser.rb create mode 100644 lib/bio/db/phyloxml/phyloxml_writer.rb create mode 100644 test/data/phyloxml/apaf.xml create mode 100644 test/data/phyloxml/bcl_2.xml create mode 100644 test/data/phyloxml/made_up.xml create mode 100644 test/data/phyloxml/phyloxml_examples.xml create mode 100644 test/unit/bio/db/test_phyloxml.rb create mode 100644 test/unit/bio/db/test_phyloxml_writer.rb commit fd8281f03423ddf23f7d409863b4df647f1b1564 Author: Naohisa Goto Date: Wed Sep 9 21:08:15 2009 +0900 Newly added Chromatogram classes contributed by Anthony Underwood. * Newly added Chromatogram classes contributed by Anthony Underwood. See git://github.com/aunderwo/bioruby.git for details of development before this merge. lib/bio.rb | 3 + lib/bio/db/chromatogram.rb | 133 +++++++++++++ lib/bio/db/chromatogram/abi.rb | 111 +++++++++++ .../db/chromatogram/chromatogram_to_biosequence.rb | 32 +++ lib/bio/db/chromatogram/scf.rb | 207 ++++++++++++++++++++ lib/bio/sequence/adapter.rb | 1 + test/data/chromatogram/test_chromatogram_abi.ab1 | Bin 0 -> 228656 bytes .../data/chromatogram/test_chromatogram_scf_v2.scf | Bin 0 -> 47503 bytes .../data/chromatogram/test_chromatogram_scf_v3.scf | Bin 0 -> 47503 bytes test/unit/bio/db/test_chromatogram.rb | 99 ++++++++++ 10 files changed, 586 insertions(+), 0 deletions(-) create mode 100644 lib/bio/db/chromatogram.rb create mode 100644 lib/bio/db/chromatogram/abi.rb create mode 100644 lib/bio/db/chromatogram/chromatogram_to_biosequence.rb create mode 100644 lib/bio/db/chromatogram/scf.rb create mode 100644 test/data/chromatogram/test_chromatogram_abi.ab1 create mode 100644 test/data/chromatogram/test_chromatogram_scf_v2.scf create mode 100644 test/data/chromatogram/test_chromatogram_scf_v3.scf create mode 100644 test/unit/bio/db/test_chromatogram.rb commit 78f9463b764687401ff4a7480c1383c5594e5133 Author: Naohisa Goto Date: Thu Sep 10 12:38:25 2009 +0900 Bio::BIORUBY_EXTRA_VERSION is changed to ".5000". bioruby.gemspec | 2 +- lib/bio/version.rb | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) commit e731c6e52bc9a672e4546eeca4f2d2d968bdba09 Author: Naohisa Goto Date: Wed Sep 2 15:24:00 2009 +0900 BioRuby 1.3.1 is released. ChangeLog is modified, and bioruby.gemspec is regenerated. ChangeLog | 11 +++++++++++ bioruby.gemspec | 2 +- 2 files changed, 12 insertions(+), 1 deletions(-) bio-2.0.3/doc/ChangeLog-1.4.30000644000175000017500000015302014141516614014577 0ustar nileshnileshcommit ad0d7a1712d8b02358763233d38e67a0fff54917 Author: Naohisa Goto Date: Wed Aug 22 00:18:14 2012 +0900 BioRuby 1.4.3 is re-released ChangeLog | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) commit 51ab2dec144c99a14ca9009c7b589b500f1cad5f Author: Naohisa Goto Date: Wed Aug 22 00:12:47 2012 +0900 Preparation to re-release BioRuby 1.4.3 ChangeLog | 22 ++++++++++++++++++++++ 1 files changed, 22 insertions(+), 0 deletions(-) commit 5ff159d12252393ff04afe52b59a315d15c63d18 Author: Naohisa Goto Date: Wed Aug 22 00:00:40 2012 +0900 Bug fix: bin/bioruby failed to save object * Bug fix: bin/bioruby: Failed to save object with error message "can't convert Symbol into String" on Ruby 1.9. RELEASE_NOTES.rdoc | 2 ++ lib/bio/shell/core.rb | 1 + 2 files changed, 3 insertions(+), 0 deletions(-) commit 74c6ce09413e7ddde1431d74e10cc9c4cdbb95ba Author: Naohisa Goto Date: Tue Aug 21 22:35:18 2012 +0900 BioRuby 1.4.3 is released. ChangeLog | 21 +++++++++++++++++++++ 1 files changed, 21 insertions(+), 0 deletions(-) commit 61af85b6cfc7bb1f3668ed68232113eb0751e7ea Author: Naohisa Goto Date: Tue Aug 21 22:33:30 2012 +0900 preparation for BioRuby 1.4.3 release version bioruby.gemspec | 2 +- lib/bio/version.rb | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) commit 1ec68beac42a06e9ef0a9c953650ef4d599e4e65 Author: Naohisa Goto Date: Tue Aug 21 20:53:04 2012 +0900 ChangeLog modified; release candidate version 1.4.3-rc2 ChangeLog | 1353 ++++++++++++++++++++++++++++++++++++++++++++++++++++ bioruby.gemspec | 2 +- lib/bio/version.rb | 4 +- 3 files changed, 1356 insertions(+), 3 deletions(-) commit e0d570b237a8b96ae0c1e7b1ad72c7333be07c52 Author: Naohisa Goto Date: Mon Aug 20 20:35:58 2012 +0900 version changed to 1.4.3-rc1 bioruby.gemspec | 3 ++- lib/bio/version.rb | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) commit 511c81ba67f7b8dc9cff85cf68db654d2feaf52e Author: Naohisa Goto Date: Mon Aug 20 20:17:14 2012 +0900 document JRUBY-5678 (resolved) and related issue with the workaround. KNOWN_ISSUES.rdoc | 9 +++++++++ RELEASE_NOTES.rdoc | 9 +++++++++ 2 files changed, 18 insertions(+), 0 deletions(-) commit 2fdd7a3b3555a33dead31181c9526af22f24916f Author: Naohisa Goto Date: Mon Aug 20 19:44:39 2012 +0900 update recommended Ruby versions and the year in copyright lines README.rdoc | 7 +++---- 1 files changed, 3 insertions(+), 4 deletions(-) commit b156227749e5ada74330e837c9ce48a16e6a6a2f Author: Naohisa Goto Date: Mon Aug 20 19:16:25 2012 +0900 Bug fix: Bio::EMBL#os raises error, with incompatible change * Bug fix: Bio::EMBL#os raises error. The bug is reported by Marc P. Hoeppner in the BioRuby mailing list (https://redmine.open-bio.org/issues/3294). * Incompatible change: Bio::EMBL#os no longer splits the content with comma, and it no longer raises error even if the OS line is not in the "Genus species (name)" format. The changes may affect the parsing of old EMBL files which contain two or more species names in an OS line. * Unit tests are modified to catch up the above incompatible changes. RELEASE_NOTES.rdoc | 14 ++++++ lib/bio/db/embl/embl.rb | 74 ++++++++++++++++++++++++++++++ test/unit/bio/db/embl/test_embl.rb | 9 +--- test/unit/bio/db/embl/test_embl_rel89.rb | 9 +--- 4 files changed, 92 insertions(+), 14 deletions(-) commit 31c8b4cb6ce2364aacee8137ddec3aa5f7d2d0d8 Author: Naohisa Goto Date: Mon Aug 20 19:04:50 2012 +0900 Workaround for jruby-1.7.0.preview2 bugs JRUBY-6195, JRUBY-6818 * Workaroud for jruby-1.7.0.preview2 bugs JRUBY-6195 and JRUBY-6818. * Refactoring of call_command_popen: split _call_command_popen_ruby18 and _call_command_popen_ruby19, add _call_command_popen_jruby19. Note that _call_command_popen_jruby19 will be removed in the future after the bugs are fixed. lib/bio/command.rb | 98 ++++++++++++++++++++++++++++++++++++++++++++++----- 1 files changed, 88 insertions(+), 10 deletions(-) commit 05f51fa2e871e71c2b20559eb05e456768a4f7d6 Author: Naohisa Goto Date: Sat Aug 18 00:32:31 2012 +0900 New default etc/bioinformatics/seqdatabase.ini * New default etc/bioinformatics/seqdatabase.ini, with currently available services. etc/bioinformatics/seqdatabase.ini | 27 +++++++++++++++++++++++++++ 1 files changed, 27 insertions(+), 0 deletions(-) create mode 100644 etc/bioinformatics/seqdatabase.ini commit a4264cc3667b98289c09efc7ccba9c8e86f6d89c Author: Naohisa Goto Date: Sat Aug 18 00:31:10 2012 +0900 etc/bioinformatics/seqdatabase.ini is moved to sample/ etc/bioinformatics/seqdatabase.ini | 210 ------------------------------------ sample/seqdatabase.ini | 210 ++++++++++++++++++++++++++++++++++++ 2 files changed, 210 insertions(+), 210 deletions(-) delete mode 100644 etc/bioinformatics/seqdatabase.ini create mode 100644 sample/seqdatabase.ini commit 04b7a27b557576f5325b3ee420262922ab66ca3b Author: Naohisa Goto Date: Sat Aug 18 00:30:38 2012 +0900 known issue about http://bioruby.org/cgi-bin/biofetch.rb server down KNOWN_ISSUES.rdoc | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) commit 4a8193f7b91ff703c8f3dc6e6a6ae0c981a404e6 Author: Naohisa Goto Date: Fri Aug 17 23:45:41 2012 +0900 Update descriptions about JRuby and Rubinius bugs KNOWN_ISSUES.rdoc | 14 ++++++++++---- RELEASE_NOTES.rdoc | 14 ++++++++++---- 2 files changed, 20 insertions(+), 8 deletions(-) commit a2d8dd8ccebde84e91f82c59e531cc08fbf0f3fe Author: Naohisa Goto Date: Fri Aug 17 17:19:22 2012 +0900 Remove the suffix .rb in require, to avoid potential multiple loading. test/unit/bio/db/fasta/test_defline.rb | 2 +- test/unit/bio/db/genbank/test_genpept.rb | 2 +- test/unit/bio/db/kegg/test_drug.rb | 2 +- test/unit/bio/db/kegg/test_genome.rb | 2 +- test/unit/bio/db/kegg/test_glycan.rb | 2 +- test/unit/bio/util/test_restriction_enzyme.rb | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) commit 1d2e8b02db3699c2cd4f4890abc078ffd2b503aa Author: Ben J. Woodcroft Date: Wed Aug 8 09:41:20 2012 +1000 fill in missing piece of documentation in FastaFormat lib/bio/db/fasta.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 83bf09d4d81803c8d06e0d45ca25e7c09016161c Author: Naohisa Goto Date: Wed Aug 8 00:08:26 2012 +0900 RELEASE_NOTE.rdoc modified to reflect recent changes RELEASE_NOTES.rdoc | 107 ++++++++++++++++++++++++++++++++++++++++++++------- 1 files changed, 92 insertions(+), 15 deletions(-) commit c3afb1eb98cf8777ee021624c3d2eab92b3543f2 Author: Naohisa Goto Date: Wed Aug 8 00:06:09 2012 +0900 Descriptions about JRuby, Rubinius, DDBJ Web API, SOAP4R etc. KNOWN_ISSUES.rdoc | 45 +++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 43 insertions(+), 2 deletions(-) commit 01da7401a011aa519c43a021f89f6e7f769b4649 Author: Naohisa Goto Date: Tue Aug 7 23:55:09 2012 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) commit 9f70c27d9b75408fddae8384a2a09715b959dcb5 Author: Naohisa Goto Date: Tue Aug 7 23:51:56 2012 +0900 improve documentation; version changed to 1.4.3-pre1 lib/bio/version.rb | 13 +++++++++++-- 1 files changed, 11 insertions(+), 2 deletions(-) commit c11f12c8aa56b8509cd082f3478e96374210e5d7 Author: Naohisa Goto Date: Tue Aug 7 23:31:41 2012 +0900 Remove autorequire which have been deprecated bioruby.gemspec.erb | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) commit 7792b092033d2c819f2bcad0e206f27608481db5 Author: Ben J Woodcroft Date: Mon Aug 6 09:40:55 2012 +1000 flesh out FastaFormat documentation lib/bio/db/fasta.rb | 102 ++++++++++++++++++++++++------------------- lib/bio/db/fasta/defline.rb | 2 +- 2 files changed, 58 insertions(+), 46 deletions(-) commit 9a2fe67c247cdc7c9ddc9f8b8de771515ba76ac1 Author: Naohisa Goto Date: Fri Aug 3 22:36:12 2012 +0900 .travis.yml: restructure matrix, add allow_failures lines * Add allow_failures lines * Restructure matrix: remove many exclude lines and add some include lines. * When running jruby, Set TMPDIR to avoid known issue about FileUtils#remove_entry_secure. .travis.yml | 52 ++++++++++++++++++---------------------------------- 1 files changed, 18 insertions(+), 34 deletions(-) commit 553fd102c533c42675f93895557e3e00d36fd3e7 Author: Naohisa Goto Date: Fri Aug 3 22:05:39 2012 +0900 Improve tests for BLAST "-m 8" tabular format parser test/unit/bio/appl/blast/test_report.rb | 119 +++++++++++++++++++++++++++++++ 1 files changed, 119 insertions(+), 0 deletions(-) commit 3e1c062dbc168bd558ca8408a6da115aa570f3a7 Author: Naohisa Goto Date: Fri Aug 3 22:05:07 2012 +0900 Improve test and suppress warning: assigned but unused variable test/unit/bio/io/flatfile/test_buffer.rb | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) commit 7e29ce1f050e9e5b23299372d8ddfae781447dc3 Author: Naohisa Goto Date: Fri Aug 3 22:02:21 2012 +0900 Improve test and suppress warning: assigned but unused variable test/unit/bio/db/test_newick.rb | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) commit 1053b62069df74f336934e4ed0f3f217e4ad3312 Author: Naohisa Goto Date: Fri Jul 27 13:56:53 2012 +0900 Suppress warnings: shadowing outer local variable * Suppress warnings: shadowing outer local variable. Thanks to Andrew Grimm: https://github.com/bioruby/bioruby/pull/64 lib/bio/db/gff.rb | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) commit e55794f65b3fb45c99e61d45220fe42f718426a3 Author: Naohisa Goto Date: Wed Jul 25 23:29:17 2012 +0900 Suppress warnings in lib/bio/alignment.rb:2322 * A space is inserted to suppress warnings in lib/bio/alignment.rb:2322. * warning: :' after local variable is interpreted as binary operator * warning: even though it seems like symbol literal lib/bio/alignment.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 174a38ea8c4ecea70724bf6ec8e72b2e4259853b Author: Naohisa Goto Date: Wed Jul 25 23:12:51 2012 +0900 Modified to follow changes of GenomeNet BLAST site lib/bio/appl/blast/genomenet.rb | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) commit 93e24935840dcdec76984313719700134d69daf2 Author: Naohisa Goto Date: Wed Jul 25 15:21:32 2012 +0900 suppress warnings: instance variable @comment not initialized lib/bio/db/gff.rb | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) commit 0ad3818fedb707a26e849877bde1f8dab006b848 Author: Naohisa Goto Date: Wed Jul 25 00:54:02 2012 +0900 suppress warnings: URI.escape/URI.unescape is obsolete lib/bio/db/gff.rb | 39 +++++++++++++++++++++++++++++++++------ 1 files changed, 33 insertions(+), 6 deletions(-) commit 1263938742e7eeedb4a877aff7314e304320eca9 Author: Naohisa Goto Date: Mon Jul 23 21:15:52 2012 +0900 Added link to blastall options reference * Added link to blastall options reference. Thanks to Gareth Rees who sent a pull request. (https://github.com/bioruby/bioruby/pull/49) lib/bio/appl/blast/genomenet.rb | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) commit 2ec5f4fd5abd0db7ec79ab3a9fd4adde7c9384a8 Author: Naohisa Goto Date: Mon Jul 23 17:26:45 2012 +0900 Next bioruby release version will be 1.4.3. RELEASE_NOTES.rdoc | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 6cf1318507a5d82bb93acdfe33e96723a2e742fc Author: Naohisa Goto Date: Mon Jul 23 17:25:35 2012 +0900 fix typo README.rdoc | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 2fd71cac315affe6e4d90b03dadac782f11553a5 Author: Naohisa Goto Date: Mon Jul 23 17:21:57 2012 +0900 Bug fix: Genomenet remote blast: catch up changes of the server lib/bio/appl/blast/genomenet.rb | 33 +++++++++++++++++++++++---------- 1 files changed, 23 insertions(+), 10 deletions(-) commit 69d9717da11b2fe81a8f840bbafcc5fbb0dbe688 Author: Naohisa Goto Date: Fri Jul 20 11:24:37 2012 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) commit 9683da186579dbfa5da1bb1a32edc49cfdc026b8 Author: Naohisa Goto Date: Wed Jul 18 23:19:33 2012 +0900 Incompatible changes in Bio::KEGG::KGML are documented. * Incompatible changes in Bio::KEGG::KGML are documented. * Next BioRuby release version will be 1.4.3. RELEASE_NOTES.rdoc | 44 +++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 41 insertions(+), 3 deletions(-) commit 6cab377ae760d1abfda06caafe4a04ecd549e21d Author: Naohisa Goto Date: Wed Jul 18 22:56:00 2012 +0900 Incompatible changes: Bio::KEGG::KGML::Reaction#substrates, products * Incompatible changes: Bio::KEGG::KGML::Reaction#substrates and Bio::KEGG::KGML::Reaction#products are changed to return an array containing Bio::KEGG::KGML::Substrate and Bio::KEGG::KGML::Product objects, respectively. The aim of these changes are to store ID of substrates and products that were thrown away in the previous versions. lib/bio/db/kegg/kgml.rb | 48 ++++++++++++++-- test/unit/bio/db/kegg/test_kgml.rb | 104 +++++++++++++++++++++++++++++++++++- 2 files changed, 144 insertions(+), 8 deletions(-) commit 3cb1e09709d3c6b934028e28f9cafed149c9c751 Author: Naohisa Goto Date: Wed Jul 18 22:16:46 2012 +0900 Bio::KEGG::KGML#parse_* :use new attribute names * In Bio::KEGG::KGML#parse_* (private methods) new attribute method names should be used instead of deprecated old names. lib/bio/db/kegg/kgml.rb | 18 +++++++++--------- 1 files changed, 9 insertions(+), 9 deletions(-) commit c5ef981db6add98dc6778cd9809aff38a7071593 Author: Naohisa Goto Date: Wed Jul 18 22:14:33 2012 +0900 modified documentation for Bio::KEGG::KGML lib/bio/db/kegg/kgml.rb | 73 +++++++++++++++++++++++++++-------------------- 1 files changed, 42 insertions(+), 31 deletions(-) commit 5416b84eaa37b5abf15f905586a5eee65c4026f0 Author: Naohisa Goto Date: Wed Jul 18 15:01:58 2012 +0900 New class Bio::KEGG::KGML::Graphics with tests for Bio::KEGG::KGML * New class Bio::KEGG::KGML::Graphics for storing a graphics element. This fixes https://github.com/bioruby/bioruby/issues/51. * Unit tests for Bio::KEGG::KGML are added with mock test data. * Improve rdoc documentation for Bio::KEGG::KGML. * New method Bio::KEGG::KGML::Reaction#id * Attribute methods that were different from the KGML attribute names are renamed to the names of the KGML attribute names. Old method names are deprecated and are changed to aliases and will be removed in the future. * Bio::KEGG::KGML::Entry#id (old name: entry_id) * Bio::KEGG::KGML::Entry#type (old name: category) * Bio::KEGG::KGML::Entry#entry1 (old name: node1) * Bio::KEGG::KGML::Entry#entry2 (old name: node2) * Bio::KEGG::KGML::Entry#type (old name: rel) * Bio::KEGG::KGML::Reaction#name (old name: entry_id) * Bio::KEGG::KGML::Reaction#type (old name: direction) * Following attribute methods are deprecated because two or more graphics elements may exist in an entry element. They will be removed in the future. * Bio::KEGG::KGML::Entry#label * Bio::KEGG::KGML::Entry#shape * Bio::KEGG::KGML::Entry#x * Bio::KEGG::KGML::Entry#y * Bio::KEGG::KGML::Entry#width * Bio::KEGG::KGML::Entry#height * Bio::KEGG::KGML::Entry#fgcolor * Bio::KEGG::KGML::Entry#bgcolor lib/bio/db/kegg/kgml.rb | 321 ++++++++++--- test/data/KEGG/test.kgml | 37 ++ test/unit/bio/db/kegg/test_kgml.rb | 922 ++++++++++++++++++++++++++++++++++++ 3 files changed, 1223 insertions(+), 57 deletions(-) create mode 100644 test/data/KEGG/test.kgml create mode 100644 test/unit/bio/db/kegg/test_kgml.rb commit e5478363ef6969ec14c4e09c2bd7c6d27c12cf5b Author: Naohisa Goto Date: Tue Jul 17 22:23:28 2012 +0900 rdoc documentation for Bio::KEGG::KGML lib/bio/db/kegg/kgml.rb | 166 ++++++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 157 insertions(+), 9 deletions(-) commit 4a97e7034cae835b3bbc8ef918b9c6c48910dec5 Author: Naohisa Goto Date: Wed Jul 11 15:16:49 2012 +0900 autoload should not be used for libraries outside bio lib/bio/db/kegg/kgml.rb | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) commit 338d4cd9913d70041349c5201f80f7a65e7135a6 Author: Naohisa Goto Date: Fri Jul 6 00:50:01 2012 +0900 remove unnecessary require "bio/db" in lib/bio/db/pdb.rb lib/bio/db/pdb.rb | 5 +---- 1 files changed, 1 insertions(+), 4 deletions(-) commit 87c806a480fcacb0fc610c9669de19e4cb661a9c Author: Naohisa Goto Date: Fri Jul 6 00:47:20 2012 +0900 workaround to avoid circular require about Bio::PDB lib/bio/db/pdb/atom.rb | 5 +++-- lib/bio/db/pdb/chain.rb | 5 ++--- lib/bio/db/pdb/chemicalcomponent.rb | 5 +++-- lib/bio/db/pdb/model.rb | 4 ++-- lib/bio/db/pdb/pdb.rb | 3 ++- lib/bio/db/pdb/residue.rb | 4 ++-- lib/bio/db/pdb/utils.rb | 11 +++++++---- 7 files changed, 21 insertions(+), 16 deletions(-) commit 874f35c3930506fa029b419aa84677d1fea6681a Author: Naohisa Goto Date: Fri Jul 6 00:24:24 2012 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) commit 090d4edb5698135f87df450a963ef35a307349c4 Author: Naohisa Goto Date: Fri Jul 6 00:19:54 2012 +0900 Tree output (formatter) methods moved to lib/bio/tree/output.rb * To avoid circular require about bio/tree, phylogenetic tree output (formatter) methods are moved to lib/bio/tree/output.rb. lib/bio/db/newick.rb | 244 -------------------------------------------- lib/bio/tree.rb | 3 +- lib/bio/tree/output.rb | 264 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 265 insertions(+), 246 deletions(-) create mode 100644 lib/bio/tree/output.rb commit b3d12b63097a5141b029bbfb3690870cd1935a60 Author: Naohisa Goto Date: Fri Jul 6 00:18:44 2012 +0900 Workaround to avoid circular require for Bio::Blast lib/bio/appl/bl2seq/report.rb | 6 +++--- lib/bio/appl/blast/ddbj.rb | 3 --- lib/bio/appl/blast/format0.rb | 3 +++ lib/bio/appl/blast/genomenet.rb | 2 -- lib/bio/appl/blast/ncbioptions.rb | 11 ++++++++--- lib/bio/appl/blast/remote.rb | 11 ++++++----- lib/bio/appl/blast/report.rb | 16 ++++++++++------ lib/bio/appl/blast/rpsblast.rb | 5 +++-- lib/bio/appl/blast/wublast.rb | 6 +++--- 9 files changed, 36 insertions(+), 27 deletions(-) commit 8f6c906c7b0d65b93ebf0a1e1307259e6eab8465 Author: Naohisa Goto Date: Thu Jul 5 23:29:42 2012 +0900 remove old require lines that are commented out lib/bio/appl/blast/format0.rb | 5 ----- 1 files changed, 0 insertions(+), 5 deletions(-) commit c632fbf2d0320860eadfacb196d51d80ed3a2b34 Author: Naohisa Goto Date: Thu Jul 5 23:16:49 2012 +0900 Remove old workaround of strscan.so for Ruby 1.7 or earlier lib/bio/appl/blast/format0.rb | 18 +----------------- 1 files changed, 1 insertions(+), 17 deletions(-) commit c81dce87f53d3ea7c7d2335e077fa609f2737779 Author: Naohisa Goto Date: Thu Jul 5 23:03:40 2012 +0900 .travis.yml: include ruby 1.9.2 test .travis.yml | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) commit 34709d114089c722b5da796028ffb91021761fdd Author: Naohisa Goto Date: Thu Jul 5 23:00:37 2012 +0900 Remove old comment lines lib/bio/sequence/format.rb | 6 ------ 1 files changed, 0 insertions(+), 6 deletions(-) commit e0d5ed61e0101e2e72ad024dccd58c8c90def2b9 Author: Naohisa Goto Date: Thu Jul 5 22:42:17 2012 +0900 Finalizer for Bio::Command::Tmpdir is changed to suppress test failure * New class Bio::Command::Tmpdir::Remover for removing temporary directory in finilizer. This class is BioRuby internal use only. Users should not use this class. * Finalizer for Bio::Command::Tmpdir is changed from a Proc object to an instance of the Remover class. * Test failure fix: In some environment, with Ruby 1.9.2, test_output_embl(Bio::FuncTestSequenceOutputEMBL) was failed with "<#" that was raised in the finalizer callback of Bio::Command::Tmpdir. This commit fixes the problem. lib/bio/command.rb | 56 +++++++++++++++++++++++++++------------------------ 1 files changed, 30 insertions(+), 26 deletions(-) commit cca98d1378ce66d6db84cc9c1beadd39ed0e0fee Author: Naohisa Goto Date: Thu Jul 5 22:21:34 2012 +0900 Workaround to avoid circular require and JRuby autoload bug * "require" lines are modified to avoid circular require. * In files that would be required directly from outside bio/sequence (aa.rb, adapter.rb, common.rb, compat.rb, dblink.rb, generic.rb, na.rb, quality_score.rb, sequence_masker.rb), because of avoiding potential mismatch of superclass and/or lack of some methods, bio/sequence.rb is required when Bio::Sequence is not defined. * workaround to avoid JRuby autoload bug lib/bio/sequence.rb | 10 ++++++---- lib/bio/sequence/aa.rb | 8 +++----- lib/bio/sequence/adapter.rb | 12 ++++++------ lib/bio/sequence/common.rb | 2 ++ lib/bio/sequence/compat.rb | 9 ++------- lib/bio/sequence/dblink.rb | 11 ++++++----- lib/bio/sequence/generic.rb | 7 +++---- lib/bio/sequence/na.rb | 10 ++++------ lib/bio/sequence/quality_score.rb | 2 ++ lib/bio/sequence/sequence_masker.rb | 3 +++ 10 files changed, 37 insertions(+), 37 deletions(-) commit d2915c33ae7f330837688195a58c1e60fe78402a Author: Naohisa Goto Date: Thu Jul 5 21:04:28 2012 +0900 workaround to avoid circular require in Bio::RestrictionEnzyme * Workaround to avoid circular require in Bio::RestrictionEnzyme * Special care was needed for Bio::RestrictionEnzyme::Analysis because its method definitions are divided into two files: analysis.rb, analysis_basic.rb. lib/bio/util/restriction_enzyme/analysis.rb | 13 ++++++++----- lib/bio/util/restriction_enzyme/analysis_basic.rb | 7 ++++--- lib/bio/util/restriction_enzyme/cut_symbol.rb | 5 +++-- lib/bio/util/restriction_enzyme/dense_int_array.rb | 3 +++ lib/bio/util/restriction_enzyme/double_stranded.rb | 7 +++---- .../double_stranded/aligned_strands.rb | 7 +++---- .../double_stranded/cut_location_pair.rb | 7 +++---- .../cut_location_pair_in_enzyme_notation.rb | 7 +++---- .../double_stranded/cut_locations.rb | 7 +++---- .../cut_locations_in_enzyme_notation.rb | 7 +++---- lib/bio/util/restriction_enzyme/range/cut_range.rb | 7 +++---- .../util/restriction_enzyme/range/cut_ranges.rb | 7 +++---- .../range/horizontal_cut_range.rb | 7 +++---- .../restriction_enzyme/range/sequence_range.rb | 7 +++---- .../range/sequence_range/calculated_cuts.rb | 7 +++---- .../range/sequence_range/fragment.rb | 7 +++---- .../range/sequence_range/fragments.rb | 7 +++---- .../restriction_enzyme/range/vertical_cut_range.rb | 7 +++---- lib/bio/util/restriction_enzyme/single_strand.rb | 6 +++--- .../cut_locations_in_enzyme_notation.rb | 7 +++---- .../restriction_enzyme/single_strand_complement.rb | 7 +++---- .../util/restriction_enzyme/sorted_num_array.rb | 3 +++ .../util/restriction_enzyme/string_formatting.rb | 7 +++---- 23 files changed, 75 insertions(+), 81 deletions(-) commit 7df4843288ffde6d7132a5651fe978301f8ebd2b Author: Naohisa Goto Date: Thu Jul 5 20:18:08 2012 +0900 workaround to avoid JRuby autoload bug lib/bio/util/restriction_enzyme.rb | 4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) commit 97d95f2b400006d4229a7ce69d7d8a5cdce42764 Author: Naohisa Goto Date: Wed Jul 4 22:00:27 2012 +0900 changed require to autoload for the workaround of JRuby autoload bug lib/bio/feature.rb | 5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) commit 530b82a45731c2a71a110826341be425de1271e0 Author: Naohisa Goto Date: Wed Jul 4 22:00:06 2012 +0900 workaround to avoid JRuby autoload bug lib/bio/sequence/common.rb | 4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) commit 8614f31b36fb93d6e49d109268d646ff3032cd1a Author: Naohisa Goto Date: Wed Jul 4 21:28:52 2012 +0900 workaround to avoid JRuby autoload bug * Workaround to avoid JRuby autoload bug. * Changed to require bio/db.rb because it is always loaded. lib/bio/db/kegg/genes.rb | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) commit ea500006ed56857139c858bdfeb98773e5ca541e Author: Naohisa Goto Date: Thu Jun 28 21:36:35 2012 +0900 Rakefile: use own mktmpdir Rakefile | 59 +++++++++++++++++++++++++++++++++++++++++++---------------- 1 files changed, 43 insertions(+), 16 deletions(-) commit 452fadcab61083dcb9d01ee05d300eae5cb23fee Author: Naohisa Goto Date: Thu Jun 28 20:37:59 2012 +0900 .travis.yml: remove "rake regemspec" from after_install .travis.yml | 2 -- 1 files changed, 0 insertions(+), 2 deletions(-) commit 3fad822af3d7e558a58b71fd8ec2a7061b49f9f2 Author: Naohisa Goto Date: Thu Jun 28 20:36:59 2012 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) commit ea6e96fc654c797664b118a6326a84e4f9b1a8a3 Author: Naohisa Goto Date: Thu Jun 28 20:35:49 2012 +0900 print message when doing Dir.chdir Rakefile | 17 +++++++++++------ 1 files changed, 11 insertions(+), 6 deletions(-) commit c2fcd5e8cc71da38dc3c6d1f8c8d0233e47398b3 Author: Naohisa Goto Date: Thu Jun 28 20:28:41 2012 +0900 In tar-install, removed dependency to regemspec Rakefile | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 67a7e83d516aab5d60f8263525b359be8b0ffc0b Author: Naohisa Goto Date: Thu Jun 28 20:23:24 2012 +0900 Rakefile: give up using Dir.mktmpdir because of JRuby's behavior * Rakefile: give up using Dir.mktmpdir because of JRuby's behavior that may be related with http://jira.codehaus.org/browse/JRUBY-5678 Rakefile | 61 ++++++++++++++++++++++++++++++++++++++++++++++--------------- 1 files changed, 46 insertions(+), 15 deletions(-) commit cff098034a338bbe9579d6c7b4380c7132a38ef5 Author: Naohisa Goto Date: Thu Jun 28 19:23:57 2012 +0900 gem-integration-test, gem-install and gem-install-nodoc are removed * gem-integration-test, gem-install and gem-install-nodoc are removed because they are useless with Bundler Rakefile | 13 ------------- 1 files changed, 0 insertions(+), 13 deletions(-) commit d5c054265af4f80318cbfa5a5bbdee6125219de2 Author: Naohisa Goto Date: Thu Jun 28 18:10:05 2012 +0900 .travis.yml: .gemspec is needed to install local gem .travis.yml | 1 + gemfiles/prepare-gemspec.rb | 25 +++++++++++++++++++++++++ 2 files changed, 26 insertions(+), 0 deletions(-) create mode 100644 gemfiles/prepare-gemspec.rb commit 05b6172123f42a1d8d46668d8a3d5f698c371704 Author: Naohisa Goto Date: Thu Jun 28 17:51:43 2012 +0900 remove 1.9.2; add tar/gem integration tests * Remove ruby version 1.9.2 from matrix for reducing builds * Add tar/gem integration tests * Add a new helper script gemfiles/modify-Gemfile.rb, modifying gemfile when running gem integration test. * Remove jruby version comments .travis.yml | 26 +++++++++++++++++--------- gemfiles/modify-Gemfile.rb | 28 ++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+), 9 deletions(-) create mode 100644 gemfiles/modify-Gemfile.rb commit 6813f91893e7ddc3000047357c9ed2dafb32a722 Author: Naohisa Goto Date: Thu Jun 28 17:06:28 2012 +0900 descriptions are modified for danger operations Rakefile | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit a209688952c922d9ba45c227874990bccd3da7c0 Author: Naohisa Goto Date: Mon Jun 25 23:25:51 2012 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) commit 8f6459497be0e9ca7dc3eb2eb9606e42d97ad60c Author: Naohisa Goto Date: Mon Jun 25 21:01:06 2012 +0900 rake tasks added and default task is changed * New tasks: * gem-install: build gem and install it * gem-install-nodoc: build gem and install it with --no-ri --no-rdoc. * gem-test: test installed bioruby gem installed with gem-install (or gem-install-nodoc) * gem-integration-test: build gem, install and run test (with --no-ri --no-rdoc) * tar-install: DANGER: build tar and install by using setup.rb * installed-test: test installed bioruby * tar-integration-test: DANGER: build tar, install and run test * see-env: see BIORUBY_RAKE_DEFAULT_TASK environment variable and invoke the specified task. If the variable did not exist, it invokes "test" which is previously the default task. It is added for selecting task on Travis-ci. It is not recommended to invoke the task explicitly by hand. * Default task is changed from "test" to "see-env". Rakefile | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 107 insertions(+), 3 deletions(-) commit 3b400042cd361e1ab6d0fb0d8c8cce14a6c2ae10 Author: Naohisa Goto Date: Mon Jun 25 20:58:13 2012 +0900 BIORUBY_TEST_LIB is always added on the top of $LOAD_PATH * When BIORUBY_TEST_LIB is specified, the specified directory name is always added on the top of $LOAD_PATH even if it is already included in the middle of $LOAD_PATH. test/bioruby_test_helper.rb | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit 848304b6f90310f8fa15c80ba06655ae5cae5053 Author: Naohisa Goto Date: Mon Jun 25 20:42:07 2012 +0900 New env BIORUBY_TEST_GEM and BIORUBY_TEST_LIB behavior changed * New environment variable BIORUBY_TEST_GEM for testing installed bio-X.X.X gem. Version number can be specified. Example with version number: % env BIORUBY_TEST_GEM=1.4.2.5000 ruby test/runner.rb Example without version number: % env BIORUBY_TEST_GEM="" ruby test/runner.rb * When BIORUBY_TEST_LIB is empty, it no longer add an empty string to $LOAD_PATH. Moreover, when BIORUBY_TEST_GEM is set, the variable is ignored. test/bioruby_test_helper.rb | 49 ++++++++++++++++++++++++++++++++---------- 1 files changed, 37 insertions(+), 12 deletions(-) commit 9453a6773c24f866698370195fd8e767443a38b9 Author: Tomoaki NISHIYAMA Date: Fri Jun 1 18:06:40 2012 +0900 broader FASTQ file recognition * Because PacBio RS sequencer may produce kilobases long reads and read buffer size (default 31 lines) for file format detection may not be sufficient to find the second id line starting with "+", the regular expression for FASTQ is truncated only to check the first id line starting with "@". * Test code is added. lib/bio/io/flatfile/autodetection.rb | 2 +- test/unit/bio/io/flatfile/test_autodetection.rb | 6 ++++++ 2 files changed, 7 insertions(+), 1 deletions(-) commit 120e780c023cba06b83899c2f8a17c8fc1de4faa Author: Naohisa Goto Date: Fri Jun 8 15:36:29 2012 +0900 Retry sequence randomize test up to 10 times when fails * To suppress rare failure of chi-square equiprobability tests for Bio::Sequence::Common#randomize, test code changed to retry up to 10 times if the chi-square test fails. The assertion fails if the chi-square test fails 10 consecutive times, and this strongly suggests bugs in codes or in the random number generator. * The chi-square equiprobability tests are separated into a new test class. test/unit/bio/sequence/test_common.rb | 40 +++++++++++++++++++++++++++++--- 1 files changed, 36 insertions(+), 4 deletions(-) commit 20dde52f7da784d4d9ac551957700cd96e842ef6 Author: Naohisa Goto Date: Sat May 19 18:14:19 2012 +0900 libxml-ruby is disabled because of build error on Travis-ci gemfiles/Gemfile.travis-jruby1.8 | 3 ++- gemfiles/Gemfile.travis-jruby1.9 | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) commit 3c5c1cc277d30737815c7e44a2abbb308f5324b0 Author: Clayton Wheeler Date: Mon May 14 21:48:41 2012 -0400 Use libxml-ruby instead of libxml-jruby to fix JRuby test failures. The travis-ci Gemfiles currently call for libxml-jruby; this appears not to support the same API as libxml-ruby, resulting in several tests in test/unit/bio/db/test_phyloxml.rb failing with "NameError: uninitialized constant LibXMLJRuby::XML::Parser::Options". Switching to the C libxml-ruby library allows these tests to pass under JRuby in 1.8 mode. JRuby in 1.9 mode still fails a few PhyloXML tests due to https://jira.codehaus.org/browse/JRUBY-6662. gemfiles/Gemfile.travis-jruby1.8 | 2 +- gemfiles/Gemfile.travis-jruby1.9 | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) commit 01a618242d67f0d00fe681dfd85e68bb393513fc Author: Clayton Wheeler Date: Thu May 10 23:13:56 2012 -0400 test_tree.rb: to use %f instead of %g to prevent odd behavior. test/unit/bio/test_tree.rb | 22 +++++++++++----------- 1 files changed, 11 insertions(+), 11 deletions(-) commit 5e80e4394bf2a5e4ee472fe84ab76239b293e1b5 Author: Clayton Wheeler Date: Thu May 10 23:04:55 2012 -0400 Fixed spurious JRuby failures in test_tree.rb due to floating point differences. test/unit/bio/test_tree.rb | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) commit 459d4da894e9a9b9db0d793e3711dc45bae2089b Author: Artem Tarasov Date: Thu May 10 16:23:13 2012 +0400 Test bug fix: order of hash keys are not guaranteed * Test bug fix: Bio::TestSOFT#test_dataset: order of hash keys are not guaranteed. test/unit/bio/db/test_soft.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit 7e730691d6ec597a610dc0d4665db3598fcfde59 Author: Naohisa Goto Date: Thu May 10 00:06:19 2012 +0900 removed potential circular require about Bio::Sequence::Format lib/bio/db/embl/format_embl.rb | 4 ---- lib/bio/db/fasta/format_fasta.rb | 4 ---- lib/bio/db/fasta/format_qual.rb | 5 ----- lib/bio/db/fastq/format_fastq.rb | 1 - lib/bio/db/genbank/format_genbank.rb | 4 ---- lib/bio/sequence/format_raw.rb | 4 ---- 6 files changed, 0 insertions(+), 22 deletions(-) commit f1c398fdc3488bd18bd13ac864920ce6db4dab9e Author: Naohisa Goto Date: Wed May 9 15:54:20 2012 +0900 .travis.yml: comment out apt-get lines * .travis.yml: comment out apt-get lines because libxml2-dev and libexpat1-dev are already installed. .travis.yml | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) commit bc5ef4959e51f4a199d9f740b07812e9b8216255 Author: Naohisa Goto Date: Wed May 9 15:47:11 2012 +0900 travis-ci: comment out soap4r-ruby1.9 in Gemfile because of error * travis-ci: soap4r-ruby1.9 gem in Gemfile.travis-ruby1.9 and Gemfile.travis-jruby1.9 is commented out because of an error "uninitialized constant XML::SaxParser". gemfiles/Gemfile.travis-jruby1.9 | 4 +++- gemfiles/Gemfile.travis-ruby1.9 | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) commit 7e8153c09660c31d6286c1924680b8c5073a10b6 Author: Naohisa Goto Date: Tue May 1 18:11:09 2012 +0900 config files for Travis CI continuous integration service .travis.yml | 73 ++++++++++++++++++++++++++++++++++++++ gemfiles/Gemfile.travis-jruby1.8 | 6 +++ gemfiles/Gemfile.travis-jruby1.9 | 7 ++++ gemfiles/Gemfile.travis-ruby1.8 | 7 ++++ gemfiles/Gemfile.travis-ruby1.9 | 8 ++++ 5 files changed, 101 insertions(+), 0 deletions(-) create mode 100644 .travis.yml create mode 100644 gemfiles/Gemfile.travis-jruby1.8 create mode 100644 gemfiles/Gemfile.travis-jruby1.9 create mode 100644 gemfiles/Gemfile.travis-ruby1.8 create mode 100644 gemfiles/Gemfile.travis-ruby1.9 commit f1ecae7763648cb735a885ddb6c46d71c59b0694 Author: Naohisa Goto Date: Fri Mar 23 01:36:59 2012 +0900 Test bug fix: tests affected by the bug of Bio::NucleicAcid.to_re("s") test/unit/bio/data/test_na.rb | 2 +- test/unit/bio/sequence/test_na.rb | 2 +- test/unit/bio/test_sequence.rb | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) commit 3fd9384b1b59140a929c81dcc4b07cb3c2e47525 Author: Trevor Wennblom Date: Sat Feb 25 15:26:27 2012 -0600 Bug fix: Bio::NucleicAcid.to_re("s") typo lib/bio/data/na.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit c552aa3a6773139b14ae95e79e0fb43a2f91c6fb Author: Naohisa Goto Date: Thu Jan 12 22:24:37 2012 +0900 Bug fix: GenomeNet BLAST server URI changed. * Bug fix: GenomeNet BLAST server URI changed. Reported by joaocardoso via GitHub. ( https://github.com/bioruby/bioruby/issues/44 ) lib/bio/appl/blast/genomenet.rb | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) commit f33abf9bbd90c3c1e320f06447fdb54ffd094c5d Author: peterjc Date: Fri Nov 25 11:20:08 2011 +0000 Mark echoarg2.bat and echoarg2.sh as world executable 0 files changed, 0 insertions(+), 0 deletions(-) mode change 100644 => 100755 test/data/command/echoarg2.bat mode change 100644 => 100755 test/data/command/echoarg2.sh commit d2d66f833d0b20647e8d761d2a240b99b206eaa8 Author: Naohisa Goto Date: Thu Nov 24 13:32:37 2011 +0900 Bug fix: rake aborted without git bioruby.gemspec.erb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit c2139739988ef731d61bf1a8cdba2dc5c48393bd Author: Naohisa Goto Date: Thu Nov 24 13:07:10 2011 +0900 regenerate bioruby.gemspec with rake regemspec. bioruby.gemspec | 18 ++++++++++-------- 1 files changed, 10 insertions(+), 8 deletions(-) commit 6213b45d28bfea2cc8c838813b524d48c369266b Author: Naohisa Goto Date: Thu Nov 24 13:05:07 2011 +0900 Added workaround for changes of a module name and file names to require. Rakefile | 21 +++++++++++++++++++-- 1 files changed, 19 insertions(+), 2 deletions(-) commit 39f847cf8d453476275361078b831da43d400816 Author: Naohisa Goto Date: Thu Nov 24 12:08:47 2011 +0900 Use binary mode to open files. Rakefile | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) commit 688779e71a27e861fb01e07f816384561b8cfe45 Author: Naohisa Goto Date: Thu Nov 24 11:49:30 2011 +0900 Rakefile: new tasks: test-all to run all tests, etc. * Rakefile: new tasks: test-all to run all tests, and test-network to run tests in test/network. Rakefile | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) commit 53719535defcb0fefb3cf8bebe3fad6716bf7de2 Author: Naohisa Goto Date: Thu Nov 24 11:28:38 2011 +0900 test/runner.rb: Run tests only in test/unit and test/functional. test/runner.rb | 22 ++++++++++++++++------ 1 files changed, 16 insertions(+), 6 deletions(-) commit fb9ee403db6b447aee73ebb7f12ff5a5b73d6c52 Author: Naohisa Goto Date: Wed Nov 23 20:36:36 2011 +0900 A test class using network connection is moved under test/network/. test/functional/bio/test_command.rb | 16 ---------------- test/network/bio/test_command.rb | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+), 16 deletions(-) create mode 100644 test/network/bio/test_command.rb commit a6dda2215aa686a9ca4af7484aa190f726d51e69 Author: Naohisa Goto Date: Wed Nov 23 20:28:58 2011 +0900 Tests using network connections are moved to test/network/ * Tests using network connections are moved to test/network/. * renamed: test/functional/bio/appl -> test/network/bio/appl * renamed: test/functional/bio/io -> test/network/bio/io test/functional/bio/appl/blast/test_remote.rb | 93 --------- test/functional/bio/appl/test_blast.rb | 61 ------ test/functional/bio/appl/test_pts1.rb | 117 ----------- test/functional/bio/io/test_ddbjrest.rb | 47 ----- test/functional/bio/io/test_ensembl.rb | 230 --------------------- test/functional/bio/io/test_pubmed.rb | 135 ------------- test/functional/bio/io/test_soapwsdl.rb | 53 ----- test/functional/bio/io/test_togows.rb | 268 ------------------------- test/network/bio/appl/blast/test_remote.rb | 93 +++++++++ test/network/bio/appl/test_blast.rb | 61 ++++++ test/network/bio/appl/test_pts1.rb | 117 +++++++++++ test/network/bio/io/test_ddbjrest.rb | 47 +++++ test/network/bio/io/test_ensembl.rb | 230 +++++++++++++++++++++ test/network/bio/io/test_pubmed.rb | 135 +++++++++++++ test/network/bio/io/test_soapwsdl.rb | 53 +++++ test/network/bio/io/test_togows.rb | 268 +++++++++++++++++++++++++ 16 files changed, 1004 insertions(+), 1004 deletions(-) delete mode 100644 test/functional/bio/appl/blast/test_remote.rb delete mode 100644 test/functional/bio/appl/test_blast.rb delete mode 100644 test/functional/bio/appl/test_pts1.rb delete mode 100644 test/functional/bio/io/test_ddbjrest.rb delete mode 100644 test/functional/bio/io/test_ensembl.rb delete mode 100644 test/functional/bio/io/test_pubmed.rb delete mode 100644 test/functional/bio/io/test_soapwsdl.rb delete mode 100644 test/functional/bio/io/test_togows.rb create mode 100644 test/network/bio/appl/blast/test_remote.rb create mode 100644 test/network/bio/appl/test_blast.rb create mode 100644 test/network/bio/appl/test_pts1.rb create mode 100644 test/network/bio/io/test_ddbjrest.rb create mode 100644 test/network/bio/io/test_ensembl.rb create mode 100644 test/network/bio/io/test_pubmed.rb create mode 100644 test/network/bio/io/test_soapwsdl.rb create mode 100644 test/network/bio/io/test_togows.rb commit ec747aa33d06e08a6469dfd330360161d1b0f8e2 Author: Naohisa Goto Date: Wed Nov 23 15:03:08 2011 +0900 Test bug fix: use binmode to disable CR/LF conversion (fail on Windows) test/unit/bio/appl/blast/test_rpsblast.rb | 1 + test/unit/bio/io/flatfile/test_buffer.rb | 1 + 2 files changed, 2 insertions(+), 0 deletions(-) commit 07ce32da009baa2c4e81f6d96f45e3dac49da183 Author: Naohisa Goto Date: Wed Nov 23 14:47:33 2011 +0900 Test bug fix: Read Sanger chromatogram files with binary mode * Test bug fix: Read Sanger chromatogram files with binary mode. Fix error/failure on Windows due to default text mode reading. test/unit/bio/db/sanger_chromatogram/test_abif.rb | 3 ++- test/unit/bio/db/sanger_chromatogram/test_scf.rb | 6 ++++-- 2 files changed, 6 insertions(+), 3 deletions(-) commit 20d9068643214e3482d18c36028e50b3c9109755 Author: Naohisa Goto Date: Wed Nov 23 14:17:25 2011 +0900 Incompatible change: Bio::FlatFile.open and auto use binary mode * Incompatible change: Bio::FlatFile.open and auto use binary mode (binmode) unless text mode option is explicitly given. RELEASE_NOTES.rdoc | 7 ++ lib/bio/io/flatfile/buffer.rb | 84 ++++++++++++++++++ test/unit/bio/io/flatfile/test_buffer.rb | 139 ++++++++++++++++++++++++++++++ 3 files changed, 230 insertions(+), 0 deletions(-) commit 48bd150a6180d59879872bd85dd95c7ddf1a19c0 Author: Naohisa Goto Date: Tue Nov 22 17:32:23 2011 +0900 Test bug fix: fixed incomplete Windows platform detection. test/unit/bio/test_command.rb | 13 +++++++++---- 1 files changed, 9 insertions(+), 4 deletions(-) commit d499bcee7956b1a0a4c04aeb106e50a0839167b0 Author: Naohisa Goto Date: Tue Nov 22 16:15:05 2011 +0900 FuncTestCommandCall is changed to test various command-lines. * New file test/data/command/echoarg2.sh shell script, which acts like echoarg2.bat for Windows. * FuncTestCommandCall is changed to test various command-lines. test/data/command/echoarg2.sh | 4 ++ test/functional/bio/test_command.rb | 70 +++++++++++++++++++++++++++++------ 2 files changed, 62 insertions(+), 12 deletions(-) create mode 100644 test/data/command/echoarg2.sh commit d45e311c09ad2f4116770dd903f81e652a63ca2a Author: Naohisa Goto Date: Tue Nov 22 14:21:34 2011 +0900 Test bug fix: Opened files should be closed. * Test bug fix: Opened files should be closed. When finalizing writer tests, temporary files are not properly closed after verify reading, and removing the temporary files raise erro on Windows. test/unit/bio/db/test_phyloxml_writer.rb | 24 +++++++++++++++--------- 1 files changed, 15 insertions(+), 9 deletions(-) commit a9022c61b98746e98a83f1cfd902e0e6b11c7bbb Author: Naohisa Goto Date: Tue Nov 22 13:55:15 2011 +0900 New method Bio::PhyloXML::Parser#closed?, and Bio::PhyloXML::Parser.open with block. * New method Bio::PhyloXML::Parser#closed? to check if it is closed or not. * Bio::PhyloXML::Parser.open and open_uri now can get a block. When a block is given, a Bio::PhyloXML::Parser object is passed to the block as an argument. When the block terminates, the object is closed. * Added tests about the above changes. lib/bio/db/phyloxml/phyloxml_parser.rb | 57 +++++++++++++++++++++++++++++--- test/unit/bio/db/test_phyloxml.rb | 56 ++++++++++++++++++++++++++++++- 2 files changed, 106 insertions(+), 7 deletions(-) commit 893cbe6ca993eca08427074059c2ba03621ea889 Author: Naohisa Goto Date: Sat Nov 5 00:49:10 2011 +0900 Ruby 1.9 should be fully supported, and optional requirements are revised. README.rdoc | 48 +++++++++++++++++++++++++++++++++--------------- 1 files changed, 33 insertions(+), 15 deletions(-) commit 38b1715c2d6bad39560e0846781ca903b1c16eda Author: Naohisa Goto Date: Fri Nov 4 22:12:38 2011 +0900 Added REFERENCE. README.rdoc | 12 ++++++++++++ 1 files changed, 12 insertions(+), 0 deletions(-) commit 9a766cd17236bbe1e28d6972001dd5e3ed596123 Author: Naohisa Goto Date: Fri Nov 4 21:39:20 2011 +0900 Removed "setup.rb test" and added about running tests. README.rdoc | 39 ++++++++++++++++++++++++++++++++++----- 1 files changed, 34 insertions(+), 5 deletions(-) commit 39737179b06366e1d5acf2e5ac930e41b3a4ee38 Author: Pjotr Prins Date: Fri Oct 14 08:58:01 2011 +0200 Tutorial: added info on biogems doc/Tutorial.rd | 16 ++++++++++++++++ doc/Tutorial.rd.html | 23 +++++++++++++++-------- 2 files changed, 31 insertions(+), 8 deletions(-) commit e84400c5e9e94d95d6a8d3c4b72388b94d204766 Author: Pjotr Prins Date: Fri Oct 14 08:49:41 2011 +0200 Tutorial: small updates doc/Tutorial.rd | 8 +++++--- doc/Tutorial.rd.html | 9 +++++---- 2 files changed, 10 insertions(+), 7 deletions(-) commit 9fe07345b3b7be890d5baad9a51f0752af5e0ac4 Author: Naohisa Goto Date: Tue Sep 13 23:05:39 2011 +0900 README_DEV.rdoc: added git tips and policies, etc. * Added Git tips about sending a patch or a pull request. * Added Git management policies for the blessed repository. * Added some coding styles. * Added descriptions about Ruby versions and OS. README_DEV.rdoc | 95 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 93 insertions(+), 2 deletions(-) commit 3c952c4a782501b21f36ece5bcab672dab12fc6d Author: Naohisa Goto Date: Tue Sep 13 13:21:20 2011 +0900 README.rdoc: for release notes and changelog, about sample files. README.rdoc | 10 +++++++++- 1 files changed, 9 insertions(+), 1 deletions(-) commit fba9a6c0f1f79dd567ca54ba085b6258ac8efb31 Author: Naohisa Goto Date: Tue Sep 13 13:20:05 2011 +0900 RELEASE_NOTES.rdoc: mentioned about removal of rdoc.zsh. RELEASE_NOTES.rdoc | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) commit 685b6bb7b98083e1b50e73baf4e7fa71bc9a39fa Author: Naohisa Goto Date: Mon Sep 12 21:23:34 2011 +0900 bioruby.gemspec.erb: LEGAL is added to rdoc files * bioruby.gemspec.erb: LEGAL is added to rdoc files. * bioruby.gemspec is updated by "rake regemspec". bioruby.gemspec | 9 ++++++--- bioruby.gemspec.erb | 6 +++++- 2 files changed, 11 insertions(+), 4 deletions(-) commit 414a6331f40fc99f554042e9a031689ea6d76da4 Author: Naohisa Goto Date: Mon Sep 12 20:54:06 2011 +0900 deleted rdoc.zsh which is obsolete and unused * Deleted rdoc.zsh which is obsolete and unused. To generate rdoc html, "rake rdoc" or "rake rerdoc". See "rake -T" for more information. rdoc.zsh | 8 -------- 1 files changed, 0 insertions(+), 8 deletions(-) delete mode 100644 rdoc.zsh commit 272d9106cec43b0f219edd92a6f7bd3f9875a761 Author: Naohisa Goto Date: Mon Sep 12 20:35:47 2011 +0900 Added new ChangeLog, showing changes after 1.4.2 release. * Added new ChangeLog, showing changes after 1.4.2 release. For the changes before 1.4.2, see doc/ChangeLog-before-1.4.2. For the changes before 1.3.1, see doc/ChangeLog-before-1.3.1. ChangeLog | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 64 insertions(+), 0 deletions(-) create mode 100644 ChangeLog commit 941493378f9884978c81d5f63ee4ed5c175d4bea Author: Naohisa Goto Date: Mon Sep 12 20:28:28 2011 +0900 Rakefile: add new task :rechangelog to update ChangeLog using git log. * Rakefile: add new task :rechangelog to update ChangeLog using git log. Note that the tag name (currently 1.4.2) is hardcoded in Rakefile. Rakefile | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) commit 1c89e6546223c3c05ea79b8ade4b493580851efa Author: Naohisa Goto Date: Mon Sep 12 20:24:49 2011 +0900 renamed ChangeLog to doc/ChangeLog-before-1.4.2 ChangeLog | 5013 -------------------------------------------- doc/ChangeLog-before-1.4.2 | 5013 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 5013 insertions(+), 5013 deletions(-) delete mode 100644 ChangeLog create mode 100644 doc/ChangeLog-before-1.4.2 commit 2233fbada55034bd16fb5b9c642292b4b6ccca83 Author: Naohisa Goto Date: Mon Sep 12 20:22:49 2011 +0900 ChangeLog updated: add log about 1.4.2 release ChangeLog | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) commit 1c02ab0488e4097a2cf5c16180c3179c78e3d572 Author: Naohisa Goto Date: Mon Sep 12 19:40:54 2011 +0900 New RELEASE_NOTES.rdoc for next release version. RELEASE_NOTES.rdoc | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 47 insertions(+), 0 deletions(-) create mode 100644 RELEASE_NOTES.rdoc commit 4e63e69e98c0c440ec476ef3407fcc8fd2411056 Author: Naohisa Goto Date: Mon Sep 12 19:32:48 2011 +0900 renamed RELEASE_NOTES.rdoc to doc/RELEASE_NOTES-1.4.2.rdoc RELEASE_NOTES.rdoc | 132 ------------------------------------------ doc/RELEASE_NOTES-1.4.2.rdoc | 132 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 132 insertions(+), 132 deletions(-) delete mode 100644 RELEASE_NOTES.rdoc create mode 100644 doc/RELEASE_NOTES-1.4.2.rdoc commit 9c5c8cafc3ec372ef80aa20d01d13034f94d5af2 Author: Naohisa Goto Date: Fri Sep 2 12:02:41 2011 +0900 Bio::BIORUBY_EXTRA_VERSION set to ".5000" (unstable version). lib/bio/version.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) bio-2.0.3/doc/ChangeLog-1.5.00000644000175000017500000027535114141516614014611 0ustar nileshnileshcommit 01ac93ca3b341716c85c571f1194834db0a68e52 Author: Naohisa Goto Date: Wed Jul 1 02:21:26 2015 +0900 update ChangeLog by rake rechangelog ChangeLog | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) commit cac63ff501df6a71afead77175db9fb491c2985b Author: Naohisa Goto Date: Wed Jul 1 01:31:21 2015 +0900 .travis.yml: use Ruby 2.1.6 for tar and gem integration tests .travis.yml | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) commit 5318c99249f34dacc788b82c658ea0e256770db0 Author: Naohisa Goto Date: Wed Jul 1 00:43:39 2015 +0900 known issues added about new BLAST XML format and BLAST+ text format KNOWN_ISSUES.rdoc | 11 +++++++++++ RELEASE_NOTES.rdoc | 11 +++++++++++ 2 files changed, 22 insertions(+) commit b61d8df0300ef366539e1154c9a2dac2f1f4ff18 Author: Naohisa Goto Date: Tue Jun 30 23:57:01 2015 +0900 update ChangeLog with rake rechangelog ChangeLog | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) commit 75cf6c31c57239b2e39a171e536ad5dddcaec94a Author: Naohisa Goto Date: Tue Jun 30 23:56:08 2015 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 1 + 1 file changed, 1 insertion(+) commit 608850beb33f3f7333f05307202b766adb350eb9 Author: Naohisa Goto Date: Tue Jun 30 23:54:59 2015 +0900 description about updating of Ruby's License RELEASE_NOTES.rdoc | 9 +++++++++ 1 file changed, 9 insertions(+) commit f54bcfc20d20935db4e342e5988c0b7f59c131b3 Author: Naohisa Goto Date: Tue Jun 30 23:16:51 2015 +0900 BSDL is referred in COPYING and COPYING.ja BSDL | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 BSDL commit d1cbfb699259fd57af02181f4374d562dda3abe1 Author: Naohisa Goto Date: Tue Jun 30 23:14:42 2015 +0900 changes of Ruby's License is reflected. COPYING | 4 ++-- COPYING.ja | 72 ++++++++++++++++++++++++++++++------------------------------ 2 files changed, 38 insertions(+), 38 deletions(-) commit 2d9de9a0e2abe7fa9f193e54af0cbfc24bf2c37b Author: Naohisa Goto Date: Tue Jun 30 22:50:37 2015 +0900 ChangeLog is regenerated by using "rake rechangelog" ChangeLog | 2786 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 2780 insertions(+), 6 deletions(-) commit 70665a69a79d569d7bb37ef6d8c238534e6dae3a Author: Naohisa Goto Date: Tue Jun 30 22:49:55 2015 +0900 KNOWN_ISSUES.rdoc: change ruby versions and add descriptions KNOWN_ISSUES.rdoc | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) commit 1054106f93b973b5a92f993c5b83b1444f96fffe Author: Naohisa Goto Date: Tue Jun 30 22:48:51 2015 +0900 prepare to release BioRuby 1.5.0 bioruby.gemspec | 2 +- lib/bio/version.rb | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) commit 0f150586904f7e423455615313992ccf77d7e123 Author: Naohisa Goto Date: Tue Jun 30 22:44:58 2015 +0900 RELEASE_NOTES.rdoc: update many RELEASE_NOTES.rdoc | 151 +++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 132 insertions(+), 19 deletions(-) commit 2924ca0b977da13d42f232f880fd2df0b2995677 Author: Naohisa Goto Date: Tue Jun 30 21:55:52 2015 +0900 Bug fix: Bio::UniProtKB#gene_name should not raise NoMethodError * Bug fix: Bio::UniProtKB#gene_name raised NoMethodError when gene_names method returns nil. It should return nil. Thanks to Jose Irizarry who reports and sends suggested fix. (https://github.com/bioruby/bioruby/pull/83 ) lib/bio/db/embl/uniprotkb.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 9bf9022007a0ff31a870b1ea08e423aebc487c17 Author: Naohisa Goto Date: Tue Jun 30 18:45:51 2015 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 1 + 1 file changed, 1 insertion(+) commit a151e51a44b3dd93e5d075d71954f639eaec339e Author: Naohisa Goto Date: Tue Jun 30 18:10:55 2015 +0900 update docs; change recommended Ruby versions README.rdoc | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) commit e86e745c8a6c666c446fe2f9f47818140999e2db Author: Naohisa Goto Date: Tue Jun 30 18:08:53 2015 +0900 delete description about SOAP4R README.rdoc | 5 ----- 1 file changed, 5 deletions(-) commit 7e4bfb6b3757872691487b080bfd87363a4f9480 Author: Naohisa Goto Date: Tue Jun 30 03:22:35 2015 +0900 .travis.yml: test/unit no longer bundled with Ruby 2.2 * For Ruby 2.2, use a new Gemfile named Gemfile.travis-ruby2.2 that include 'gem "test-unit"' line because test/unit have been provided by bundled gem since Ruby 2.2. .travis.yml | 7 ++++--- gemfiles/Gemfile.travis-ruby2.2 | 9 +++++++++ 2 files changed, 13 insertions(+), 3 deletions(-) create mode 100644 gemfiles/Gemfile.travis-ruby2.2 commit cac85ca215ed781c80d49a5bf3d5d37d808c783b Author: Naohisa Goto Date: Tue Jun 30 02:51:16 2015 +0900 bump up version to 1.5.0-dev; simplify the versioning rules * Bump up version to 1.5.0-dev (1.5.0.20150630) * Simplify the versioning rules. * We will adopt the Semantic Versioning since BioRuby 1.5.1. bioruby.gemspec | 2 +- bioruby.gemspec.erb | 21 ++++----------------- lib/bio/version.rb | 17 ++++++++--------- 3 files changed, 13 insertions(+), 27 deletions(-) commit 1a24fb6840932499be833b5ec3bb36184b1334a1 Author: Naohisa Goto Date: Tue Jun 30 02:14:01 2015 +0900 Bug fix: update Bio::Hinv::BASE_URI * Bug fix: update Bio::Hinv::BASE_URI to follow the server URI change. * Update official documentation URL. lib/bio/io/hinv.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit a9a12fff70ca287aa098d1331a3146e2899cb709 Author: Naohisa Goto Date: Tue Jun 30 02:08:40 2015 +0900 delete $Id:$ line lib/bio/io/hinv.rb | 1 - 1 file changed, 1 deletion(-) commit 2bfa0f41969003f17c4b894b5279347616c8f187 Author: Naohisa Goto Date: Tue Jun 30 01:58:01 2015 +0900 delete sections about SOAP4R issues KNOWN_ISSUES.rdoc | 12 ------------ 1 file changed, 12 deletions(-) commit 9dbd83aa00acc5f78b5da68f000c305da9f31b66 Author: Naohisa Goto Date: Tue Jun 30 01:54:16 2015 +0900 remove commented-out lines of soap4r-ruby1.9 gemfiles/Gemfile.travis-jruby1.9 | 3 --- gemfiles/Gemfile.travis-rbx | 3 --- gemfiles/Gemfile.travis-ruby1.9 | 3 --- 3 files changed, 9 deletions(-) commit 14d2f3e2fa15f94faeff4d28c957f581461eac82 Author: Naohisa Goto Date: Tue Jun 30 01:50:30 2015 +0900 .travis.yml: update ruby versions, remove ruby 1.9.2 .travis.yml | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) commit 89f9b1fe2332584b5d63b1539b8e470d853478a3 Author: Naohisa Goto Date: Tue Jun 30 00:36:42 2015 +0900 about removal of Bio::SOAPWSDL, Bio::EBI::SOAP, Bio::HGC::HiGet RELEASE_NOTES.rdoc | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) commit 29516d3d6d2f907f65822bcf4146e95785773a3a Author: Naohisa Goto Date: Tue Jun 30 00:50:47 2015 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 6 ------ 1 file changed, 6 deletions(-) commit 357a1afc5ef457326179142c163968aa5cd94864 Author: Naohisa Goto Date: Tue Jun 30 00:49:42 2015 +0900 not to load deleted file lib/bio/shell/plugin/soap.rb lib/bio/shell.rb | 1 - 1 file changed, 1 deletion(-) commit 956e475da52ea17f1022493f589489a3e7c06f93 Author: Naohisa Goto Date: Mon Jun 29 23:43:24 2015 +0900 deleted lib/bio/shell/plugin/soap.rb * deleted lib/bio/shell/plugin/soap.rb because Bio::SOAPWSDL and all SOAP client classes in BioRuby are removed. lib/bio/shell/plugin/soap.rb | 50 ------------------------------------------ 1 file changed, 50 deletions(-) delete mode 100644 lib/bio/shell/plugin/soap.rb commit 00acae3c3a8066891e08dc225eae2c22c3415191 Author: Naohisa Goto Date: Mon Jun 29 23:41:20 2015 +0900 not to load removed Bio::EBI::SOAP from lib/bio/io/ebisoap.rb lib/bio.rb | 4 ---- 1 file changed, 4 deletions(-) commit d4844b38b5ddaec7ec15b56ef66f6930f0e6cfc0 Author: Naohisa Goto Date: Mon Jun 29 23:38:26 2015 +0900 remove Bio::EBI::SOAP (lib/bio/io/ebisoap.rb) * Bio::EBI::SOAP (lib/bio/io/ebisoap.rb) is removed because Bio::SOAPWSDL is removed. lib/bio/io/ebisoap.rb | 158 ------------------------------------------------- 1 file changed, 158 deletions(-) delete mode 100644 lib/bio/io/ebisoap.rb commit 79b4705bac82fe17b12c649172a629d3de41cbdf Author: Naohisa Goto Date: Tue Jun 30 00:12:36 2015 +0900 not to load removed Bio::SOAPWSDL from lib/bio/io/soapwsdl.rb lib/bio.rb | 1 - 1 file changed, 1 deletion(-) commit 03ced6a70973557532517c70dac183775bd11fa7 Author: Naohisa Goto Date: Mon Jun 29 23:59:28 2015 +0900 remove Bio::SOAPWSDL (lib/bio/io/soapwsdl.rb) and tests * Bio::SOAPWSDL is removed because SOAP4R (SOAP/WSDL library in Ruby) is no longer bundled with Ruby since Ruby 1.9. For Ruby 1.9 or later, some gems of SOAP4R are available, but we think they are not well-maintained. Moreover, many SOAP servers have been retired (see previous commits). So, we give up maintaining Bio::SOAPWSDL. lib/bio/io/soapwsdl.rb | 119 ---------------------------------- test/network/bio/io/test_soapwsdl.rb | 53 --------------- test/unit/bio/io/test_soapwsdl.rb | 33 ---------- 3 files changed, 205 deletions(-) delete mode 100644 lib/bio/io/soapwsdl.rb delete mode 100644 test/network/bio/io/test_soapwsdl.rb delete mode 100644 test/unit/bio/io/test_soapwsdl.rb commit d927652e9f5d241e3c1b13b7d760f5a190b72e50 Author: Naohisa Goto Date: Mon Jun 29 23:35:38 2015 +0900 delete old comment-out lines about Bio::DDBJ::XML lib/bio.rb | 5 ----- 1 file changed, 5 deletions(-) commit b995251bf96b8983def36e77bc94d6f0c0f2c78c Author: Naohisa Goto Date: Mon Jun 29 23:29:47 2015 +0900 do not load Bio::HGC::HiGet from deleted lib/bio/io/higet.rb lib/bio.rb | 4 ---- 1 file changed, 4 deletions(-) commit 6191020ed1e150f9e70de687375528a899fcf8ef Author: Naohisa Goto Date: Mon Jun 29 23:27:41 2015 +0900 remove lib/bio/io/higet.rb because of the server down for a long time lib/bio/io/higet.rb | 73 --------------------------------------------------- 1 file changed, 73 deletions(-) delete mode 100644 lib/bio/io/higet.rb commit 5a527c5cdd513d72ad5817c66ac87e7613395e26 Author: Naohisa Goto Date: Sat Jun 27 02:33:46 2015 +0900 add/modify about removed features and incompatible changes RELEASE_NOTES.rdoc | 71 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 67 insertions(+), 4 deletions(-) commit 1886314d2b8dd7d4b3e86c7b93134facd881127a Author: Naohisa Goto Date: Sat Jun 27 01:24:36 2015 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 1 - 1 file changed, 1 deletion(-) commit 724e9c1c039dcc7fa19fb15de0313218a87f9868 Author: Naohisa Goto Date: Thu Jun 25 23:34:44 2015 +0900 extconf.rb is deleted because no native extensions are included * extconf.rb is deleted because no native extensions are included in BioRuby and to avoid potential confusions. Nowadays, extconf.rb is usually used only for building native extensions. Use gem or setup.rb to install BioRuby. extconf.rb | 2 -- 1 file changed, 2 deletions(-) delete mode 100644 extconf.rb commit d42a1cb1df17e0c11ca0407dc05e1271cd74a0d7 Author: Naohisa Goto Date: Wed Jun 24 22:29:28 2015 +0900 Ruby 2.3 support: IO#close to closed IO object is allowed without error. test/unit/bio/io/flatfile/test_buffer.rb | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) commit 5ea39188ac3cc2609397b2d8864a2019ea6b93d2 Author: Naohisa Goto Date: Fri May 1 23:42:39 2015 +0900 s.license = "Ruby" * bioruby.gemspec.erb, bioruby.gemspec: s.license = "Ruby" Thanks to Peter Cock who reports a patch. (https://github.com/bioruby/bioruby/issues/101 ) bioruby.gemspec | 1 + bioruby.gemspec.erb | 1 + 2 files changed, 2 insertions(+) commit 2b18ae005a592ea4ae7b632f7e658d4bbf153fd8 Author: Naohisa Goto Date: Fri May 1 23:39:36 2015 +0900 remove deprecated Gem::Specification#rubyforge_project bioruby.gemspec | 2 +- bioruby.gemspec.erb | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) commit 3a1d89bde9af44793c850b1cde950e3e3042fb8d Author: Naohisa Goto Date: Sat Mar 28 01:52:31 2015 +0900 delete obsolete $Id:$ line lib/bio/db/gff.rb | 1 - 1 file changed, 1 deletion(-) commit 165ebf29ba192c7a7e7f1633809d34966c2aeed1 Author: Naohisa Goto Date: Sat Mar 28 01:51:47 2015 +0900 suppress "character class has duplicated range" warnings lib/bio/db/gff.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 715ee5aa3a797737d390365b2c202cc9a0effea5 Author: Naohisa Goto Date: Sat Mar 28 01:37:35 2015 +0900 delete obsolete $Id:$ line lib/bio/appl/sosui/report.rb | 1 - 1 file changed, 1 deletion(-) commit 71e34938f1228911657ebf00720712a17bc89ea9 Author: Naohisa Goto Date: Sat Mar 28 01:36:44 2015 +0900 comment out a line to suppress warning: assigned but unused variable - tmh lib/bio/appl/sosui/report.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit fc518f3826bf60d70ebdbd70acdba512f1462c6f Author: Naohisa Goto Date: Sat Mar 28 01:34:22 2015 +0900 delete obsolete $Id:$ line lib/bio/db/sanger_chromatogram/chromatogram.rb | 1 - 1 file changed, 1 deletion(-) commit 516c467dfb245d99c4f7f77e251c77ffc5d274ca Author: Naohisa Goto Date: Sat Mar 28 01:33:19 2015 +0900 suppress warning: instance variable @aqual not initialized lib/bio/db/sanger_chromatogram/chromatogram.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 56d2e472196ba03ba6aa2a2bdf8d3de81272fa15 Author: Naohisa Goto Date: Sat Mar 28 01:30:26 2015 +0900 delete obsolete $Id:$ line lib/bio/db/kegg/module.rb | 1 - 1 file changed, 1 deletion(-) commit fb6b9b6578d08a87c1974e58f6d1f231b4ad52c0 Author: Naohisa Goto Date: Sat Mar 28 01:28:05 2015 +0900 suppress "instance variable @XXX not initialized" warnings lib/bio/db/kegg/module.rb | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) commit 9f70b8d54abd9adbd50d46a3176f23f51af01cc7 Author: Naohisa Goto Date: Sat Mar 28 01:25:50 2015 +0900 delete obsolete $Id:$ line lib/bio/db/kegg/pathway.rb | 1 - 1 file changed, 1 deletion(-) commit 3844b9bb69e1f657c9b85bb20a4d209828b78b12 Author: Naohisa Goto Date: Sat Mar 28 01:25:03 2015 +0900 suppress "instance variable @XXX not initialized" warnings lib/bio/db/kegg/pathway.rb | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) commit 8d857e246eacb6c9f8fbbceaa2fba7f1211e2b86 Author: Naohisa Goto Date: Sat Mar 28 01:20:13 2015 +0900 delete obsolete $Id:$ line lib/bio/db/fasta/defline.rb | 1 - 1 file changed, 1 deletion(-) commit aadf285bc9e618b7813b42fd39e0b1966a04385c Author: Naohisa Goto Date: Sat Mar 28 01:18:43 2015 +0900 suppress defline.rb:393: warning: character class has duplicated range lib/bio/db/fasta/defline.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 5297db11eb165885c4f15b914c2132c4122ae5a9 Author: Naohisa Goto Date: Sat Mar 28 01:11:43 2015 +0900 delete obsolete $Id:$ line test/unit/bio/test_db.rb | 1 - 1 file changed, 1 deletion(-) commit 20381ad45c674c0844a92891cb8ae71edaa6e333 Author: Naohisa Goto Date: Sat Mar 28 01:08:04 2015 +0900 suppress "warning: instance variable @tagsize not initialized" * test/unit/bio/test_db.rb: to suppress "warning: instance variable @tagsize not initialized" when executing Bio::TestDB#test_fetch, @tagsize is set in setup. test/unit/bio/test_db.rb | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) commit d194edfc68bc10fde11f2cf014a59113ddc63b24 Author: Naohisa Goto Date: Sat Mar 28 00:59:21 2015 +0900 delete obsolete $Id:$ line lib/bio/data/codontable.rb | 1 - 1 file changed, 1 deletion(-) commit fac51f540dc7b33cd3ec51f97b5cb1ea587a461e Author: Naohisa Goto Date: Sat Mar 28 00:57:28 2015 +0900 suppress warning: instance variable @reverse not initialized lib/bio/data/codontable.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 4e85315f03e374157f832c8435d0d2f43cd969af Author: Naohisa Goto Date: Sat Mar 28 00:55:25 2015 +0900 delete obsolete $Id:$ line lib/bio/appl/iprscan/report.rb | 1 - 1 file changed, 1 deletion(-) commit dafa7ce62378ff1605a295f8c620eb3a0a4a3c57 Author: Naohisa Goto Date: Sat Mar 28 00:54:37 2015 +0900 suppress warning: instance variable @ipr_ids not initialized lib/bio/appl/iprscan/report.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 52b6073997c1b26fea9d4aae3154b37575944d4d Author: Naohisa Goto Date: Sat Mar 28 00:50:43 2015 +0900 suppress "method redefined" warnings and fill RDoc for some methods lib/bio/db/phyloxml/phyloxml_elements.rb | 46 +++++++++++++++++++++++------- 1 file changed, 35 insertions(+), 11 deletions(-) commit 3d2e99fe993d76d5ece5bdbcd2e9541fa098c4dd Author: Naohisa Goto Date: Sat Mar 28 00:36:51 2015 +0900 suppress "instance variable @XXX not initialized" warnings lib/bio/db/phyloxml/phyloxml_elements.rb | 88 +++++++++++++++--------------- 1 file changed, 44 insertions(+), 44 deletions(-) commit 02d4f98eae3934d8ad9c950b41132eb14653fe27 Author: Naohisa Goto Date: Thu Mar 26 20:33:35 2015 +0900 suppress warning: instance variable @uri not initialized lib/bio/db/phyloxml/phyloxml_elements.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 94277712e9dd000c2d9bf5b6ebfd84d0f2fc3b59 Author: Naohisa Goto Date: Thu Mar 26 01:47:45 2015 +0900 suppress warning: instance variable @format not initialized lib/bio/db/fastq.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit e61e1071e4bb7dd9ee995c3a7f864c2ef4384edd Author: Naohisa Goto Date: Thu Mar 26 01:40:33 2015 +0900 suppress "instance variable not initialized" warnings * suppress warning: instance variable @sc_match not initialized * suppress warning: instance variable @sc_mismatch not initialized * suppress warning: instance variable @gaps not initialized * suppress warning: instance variable @hit_frame not initialized * suppress warning: instance variable @query_frame not initialized lib/bio/appl/blast/format0.rb | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) commit 08c458c74a7a34e340e09053cbc0f9c071e27395 Author: Naohisa Goto Date: Thu Mar 26 01:09:16 2015 +0900 suppress warning: instance variable @pattern not initialized lib/bio/appl/blast/format0.rb | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) commit 33d7eed180fd601972724f4b992f1a17c689ef62 Author: Naohisa Goto Date: Thu Mar 26 00:57:02 2015 +0900 Test bug fix: fix typo of test target method test/network/bio/test_command.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 76a98bce1affac03483c08f803d4314b42a0a3d3 Author: Naohisa Goto Date: Thu Mar 26 00:32:25 2015 +0900 Incompatible Change: Bio::Command.make_cgi_params rejects single String * Incompatible Change: Bio::Command.make_cgi_params no longer accepts a single String as a form. Use Hash or Array containing key-value pairs as String objects. This change also affects Bio::Command.post_form and Bio::Command.http_post_form which internally use this method. lib/bio/command.rb | 2 +- test/unit/bio/test_command.rb | 9 +++++---- 2 files changed, 6 insertions(+), 5 deletions(-) commit b1612545a7516befd850a6d5925aa73bbaa4b4b0 Author: Naohisa Goto Date: Wed Mar 25 02:36:41 2015 +0900 delete obsolete $Id:$ line lib/bio/io/togows.rb | 1 - 1 file changed, 1 deletion(-) commit 4d5a419cc78ff2a79cff2812adc6f16f286204e8 Author: Naohisa Goto Date: Wed Mar 25 02:35:45 2015 +0900 delete obsolete $Id:$ line test/network/bio/io/test_togows.rb | 1 - 1 file changed, 1 deletion(-) commit a8d2c4cac665b4bb8140df329a9cc1d6e5e2d02d Author: Naohisa Goto Date: Wed Mar 25 02:35:03 2015 +0900 delete obsolete $Id:$ line test/unit/bio/io/test_togows.rb | 1 - 1 file changed, 1 deletion(-) commit dd0967db3743789ea5aa48623df8d97f93062694 Author: Naohisa Goto Date: Wed Mar 25 02:33:49 2015 +0900 test_make_path: add test data using Symbol objects test/unit/bio/io/test_togows.rb | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) commit e07158a60ca666b5d625408bcf8fa602fd8114a8 Author: Naohisa Goto Date: Wed Mar 25 02:22:31 2015 +0900 Bio::TogoWS::REST#entry: comma between IDs should NOT be escaped to %2C lib/bio/io/togows.rb | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) commit 98546289b2f2da2dc7f9586fd5e2942da4d8f3a8 Author: Naohisa Goto Date: Wed Mar 25 02:00:17 2015 +0900 Bug fix: search with offset did not work due to TogoWS server change * lib/bio/io/togows.rb: Bug fix: Bio::TogoWS::REST#search with offset and limit did not work due to TogoWS server change about URI escape. For example, http://togows.org/search/nuccore/Milnesium+tardigradum/2%2C3 fails, http://togows.org/search/nuccore/Milnesium+tardigradum/2,3 works fine. lib/bio/io/togows.rb | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) commit 7097f80e315a0a6332e7a76a5bb261649e8dcc1a Author: Naohisa Goto Date: Wed Mar 25 01:33:26 2015 +0900 Bug fix due to TogoWS convert method spec change * lib/bio/io/togows.rb: Bug fix: Bio::TogoWS::REST#convert did not work because of the spec change of TogoWS REST API. lib/bio/io/togows.rb | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) commit 1a9b1063af4c0b32cd287d4a2c2466343aeddb98 Author: Naohisa Goto Date: Wed Mar 25 01:30:34 2015 +0900 improve tests for bio/command.rb for methods using http protocol test/network/bio/test_command.rb | 67 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 65 insertions(+), 2 deletions(-) commit c63920e4d8569e3eaef201d4d60fcddfa15f1f34 Author: Naohisa Goto Date: Wed Mar 25 01:30:06 2015 +0900 delete obsolete $Id:$ line lib/bio/command.rb | 1 - 1 file changed, 1 deletion(-) commit 1683edac0e9ecbf819ffcd332a6db2d25c2d596a Author: Naohisa Goto Date: Wed Mar 25 01:28:28 2015 +0900 new methods Bio::Command.http_post and Bio::Command.post to post raw data lib/bio/command.rb | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) commit a40157205282e148bf3a2e43aed1e08d713fb598 Author: Naohisa Goto Date: Tue Mar 24 00:46:23 2015 +0900 suppress warnings "instance variable @circular not initialized" lib/bio/util/restriction_enzyme/range/sequence_range.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit abcac8de85c9606f6a1879fe9d2ae559911708c9 Author: Naohisa Goto Date: Tue Mar 24 00:29:42 2015 +0900 delete obsolete $Id:$ line test/unit/bio/io/flatfile/test_autodetection.rb | 1 - 1 file changed, 1 deletion(-) commit 1b5bf586af238b712a9f640087421fd299376c2d Author: Naohisa Goto Date: Tue Mar 24 00:28:38 2015 +0900 suppress warning: assigned but unused variable - length test/unit/bio/io/flatfile/test_autodetection.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 5497068d17c2794ab2b6ef1e603e5478a86537c6 Author: Naohisa Goto Date: Tue Mar 24 00:22:54 2015 +0900 add/modify assertions to suppress "unused variable" warnings test/unit/bio/db/genbank/test_genbank.rb | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) commit d5bafd8b7ee28ab0418b09fd6dd47abcb9eb1ee5 Author: Naohisa Goto Date: Mon Mar 23 23:57:56 2015 +0900 delete obsolete $Id:$ line lib/bio/appl/blast.rb | 1 - 1 file changed, 1 deletion(-) commit bbd60d1aae7c894f914b7265d2de22fea5eb3faf Author: Naohisa Goto Date: Mon Mar 23 23:56:42 2015 +0900 suppress warning: assigned but unused variable - dummy lib/bio/appl/blast.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 4a91502ccf14ab8655645144120aa97d0c8313a5 Author: Naohisa Goto Date: Mon Mar 23 20:32:59 2015 +0900 delete obsolete $Id:$ line lib/bio/shell/setup.rb | 1 - 1 file changed, 1 deletion(-) commit c437a4078ff8e2869b9c1ab3543022db373a93c3 Author: Naohisa Goto Date: Mon Mar 23 20:32:20 2015 +0900 suppress warning: instance variable @mode not initialized lib/bio/shell/setup.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 8967cf280d5ca8491d57a11e4f3ffab7369c4ea8 Author: Naohisa Goto Date: Mon Mar 23 20:28:50 2015 +0900 delete obsolete $Id:$ line lib/bio/shell/irb.rb | 1 - 1 file changed, 1 deletion(-) commit 42b5f030067be4bc9c53ccb4c06ccfc5e8d9df03 Author: Naohisa Goto Date: Mon Mar 23 20:28:27 2015 +0900 change deprecated method File.exists? to File.exist? lib/bio/shell/irb.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 389ad2f311f161f235db2373aeb2f5500b1ea65f Author: Naohisa Goto Date: Mon Mar 23 20:27:01 2015 +0900 delete obsolete $Id:$ line lib/bio/shell/interface.rb | 1 - 1 file changed, 1 deletion(-) commit de5949798d66c16d2b5e2cf8ba7192049ec99c5b Author: Naohisa Goto Date: Mon Mar 23 20:26:37 2015 +0900 change deprecated method File.exists? to File.exist? lib/bio/shell/interface.rb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit c8907059a716a8778e333755c8fb53bb2a0c7158 Author: Naohisa Goto Date: Mon Mar 23 20:24:58 2015 +0900 delete obsolete $Id:$ line lib/bio/shell/core.rb | 1 - 1 file changed, 1 deletion(-) commit 1fe5903f8acd8045d203465a099a45218e7e3891 Author: Naohisa Goto Date: Mon Mar 23 20:24:25 2015 +0900 change deprecated method File.exists? to File.exist? lib/bio/shell/core.rb | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) commit 929207c6f186c81f076fab9b1bbbd23c4b966f4e Author: Naohisa Goto Date: Mon Mar 23 20:20:05 2015 +0900 delete obsolete $Id:$ line test/unit/bio/db/pdb/test_pdb.rb | 1 - 1 file changed, 1 deletion(-) commit e75c57fcd7abc56ba6fcbf1996e491aca890f5b1 Author: Naohisa Goto Date: Mon Mar 23 20:19:30 2015 +0900 suppress "assigned but unused variable" warnings test/unit/bio/db/pdb/test_pdb.rb | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) commit b458301f47322c265fce27efd0ed71443c17d9d7 Author: Naohisa Goto Date: Mon Mar 23 18:34:12 2015 +0900 delete obsolete $Id:$ line lib/bio/shell/plugin/entry.rb | 1 - 1 file changed, 1 deletion(-) commit c3f909fe06b82b3cbd4bdcbcdef668fc0727be9d Author: Naohisa Goto Date: Mon Mar 23 18:33:30 2015 +0900 change deprecated method File.exists? to File.exist? lib/bio/shell/plugin/entry.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 7ba6349c2446aa03b843a2a8fb49505c8f63c6ca Author: Naohisa Goto Date: Mon Mar 23 18:20:44 2015 +0900 change deprecated method File.exists? to File.exist? lib/bio/appl/meme/mast.rb | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) commit e7f78ea3c3fb1b78adcc6ae13f450cf2cda361cd Author: Naohisa Goto Date: Mon Mar 23 18:10:27 2015 +0900 delete obsolete $Id:$ line lib/bio/db/phyloxml/phyloxml_writer.rb | 1 - 1 file changed, 1 deletion(-) commit b32eae0050a73bd5a2931c17a6694f494ad00bb2 Author: Naohisa Goto Date: Mon Mar 23 18:07:54 2015 +0900 suppress warning: mismatched indentations at 'end' with 'def' at 166 lib/bio/db/phyloxml/phyloxml_writer.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit bd735347283ce5d332245d0349186f300800a43f Author: Naohisa Goto Date: Sat Mar 21 12:57:03 2015 +0900 remove duplicated line and suppress Ruby 2.2 warning setup.rb | 1 - 1 file changed, 1 deletion(-) commit 68ad10e178594691c77ba4b97c2449fecf0ac9de Author: Naohisa Goto Date: Sat Mar 21 12:50:46 2015 +0900 Ruby 1.9 support: suppress "shadowing outer local variable" warnings setup.rb | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) commit 2343482078aec8373f7a2eb8ed4d7c44119f809c Author: Naohisa Goto Date: Sat Mar 21 12:16:45 2015 +0900 Ruby 2.2 support: Config was renamed to RbConfig setup.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit d512712745142d6c6ebe9a6ef51c8c4773bd7c2c Author: Naohisa Goto Date: Sat Mar 21 11:52:47 2015 +0900 Ruby 1.9 support: suppress "shadowing outer local variable" warnings lib/bio/db/embl/format_embl.rb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit bfa75df9047ab6855c931558f6bf9fdbb1c3c288 Author: Naohisa Goto Date: Sat Mar 21 11:36:01 2015 +0900 delete obsolete $Id:$ line lib/bio/io/flatfile/buffer.rb | 1 - 1 file changed, 1 deletion(-) commit d6fbaa0c555117ebadd46e284ae357586856102d Author: Naohisa Goto Date: Sat Mar 21 11:35:07 2015 +0900 Ruby 1.9 support: suppress warning: shadowing outer local variable - fobj lib/bio/io/flatfile/buffer.rb | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) commit 0083d3284ec181f4bcc3144f76b12f9d52e3eff6 Author: Naohisa Goto Date: Sat Mar 21 11:29:39 2015 +0900 delete obsolete $Id:$ line lib/bio/io/fastacmd.rb | 1 - 1 file changed, 1 deletion(-) commit d4909c0e80e572a639edba07388e430c7f5d6ce8 Author: Naohisa Goto Date: Sat Mar 21 11:29:01 2015 +0900 remove old sample script in "if $0 == __FILE__" block lib/bio/io/fastacmd.rb | 15 --------------- 1 file changed, 15 deletions(-) commit 8171162d0a3991d5f0d9a8bccee57250248d6d3d Author: Naohisa Goto Date: Sat Mar 21 11:15:10 2015 +0900 delete obsolete $Id:$ line lib/bio/db/go.rb | 1 - 1 file changed, 1 deletion(-) commit ed7c9a5335ef59399f3098311f47b8dec519281a Author: Naohisa Goto Date: Sat Mar 21 11:14:19 2015 +0900 Ruby 1.9 support: suppress warnings: "shadowing outer local variable - goid" lib/bio/db/go.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit dd543068b046c9a0c2a40159c830c92b680244f1 Author: Naohisa Goto Date: Sat Mar 21 11:03:08 2015 +0900 delete obsolete $Id:$ line lib/bio/db/phyloxml/phyloxml_elements.rb | 1 - 1 file changed, 1 deletion(-) commit 4c74a6e3aeca30820b0be61e867c9201445542ec Author: Naohisa Goto Date: Sat Mar 21 10:24:40 2015 +0900 suppress warning: mismatched indentations at 'end' with 'class' lib/bio/db/phyloxml/phyloxml_elements.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit ee4ffdc748c1f9f45e97ff7f0da8350c5468c333 Author: Naohisa Goto Date: Sat Mar 21 10:08:30 2015 +0900 delete obsolete $Id:$ line lib/bio/db/phyloxml/phyloxml_parser.rb | 1 - 1 file changed, 1 deletion(-) commit 46a4edc8729ff836ae28d11f1503c9923275b9f6 Author: Naohisa Goto Date: Sat Mar 21 10:00:04 2015 +0900 Ruby 1.9 support: suppress warning "shadowing outer local variable - flag" lib/bio/db/phyloxml/phyloxml_parser.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit db3552c683edf79adbfa5ed897f5ef91e8417585 Author: Naohisa Goto Date: Fri Mar 20 16:33:45 2015 +0900 Bug fix: Bio::PhyloXML::Parser.open_uri did not return block return value * Bug fix: Bio::PhyloXML::Parser.open_uri did not return block return value when giving block. * Suppress warning "assigned but unused variable - ret" lib/bio/db/phyloxml/phyloxml_parser.rb | 1 + 1 file changed, 1 insertion(+) commit 84c2c4e94352cc9cef982d3b505b4f439617e01e Author: Naohisa Goto Date: Fri Mar 20 16:21:49 2015 +0900 delete obsolete $Id:$ line lib/bio/appl/genscan/report.rb | 1 - 1 file changed, 1 deletion(-) commit 05c55d0aaf1dc130ac04155622ccebb3394fc3c0 Author: Naohisa Goto Date: Fri Mar 20 16:21:06 2015 +0900 Ruby 1.9 support: suppress warning "shadowing outer local variable - i" lib/bio/appl/genscan/report.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 5edcc1c97ca7c292fa6551509570daf68ac36837 Author: Naohisa Goto Date: Fri Mar 20 16:13:57 2015 +0900 Ruby 1.9 support: suppress warning "shadowing outer local variable - y" lib/bio/appl/blast/format0.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 813d53a06258244a47784697e8fc95f1f15db8da Author: Naohisa Goto Date: Fri Mar 20 16:03:19 2015 +0900 delete obsolete $Id:$ line lib/bio/io/das.rb | 1 - 1 file changed, 1 deletion(-) commit b6ae4a423dd763969c8e18ca6a578fd0600d6159 Author: Naohisa Goto Date: Fri Mar 20 16:02:20 2015 +0900 Ruby 1.9 support: suppress "warning: shadowing outer local variable - e" lib/bio/io/das.rb | 80 ++++++++++++++++++++++++++--------------------------- 1 file changed, 40 insertions(+), 40 deletions(-) commit 7fa75a644167dd8c189f681e29c1cf5f1bf2fe0b Author: Naohisa Goto Date: Fri Mar 20 15:36:00 2015 +0900 delete obsolete $Id:$ line lib/bio/shell/plugin/seq.rb | 1 - 1 file changed, 1 deletion(-) commit 051aba1519d71f1205363c4421feb6c06881ab0c Author: Naohisa Goto Date: Fri Mar 20 15:29:02 2015 +0900 Bug fix: Ruby 1.9 support: did not yield the last part of the string * lib/bio/shell/plugin/seq.rb: Bug fix: Ruby 1.9 support: String#step and #skip (extended by bioruby-shell) did not yield the last part of the string due to a change from Ruby 1.8 to 1.9. * Suppress warning message "shadowing outer local variable - i" lib/bio/shell/plugin/seq.rb | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) commit a9f2bff92de58c2ab4cefc67e721d3ad69e9de98 Author: Naohisa Goto Date: Fri Mar 20 15:09:16 2015 +0900 Ruby 2.2 support: suppress a "shadowing outer local variable" warning lib/bio/alignment.rb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit d0bcc8766d91eb7cacea2a6d5b32b3e0b3c5ce56 Author: Naohisa Goto Date: Fri Mar 20 14:31:04 2015 +0900 delete obsolete $Id:$ line test/unit/bio/test_alignment.rb | 1 - 1 file changed, 1 deletion(-) commit 0c8fa8fd558088822a98e11b6fa4bec9b37ebec7 Author: Naohisa Goto Date: Fri Mar 20 14:26:38 2015 +0900 Ruby 2.2 support: comment out duplicated line to suppress warning * Ruby 2.2 support: test/unit/bio/test_alignment.rb: Suppress warning: duplicated key at line 182 ignored: "t" test/unit/bio/test_alignment.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit ab17c40e1ce492dc924205e8e2f90d31adae4464 Author: Naohisa Goto Date: Fri Mar 20 14:18:08 2015 +0900 Ruby 2.2 support: some tests did not run with test-unit gem * Ruby 2.2 support: test/unit/bio/db/test_fastq.rb Support for test-unit gem bundled in Ruby 2.2. See commit log b9488a64abb780c5e9b6cd28e8264bad399fa749 for details. test/unit/bio/db/test_fastq.rb | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) commit ea668d73c18e3df33625cba4352ad5f6966e0eb4 Author: Naohisa Goto Date: Fri Mar 20 14:03:43 2015 +0900 delete obsolete $Id:$ line test/unit/bio/appl/sim4/test_report.rb | 1 - 1 file changed, 1 deletion(-) commit 1abb8d362a0f2443b48923bcccba3d7d0caa1f1d Author: Naohisa Goto Date: Fri Mar 20 13:57:33 2015 +0900 Ruby 2.2 support: some tests did not run with test-unit gem * Ruby 2.2 support: test/unit/bio/appl/sim4/test_report.rb Support for test-unit gem bundled in Ruby 2.2. See commit log b9488a64abb780c5e9b6cd28e8264bad399fa749 for details. test/unit/bio/appl/sim4/test_report.rb | 62 +++++++++++++++++++++++--------- 1 file changed, 46 insertions(+), 16 deletions(-) commit b9488a64abb780c5e9b6cd28e8264bad399fa749 Author: Naohisa Goto Date: Fri Mar 20 13:13:28 2015 +0900 Ruby 2.2 support: some tests did not run with test-unit gem * Ruby 2.2 support: test/unit/bio/appl/blast/test_report.rb: With test-unit gem bundled in Ruby 2.2, test methods inherited from a parent class and executed in the parent class do not run in the child class. To avoid the behavior, test methods are moved to modules and test classes are changed to include the modules. test/unit/bio/appl/blast/test_report.rb | 156 ++++++++++++++++++++++--------- 1 file changed, 110 insertions(+), 46 deletions(-) commit febe8bbf614e530f597d7306d33df5f5f4ee6699 Author: Naohisa Goto Date: Thu Mar 19 00:55:09 2015 +0900 try to use bio-old-biofetch-emulator gem * bin/br_biofetch.rb: try to use bio-old-biofetch-emulator gem. Without bio-old-biofetch-emulator, the program exits with error message when default BioRuby BioFetch server is specified. bin/br_biofetch.rb | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) commit 08450e0a35cbf5596dd30238d23aa7a7296c8f67 Author: Naohisa Goto Date: Thu Mar 19 00:36:10 2015 +0900 do not repeat default_url and another_url bin/br_biofetch.rb | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) commit 8e39d3411405b09cc6ea55ba31e5206536ebf59d Author: Naohisa Goto Date: Wed Mar 18 23:57:59 2015 +0900 Revert e29fc5fadbe0dae6528cf49637496dc2df3ec0dc * bin/br_biofetch.rb: revert e29fc5fadbe0dae6528cf49637496dc2df3ec0dc because the old deprecated bioruby biofetch server can be emulated by bio-old-biofetch-emulator gem package. bin/br_biofetch.rb | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) commit 849c38931a64b7ff2ba7ec46a495e65a99a869fb Author: Ben J. Woodcroft Date: Wed Aug 8 09:44:09 2012 +1000 add FastaFormat#first_name method lib/bio/db/fasta.rb | 17 ++++++++++++++++ test/unit/bio/db/test_fasta.rb | 42 +++++++++++++++++++++++++++++++++++++++- 2 files changed, 58 insertions(+), 1 deletion(-) commit 888a70508c0392cae89021feba5c4a6a62228a11 Author: Naohisa Goto Date: Fri Nov 14 15:08:35 2014 +0900 fix typo * fix typo. Thanks to Iain Barnett who reported the bug in https://github.com/bioruby/bioruby/pull/93 (c4843d65447f6a434da523c9c313f34d025f36f8) lib/bio/sequence/compat.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit afc6df190109649e8eb11b2af1184ddfcf5327d3 Author: Naohisa Goto Date: Fri Nov 14 14:29:42 2014 +0900 add documentation when gc_percent is not enough lib/bio/sequence/na.rb | 8 ++++++++ 1 file changed, 8 insertions(+) commit bb63f67f2dfe6dba5c70ada033ca0cc1ecaa7783 Author: Naohisa Goto Date: Thu Nov 13 21:43:00 2014 +0900 Add tests for Bio::PubMed#search, query, pmfetch * Add tests for Bio::PubMed#search, query, pmfetch, although they will be deprecated in the future. * This commit and commit bfe4292c51bba5c4032027c36c35e98f28a9605a are inspired by https://github.com/bioruby/bioruby/pull/76 (though the pull request have not been merged), and the commits fix the issue https://github.com/bioruby/bioruby/issues/75. Thanks to Paul Leader who reported the issue and the pull request. test/network/bio/io/test_pubmed.rb | 49 ++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) commit 74edba100da83c27f2edb7a9edc9ec98265a7cff Author: Naohisa Goto Date: Thu Nov 13 12:05:12 2014 +0900 Change default tool and email values * Default "tool" and "email" values are changed to "bioruby" and "staff@bioruby.org" respectively. Now, the author of a script do not need to set his/her email address unless the script makes excess traffic to NCBI. * Update RDoc documentation lib/bio/io/ncbirest.rb | 48 +++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 43 insertions(+), 5 deletions(-) commit bfe4292c51bba5c4032027c36c35e98f28a9605a Author: Naohisa Goto Date: Thu Nov 13 11:54:53 2014 +0900 Bug fix: use NCBI E-Utilities instead of old deprecated API * Bio::PubMed#search, query, pmfetch: remove old code using deprecated and/or unofficial web API, and change use esearch or efetch methods which use NCBI E-utilities. These methods will be deprecated in the future. To indicate this, show warning message if $VERBOSE is true. * Update RDoc documentation lib/bio/io/pubmed.rb | 157 ++++++++++++++++++++++++-------------------------- 1 file changed, 76 insertions(+), 81 deletions(-) commit d78173a6eb6d8177e733decc0b8137fac067aa82 Author: Naohisa Goto Date: Tue Nov 11 17:41:32 2014 +0900 remove unused $Id:$ line bin/br_biofetch.rb | 1 - 1 file changed, 1 deletion(-) commit e29fc5fadbe0dae6528cf49637496dc2df3ec0dc Author: Naohisa Goto Date: Tue Nov 11 17:31:38 2014 +0900 Change default server to EBI Dbfetch server; remove BioRuby BioFetch server * Change default server to EBI Dbfetch server. * The BioRuby BioFetch server is removed. When "-r" option (force to use BioRuby server) is specified, warning message is shown, and the program exits with code 1 (abnormal exit). * Usage message is also changed. bin/br_biofetch.rb | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) commit 94ecac33e87e444d9fe991340c2d8f3709bc6d90 Author: Naohisa Goto Date: Tue Nov 11 17:19:30 2014 +0900 fix documentation reflecting recent changes of Bio::Fetch lib/bio/io/fetch.rb | 33 ++++++++++++++++++++------------- 1 file changed, 20 insertions(+), 13 deletions(-) commit 06a9db014614818ef35108928415ef18e8c8ae2c Author: Naohisa Goto Date: Tue Nov 11 16:41:26 2014 +0900 documentation about incompatible changes of Bio::Fetch RELEASE_NOTES.rdoc | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) commit 6d94e949b6d325f27b45b816a8305f828d049ec6 Author: Naohisa Goto Date: Tue Nov 11 16:35:50 2014 +0900 Issue about Bio::Fetch and BioRuby BioFetch server is resolved * Issue about Bio::Fetch is resolved by recent commits. * The BioRuby BioFetch server is deprecated. There is no hope to restart the service again. EBI Dbfetch server is an alternative. KNOWN_ISSUES.rdoc | 9 --------- 1 file changed, 9 deletions(-) commit 699cd3ff136310a551d30e0ddd7fbe66e483b5be Author: Naohisa Goto Date: Tue Nov 11 15:27:11 2014 +0900 update RDoc documents for Bio::Fetch lib/bio/io/fetch.rb | 61 +++++++++++++++++++++++++++++++++------------------ 1 file changed, 40 insertions(+), 21 deletions(-) commit c7837f8e5ee2cc1c3085da74567a2b25280bbb8f Author: Naohisa Goto Date: Tue Nov 11 14:48:48 2014 +0900 Incompatibile change: remove Bio::Fetch.query; add Bio::Fetch::EBI.query * Incompatible change: remove a class method Bio::Fetch.query because default server URL in Bio::Fetch is deprecated. * New class method: Bio::Fetch::EBI.query. This can be used as an alternative method of deprecated Bio::Fetch.query method. lib/bio/io/fetch.rb | 35 ++++++++++++++++++----------------- 1 file changed, 18 insertions(+), 17 deletions(-) commit f9048684acaff0fcd00b458a946d5f692706325b Author: Naohisa Goto Date: Tue Nov 11 14:24:22 2014 +0900 Incompatible change: Default server in Bio::Fetch.new is deperecated * Incompatible change: Default server URL in Bio::Fetch.new is deprecated. Users must explicitly specify the URL. Alternatively, users must change their code to use Bio::Fetch::EBI. * New class Bio::Fetch::EBI, EBI Dbfetch client. This acts the same as Bio::Fetch.new(Bio::Fetch::EBI::URL) with default database name "ena_sequence". lib/bio/io/fetch.rb | 36 +++++++++++++++++++++++++++++++++--- 1 file changed, 33 insertions(+), 3 deletions(-) commit e8919f4f57fc545ca194bebb08c11159b36071cb Author: Naohisa Goto Date: Tue Nov 11 13:43:28 2014 +0900 removed unused variables lib/bio/io/fetch.rb | 1 - 1 file changed, 1 deletion(-) commit faec95656b846a7a17cd6a1dbc633dda63cb5b6e Author: Naohisa Goto Date: Tue Nov 11 11:44:00 2014 +0900 Updated URL of EMBL-EBI Dbfetch lib/bio/io/fetch.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 85be893655f68aafbf7e13badd20bf7f26cd7328 Author: Jose Irizarry Date: Mon Dec 24 12:30:55 2012 -0400 Update lib/bio/io/fetch.rb Use EBI's dbfetch endpoint as default since BioRuby's endpoint has been disabled for a while now. lib/bio/io/fetch.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 163cc06547beed653e19b8c6e71e829d85f2f99c Author: Naohisa Goto Date: Tue Oct 21 16:42:30 2014 +0900 Doc bug fix: wrong sample code lib/bio/appl/paml/codeml.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 97143139d2d4a66366576a8e62518e93fa5afccf Author: Naohisa Goto Date: Tue Oct 21 15:47:52 2014 +0900 Prevent to repeat calculations of total bases * Bio::Sequence::NA#gc_content, at_content, gc_skew, at_skew: Prevent to repeat calculations of total bases. lib/bio/sequence/na.rb | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) commit b5dbd882e000842fef65e10290b379bfafdddf06 Author: Naohisa Goto Date: Tue Oct 21 15:41:13 2014 +0900 Documentation bug fix: Return value is Rational or Float. * Bio::Sequence::NA#gc_content, at_content, gc_skew, at_skew: Return value is Rational or Float in recent versions of Ruby. Documentation added for the treatment of "u" and to return 0.0 if there are no bases. Reported by ctSkennerton (https://github.com/bioruby/bioruby/issues/73 ). lib/bio/sequence/na.rb | 47 +++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 43 insertions(+), 4 deletions(-) commit 3ba98d52ce57488e604dd7ac388a874e5b40ae9d Author: Naohisa Goto Date: Tue Aug 12 00:58:38 2014 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) commit a9724d339582952b40c928beccf91376d4e63315 Author: Naohisa Goto Date: Tue Aug 5 19:20:42 2014 +0900 Update URIs * Update URIs. * Remove links to RubyForge and RAA which have already been closed. * Add some words for Ruby 1.9 or later. README.rdoc | 24 +++++++++++------------- 1 file changed, 11 insertions(+), 13 deletions(-) commit 5f3569faaf89ebcd2b2cf9cbe6b3c1f0544b2679 Author: Iain Barnett Date: Wed Mar 5 02:11:07 2014 +0000 Refactor Bio::AminoAcid#weight: Early return clearer and idiomatic. lib/bio/data/aa.rb | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-) commit c229a20887fcb6df9a7ba49ad5a23e175056fa8d Author: Iain Barnett Date: Wed Mar 5 02:02:45 2014 +0000 Fixed the stack level too deep errors by using Hash#invert. lib/bio/data/aa.rb | 18 +----------------- 1 file changed, 1 insertion(+), 17 deletions(-) commit 08dd928df30f5b39c255f9f70dbed8410d395cdf Author: Iain Barnett Date: Tue Mar 4 01:22:51 2014 +0000 Refactored to shorten, remove rescues, and clarify. lib/bio/alignment.rb | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) commit 112aa284cb1ebecc1d5de186edf2b385649a7268 Author: Naohisa Goto Date: Wed Mar 19 14:48:32 2014 +0900 Bug fix: SEQRES serNum digits were extended in PDB v3.2 (2008) * Bug fix: SEQRES serNum digits were extended in PDB v3.2 (2008). Thanks to a researcher who reports the patch via email. lib/bio/db/pdb/pdb.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit ecd5e0c86b04aa918b71c859568425fa39ebbde5 Author: Naohisa Goto Date: Sat Jan 18 04:22:51 2014 +0900 suppress "source :rubygems is deprecated" warning gemfiles/Gemfile.travis-jruby1.8 | 2 +- gemfiles/Gemfile.travis-jruby1.9 | 2 +- gemfiles/Gemfile.travis-ruby1.8 | 2 +- gemfiles/Gemfile.travis-ruby1.9 | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) commit 4bda345fe3de9cf1b64c26f3dca1cb3727c946d0 Author: Naohisa Goto Date: Sat Jan 18 04:22:03 2014 +0900 gemfiles/Gemfile.travis-rbx: Gemfile for rbx (Rubinius) on Travis-ci * gemfiles/Gemfile.travis-rbx: Gemfile for rbx (Rubinius) on Travis-ci * .travis.yml is modified to use gemfile/Gemfile.travis-rbx for rbx. .travis.yml | 4 ++-- gemfiles/Gemfile.travis-rbx | 16 ++++++++++++++++ 2 files changed, 18 insertions(+), 2 deletions(-) create mode 100644 gemfiles/Gemfile.travis-rbx commit dcff544d6d0a967eb853b97ba9faa30eaa6fd9dc Author: Naohisa Goto Date: Sat Jan 18 04:13:50 2014 +0900 .travis.yml: fix mistakes .travis.yml | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) commit f0f67f295f05a5e1e30c479621c25498e2c8f6f2 Author: Naohisa Goto Date: Sat Jan 18 03:56:54 2014 +0900 Ruby 2.1 workaround: Array#uniq does not always choose the first item * Ruby 2.1 workaround: Array#uniq does not always choose the first item. Thanks to Andrew Grimm who reported the issue. (https://github.com/bioruby/bioruby/issues/92 ) Note that the behavior change is also regarded as a bug in Ruby and is fixed. (https://bugs.ruby-lang.org/issues/9340 ) test/unit/bio/test_pathway.rb | 35 +++++++++++++++++++++++++---------- 1 file changed, 25 insertions(+), 10 deletions(-) commit e92e09edf5904f51d3e73e61d13fce4159a543c5 Author: Naohisa Goto Date: Sat Jan 18 03:32:05 2014 +0900 .travis.yml: workaround to avoid error in Ruby 1.8.7 and jruby-18mode * workaround to avoid error in Ruby 1.8.7 and jruby-18mode (reference: https://github.com/rubygems/rubygems/pull/763 ) .travis.yml | 2 ++ 1 file changed, 2 insertions(+) commit 655a675096962710896fb458afcac9b5deb1fa5f Author: Naohisa Goto Date: Sat Jan 18 03:22:44 2014 +0900 .travis.yml: rbx version is changed to 2.2.3 * rbx version is changed to 2.2.3 * add dependent gems for rbx platforms, described in http://docs.travis-ci.com/user/languages/ruby/ .travis.yml | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) commit d2f5b882d5e2acf35d0c783a56aa47533b9f2bd5 Author: Naohisa Goto Date: Sat Jan 11 03:46:45 2014 +0900 .travis.yml: change ruby versions for tar and gem integration tests * In tar and gem integration tests, Ruby versions are changed to MRI 2.0.0 and jruby-19mode. * Add jruby-18mode * Add rbx-2.1.1 .travis.yml | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) commit 71991af394b937d35e2bbbc84a21e65ffba7714d Author: Naohisa Goto Date: Thu Jan 9 00:57:25 2014 +0900 .travis.yml: Add 2.1.0 and 2.0.0, remove rbx-XXmode * Add 2.1.0 and 2.0.0 * Remove rbx-18mode and rbx-19mode * 1.9.2 is moved from "include" to "rvm". * 1.8.7 is moved from "rvm" to "include", and remove "gemfiles/Gemfile.travis-ruby1.8" line from "gemfile". * Remove "exclude" and simplify build matrix. * Suggested by agrimm in https://github.com/bioruby/bioruby/pull/91 .travis.yml | 27 +++++---------------------- 1 file changed, 5 insertions(+), 22 deletions(-) commit 80966bc875cc6e01978b6c9272f6ddd8f344aa62 Author: Brynjar Smari Bjarnason Date: Mon Dec 9 14:57:42 2013 +0100 Bug fix: Only do gsub on bio_ref.reference.authors if it exists. * Bug fix: Only do gsub on bio_ref.reference.authors if it exists. Fix https://github.com/bioruby/bioruby/issues/89 lib/bio/db/biosql/sequence.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 3337bbd3be2affcef44202a0c924b3e22dafd856 Author: Brynjar Smari Bjarnason Date: Mon Dec 9 14:55:24 2013 +0100 Bug fix: Missing require when generating genbank output for BioSQL sequence * Bug fix: Missing require when generating genbank output for BioSQL sequence. Partly fix https://github.com/bioruby/bioruby/issues/89 lib/bio/db/biosql/biosql_to_biosequence.rb | 1 + 1 file changed, 1 insertion(+) commit 1f829ae8e8c89c5c24e7bc7aa8ed5fa25e8ef6c2 Author: Naohisa Goto Date: Sat Nov 23 18:17:43 2013 +0900 Benchmark example1-seqnos.aln in addition to example1.aln * sample/benchmark_clustalw_report.rb: Benchmark parsing speed of example1-seqnos.aln in addition to example1.aln. sample/benchmark_clustalw_report.rb | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) commit c5d3e761859fa72c18f9301d84c31070f35e733e Author: Andrew Grimm Date: Tue Sep 17 21:15:56 2013 +1000 Add benchmark script for Bio::ClustalW::Report. sample/benchmark_clustalw_report.rb | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) create mode 100644 sample/benchmark_clustalw_report.rb commit 07c14e94cdb94cf9ba8a2bf050572ae1cbf24cff Author: Naohisa Goto Date: Sat Nov 23 17:49:54 2013 +0900 Bio::ClustalW::Report#do_parse speed optimization * Bio::ClustalW::Report#do_parse speed optimization. Thanks to Andrew Grimm who indicates the optimization point. (https://github.com/bioruby/bioruby/pull/86 ) * "$" in the regular expression is changed to "\z". In this context, the "$" was intended to be matched with only the end of the string. lib/bio/appl/clustalw/report.rb | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) commit 6a78028d4f595ecb5b4600d0f238b07a2d80bdd5 Author: Naohisa Goto Date: Sat Nov 23 15:52:15 2013 +0900 New test data: ClustalW with -SEQNOS=ON option * test/data/clustalw/example1-seqnos.aln: New test data: ClustalW running with -SEQNOS=ON option. * Bio::TestClustalWReport::TestClustalWReportSeqnos: new test class that parses the above data. test/data/clustalw/example1-seqnos.aln | 58 ++++++++++++++++++++++++++++ test/unit/bio/appl/clustalw/test_report.rb | 11 ++++++ 2 files changed, 69 insertions(+) create mode 100644 test/data/clustalw/example1-seqnos.aln commit f5da0bbb4b1639616bb8c63ff8c58840e140ef8b Author: Naohisa Goto Date: Sat Nov 23 15:35:31 2013 +0900 Simplify test data path in setup * Bio::TestClustalWReport::TestClustalWReport#setup: simplify test data filename path. * Modify indentes and void lines. test/unit/bio/appl/clustalw/test_report.rb | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) commit 8f0bea1d3252d0de9e2d91dc31ef8a9552c2d758 Author: Naohisa Goto Date: Sat Nov 23 15:21:38 2013 +0900 Common test methods are moved to a module * New namespace module Bio::TestClustalWReport. * Common test methods are moved to CommonTestMethods, and test classes using the methods include it. * The test_sequences method is split into two methods CommonTestMethods#test_sequence0 and test_sequence1. test/unit/bio/appl/clustalw/test_report.rb | 97 +++++++++++++++------------- 1 file changed, 53 insertions(+), 44 deletions(-) commit edda65b8fb32c2eee6b0652074981c31aa68b0eb Author: Naohisa Goto Date: Fri Aug 23 23:51:59 2013 +0900 Test bug fix: Read test file with binary mode to avoid encoding error * Test bug fix: Read test file with binary mode to avoid string encoding error. Thanks to nieder (github.com/nieder) who reports the bug. (https://github.com/bioruby/bioruby/issues/84) test/unit/bio/db/test_phyloxml.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 011d6fe5f016408891c5da3143c83e2564ccbf27 Author: meso_cacase Date: Fri Apr 5 01:13:27 2013 +0900 Modified siRNA design rules by Ui-Tei and Reynolds * Ui-Tei rule: Avoided contiguous GCs 10 nt or more. * Reynolds rule: Total score of eight criteria is calculated. * Returns numerical score for functional siRNA instead of returning 'true'. * Returns 'false' for non-functional siRNA, as usual. * Unit tests are modified to reflect these changes. lib/bio/util/sirna.rb | 92 +++++++++++++++++++++++++++++++------- test/unit/bio/util/test_sirna.rb | 44 +++++++++--------- 2 files changed, 98 insertions(+), 38 deletions(-) commit b6e7953108ebf34d61bc79ee4bdae1092cfe339f Author: Naohisa Goto Date: Fri Jun 28 15:40:57 2013 +0900 Use Bio::UniProtKB instead of Bio::UniProt * Use Bio::UniProtKB instead of Bio::UniProt. * Test class names are also changed from UniProt to UniProtKB. test/unit/bio/db/embl/test_uniprotkb_new_part.rb | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) commit cddd35cf8d64abfff8bd6a8372d019fc4c32848c Author: Naohisa Goto Date: Fri Jun 28 15:26:20 2013 +0900 rename test/unit/bio/db/embl/test_uniprot_new_part.rb to test_uniprotkb_new_part.rb test/unit/bio/db/embl/test_uniprot_new_part.rb | 208 ---------------------- test/unit/bio/db/embl/test_uniprotkb_new_part.rb | 208 ++++++++++++++++++++++ 2 files changed, 208 insertions(+), 208 deletions(-) delete mode 100644 test/unit/bio/db/embl/test_uniprot_new_part.rb create mode 100644 test/unit/bio/db/embl/test_uniprotkb_new_part.rb commit 1b51d0940712a6f144f8268dc77048bc7ec7d983 Author: Naohisa Goto Date: Fri Jun 28 15:21:36 2013 +0900 Reflect the rename of Bio::UniProtKB from SPTR to UniProtKB. * Reflect the rename of Bio::UniProtKB from SPTR to UniProtKB. * Test class names are also changed. test/unit/bio/db/embl/test_uniprotkb.rb | 223 +++++++++++++++---------------- 1 file changed, 111 insertions(+), 112 deletions(-) commit 68494aa862c3495def713e6cad6fc478f223416f Author: Naohisa Goto Date: Fri Jun 28 15:01:15 2013 +0900 test_sptr.rb is renamed to test_uniprotkb.rb test/unit/bio/db/embl/test_sptr.rb | 1807 ------------------------------- test/unit/bio/db/embl/test_uniprotkb.rb | 1807 +++++++++++++++++++++++++++++++ 2 files changed, 1807 insertions(+), 1807 deletions(-) delete mode 100644 test/unit/bio/db/embl/test_sptr.rb create mode 100644 test/unit/bio/db/embl/test_uniprotkb.rb commit e1ed7fab4c0350e6866dd420a93e950c53063f38 Author: Naohisa Goto Date: Fri Jun 28 14:52:08 2013 +0900 Add autoload of Bio::UniProtKB, and modify comments of deprecated classes. lib/bio.rb | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) commit 7c78cb1b275a845e215f9a6da67026836efc5807 Author: Naohisa Goto Date: Fri Jun 28 14:28:02 2013 +0900 Bio::SwissProt and Bio::TrEMBL are deprecated * Bio::SwissProt and Bio::TrEMBL are deprecated. * Show warning messages when using these classes. lib/bio/db/embl/swissprot.rb | 41 ++++++++++++---------------------------- lib/bio/db/embl/trembl.rb | 43 +++++++++++++----------------------------- 2 files changed, 25 insertions(+), 59 deletions(-) commit b998ad13849ff7f1d69ed0c640a2e1bafe3fc957 Author: Naohisa Goto Date: Fri Jun 28 14:27:36 2013 +0900 Bio::UniProt is changed to be an alias of Bio::UniProtKB. lib/bio/db/embl/uniprot.rb | 41 ++++++++++++----------------------------- 1 file changed, 12 insertions(+), 29 deletions(-) commit f46324e2fb6a2bc3e4680c8064dc0fc3d89e6f24 Author: Naohisa Goto Date: Fri Jun 28 14:21:56 2013 +0900 Bio::SPTR is renamed as Bio::UniProtKB and changed to an alias * Bio::SPTR is renamed to Bio::UniProtKB. * For older programs which use Bio::SPTR, set Bio::SPTR as an alias of Bio::UniProtKB. lib/bio/db/embl/sptr.rb | 20 ++++++ lib/bio/db/embl/uniprotkb.rb | 147 +++++++++++++++++++++--------------------- 2 files changed, 93 insertions(+), 74 deletions(-) create mode 100644 lib/bio/db/embl/sptr.rb commit 70816d90a6ef290c7ca7f50d492e7c4f836aadd8 Author: Naohisa Goto Date: Thu Jun 27 18:16:38 2013 +0900 Rename lib/bio/db/embl/sptr.rb to uniprotkb.rb lib/bio/db/embl/sptr.rb | 1456 ------------------------------------------ lib/bio/db/embl/uniprotkb.rb | 1456 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 1456 insertions(+), 1456 deletions(-) delete mode 100644 lib/bio/db/embl/sptr.rb create mode 100644 lib/bio/db/embl/uniprotkb.rb commit 2a10ded8e1502e0db5ec3b2e060f658ee53aafd0 Author: Naohisa Goto Date: Thu Jun 27 16:36:58 2013 +0900 Bio::RefSeq and Bio::DDBJ are deprecated. Show warnings. * Bio::RefSeq and Bio::DDBJ are deprecated because they were only an alias of Bio::GenBank. Please use Bio::GenBank instead. * Show warning message when loading the classes and initializing a new instance. * Changed to require genbank.rb only when GenBank is not defined. This might reduce the possibility of circular require. lib/bio/db/genbank/ddbj.rb | 11 +++++++++-- lib/bio/db/genbank/refseq.rb | 14 +++++++++++--- 2 files changed, 20 insertions(+), 5 deletions(-) commit 118d0bff58b48f69505eef5dcc2f961ac6e0d9de Author: Naohisa Goto Date: Thu Jun 27 16:08:55 2013 +0900 Remove descriptions about DDBJ Web API (WABI) KNOWN_ISSUES.rdoc | 8 -------- 1 file changed, 8 deletions(-) commit fe8f976c7ced4d525a4eabd728269f71326cf001 Author: Naohisa Goto Date: Thu Jun 27 13:41:19 2013 +0900 Remove ddbjsoap method that uses Bio::DDBJ::XML lib/bio/shell/plugin/soap.rb | 28 ---------------------------- 1 file changed, 28 deletions(-) commit 54bef3fc48bb48eb198537a9fba6379f33f036cc Author: Naohisa Goto Date: Thu Jun 27 13:39:42 2013 +0900 Remove Bio::Blast::Remote::DDBJ from the comment line test/network/bio/appl/blast/test_remote.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit a7c5a656dab1bb8ada6b36ec003a89aec9e26671 Author: Naohisa Goto Date: Tue Jun 25 18:34:46 2013 +0900 Delete sample/demo_ddbjxml.rb which uses Bio::DDBJ::XML sample/demo_ddbjxml.rb | 212 ------------------------------------------------ 1 file changed, 212 deletions(-) delete mode 100644 sample/demo_ddbjxml.rb commit e55293b67d305382cfb30b45aa30af82a574b580 Author: Naohisa Goto Date: Tue Jun 25 18:29:54 2013 +0900 Remove Bio::Blast::Remote::DDBJ, Bio::Blast.ddbj and related components * Remove Bio::Blast::Remote::DDBJ, Bio::Blast.ddbj and related components which use Bio::DDBJ::XML or Bio::DDBJ::REST. lib/bio/appl/blast/ddbj.rb | 131 ---------------------------- lib/bio/appl/blast/remote.rb | 9 -- test/network/bio/appl/blast/test_remote.rb | 14 --- test/network/bio/appl/test_blast.rb | 12 --- 4 files changed, 166 deletions(-) delete mode 100644 lib/bio/appl/blast/ddbj.rb commit 19a5c992096a68a26f8684ee2ae128d17f2a49fd Author: Naohisa Goto Date: Tue Jun 25 16:52:05 2013 +0900 Remove Bio::DDBJ::XML and REST due to suspension of DDBJ Web API (WABI) * Remove Bio::DDBJ::XML and Bio::DDBJ::REST due to suspension of DDBJ Web API (WABI). DDBJ says that it is now under reconstruction and the API will be completely changed. Thus, I've decided to throw away current API client in Ruby and to implement new one with the new API. * Autoload lines in lib/bio/db/genbank/ddbj.rb are removed. * Tests are also removed. lib/bio/db/genbank/ddbj.rb | 3 - lib/bio/io/ddbjrest.rb | 344 ------------------------- lib/bio/io/ddbjxml.rb | 458 ---------------------------------- test/network/bio/io/test_ddbjrest.rb | 47 ---- test/unit/bio/io/test_ddbjxml.rb | 81 ------ 5 files changed, 933 deletions(-) delete mode 100644 lib/bio/io/ddbjrest.rb delete mode 100644 lib/bio/io/ddbjxml.rb delete mode 100644 test/network/bio/io/test_ddbjrest.rb delete mode 100644 test/unit/bio/io/test_ddbjxml.rb commit 1f852e0bf3c830aaa40dc7fc2bd535418af8dfd1 Author: Naohisa Goto Date: Sat May 25 03:00:08 2013 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 2 -- 1 file changed, 2 deletions(-) commit 5b90959ab399f961823a7c4453392c75cf971333 Author: Naohisa Goto Date: Sat May 25 02:58:50 2013 +0900 Update files and directories used to create package without git bioruby.gemspec.erb | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) commit df29f057ded6ac73fbdce7ae04a70ead28f4cc9f Author: Naohisa Goto Date: Sat May 25 02:46:32 2013 +0900 Ruby 2.0 support: not to add ChangeLog and LEGAL to rdoc_files * Ruby 2.0 support: not to add ChangeLog and LEGAL to rdoc_files. Because ChangeLog is not rdoc format, rdoc bundled with Ruby 2.0 raises error during parsing. bioruby.gemspec.erb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 930a5fcf5e38ae2bdfeee62eed9a46db1c519fae Author: Kenichi Kamiya Date: Thu Apr 4 17:29:33 2013 +0900 Remove unused variable in lib/bio/util/contingency_table.rb This commit removes below interpreter warning. * warning: assigned but unused variable lib/bio/util/contingency_table.rb | 2 -- 1 file changed, 2 deletions(-) commit 490b3f7ca3b987c1a17852b641aad3125fc565cd Author: Kenichi Kamiya Date: Thu Apr 4 17:28:30 2013 +0900 Rename unused variable in lib/bio/tree.rb This commit removes below interpreter warning. * warning: assigned but unused variable lib/bio/tree.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit c024fb972edb52e213165149273fc7ac4bec2f6e Author: Naohisa Goto Date: Thu May 16 21:26:44 2013 +0900 Refactoring to suppress "warning: assigned but unused variable" lib/bio/pathway.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit b3b2a268d6118307eed88fce1d805a61c6fb843d Author: Kenichi Kamiya Date: Thu Apr 4 17:18:44 2013 +0900 Remove unused variable in lib/bio/db/transfac.rb This commit removes below interpreter warning. * warning: assigned but unused variable lib/bio/db/transfac.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit dd8abf1f95af4a70cf0b86b0e719e3dcfd8abecf Author: Naohisa Goto Date: Thu May 16 21:13:34 2013 +0900 Refactoring to suppress warnings "assigned but unused variable" lib/bio/db/nexus.rb | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) commit b37512fb8028cf30bb2f813928aed49a5b39dce3 Author: Kenichi Kamiya Date: Thu Apr 4 17:15:59 2013 +0900 Rename unused variable in lib/bio/db/kegg/reaction.rb This commit removes below interpreter warning. * warning: assigned but unused variable lib/bio/db/kegg/reaction.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit a81fca3b1247ae4a3e05bfa912c8181efdfca81b Author: Kenichi Kamiya Date: Thu Apr 4 17:15:09 2013 +0900 Remove unused variable in lib/bio/db/go.rb This commit removes below interpreter warning. * warning: assigned but unused variable lib/bio/db/go.rb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit 69b0c433e76faffba6a48dfc38dcc2b1444ce2b7 Author: Kenichi Kamiya Date: Thu Apr 4 17:13:24 2013 +0900 Rename unused variable in lib/bio/db/gff.rb This commit removes below interpreter warning. * warning: assigned but unused variable lib/bio/db/gff.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 88c214fe3183c161cda94a3a4cda442b3a769965 Author: Naohisa Goto Date: Thu May 9 23:46:28 2013 +0900 add a dummy line to suppress "warning: assigned but unused variable" lib/bio/db/embl/sptr.rb | 1 + 1 file changed, 1 insertion(+) commit 1ead12f9c951a983c6775f79ca1b6944f95a61b9 Author: Naohisa Goto Date: Thu May 9 23:41:54 2013 +0900 Refactoring to suppress warnings "assigned but unused variable" lib/bio/db/embl/embl.rb | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) commit 8d0eb5105eb2f419f5b4f4fbb191b8fb2032664b Author: Kenichi Kamiya Date: Thu Apr 4 17:01:27 2013 +0900 Remove unused variable in lib/bio/appl/paml/common This commit removes below interpreter warning. * warning: assigned but unused variable lib/bio/appl/paml/common.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit c6cf0d2e2a3a0b9062f9464dba0e363f460d04e4 Author: Naohisa Goto Date: Thu May 9 23:27:54 2013 +0900 suppress warning "assigned but unused variable" lib/bio/appl/paml/codeml/report.rb | 1 + 1 file changed, 1 insertion(+) commit 8834d50544b03a92a3ca816704b179e4333d1dfc Author: Kenichi Kamiya Date: Thu Apr 4 16:59:18 2013 +0900 Remove unused variable in lib/bio/appl/meme/mast/report This commit removes below interpreter warning. * warning: assigned but unused variable lib/bio/appl/meme/mast/report.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit fe51a49ee68c41a3ce0c48c39db6e8a28d1689ee Author: Kenichi Kamiya Date: Thu Apr 4 16:57:44 2013 +0900 Remove unused variable in lib/bio/appl/blast/report This commit removes below interpreter warning. * warning: assigned but unused variable lib/bio/appl/blast/report.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 622497ff309412fb986c5315d55d41c3ca48d362 Author: Kenichi Kamiya Date: Thu Apr 4 17:25:29 2013 +0900 Fix indent in lib/bio/map This commit removes below interpreter warning. * warning: mismatched indentations at ... lib/bio/map.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 3ea6bcaf229fd1a71a0192253cc47e817bb64b82 Author: Kenichi Kamiya Date: Thu Apr 4 18:05:04 2013 +0900 Remove unused variable in test/unit/bio/appl/blast/test_report This commit removes below interpreter warning. * warning: assigned but unused variable test/unit/bio/appl/blast/test_report.rb | 2 -- 1 file changed, 2 deletions(-) commit 178ca9e5244cc3aa7f0507c7d5528bb57b0858be Author: Kenichi Kamiya Date: Thu Apr 4 18:03:46 2013 +0900 Remove unused variable in test/unit/bio/appl/bl2seq/test_report This commit removes below interpreter warning. * warning: assigned but unused variable test/unit/bio/appl/bl2seq/test_report.rb | 1 - 1 file changed, 1 deletion(-) commit b8a5c1cb9f54d9199200b406f77e8152eef96f02 Author: Naohisa Goto Date: Thu May 9 21:20:10 2013 +0900 Add assertions and suppress "unused variable" warnings * Add assertions to check object id returned by forward_complement and reverse_complement methods. This change also aims to suppress "assigned but unused variable" warnings. test/unit/bio/sequence/test_na.rb | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) commit bd8fc9b197c54c108d74fea9161c8f0dd3b041fc Author: Kenichi Kamiya Date: Thu Apr 4 17:59:09 2013 +0900 Remove unused variable in test/unit/bio/io/flatfile/test_splitter This commit removes below interpreter warning. * warning: assigned but unused variable test/unit/bio/io/flatfile/test_splitter.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 0a87c9e265c4560453faf84fc009b60319c75416 Author: Kenichi Kamiya Date: Thu Apr 4 17:57:51 2013 +0900 Remove unused variable in test/unit/bio/db/test_phyloxml_writer This commit removes below interpreter warning. * warning: assigned but unused variable test/unit/bio/db/test_phyloxml_writer.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 95b2614eb32eb12428df29360d0c1f146f39a469 Author: Naohisa Goto Date: Thu May 9 20:56:43 2013 +0900 Comment out some lines to suppress "unused variable" warnings test/unit/bio/db/test_gff.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit b8917841559fbd506c73fdf374a8097f23a1bc37 Author: Kenichi Kamiya Date: Thu Apr 4 17:51:11 2013 +0900 Remove unused variable in test/unit/bio/db/embl/test_sptr.rb * Remove warnings "warning: assigned but unused variable" * Note that the sequence in TestSPTRUniProtRel7_6#test_10000aa is a fragment of human p53 protein, and is not related with Q09165. test/unit/bio/db/embl/test_sptr.rb | 3 --- 1 file changed, 3 deletions(-) commit 6b46d324a545f509bbd238ae7ec009d586469314 Author: Kenichi Kamiya Date: Thu Apr 4 17:45:47 2013 +0900 Remove unused variable in test/unit/bio/db/embl/test_embl_rel89 This commit removes below interpreter warning. * warning: assigned but unused variable test/unit/bio/db/embl/test_embl_rel89.rb | 1 - 1 file changed, 1 deletion(-) commit f36eeb0107e7a8315c66888ec8292ed33bd959cc Author: Kenichi Kamiya Date: Thu Apr 4 17:45:21 2013 +0900 Remove unused variable in test/unit/bio/db/embl/test_embl This commit removes below interpreter warning. * warning: assigned but unused variable test/unit/bio/db/embl/test_embl.rb | 1 - 1 file changed, 1 deletion(-) commit a1a2ad9b963d9bb2da8d07ae7b182bd339bea88e Author: Kenichi Kamiya Date: Thu Apr 4 17:36:59 2013 +0900 Fix indent in test/unit/bio/sequence/test_dblink This commit removes below interpreter warning. * warning: mismatched indentations at ... test/unit/bio/sequence/test_dblink.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 345a8eb4408ca241c13c410a578490c905eb7391 Author: Kenichi Kamiya Date: Thu Apr 4 17:36:21 2013 +0900 Fix indent in test/unit/bio/db/test_phyloxml This commit removes below interpreter warning. * warning: mismatched indentations at ... test/unit/bio/db/test_phyloxml.rb | 58 ++++++++++++++++++------------------- 1 file changed, 29 insertions(+), 29 deletions(-) commit ae8c7a6705a30c0c18c57df9869979a968aa63ac Author: Kenichi Kamiya Date: Thu Apr 4 17:35:07 2013 +0900 Fix indent in test/unit/bio/db/genbank/test_genbank This commit removes below interpreter warning. * warning: mismatched indentations at ... test/unit/bio/db/genbank/test_genbank.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 872d8954e1351251fbace20e331035251ae5f806 Author: Kenichi Kamiya Date: Thu Apr 4 17:33:23 2013 +0900 Fix indent in test/unit/bio/appl/test_blast This commit removes below interpreter warning. * warning: mismatched indentations at ... test/unit/bio/appl/test_blast.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit bd973e084695c4d777c8ecf6d566788838158165 Author: Naohisa Goto Date: Wed Mar 27 03:03:49 2013 +0900 .travis.yml: rbx-18mode is moved to allow_failures .travis.yml | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) commit 63e93faba74a8143a0be9595fdf87329f3015745 Author: Andrew Grimm Date: Tue Mar 26 20:20:11 2013 +1100 Squash warning in alignment.rb: assigned but unused variable - oldkeys lib/bio/alignment.rb | 1 - 1 file changed, 1 deletion(-) commit bd735d6f9d6edfd1550a4279167ac06b372f847a Author: Andrew Grimm Date: Tue Mar 26 20:14:46 2013 +1100 Squash warning in alignment.rb: assigned but unused variable - lines lib/bio/alignment.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 3e7b27f96a901a3abfc338572f98d60a9e3be498 Author: Andrew Grimm Date: Tue Mar 26 19:44:49 2013 +1100 Squash warning in defline.rb: assigned but unused variable - idtype lib/bio/db/fasta/defline.rb | 1 - 1 file changed, 1 deletion(-) commit aafc03330fa79243cfa4097d356a7c304ddb7980 Author: Kenichi Kamiya Date: Sat Feb 16 21:22:55 2013 +0900 Simplify some regular expressions * /\w/ including /\d/ * /\s/ including [/\r/, /\t/, /\n/] lib/bio/appl/blast/format0.rb | 2 +- lib/bio/db/embl/common.rb | 2 +- lib/bio/db/embl/embl.rb | 2 +- lib/bio/db/embl/sptr.rb | 2 +- lib/bio/db/gff.rb | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) commit 623ad4011fa8b56f3c9f50a859d1fa26f6570700 Author: Naohisa Goto Date: Fri Jan 11 16:41:12 2013 +0900 Improvement of parameter checks and error output * Improvement of parameter checks * To avoid potential XSS in old MSIE which ignores content-type, always do CGI.escapeHTML for parameters in error messages sample/biofetch.rb | 91 ++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 63 insertions(+), 28 deletions(-) commit 03d48c43f1de7ebc9104b9aa972f226774a0bf49 Author: Naohisa Goto Date: Fri Jan 11 15:32:05 2013 +0900 Add metadata cache * Add metadata cache. It caches the list of databases and a list of available formats for each database. Database entries are not cached. * charset=utf-8 in CGI header. sample/biofetch.rb | 110 +++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 87 insertions(+), 23 deletions(-) commit 114d29d4bdfc328f5e91adee9bea465622248e0d Author: Naohisa Goto Date: Fri Jan 11 09:10:08 2013 +0900 remove excess double quotations in html part sample/biofetch.rb | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) commit 949311648b92d51a2596f896fdae8d74ac0608a3 Author: Naohisa Goto Date: Fri Jan 11 08:59:18 2013 +0900 add magic comment: coding utf-8 sample/biofetch.rb | 1 + 1 file changed, 1 insertion(+) commit 4ae509273134c5deca7910847063ed07c56150db Author: Naohisa Goto Date: Thu Jan 10 23:27:09 2013 +0900 Rewrite to use TogoWS REST API instead of SOAP-based KEGG API. * Rewrite to use TogoWS REST API instead of deprecated SOAP-based KEGG API. * Examples are changed to fit with current TogoWS. * Now, the script does not depend on any non-standard libraries including BioRuby. This means that one can put this script on a server without installing BioRuby. * New constans SCRIPT_NAME and BASE_URL for easy customizing. * Many changes. See "git diff" for details. sample/biofetch.rb | 265 ++++++++++++++++++++++++++-------------------------- 1 file changed, 131 insertions(+), 134 deletions(-) commit bc98bc54c59be98425d66c64b19a3b9612993beb Author: Naohisa Goto Date: Thu Jan 10 15:17:42 2013 +0900 Add 'gem "rdoc"' to avoid "ERROR: 'rake/rdoctask' is obsolete..." gemfiles/Gemfile.travis-ruby1.9 | 1 + 1 file changed, 1 insertion(+) commit dfe54b2fbe303f56a868404173fe346724b7aa4a Author: Naohisa Goto Date: Thu Jan 10 14:06:45 2013 +0900 Add 'gem "rdoc"' to avoid "ERROR: 'rake/rdoctask' is obsolete..." gemfiles/Gemfile.travis-jruby1.8 | 1 + gemfiles/Gemfile.travis-jruby1.9 | 1 + 2 files changed, 2 insertions(+) commit f07ec6ac326d51c055496983abba54afd00c35d4 Author: Naohisa Goto Date: Thu Jan 10 01:38:00 2013 +0900 Add 'gem "rdoc"' to avoid "ERROR: 'rake/rdoctask' is obsolete..." gemfiles/Gemfile.travis-ruby1.8 | 1 + 1 file changed, 1 insertion(+) commit 4221d52055087f85daa1c23349d10ecdb4d01a31 Author: Naohisa Goto Date: Thu Jan 10 01:27:03 2013 +0900 Ruby 2.0 support: Set script encoding to US-ASCII for gff.rb. lib/bio/db/gff.rb | 1 + 1 file changed, 1 insertion(+) commit 1526df8273e9d2283fd4a921d4cf8c0c664fe71c Author: Naohisa Goto Date: Thu Jan 10 00:45:36 2013 +0900 Convert encoding of the Japanese tutorial files to UTF-8 doc/Tutorial.rd.ja | 1920 +++++++++++++++++++++++------------------------ doc/Tutorial.rd.ja.html | 1918 +++++++++++++++++++++++----------------------- 2 files changed, 1919 insertions(+), 1919 deletions(-) commit 3215570185a46dd0d6c4cd96d583b2487636b483 Author: Naohisa Goto Date: Thu Jan 10 00:41:51 2013 +0900 updated doc/Tutorial.rd.html and doc/Tutorial.rd.ja.html doc/Tutorial.rd.html | 19 ++--- doc/Tutorial.rd.ja.html | 202 +++++++++++++---------------------------------- 2 files changed, 63 insertions(+), 158 deletions(-) commit 8db12935a9cc15bae92bdb7183476cfea9e1f819 Author: Naohisa Goto Date: Thu Jan 10 00:38:18 2013 +0900 Set html title when generating tutorial html Rakefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit 644d438decceb072475877a749435fba543ff8ea Author: Naohisa Goto Date: Fri Jan 4 03:19:00 2013 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 6 ------ 1 file changed, 6 deletions(-) commit 54dc9b9f68ee2de9ee005a772ce000277a073d97 Author: Naohisa Goto Date: Fri Jan 4 02:35:01 2013 +0900 Remove sample/psortplot_html.rb which depend on Bio::KEGG::API. * Remove sample/psortplot_html.rb because it strongly depend on removed Bio::KEGG::API and discontinued SOAP-based KEGG API. It is hard to re-write by using new REST-based KEGG API because the new API seems to lack color_pathway_by_objects that returns image URL. Moreover, there is no one-by-one API migration guide. sample/psortplot_html.rb | 214 ---------------------------------------------- 1 file changed, 214 deletions(-) delete mode 100644 sample/psortplot_html.rb commit dbdf2dad3dec9d10141b891a481b9b05e1561708 Author: Naohisa Goto Date: Fri Jan 4 02:34:05 2013 +0900 Remove descriptions about KEGG API and Bio::KEGG::API. doc/Tutorial.rd | 6 --- doc/Tutorial.rd.ja | 106 +--------------------------------------------------- 2 files changed, 1 insertion(+), 111 deletions(-) commit 3ca725dc1e07f794344c9fcae43d4972ed2895da Author: Naohisa Goto Date: Fri Jan 4 02:33:09 2013 +0900 Remove description about KEGG API and Bio::KEGG::API. README.rdoc | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) commit d4568788069f2d998a78ad72b1d906aae13e85f4 Author: Naohisa Goto Date: Thu Jan 3 23:55:58 2013 +0900 Remove KEGG API plugin of BioRuby Shell, due to the removal of Bio::KEGG::API. lib/bio/shell.rb | 1 - lib/bio/shell/plugin/keggapi.rb | 181 --------------------------------------- 2 files changed, 182 deletions(-) delete mode 100644 lib/bio/shell/plugin/keggapi.rb commit 22c8f4945d622f8f22c08b262c6caf81a0261284 Author: Naohisa Goto Date: Thu Jan 3 23:52:36 2013 +0900 Delete autoload lines for removed Bio::KEGG::API lib/bio.rb | 4 ---- 1 file changed, 4 deletions(-) commit b56ec0984d5001c3a4d3b4f0ba8fbbbf79835747 Author: Naohisa Goto Date: Thu Jan 3 23:51:24 2013 +0900 Remove Bio::KEGG::API and its sample code and documentation files. * Remove Bio::KEGG::API and its sample code and documentation files. * deleted: lib/bio/io/keggapi.rb * deleted: doc/KEGG_API.rd * deleted: doc/KEGG_API.rd.ja * deleted: sample/demo_keggapi.rb doc/KEGG_API.rd | 1843 ------------------------------------------------ doc/KEGG_API.rd.ja | 1834 ----------------------------------------------- lib/bio/io/keggapi.rb | 363 ---------- sample/demo_keggapi.rb | 502 ------------- 4 files changed, 4542 deletions(-) delete mode 100644 doc/KEGG_API.rd delete mode 100644 doc/KEGG_API.rd.ja delete mode 100644 lib/bio/io/keggapi.rb delete mode 100644 sample/demo_keggapi.rb commit 63af413c122b4531193153fbfee034deaf0a9606 Author: Naohisa Goto Date: Mon Oct 1 21:11:14 2012 +0900 Bug fix: parse error when subject sequence contains spaces * Bug fix: parse error when subject sequence contains spaces. Thanks to Edward Rice who reports the bug. (Bug #3385) (https://redmine.open-bio.org/issues/3385) lib/bio/appl/blast/format0.rb | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) commit 9f2f682ec6624ff356bea7aca76365ba95d33549 Author: Naohisa Goto Date: Fri Sep 7 16:50:44 2012 +0900 add an env line to be recognized in allow_failures .travis.yml | 1 + 1 file changed, 1 insertion(+) commit fead6dda526081db09c56c2262f111338b7d8cd7 Author: Naohisa Goto Date: Fri Sep 7 16:08:57 2012 +0900 environment variable TESTOPTS=-v for verbose output of rake test .travis.yml | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) commit 3de19895140502898c77fc83d9ad6fae47331763 Author: Naohisa Goto Date: Thu Sep 6 18:17:22 2012 +0900 Remove Bio.method_missing because it is broken. * Bio.method_missing, the hook of undefined methods, providing shortcut of Bio::Shell methods, is now removed, because it does not work correctly, and because the use of method_missing should normally be avoided unless it is really necessary. Alternatively, use Bio::Shell.xxxxx (xxxxx is a method name). lib/bio.rb | 13 ------------- 1 file changed, 13 deletions(-) commit a358584c4a76be6a38ab38a18e6dc66840030450 Author: Naohisa Goto Date: Thu Sep 6 16:48:51 2012 +0900 Delete autoload line of a removed class Bio::NCBI::SOAP. lib/bio/io/ncbirest.rb | 1 - 1 file changed, 1 deletion(-) commit 340d665775b862da638e4d12751b84d2ccd83e82 Author: Naohisa Goto Date: Thu Sep 6 16:47:03 2012 +0900 Delete autoload lines of removed classes. lib/bio.rb | 4 ---- 1 file changed, 4 deletions(-) commit c7c29a672b38d2182cf4afc9a970b854af1149a7 Author: Naohisa Goto Date: Thu Sep 6 16:43:25 2012 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 3 --- 1 file changed, 3 deletions(-) commit 09bb4b8a8b7e01a36dbe0cf44a5c2a6a6b5750f1 Author: Naohisa Goto Date: Thu Sep 6 16:23:17 2012 +0900 Remove Bio::Shell#ncbisoap which uses removed Bio::NCBI::SOAP. lib/bio/shell/plugin/soap.rb | 9 --------- 1 file changed, 9 deletions(-) commit a5e46acdaf06568bea6cb773200bbf3881b5670e Author: Naohisa Goto Date: Thu Sep 6 16:02:32 2012 +0900 Remove issues about removed classes Bio::NCBI::SOAP and Bio::KEGG::Taxonomy KNOWN_ISSUES.rdoc | 10 ---------- 1 file changed, 10 deletions(-) commit 529815acb1b57486bd506b81eec6be80277cbae7 Author: Naohisa Goto Date: Wed Sep 5 11:33:27 2012 +0900 Remove Bio::KEGG::Taxonomy which is old and broken * Remove Bio::KEGG::Taxonomy because it raises error or the data structure in the return value seems to be broken. In addition, running the sample script sample/demo_kegg_taxonomy.rb shows error or falls into infinite loop. Moreover, KEGG closes public FTP site and the target data file of the class ("taxonomy") can only be obtained by paid subscribers. From the above reasons, it seems there are no users of this class now. * Deleted files: lib/bio/db/kegg/taxonomy.rb, sample/demo_kegg_taxonomy.rb lib/bio/db/kegg/taxonomy.rb | 280 ------------------------------------------ sample/demo_kegg_taxonomy.rb | 92 -------------- 2 files changed, 372 deletions(-) delete mode 100644 lib/bio/db/kegg/taxonomy.rb delete mode 100644 sample/demo_kegg_taxonomy.rb commit dc47fb46e86bba15ba43de31075eaba3cf811fa3 Author: Naohisa Goto Date: Wed Sep 5 11:26:00 2012 +0900 Remove Bio::NCBI::SOAP which is broken * Remove Bio::NCBI::SOAP in lib/bio/io/ncbisoap.rb, because it always raises error during the parsing of WSDL files provided by NCBI, both with Ruby 1.8.X (with bundled SOAP4R) and Ruby 1.9.X (with soap4r-ruby1.9 gem). To solve the error, modifying SOAP4R may be needed, that seems very difficult. The alternative is Bio::NCBI::REST, REST client class for the NCBI EUtil web services. lib/bio/io/ncbisoap.rb | 156 ------------------------------------------------ 1 file changed, 156 deletions(-) delete mode 100644 lib/bio/io/ncbisoap.rb commit 314e06e54603bb238015c391904f414b3da48752 Author: Naohisa Goto Date: Tue Sep 4 11:13:47 2012 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) commit e929d5d23a9b489ef42f30b33959f059baf1e185 Author: Naohisa Goto Date: Tue Sep 4 11:09:36 2012 +0900 Remove issues about removed classes Bio::Ensembl and Bio::DBGET. KNOWN_ISSUES.rdoc | 16 ---------------- 1 file changed, 16 deletions(-) commit 550a5440490012f73b6d38d84238cd498f2ebb02 Author: Naohisa Goto Date: Tue Sep 4 10:57:20 2012 +0900 Remove Bio::Ensembl because it does not work * Remove Bio::Ensembl because it does not work after the renewal of the Ensembl web site in 2008. * Alternative is io-ensembl gem which supports current Ensembl API. http://rubygems.org/gems/bio-ensembl * Deleted files: lib/bio/io/ensembl.rb, test/network/bio/io/test_ensembl.rb, test/unit/bio/io/test_ensembl.rb. lib/bio/io/ensembl.rb | 229 ---------------------------------- test/network/bio/io/test_ensembl.rb | 230 ----------------------------------- test/unit/bio/io/test_ensembl.rb | 111 ----------------- 3 files changed, 570 deletions(-) delete mode 100644 lib/bio/io/ensembl.rb delete mode 100644 test/network/bio/io/test_ensembl.rb delete mode 100644 test/unit/bio/io/test_ensembl.rb commit 61301a8ec252f3623f994edd59f597360f73448b Author: Naohisa Goto Date: Tue Sep 4 10:47:52 2012 +0900 Remove obsolete Bio::DBGET * Remove Bio::DBGET because it uses old original protocol that was discontinued about 8 years ago. * Remove lib/bio/io/dbget.rb and sample/dbget. lib/bio/io/dbget.rb | 194 --------------------------------------------------- sample/dbget | 37 ---------- 2 files changed, 231 deletions(-) delete mode 100644 lib/bio/io/dbget.rb delete mode 100755 sample/dbget commit 3c5e288a8685ba3279a3ba73f1b31056c1b6f7a8 Author: Naohisa Goto Date: Thu Aug 23 00:25:43 2012 +0900 Refresh ChangeLog, showing changes after 1.4.3 release. * Refresh to the new ChangeLog, showing changes after 1.4.3 release. For the changes before 1.4.3, see doc/ChangeLog-1.4.3. For the changes before 1.4.2, see doc/ChangeLog-before-1.4.2. For the changes before 1.3.1, see doc/ChangeLog-before-1.3.1. ChangeLog | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) create mode 100644 ChangeLog commit 63c13ad8516b9dcacbe001137666c3468968542b Author: Naohisa Goto Date: Thu Aug 23 00:25:07 2012 +0900 Rakefile: Update hardcoded git tag name for updating of ChangeLog. Rakefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit b10c7ad2db24d88726ffb8c63078baa217aeac43 Author: Naohisa Goto Date: Thu Aug 23 00:20:01 2012 +0900 renamed ChangeLog to doc/ChangeLog-1.4.3 ChangeLog | 1478 --------------------------------------------------- doc/ChangeLog-1.4.3 | 1478 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 1478 insertions(+), 1478 deletions(-) delete mode 100644 ChangeLog create mode 100644 doc/ChangeLog-1.4.3 commit 0c20cb62ba6b253098e7198c14de1829f72474f5 Author: Naohisa Goto Date: Thu Aug 23 00:18:50 2012 +0900 ChangeLog updated: add log about 1.4.3 release. ChangeLog | 9 +++++++++ 1 file changed, 9 insertions(+) commit 5e88ccbe0fefdd4d57f144aaf9073f5e7d93281c Author: Naohisa Goto Date: Thu Aug 23 00:16:25 2012 +0900 New RELEASE_NOTES.rdoc for the next release version. RELEASE_NOTES.rdoc | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) create mode 100644 RELEASE_NOTES.rdoc commit e3d40b90d88ab0d0a91d8e32ebf97c16097f0996 Author: Naohisa Goto Date: Thu Aug 23 00:12:40 2012 +0900 Renamed RELEASE_NOTES.rdoc to doc/RELEASE_NOTES-1.4.3.rdoc RELEASE_NOTES.rdoc | 204 ------------------------------------------ doc/RELEASE_NOTES-1.4.3.rdoc | 204 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 204 insertions(+), 204 deletions(-) delete mode 100644 RELEASE_NOTES.rdoc create mode 100644 doc/RELEASE_NOTES-1.4.3.rdoc commit 08bcabecccb271385d38a0f807e8c408def5a128 Author: Naohisa Goto Date: Thu Aug 23 00:00:15 2012 +0900 Bio::BIORUBY_EXTRA_VERSION set to ".5000" (unstable version). bioruby.gemspec | 2 +- lib/bio/version.rb | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) bio-2.0.3/doc/Changes-1.3.rdoc0000644000175000017500000002272214141516614015110 0ustar nileshnilesh= Incompatible and important changes since the BioRuby 1.2.1 release A lot of changes have been made to the BioRuby after the version 1.2.1 is released. == New features === Support for sequence output with improvements of Bio::Sequence The outputting of EMBL and GenBank formatted text are now supported in the Bio::Sequence class. See the document of Bio::Sequence#output for details. You can also create Bio::Sequence objects from many kinds of data such as Bio::GenBank, Bio::EMBL, and Bio::FastaFormat by using the to_biosequence method. === BioSQL support BioSQL support is completely rewritten by using ActiveRecord. === Bio::Blast Bio::Blast#reports can parse NCBI default (-m 0) format and tabular (-m 8) format, in addition to XML (-m 7) format. Bio::Blast::Report now supports XML format with multiple query sequences generated by blastall 2.2.14 or later. Bio::Blast.remote supports DDBJ, in addition to GenomeNet. In addition, a list of available blast databases on remote sites can be obtained by using Bio::Blast::Remote::DDBJ.databases and Bio::Blast::Remote::GenomeNet.databases methods. Note that the above remote blast methods may be changed in the future to support NCBI. Bio::Blast::RPSBlast::Report is newly added, a parser for NCBI RPS Blast (Reversed Position Specific Blast) default (-m 0 option) results. === Bio::GFF::GFF2 and Bio::GFF::GFF3 The outputting of GFF2/GFF3-formatted text is now supported. However, many incompatible changes have been made (See below for details). === Bio::Hinv H-Invitational Database web service (REST) client class is newly added. === Bio::NCBI::REST NCBI E-Utilities client class is newly added. === Bio::PAML::Codeml and Bio::PAML::Codeml::Report Bio::PAML::Codeml, wrapper for PAML codeml program, and Bio::PAML::Codeml::Report, parser for codeml result are newly added, though some of them are still under construction and too specific to particular use cases. === Bio::Locations New method Bio::Locations#to_s is added to support output of features. === Bio::TogoWS::REST TogoWS REST client class is newly added. Information about TogoWS REST service can be found on http://togows.dbcls.jp/site/en/rest.html. == Deprecated classes === Bio::Features Bio::Features is obsoleted and changed to an array of Bio::Feature object with some backward compatibility methods. The backward compatibility methods will soon be removed in the future. === Bio::References Bio::References is obsoleted and changed to an array of Bio::Reference object with some backward compatibility methods. The backward compatibility methods will soon be removed in the future. == Incompatible changes === Bio::BIORUBY_VERSION Definition of the constant Bio::BIORUBY_VERSION is moved from lib/bio.rb to lib/bio/version.rb. Normally, the autoload mechanism of Ruby correctly loads the version.rb, but special scripts directly using bio.rb may be needed to be changed. Bio::BIORUBY_VERSION is changed to be frozen. New constants Bio::BIORUBY_EXTRA_VERSION and Bio::BIORUBY_VERSION_ID are added. See their RDoc for details. === Bio::Sequence Bio::Sequence#date is removed. Alternatively, date_created or date_modified can be used. Bio::Sequence#taxonomy is changed to be an alias of classification, and the data type is changed to an array of string. === Bio::Locations and Bio::Location A carat in a location (e.g. "123^124") is now parsed, instead of being replaced by "..". To distinguish from normal "..", a new attribute Bio::Location#carat is used. "order(...)" or "group(...)" are also parsed, instead of being regarded as "join(...)". To distinguish from "join(...)", a new attribute Bio::Locations#operator is used. When "order(...)" or "group(...)", the attribute is set to :order or :group, respectively. Note that "group(...)" is already deprecated in EMBL/GenBank/DDBJ. === Bio::Blast Return value of Bio::Blast#exec_* is changed to String instead of Report object. Parsing the string is now processed in Bio::Blast#query method. Bio::Blast#exec_genomenet_tab and Bio::Blast#server="genomenet_tab" is deprecated. Bio::Blast#options=() can now change the following attributes: program, db, format, matrix, and filter. Bio::Blast.reports now supports default (-m 0) and tabular (-m 8) formats. Old implementation (only supports XML) is renamed to Bio::Blast.reports_xml, to keep compatibility for older BLAST XML documents which might not be parsed by the new Bio::Blast.reports nor Bio::FlatFile, although we are not sure whether such documents really exist or not. === Bio::Blast::Default::Report and Bio::Blast::WU::Report Iteration#lambda, #kappa, #entropy, #gapped_lambda, #gapped_kappa, and #gapped_entropy, and the same methods in the Report class are changed to return float or nil instead of string or nil. === Bio::Blat When reading BLAT psl (or pslx) data by using Bio::FlatFile, it checks each query name and returns a new entry object when the query name is changed from previous queries. This is, data is stored to two or more Bio::Blat::Report objects, instead of previous version's behavior (always reads all data at once and stores to a Bio::Blat::Report object). === Bio::GFF, Bio::GFF::GFF2 and Bio::GFF::GFF3 Bio::GFF::Record#comments is renamed to #comment, and #comments= is renamed to #comment=, because they only allow a single String (or nil) and the plural form "comments" may be confusable. The "comments" and "comments=" methods can still be used, but warning messages will be shown when using in GFF2::Record and GFF3::Record objects. See below about GFF2 and/or GFF3 specific changes. === Bio::GFF::GFF2 and Bio::GFF::GFF3 Bio::GFF::GFF2::Record.new and Bio::GFF::GFF3::Record.new can also get 9 arguments corresponding to GFF columns, which helps to create Record object directly without formatted text. Bio::GFF::GFF2::Record#start, #end, and #frame return Integer or nil, and #score returns Float or nil, instead of String or nil. The same changes are also made to Bio::GFF::GFF3::Record. Bio::GFF::GFF2::Record#attributes and Bio::GFF::GFF3::Record#attributes are changed to return a nested Array, containing [ tag, value ] pairs, because of supporting multiple tags in the same tag names. If you want to get a Hash, use Record#attributes_to_hash method, though some tag-value pairs in the same tag names may be lost. Note that Bio::GFF::Record#attribute still returns a Hash for compatibility. New methods for getting, setting and manipulating attributes are added to Bio::GFF::GFF2::Record and Bio::GFF::GFF3::Record classes: attribute, get_attribute, get_attributes, set_attribute, replace_attributes, add_attribute, delete_attribute, delete_attributes, sort_attributes_by_tag!. It is recommended to use these methods instead of directly manipulating the array returned by Record#attributes. Bio::GFF::GFF2#to_s, Bio::GFF::GFF3#to_s, Bio::GFF::GFF2::Record#to_s, and Bio::GFF::GFF3::Record#to_s are added to support output of GFF2/GFF3 data. === Bio::GFF::GFF2 GFF2 attribute values are now automatically unescaped. In addition, if a value of an attribute is consisted of two or more tokens delimited by spaces, an object of the new class Bio::GFF::GFF2::Record::Value is returned instead of String. The new class Bio::GFF::GFF2::Record::Value aims to store a parsed value of an attribute. If you really want to get unparsed string, Bio::GFF::GFF2::Record::Value#to_s can be used. The metadata (lines beginning with "##") are parsed to Bio::GFF::GFF2::MetaData objects and are stored to Bio::GFF::GFF2#metadata as an array, except the "##gff-version" line. The "##gff-version" version string is stored to the Bio::GFF::GFF2#gff_version as a string. === Bio::GFF::GFF3 Aliases of columns which are renamed in the GFF3 specification are added to the Bio::GFF::GFF3::Record class: seqid (column 1; alias of "seqname"), feature_type (column 3; alias of "feature"; in the GFF3 spec, it is called "type", but because "type" is already used by Ruby, we use "feature_type"), phase (column 8; formerly "frame"). Original names can still be used because they are only aliases. Sequences bundled within GFF3 after "##FASTA" are now supported (Bio::GFF::GFF3#sequences). GFF3 attribute keys and values are automatically unescaped. Each attribute value is stored as a string, except for special attributes listed below: * Bio::GFF::GFF3::Record::Target to store a "Target" attribute. * Bio::GFF::GFF3::Record::Gap to store a "Gap" attribute. The metadata (lines beginning with "##") are parsed to Bio::GFF::GFF3::MetaData objects and stored to Bio::GFF::GFF3#metadata as an array, except "##gff-version", "##sequence-region", "###", and "##FASTA" lines. * "##gff-version" version string is stored to Bio::GFF::GFF3#gff_version. * "##sequence-region" lines are parsed to Bio::GFF::GFF3::SequenceRegion objects and stored to Bio::GFF::GFF3#sequence_regions as an array. * "###" lines are parsed to Bio::GFF::GFF3::RecordBoundary objects. * "##FASTA" is regarded as the beginning of bundled sequences. === Bio::Pathway Bio::Pathway#cliquishness is changed to calculate cliquishness (clustering coefficient) for not only undirected graphs but also directed graphs. In Bio::Pathway#to_matrix, dump_matrix, dump_list, and depth_first_search methods, to avoid dependency to the order of objects in Hash#each (and each_keys etc.), Bio::Pathway#index is used to specify preferences of nodes in a graph. === Bio::SQL and BioSQL related classes BioSQL support is completely rewritten by using ActiveRecord. See documents in lib/bio/io/sql.rb, lib/bio/io/biosql, and lib/bio/db/biosql for details of changes and usage of the classes/modules. bio-2.0.3/doc/RELEASE_NOTES-1.4.1.rdoc0000644000175000017500000000603514141516614015727 0ustar nileshnilesh= BioRuby 1.4.1 RELEASE NOTES A lot of changes have been made to the BioRuby 1.4.1 after the version 1.4.0 is released. This document describes important and/or incompatible changes since the BioRuby 1.4.0 release. For known problems, see KNOWN_ISSUES.rdoc. == New features === PAML Codeml support is significantly improved PAML Codeml result parser is completely rewritten and is significantly improved. The code is developed by Pjotr Prins. === KEGG PATHWAY and KEGG MODULE parser Parsers for KEGG PATHWAY and KEGG MODULE data are added. The code is developed by Kozo Nishida and Toshiaki Katayama. === Bio::KEGG improvements Following new methods are added. * Bio::KEGG::GENES#keggclass, keggclasses, names_as_array, names, motifs_as_strings, motifs_as_hash, motifs * Bio::KEGG::GENOME#original_databases === Test codes are added and improved. Test codes are added and improved. Tney are developed by Kazuhiro Hayashi, Kozo Nishida, John Prince, and Naohisa Goto. === Other new methods * Bio::Fastq#mask * Bio::Sequence#output_fasta * Bio::ClustalW::Report#get_sequence * Bio::Reference#== * Bio::Location#== * Bio::Locations#== * Bio::FastaNumericFormat#to_biosequence == Bug fixes === Bio::Tree Following methods did not work correctly. * Bio::Tree#collect_edge! * Bio::Tree#remove_edge_if === Bio::KEGG::GENES and Bio::KEGG::GENOME * Fixed bugs in Bio::KEGG::GENES#pathway. * Fixed parser errors due to the format changes of KEGG GENES and KEGG GENOME. === Other bug fixes * In Bio::Command, changed not to call fork(2) on platforms that do not support it. * Bio::MEDLINE#initialize should handle continuation of lines. * Typo and a missing field in Bio::GO::GeneAssociation#to_str. * Bug fix of Bio::FastaNumericFormat#to_biosequence. * Fixed UniProt GN parsing issue in Bio::SPTR. == Incompatible changes === Bio::PAML::Codeml::Report The code is completely rewritten. See the RDoc for details. === Bio::KEGG::ORTHOLOGY Bio::KEGG::ORTHOLOGY#pathways is changed to return a hash. The old pathway method is renamed to pathways_in_keggclass for compatibility. === Bio::AAindex2 Bio::AAindex2 now copies each symmetric element for lower triangular matrix to the upper right part, because the Matrix class in Ruby 1.9.2 no longer accepts any dimension mismatches. We think the previous behavior is a bug. === Bio::MEDLINE Bio::MEDLINE#reference no longer puts empty values in the returned Bio::Reference object. We think the previous behavior is a bug. We also think the effect is very small. == Known issues The following issues are added or updated. See KNOWN_ISSUES.rdoc for other already known issues. === String escaping of command-line arguments in Ruby 1.9.X on Windows After BioRuby 1.4.1, in Ruby 1.9.X running on Windows, escaping of command-line arguments are processed by the Ruby interpreter. Before BioRuby 1.4.0, the escaping is executed in Bio::Command#escape_shell_windows, and the behavior is different from the Ruby interpreter's one. Curreltly, due to the change, test/functional/bio/test_command.rb may fail on Windows with Ruby 1.9.X. bio-2.0.3/ChangeLog0000644000175000017500000014713414141516614013402 0ustar nileshnileshcommit c16e230d15cf30478a3739563b4e4745dc57ef82 Author: Naohisa Goto Date: Fri Nov 5 23:44:45 2021 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit 1a8c44dc5b8d24342550273a594e5f75e41f41df Author: Naohisa Goto Date: Fri Nov 5 23:41:29 2021 +0900 prepare for BioRuby 2.0.3 release lib/bio/version.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 863242cdb9a748ba9efb3be3b39f92c17ed3ea84 Author: Naohisa Goto Date: Fri Nov 5 23:36:27 2021 +0900 update release notes for upcoming BioRuby 2.0.3 RELEASE_NOTES.rdoc | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) commit b97794d632c73dc45639cec000fb11238740eb30 Author: Naohisa Goto Date: Fri Nov 5 23:26:20 2021 +0900 Ruby 3.0 compatibility fix: Bio::Sequence::*#partition, #rpartiton * Behaviors of Bio::Sequence::*#partition and #rpartition in Ruby 3.0 are changed to mimic those in Ruby 2.x. lib/bio/sequence/common.rb | 17 +++++++ test/unit/bio/sequence/test_ruby3.rb | 94 ++++++++++++++++++++++++++++++++++++ 2 files changed, 111 insertions(+) commit 7ff79cd449ee03e0063120fb9abec50d9b08e979 Author: Naohisa Goto Date: Fri Nov 5 22:24:16 2021 +0900 remove resolved Ruby 3.0 issue KNOWN_ISSUES.rdoc | 8 -------- 1 file changed, 8 deletions(-) commit 24bb3c3cf417837ad63a27913db7237aef1414c6 Author: Naohisa Goto Date: Fri Nov 5 21:24:26 2021 +0900 Bio::Sequence::NA,AA,Generic: workaround for Ruby 3.0.0 incompatible change * Since Ruby 3.0.0, over 30 methods in subclass of String class are changed to return/yield String instance instead of the subclass instance. (https://github.com/ruby/ruby/blob/v3_0_0/NEWS.md ) * In this commit, workaround is made for the following methods: * * * capitalize * center * chomp * chop * delete * delete_prefix * delete_suffix * downcase * each_char * each_grapheme_cluster * each_line * gsub * gsub! * ljust * lstrip * revserse * rjust * rstrip * slice! * slice / [] * split * squeeze * strip * sub * sub! * succ / next * swapcase * tr * tr_s * upcase * Note: sub! and gsub! are not described in the NEWS.md but are also affected by this Ruby 3.0.0 incompatible changes. * The following methods are not patched i.e. they return/yield String instances. * dump * partition * rpartition * scrub * Note: In Ruby 2.7 or earlier, Bio::Sequence::NA#partition and #rpartition methods return an array that may contain mixture of Bio::Sequence::NA instances and String instances. * test/unit/bio/sequence/test_ruby3.rb: unit tests for the above methods. * Close https://github.com/bioruby/bioruby/issues/137 lib/bio/sequence/common.rb | 95 +++++++++ test/unit/bio/sequence/test_ruby3.rb | 368 +++++++++++++++++++++++++++++++++++ 2 files changed, 463 insertions(+) create mode 100644 test/unit/bio/sequence/test_ruby3.rb commit 0efc0a54685d43645daa9c53d7140bfb81577777 Author: kojix2 <2xijok@gmail.com> Date: Tue Aug 31 19:39:25 2021 +0900 Fix typos: Retrun -> Return lib/bio/appl/iprscan/report.rb | 6 +++--- lib/bio/appl/sosui/report.rb | 2 +- lib/bio/db/embl/uniprotkb.rb | 2 +- lib/bio/db/go.rb | 4 ++-- lib/bio/tree.rb | 2 +- 5 files changed, 8 insertions(+), 8 deletions(-) commit a291b5a72da12a5cc8b006d1dd63d002fda5dff3 Author: Naohisa Goto Date: Thu Dec 31 23:52:54 2020 +0900 BioRuby 2.0.2 is released ChangeLog | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 98 insertions(+) commit fd420b713ca364e677ef3551919ec907791df86d Author: Naohisa Goto Date: Thu Dec 31 23:51:08 2020 +0900 RELEASE_NOTES.rdoc: change some description RELEASE_NOTES.rdoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 0ed9b37f38fa1b00dcc1d422914e4cbdbbc5f6ab Author: Naohisa Goto Date: Thu Dec 31 23:46:46 2020 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit fa84b8fac653c26b9f8db429321ef49202554a69 Author: Naohisa Goto Date: Thu Dec 31 23:45:38 2020 +0900 prepare for BioRuby 2.0.2 release lib/bio/version.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 32c5efccbad4e0d3440726383f863520ee242cc5 Author: Naohisa Goto Date: Thu Dec 31 23:39:18 2020 +0900 update release notes for upcoming BioRuby 2.0.2 RELEASE_NOTES.rdoc | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) commit c2f6e62ab64bc532f442bccc0d76ced5380664ec Author: Naohisa Goto Date: Thu Dec 31 23:38:43 2020 +0900 add a known issue about Ruby 3.0 KNOWN_ISSUES.rdoc | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) commit a733c0816da7c97ca0c23016b473b004fb755f54 Author: Naohisa Goto Date: Thu Dec 31 23:04:55 2020 +0900 remove deprecation warning of Gem::Specification#has_rdoc= Gem::Specification#has_rdoc= have been deprecated since RubyGems 1.3.3 in 2009. (https://blog.rubygems.org/2009/05/04/1.3.3-released.html ) RDoc is always generated regardless of the value, and the line is safely removed. This fixes https://github.com/bioruby/bioruby/issues/138 . Thanks to @jaysonvirissimo for reporting the issue. bioruby.gemspec | 1 - bioruby.gemspec.erb | 1 - 2 files changed, 2 deletions(-) commit bed6746ce62059795996eeb6e5ac65655bab12b5 Author: Naohisa Goto Date: Thu Dec 31 22:51:56 2020 +0900 require ruby's date library to avoid NameError for Date In Bio::Sequence#output(:embl), NameError (uninitialized constant Bio::Sequence::Format::INSDFeatureHelper::Date) is observed. The error message is misleading because Date is provided by Ruby's standard date library. This fixes https://github.com/bioruby/bioruby/issues/135 . Thanks to Dr. Mark Wilkinson for reporting the issue. lib/bio/sequence/format.rb | 1 + 1 file changed, 1 insertion(+) commit 5f3aa79fdaf6dd5551d51663ca2e9b6f5e56d855 Author: Naohisa Goto Date: Fri Nov 6 17:45:12 2020 +0900 fix mistaken URLs README.rdoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit d5e1670ee4863cc60d3aa08432a7ee3b1e445439 Author: Naohisa Goto Date: Fri Sep 6 15:48:45 2019 +0900 BioRuby 2.0.1 is released ChangeLog | 185 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 185 insertions(+) commit 21bf51a1ec8c18c9cdf8528ffc3c59c503cef042 Author: Naohisa Goto Date: Fri Sep 6 15:45:47 2019 +0900 RELEASE_NOTES.rdoc: describe notable changes since 2.0.0 RELEASE_NOTES.rdoc | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) commit 9092a629e0e28b416ee7288d349fb9d73dd2b961 Author: Naohisa Goto Date: Fri Sep 6 15:06:11 2019 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) commit 59e24b6e55d2c9a8887e8e01a91999d33a008042 Author: Naohisa Goto Date: Fri Sep 6 15:04:33 2019 +0900 prepare for BioRuby 2.0.1 release lib/bio/version.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 9635a38a158db434fd2b6aff7a2ee75622ddecef Author: Naohisa Goto Date: Fri Sep 6 14:51:23 2019 +0900 sample/fastq2html.rb: A html visualization of FASTQ sequences * sample/fastq2html: A html visualization of FASTQ sequences. Each sequence is colored with the quality score. * sample/fastq2html.cwl: CWL workflow for the above sample script * sample/fastq2html.testdata.yaml: Test data for the above workflow sample/fastq2html.cwl | 23 ++++++++++ sample/fastq2html.rb | 94 +++++++++++++++++++++++++++++++++++++++++ sample/fastq2html.testdata.yaml | 5 +++ 3 files changed, 122 insertions(+) create mode 100644 sample/fastq2html.cwl create mode 100644 sample/fastq2html.rb create mode 100644 sample/fastq2html.testdata.yaml commit 6bbcf8b66310c225d686f2c59359680a0bc0b4b6 Author: Naohisa Goto Date: Fri Sep 6 14:42:19 2019 +0900 sample/rev_comp.rb: Generates reverse-complement sequences * sample/rev_comp.rb: Generates reverse-complement sequences of the given nucleotide sequences. * sample/rev_comp.cwl: CWL cowkflow for the sample script * sample/rev_comp.testdata.yaml: Test data for the above CWL workflow sample/rev_comp.cwl | 23 +++++++++++++++++++++++ sample/rev_comp.rb | 20 ++++++++++++++++++++ sample/rev_comp.testdata.yaml | 7 +++++++ 3 files changed, 50 insertions(+) create mode 100644 sample/rev_comp.cwl create mode 100644 sample/rev_comp.rb create mode 100644 sample/rev_comp.testdata.yaml commit ff0e6c3c6b6f1b56d81b5a4b579a6d0984bfc607 Author: Naohisa Goto Date: Fri Sep 6 14:40:45 2019 +0900 sample/color_scheme_(na|aa).rb: use String#each_char instead of each_byte sample/color_scheme_aa.rb | 6 +++--- sample/color_scheme_na.rb | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) commit 6f7c1be09aa3d6cdb76fd029fc0f84efda31c907 Author: Naohisa Goto Date: Thu Sep 5 17:32:07 2019 +0900 sample/color_scheme_aa.rb: new sample based on color_scheme_na.rb sample/color_scheme_aa.rb | 82 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+) create mode 100644 sample/color_scheme_aa.rb commit 51864c3857178f58133b759f7608b8d6d8991c44 Author: Naohisa Goto Date: Thu Sep 5 17:13:20 2019 +0900 sample/color_scheme_na.rb: use const_get instead of eval sample/color_scheme_na.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit ba0b554971a9a387a54fc04c5002853d91357347 Author: Naohisa Goto Date: Thu Sep 5 17:02:20 2019 +0900 sample/na2aa.cwl: inputBinding position -1 for the script * sample/na2aa.cwl: inputBinding position -1 is used for the script to emphasize that the argument is the first one. sample/na2aa.cwl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 26a27ec261e2251f3ff3a85007147d33682778d0 Author: Naohisa Goto Date: Thu Sep 5 16:56:08 2019 +0900 sample/color_scheme_na.rb: Supports more file formats * sample/color_scheme_na.rb: Supports more file formats other than fasta format, by using Bio::Flatfile. sample/color_scheme_na.rb | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) commit 5c053a606382bb578a2b6884ee639805154433e5 Author: Naohisa Goto Date: Thu Sep 5 12:32:00 2019 +0900 sample/na2aa.cwl: use inputBinding sample/na2aa.cwl | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) commit 6a3c3e02f08549d47dda00dca92d55bbadfc468f Author: Naohisa Goto Date: Wed Sep 4 22:59:21 2019 +0900 Sample CWL workflow to run sample/na2aa.rb * na2aa.cwl: A sample CWL workflow to run na2aa.rb in sample/ dir * na2aa.testdata.yaml: Test data for the workflow sample/na2aa.cwl | 20 ++++++++++++++++++++ sample/na2aa.testdata.yaml | 7 +++++++ 2 files changed, 27 insertions(+) create mode 100644 sample/na2aa.cwl create mode 100644 sample/na2aa.testdata.yaml commit 960b885036f549863e3cfe9c693c90f9bef27d3d Author: Naohisa Goto Date: Wed Sep 4 21:00:50 2019 +0900 LEGAL: na2aa.rb is now Ruby's License LEGAL | 1 - 1 file changed, 1 deletion(-) commit 7af9e81988939007eb36dab6b102a7422e8196d8 Author: Naohisa Goto Date: Wed Sep 4 14:34:05 2019 +0900 sample/na2aa.rb: Completely rewritten * sample/na2aa.rb: Completely rewritten. License is changed because old code is completely wiped out. Note that the old code always raises error due to a bug in the code. * The old code was trying to replace 'X' (any) to '-' (gap) but the new code does not modify translated sequences anymore. sample/na2aa.rb | 36 +++++++++++------------------------- 1 file changed, 11 insertions(+), 25 deletions(-) commit cf8cac5e32db42b6683c1a837adc9e1c04994062 Author: Naohisa Goto Date: Mon Sep 2 17:11:08 2019 +0900 Bug fix: Bio::GFF::GFF2::Record.parse did not return correct object lib/bio/db/gff.rb | 4 +++- test/unit/bio/db/test_gff.rb | 5 +++++ 2 files changed, 8 insertions(+), 1 deletion(-) commit 80b387e7e2bb8570d9204e389b6c5d90c6ea31de Author: Naohisa Goto Date: Fri Jun 14 14:33:19 2019 +0900 BioRuby 2.0.0 is released ChangeLog | 1051 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 1043 insertions(+), 8 deletions(-) commit 2e4046517fd8ee1c105ef53131e69f787d790099 Author: Naohisa Goto Date: Fri Jun 14 14:23:19 2019 +0900 Add "Recommended Plugins" section and description is moved to it README.rdoc | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) commit 7a533e4f57edcebb5dfe15fdddc9fbc986d2b7ec Author: Naohisa Goto Date: Fri Jun 14 14:17:08 2019 +0900 fix directory name README.rdoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 805266c9c900903156efd0baa8c1e6ee524a8147 Author: Naohisa Goto Date: Fri Jun 14 14:14:52 2019 +0900 add description about recommended plugins README.rdoc | 11 +++++++++++ 1 file changed, 11 insertions(+) commit 02b7d8b9bc5dcd56f501a15e5e820f450153aa1c Author: Naohisa Goto Date: Fri Jun 14 13:33:30 2019 +0900 prepare to release BioRuby 2.0.0 lib/bio/version.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit f1fed8dacb425d19c12abec5d4faeb733827f80f Author: Naohisa Goto Date: Fri Jun 14 13:31:08 2019 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit 3952ec9d5ce1e3ceea9734f667d36595808c4989 Author: Naohisa Goto Date: Fri Jun 14 13:28:19 2019 +0900 Remove xmlparser dependency from Gemfile and gemfiles/Gemfile.* Gemfile | 2 -- gemfiles/Gemfile.travis-rbx | 2 -- gemfiles/Gemfile.travis-ruby1.8 | 2 -- gemfiles/Gemfile.travis-ruby1.9 | 2 -- gemfiles/Gemfile.windows | 2 -- 5 files changed, 10 deletions(-) commit d4a8ee7ae3d3b13a8be4c57c1f8db5b29f2c4a13 Author: Naohisa Goto Date: Fri Jun 14 12:34:45 2019 +0900 RELEASE_NOTES.rdoc: update aboue new features and improvements RELEASE_NOTES.rdoc | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) commit 2d4170a2a0262f5d75cef5a54b5d6f3da298f145 Author: Naohisa Goto Date: Fri Jun 14 12:24:12 2019 +0900 Tests added in the previous commit is moved and modified * test/network/bio/db/kegg/test_genes_hsa7422.rb: tests added in the previous commit is moved to the file and modified to get data from the internet for avoiding KEGG data license issue. Note that some of the tests might be fail in the near future due to the database entry updates. test/network/bio/db/kegg/test_genes_hsa7422.rb | 91 ++++++++++++++++++++++++++ test/unit/bio/db/kegg/test_genes.rb | 51 --------------- 2 files changed, 91 insertions(+), 51 deletions(-) create mode 100644 test/network/bio/db/kegg/test_genes_hsa7422.rb commit 67f8105acf22e88a7624305743ad13802ffed124 Author: kojix2 <2xijok@gmail.com> Date: Mon Oct 22 00:46:31 2018 +0900 add DiseasesAsHash to KEGG/Common lib/bio/db/kegg/common.rb | 14 ++++++++++ lib/bio/db/kegg/genes.rb | 26 +++++++++++++++++++ lib/bio/db/kegg/pathway.rb | 16 ++++-------- test/unit/bio/db/kegg/test_genes.rb | 51 +++++++++++++++++++++++++++++++++++++ 4 files changed, 96 insertions(+), 11 deletions(-) commit 9dbb655e1c3ec7460b77f1d0ea475531ac3a9361 Author: Naohisa Goto Date: Fri Jun 14 11:37:11 2019 +0900 update documents for upcoming new release KNOWN_ISSUES.rdoc | 14 +++----- LEGAL | 9 ------ README.rdoc | 92 +++++++---------------------------------------------- README_DEV.rdoc | 10 +++--- RELEASE_NOTES.rdoc | 93 +++++++++++++++++++++++++++++++++++++++++++----------- 5 files changed, 96 insertions(+), 122 deletions(-) commit 6f388019a035a41a8867c6a03ef7e2707d1edce4 Author: Naohisa Goto Date: Fri Jun 14 11:32:40 2019 +0900 .travis.yml: move 1.8.7 and 1.9.3 to allow_failures; update ruby versions .travis.yml | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) commit f2cbe9db9b78df653d774a7676e00f6f1a212b23 Author: Naohisa Goto Date: Fri Jun 14 11:18:27 2019 +0900 .travis.yml: Remove jobs using "tar-integration-test" .travis.yml | 9 --------- 1 file changed, 9 deletions(-) commit 68f28e81e3fa566843b548f1899549adcad5225a Author: Naohisa Goto Date: Fri Jun 14 11:10:18 2019 +0900 remove "rake tar-install" and "rake tar-integration-test" tasks * Rakefile: Remove "tar-install" and "tar-integration-test" tasks because they use setup.rb that is removed from the repository. Rakefile | 34 ---------------------------------- 1 file changed, 34 deletions(-) commit 0cbdb4586f2231a68579105dbc7f0fb413b38a96 Author: Naohisa Goto Date: Fri Jun 14 10:48:15 2019 +0900 next bioruby version will be 2.0.0 lib/bio/version.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 300d10b9791b7f0c0eff1d0544cae63fecc3b31a Author: Naohisa Goto Date: Fri Jun 14 10:41:12 2019 +0900 Remove setup.rb. Use RubyGems to install BioRuby. setup.rb | 1600 -------------------------------------------------------------- 1 file changed, 1600 deletions(-) delete mode 100644 setup.rb commit a74683d9acfc16d0d715b020839839afc8b43350 Author: Naohisa Goto Date: Fri Jun 14 02:28:31 2019 +0900 try to require "bio-blast-xmlparser" provided by separete gem lib/bio/appl/blast/report.rb | 8 ++++++++ 1 file changed, 8 insertions(+) commit de1c1e33aed392d4e2265a028b8acb50501f56bd Author: Naohisa Goto Date: Sat Sep 16 04:49:21 2017 +0900 check existance of a private method instead of XMLParser constant test/unit/bio/appl/blast/test_report.rb | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) commit 3f54d19c44411e845b32c522fc0deca4288dcf07 Author: Naohisa Goto Date: Sat Sep 16 04:39:19 2017 +0900 xml_set_parameter is moved from xmlparser.rb etc. * The method xml_set_parameter is moved from lib/bio/appl/blast/xmlparser.rb because it is used by the REXML parser. * The method Bio::Blast::Report.xmlparser is move to lib/bio/appl/blast/xmlparser.rb in the separate repo. * Use "defined? xmlparser_parse" for checking existance of the blast xmlparser component. * Removed line to require bio/appl/blast/xmlparser. lib/bio/appl/blast/report.rb | 40 ++++++++++++++++++++++++++++++++-------- 1 file changed, 32 insertions(+), 8 deletions(-) commit b19cd507c432739c5aaac700e222e6e4ecc63ddc Author: Naohisa Goto Date: Sat Sep 16 03:32:54 2017 +0900 lib/bio/appl/blast/xmlparser.rb is removed and moved to separate gem * lib/bio/appl/blast/xmlparser.rb is removed and moved to separate gem to eliminate dependency to xmlparser that includes native extension. lib/bio/appl/blast/xmlparser.rb | 236 ---------------------------------------- 1 file changed, 236 deletions(-) delete mode 100644 lib/bio/appl/blast/xmlparser.rb commit 525d3450ad3440bfbbe3a1540fe60d83c3845ec7 Author: Naohisa Goto Date: Sat Dec 15 11:33:08 2018 +0900 .travis.yml: remove jruby-18mode and jruby-19mode; add jruby and truffleruby .travis.yml | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) commit 5bc0042b7fc39c62222534e0e4129d3f9794fd8c Author: Naohisa Goto Date: Fri Dec 14 22:46:08 2018 +0900 appveyor.yml: regenerate bioruby.gemspec before creating gem appveyor.yml | 1 + 1 file changed, 1 insertion(+) commit 5582dc1db60ffc812211d9803d5adce9c0dd70d3 Author: Naohisa Goto Date: Fri Dec 14 22:39:35 2018 +0900 appveyor.yml: modify gemfile/Gemfile.windows after bundle install appveyor.yml | 1 + 1 file changed, 1 insertion(+) commit 09031bcae0a42fe93d07b46eb489ffbabc8c1319 Author: Naohisa Goto Date: Fri Dec 14 22:30:09 2018 +0900 appveyor.yml: give up using vendor/bundle; set BUNDLE_GEMFILE appveyor.yml | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) commit 44fb6c67aef1b0311d32ac806fc9a62f09d401d5 Author: Naohisa Goto Date: Fri Dec 14 21:59:44 2018 +0900 appveyor.yml: Specify gemfiles/Gemfile.windows * appveyor.yml: Specify gemfiles/Gemfile.windows in which xmlparser gem is excluded because of build failure of the xmlparser gem on Windows. * gemfiles/Gemfile.windows: Gemfile for Appveyor, running on Microsoft Windows. appveyor.yml | 2 +- gemfiles/Gemfile.windows | 8 ++++++++ 2 files changed, 9 insertions(+), 1 deletion(-) create mode 100644 gemfiles/Gemfile.windows commit fe55e52b42660dda1d21749bf714e989e7db754e Author: Naohisa Goto Date: Fri Dec 14 21:48:06 2018 +0900 appveyor.yml: update ruby versions and test procedure appveyor.yml | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) commit 739f5c9a512074a7de25d87e8104ed15bdb28b5d Author: Naohisa Goto Date: Fri Dec 14 11:57:43 2018 +0900 .travis.yml: change default Gemfile * Change default Gemfile to Gemfile * Move old Ruby versions to "include" matrix. * Change ruby version for gem-integration-test and tar-integration-test .travis.yml | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) commit 2f54a9cbf8fb6d8580d488b20007d5ce4562e5e9 Author: Naohisa Goto Date: Fri Dec 14 11:42:58 2018 +0900 .travis.yml: No more limit to master branch. Instead, add blocklist. .travis.yml | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) commit 9ac3e44318c67fd4415a2118dd5631902e784e12 Author: Kozo Nishida Date: Thu Dec 13 22:47:54 2018 +0900 ci(travis): Add rvm versions .travis.yml | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) commit 258dd67c9d65f1247e56d5c5228cc6f9c019d133 Author: Naohisa Goto Date: Mon Dec 10 21:56:16 2018 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 3920483d6b5a3759e6c610d7ee9fb1a63dcc9ce4 Author: Naohisa Goto Date: Mon Dec 10 21:55:19 2018 +0900 Simplify version number processing bioruby.gemspec.erb | 14 +++++++------- lib/bio/version.rb | 12 +++++------- 2 files changed, 12 insertions(+), 14 deletions(-) commit 80949a10ea5e4f88d21d893905b720925f5a9e7b Author: Naohisa Goto Date: Mon Dec 10 18:54:00 2018 +0900 next bioruby version will be 1.6.0 lib/bio/version.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 2b542865a4d4af2684ace41f79e273ebceb51807 Merge: 02a96424 d71e07a0 Author: Toshiaki Katayama Date: Fri Oct 19 06:45:49 2018 +0900 Merge pull request #125 from kojix2/master update TogoWS documentation. genbank -> ncbi-nucleotide commit d71e07a0cb1cc441241be91273bd44e3717b8773 Author: kojix2 <2xijok@gmail.com> Date: Thu Oct 18 19:10:29 2018 +0900 update TogoWS documentation. genbank -> ncbi-nucleotide lib/bio/io/togows.rb | 10 +++++----- sample/test_restriction_enzyme_long.rb | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-) commit 02a964241b79e2307d0a00473427ea6bc2ea6932 Author: Naohisa Goto Date: Thu Sep 20 07:06:08 2018 +0900 Improvement documentation * Improve documentation. * Close https://github.com/bioruby/bioruby/pull/120 . lib/bio/db/aaindex.rb | 79 +++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 74 insertions(+), 5 deletions(-) commit 6bfef40ae87099565371abf94cf2cc8bfac76b12 Author: Naohisa Goto Date: Thu Sep 20 05:01:12 2018 +0900 Bug fix: Bio::Command.new_https should support proxy lib/bio/command.rb | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) commit 4e3251d2172f58239f103e7edf8f4c351140f378 Author: Naohisa Goto Date: Thu Sep 20 04:58:56 2018 +0900 https support for Bio::Blast::Remote::GenomeNet::Information lib/bio/appl/blast/genomenet.rb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit 6dd1f9fb8c2b4ba95086eab7bffc01583feccf3a Author: ramadis Date: Sat Jul 7 15:12:33 2018 -0300 Add https requests in command. Fix genomenet query by allowing https requests. lib/bio/appl/blast/genomenet.rb | 2 +- lib/bio/command.rb | 14 ++++++++++++++ 2 files changed, 15 insertions(+), 1 deletion(-) commit 4b6f87c9fd2dc62418ddfc4b57bcc4b73287a603 Author: Tomoaki NISHIYAMA Date: Sat Mar 31 13:08:07 2018 +0900 directly refer to the given hash lib/bio/data/codontable.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 25636ffa08c6ea9a9e4d1b451a456bc1f482ad40 Author: Tomoaki NISHIYAMA Date: Sat Jun 2 15:10:12 2018 +0900 precalculated ambiguity codontable lib/bio/data/codontable.rb | 55 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 52 insertions(+), 3 deletions(-) commit b2d924045202ec3aa4e1b79341fd939a881d4c2e Author: Tomoaki NISHIYAMA Date: Sat Mar 31 11:55:57 2018 +0900 construct ambiguity nucleotide to amino acid table lib/bio/data/codontable.rb | 49 ++++++++++++++++++++++++++++++++++- test/unit/bio/data/test_codontable.rb | 3 +++ 2 files changed, 51 insertions(+), 1 deletion(-) commit a7378b6b269ea1c0391e259dd8e4868f03b064ea Author: markwilkinson Date: Tue Dec 12 14:13:51 2017 +0100 fixing Fasta Report parser for fasta36 -m10 lib/bio/appl/fasta/format10.rb | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit c89c40c29c3c92f8e548c79d2d04698123559007 Author: Naohisa Goto Date: Fri Sep 15 16:33:19 2017 +0900 Remove settings about executables * Definitions and settings about executables are removed because all files in bin/ have been moved to separate gem packages (bio-shell and bio-executables). bioruby.gemspec | 13 ------------- bioruby.gemspec.erb | 21 +-------------------- 2 files changed, 1 insertion(+), 33 deletions(-) commit b5a8d385da8f2c1b6e1caf77295e590f55595944 Author: Naohisa Goto Date: Fri Sep 15 16:20:03 2017 +0900 bin/br_*.rb is moved to bio-executables gem * The following executable files are moved to "bio-executables" gem. * bin/br_biofetch.rb * bin/br_bioflat.rb * bin/br_biogetseq.rb * bin/br_pmfetch.rb bin/br_biofetch.rb | 71 --------- bin/br_bioflat.rb | 293 ------------------------------------ bin/br_biogetseq.rb | 45 ------ bin/br_pmfetch.rb | 422 ---------------------------------------------------- 4 files changed, 831 deletions(-) delete mode 100755 bin/br_biofetch.rb delete mode 100755 bin/br_bioflat.rb delete mode 100755 bin/br_biogetseq.rb delete mode 100755 bin/br_pmfetch.rb commit eb61d89a366437570a0590a629cb75718866b236 Author: Naohisa Goto Date: Fri Sep 15 09:31:14 2017 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 44 +------------------------------------------- 1 file changed, 1 insertion(+), 43 deletions(-) commit 6d40721d039fdb6b77af656f32ccabeabc427409 Author: Naohisa Goto Date: Fri Sep 15 09:29:33 2017 +0900 Remove BioRuby Shell files that are released as independent gem package bin/bioruby | 47 -- lib/bio/shell.rb | 44 -- lib/bio/shell/core.rb | 578 --------------------- lib/bio/shell/demo.rb | 146 ------ lib/bio/shell/interface.rb | 217 -------- lib/bio/shell/irb.rb | 94 ---- lib/bio/shell/object.rb | 71 --- lib/bio/shell/plugin/blast.rb | 42 -- lib/bio/shell/plugin/codon.rb | 218 -------- lib/bio/shell/plugin/das.rb | 58 --- lib/bio/shell/plugin/emboss.rb | 23 - lib/bio/shell/plugin/entry.rb | 137 ----- lib/bio/shell/plugin/flatfile.rb | 101 ---- lib/bio/shell/plugin/midi.rb | 430 --------------- lib/bio/shell/plugin/ncbirest.rb | 68 --- lib/bio/shell/plugin/obda.rb | 45 -- lib/bio/shell/plugin/psort.rb | 56 -- lib/bio/shell/plugin/seq.rb | 248 --------- lib/bio/shell/plugin/togows.rb | 40 -- .../generators/bioruby/bioruby_generator.rb | 29 -- .../generators/bioruby/templates/_classes.rhtml | 4 - .../generators/bioruby/templates/_log.rhtml | 27 - .../generators/bioruby/templates/_methods.rhtml | 11 - .../generators/bioruby/templates/_modules.rhtml | 4 - .../generators/bioruby/templates/_variables.rhtml | 7 - .../generators/bioruby/templates/bioruby-bg.gif | Bin 1431 -> 0 bytes .../generators/bioruby/templates/bioruby-gem.png | Bin 6951 -> 0 bytes .../generators/bioruby/templates/bioruby-link.gif | Bin 2758 -> 0 bytes .../generators/bioruby/templates/bioruby.css | 368 ------------- .../generators/bioruby/templates/bioruby.rhtml | 47 -- .../bioruby/templates/bioruby_controller.rb | 144 ----- .../generators/bioruby/templates/bioruby_helper.rb | 47 -- .../generators/bioruby/templates/commands.rhtml | 8 - .../generators/bioruby/templates/history.rhtml | 10 - .../generators/bioruby/templates/index.rhtml | 26 - .../generators/bioruby/templates/spinner.gif | Bin 1542 -> 0 bytes lib/bio/shell/script.rb | 25 - lib/bio/shell/setup.rb | 108 ---- lib/bio/shell/web.rb | 102 ---- test/unit/bio/shell/plugin/test_seq.rb | 187 ------- test/unit/bio/test_shell.rb | 20 - 41 files changed, 3837 deletions(-) delete mode 100755 bin/bioruby delete mode 100644 lib/bio/shell.rb delete mode 100644 lib/bio/shell/core.rb delete mode 100644 lib/bio/shell/demo.rb delete mode 100644 lib/bio/shell/interface.rb delete mode 100644 lib/bio/shell/irb.rb delete mode 100644 lib/bio/shell/object.rb delete mode 100644 lib/bio/shell/plugin/blast.rb delete mode 100644 lib/bio/shell/plugin/codon.rb delete mode 100644 lib/bio/shell/plugin/das.rb delete mode 100644 lib/bio/shell/plugin/emboss.rb delete mode 100644 lib/bio/shell/plugin/entry.rb delete mode 100644 lib/bio/shell/plugin/flatfile.rb delete mode 100644 lib/bio/shell/plugin/midi.rb delete mode 100644 lib/bio/shell/plugin/ncbirest.rb delete mode 100644 lib/bio/shell/plugin/obda.rb delete mode 100644 lib/bio/shell/plugin/psort.rb delete mode 100644 lib/bio/shell/plugin/seq.rb delete mode 100644 lib/bio/shell/plugin/togows.rb delete mode 100644 lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/bioruby_generator.rb delete mode 100644 lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/_classes.rhtml delete mode 100644 lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/_log.rhtml delete mode 100644 lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/_methods.rhtml delete mode 100644 lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/_modules.rhtml delete mode 100644 lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/_variables.rhtml delete mode 100644 lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/bioruby-bg.gif delete mode 100644 lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/bioruby-gem.png delete mode 100644 lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/bioruby-link.gif delete mode 100644 lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/bioruby.css delete mode 100644 lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/bioruby.rhtml delete mode 100644 lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/bioruby_controller.rb delete mode 100644 lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/bioruby_helper.rb delete mode 100644 lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/commands.rhtml delete mode 100644 lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/history.rhtml delete mode 100644 lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/index.rhtml delete mode 100644 lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/spinner.gif delete mode 100644 lib/bio/shell/script.rb delete mode 100644 lib/bio/shell/setup.rb delete mode 100644 lib/bio/shell/web.rb delete mode 100644 test/unit/bio/shell/plugin/test_seq.rb delete mode 100644 test/unit/bio/test_shell.rb commit ab9feb6f1f495a2b3ca350005c6162c51178aecb Author: Naohisa Goto Date: Wed Sep 13 22:13:59 2017 +0900 Suppress warning "assigned but unused variable" lib/bio/io/flatfile/autodetection.rb | 5 +++++ 1 file changed, 5 insertions(+) commit cf486e327c253482f54e59b2e18f73db27641135 Author: Naohisa Goto Date: Wed Sep 13 22:10:53 2017 +0900 Suppress warning: "instance variable @top_strand not initialized" * Suppress warning: "instance variable @top_strand not initialized". To do so, force to raise NoMethodError when @top_strand is not initialized or is nil. This should be changed to appropriate exception in the future. lib/bio/util/sirna.rb | 2 ++ 1 file changed, 2 insertions(+) commit 88477698f0e1b5a74f9682f26e97c5f90f6912b4 Author: Naohisa Goto Date: Wed Sep 13 21:31:38 2017 +0900 Suppress warning in Ruby 2.4: "constant ::Fixnum is deprecated" lib/bio/db/soft.rb | 4 ++-- .../util/restriction_enzyme/range/sequence_range/calculated_cuts.rb | 2 +- test/unit/bio/test_alignment.rb | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) commit f8cff14179cfeea0d685f4df756db71ceb6d5fab Author: Naohisa Goto Date: Wed Sep 13 21:19:12 2017 +0900 Suppress warning "parentheses after method name is interpreted as an argument list, not a decomposed argument" in Ruby 2.4 lib/bio/map.rb | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) commit ddb25c2bf3872c6306a91e407d95caa2e136cee9 Author: Jun Aruga Date: Fri Nov 18 11:14:38 2016 +0100 Gemfile for local development. .travis.yml | 8 ++++---- gemfiles/Gemfile.travis-ruby2.2 => Gemfile | 0 bioruby.gemspec | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) rename gemfiles/Gemfile.travis-ruby2.2 => Gemfile (100%) commit 16faf6473b74eb172716b713ab757cb2ab2bcacc Author: Jun Aruga Date: Thu Nov 17 17:50:40 2016 +0100 Fixes ruby1.8 Travis failure that is because rdoc 4.3.0 requires Ruby >= 1.9.3. gemfiles/Gemfile.travis-jruby1.8 | 3 ++- gemfiles/Gemfile.travis-ruby1.8 | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) commit 146fd66b3a14972bcfd0e9bf8ec007d38c55ac39 Author: Naohisa Goto Date: Sat Aug 13 08:22:22 2016 +0900 Update URLs and use https for NCBI REST web services lib/bio/io/ncbirest.rb | 50 ++++++++++++++++++++++++++------------------------ 1 file changed, 26 insertions(+), 24 deletions(-) commit 7abd46f058a17ac34b263714449756383622012d Author: Naohisa Goto Date: Sat Aug 13 08:12:08 2016 +0900 New method Bio::Command#start_http_uri(uri) with tests * lib/bio/command.rb: New method Bio::Command#start_http_uri(uri) that supports HTTPS. Note that this method is intended to be called only from BioRuby internals. * lib/bio/command.rb: Bio::Command#post and #post_form are changed to use the start_http_uri(). * test/network/bio/test_command.rb: tests for start_http_uri(). lib/bio/command.rb | 42 ++++++++++++++++++++++++++++++++++++++-- test/network/bio/test_command.rb | 17 ++++++++++++++++ 2 files changed, 57 insertions(+), 2 deletions(-) commit 11c680f6d64a60bdc0f4248951bf2d2ebafbc433 Author: Naohisa Goto Date: Fri Jun 17 20:40:41 2016 +0900 gemfiles/Gemfile.*: remove dependency on libxml-ruby * gemfiles/Gemfile.*: remove dependency on libxml-ruby. Bio::PhyloXML required libxml-ruby but was already removed. gemfiles/Gemfile.travis-jruby1.8 | 3 --- gemfiles/Gemfile.travis-jruby1.9 | 3 --- gemfiles/Gemfile.travis-rbx | 1 - gemfiles/Gemfile.travis-ruby1.8 | 1 - gemfiles/Gemfile.travis-ruby1.9 | 1 - gemfiles/Gemfile.travis-ruby2.2 | 1 - 6 files changed, 10 deletions(-) commit 09fa57f987445e8654de6a0d0cf7c45f7625600c Author: Naohisa Goto Date: Fri Jun 17 16:16:40 2016 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) commit 87812d119820bf66767c7767cfec7554d7a00f3b Author: Naohisa Goto Date: Fri Jun 17 15:45:46 2016 +0900 README.rdoc: about bioruby-phyloxml and bio-biosql README.rdoc | 10 ++++++++++ 1 file changed, 10 insertions(+) commit 2294f255f5f05f9f629a1e88c0e1f59bb74b32bc Author: Naohisa Goto Date: Fri Jun 17 15:42:46 2016 +0900 KNOWN_ISSUES.rdoc: remove descriptions about Bio::SQL KNOWN_ISSUES.rdoc | 5 ----- 1 file changed, 5 deletions(-) commit 35a6f761dc5fa493b8311747dde7f2a54d8aee75 Author: Naohisa Goto Date: Fri Jun 17 15:40:57 2016 +0900 README.rdoc: remove descriptions about Bio::SQL README.rdoc | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) commit 46a5bf7acdc803b7e75225c41b23396c4619f25d Author: Naohisa Goto Date: Fri Jun 17 14:59:41 2016 +0900 remove autoload of Bio::SQL lib/bio.rb | 1 - 1 file changed, 1 deletion(-) commit 57bf535da34715beafccb902404cf1bb35b18af4 Author: Naohisa Goto Date: Fri Jun 17 14:48:46 2016 +0900 Removed Bio::SQL that have been moved to separate repository * Bio::SQL is moved to https://github.com/bioruby/bioruby-biosql and removed from this repository. * List of deleted files: * deleted: lib/bio/db/biosql/biosql_to_biosequence.rb * deleted: lib/bio/db/biosql/sequence.rb * deleted: lib/bio/io/biosql/ar-biosql.rb * deleted: lib/bio/io/biosql/biosql.rb * deleted: lib/bio/io/biosql/config/database.yml * deleted: lib/bio/io/sql.rb * deleted: test/unit/bio/db/biosql/tc_biosql.rb * deleted: test/unit/bio/db/biosql/ts_suite_biosql.rb lib/bio/db/biosql/biosql_to_biosequence.rb | 78 ----- lib/bio/db/biosql/sequence.rb | 444 ----------------------------- lib/bio/io/biosql/ar-biosql.rb | 257 ----------------- lib/bio/io/biosql/biosql.rb | 39 --- lib/bio/io/biosql/config/database.yml | 21 -- lib/bio/io/sql.rb | 79 ----- test/unit/bio/db/biosql/tc_biosql.rb | 114 -------- test/unit/bio/db/biosql/ts_suite_biosql.rb | 8 - 8 files changed, 1040 deletions(-) delete mode 100644 lib/bio/db/biosql/biosql_to_biosequence.rb delete mode 100644 lib/bio/db/biosql/sequence.rb delete mode 100644 lib/bio/io/biosql/ar-biosql.rb delete mode 100644 lib/bio/io/biosql/biosql.rb delete mode 100644 lib/bio/io/biosql/config/database.yml delete mode 100644 lib/bio/io/sql.rb delete mode 100644 test/unit/bio/db/biosql/tc_biosql.rb delete mode 100644 test/unit/bio/db/biosql/ts_suite_biosql.rb commit 476dcdbe2b21cd5adb641952ee3da92c2d593121 Author: Naohisa Goto Date: Wed Jun 8 12:38:22 2016 +0900 appveyor.yml: eliminate old Ruby versions and add Ruby 2.3 appveyor.yml | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) commit c26e2b77b75b5505a274822f53c6c5a8f842f6c0 Author: Naohisa Goto Date: Wed Jun 8 01:50:19 2016 +0900 .travis.yml: fix to use rbx-3.29 .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit b524abedac9c85d4f8259191b973bc38a9fc557c Author: Naohisa Goto Date: Wed Jun 8 01:45:48 2016 +0900 gemfiles/Gemfile.travis-jruby1.8: use old gem versions supporting Ruby 1.8 gemfiles/Gemfile.travis-jruby1.8 | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit c5df9268b77f1d4dc2b29e7cfb7baf3c528c1558 Author: Naohisa Goto Date: Wed Jun 8 01:42:40 2016 +0900 .travis.yml: use rbx-3.29 instead of rbx-3 .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit b51b54894ca2d76d9c13680fd72b87951a10a1df Author: Naohisa Goto Date: Wed Jun 8 01:25:18 2016 +0900 Workaround to avoid bug in old versions of Bundler * gemfiles/prepare-gemspec.rb: execute "gem update bundler" to avoid "NoMethodError: undefined method `spec' for nil:NilClass" during "bundle install". This error may be due to a bug of Bundler and the bug seems to be fixed in the latest version of Budler. gemfiles/prepare-gemspec.rb | 4 ++++ 1 file changed, 4 insertions(+) commit a82424b4864e243ebf1f8cc7f181044798b34b5a Author: Naohisa Goto Date: Wed Jun 8 01:20:15 2016 +0900 .travis.yml: add Ruby 2.3.1; use Ruby 2.2.5 instead of 2.2 .travis.yml | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) commit ae927514a5c2853d3839750af86bfcc1fc53e4f1 Author: Naohisa Goto Date: Wed Jun 8 00:54:22 2016 +0900 .travis.yml: add "sudo: false" for faster testing .travis.yml | 1 + 1 file changed, 1 insertion(+) commit 832c4dd94a5602a9deadf599ce1778fac870ac81 Author: Naohisa Goto Date: Wed Jun 8 00:46:26 2016 +0900 gemfiles/Gemfile.travis-ruby1.8: use old gem versions supporting Ruby 1.8 gemfiles/Gemfile.travis-ruby1.8 | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) commit 6cf0ab84cd67aab0f6f4012438c1852a19f3ac7a Author: Naohisa Goto Date: Wed Jun 8 00:04:36 2016 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit 8e986984892d661b4f09a06158a634554d931718 Author: Naohisa Goto Date: Tue Jun 7 23:59:35 2016 +0900 .travis.yml: Update ruby versions and remove temporary workaround * Update Ruby versions to 2.2, 2.1.10, and rbx-3. * Remove temporary workaround about RubyGems introduced in e92e09edf5904f51d3e73e61d13fce4159a543c5. .travis.yml | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) commit 90e678d6d74d86c45631128c0f16181679f0d599 Author: Naohisa Goto Date: Tue Jun 7 23:37:45 2016 +0900 Test bug: fix gem version mismatch error on Travis-CI * Rakefile: prefer to use spec read from existing bioruby.gemspec file instead of that of generated from bioruby.gemspec.erb. This fixes "can't activate bio (= 1.5.1.2016XXXX), already activated bio-1.5.1.2015NNNN" occurred on Travis-CI during gem integration tests. Rakefile | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) commit bdb33fe752b7dddcb35f57d826f85dbdd512c3c1 Author: Kozo Nishida Date: Wed Nov 4 12:08:24 2015 +0900 add appveyor.yml appveyor.yml | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 appveyor.yml commit 8b0fa73c57232a6a86d2d6fd0711f51bc50aa333 Author: Naohisa Goto Date: Thu Sep 17 23:34:34 2015 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 14 +------------- 1 file changed, 1 insertion(+), 13 deletions(-) commit 813fc808e9a235e03ed2d5bad2d15f74946bd65a Author: Naohisa Goto Date: Thu Sep 17 23:30:46 2015 +0900 Tutorial.rd.html is regenerated by rake retutorial2html doc/Tutorial.rd.html | 117 +++++++++------------------------------------------ 1 file changed, 19 insertions(+), 98 deletions(-) commit 756f14122a45973289172a88241490a1bcc0054a Author: Naohisa Goto Date: Thu Sep 17 23:25:07 2015 +0900 Delete Bio::PhyloXML tutorial * Tutorial for Bio::PhyloXML is deleted from BioRuby core. It is now moved to bio-phyloxml gem. New tutorial for Bio::PhyloXML is available at: https://github.com/bioruby/bioruby-phyloxml/blob/master/doc/Tutorial.rd doc/Tutorial.rd | 114 +++----------------------------------------------------- 1 file changed, 6 insertions(+), 108 deletions(-) commit bb42efdd2eec380c99cbd3e505577a550dda8ce7 Author: Naohisa Goto Date: Thu Sep 17 23:20:50 2015 +0900 Delete description of Bio::PhyloXML and its dependency libxml-ruby. README.rdoc | 6 ------ 1 file changed, 6 deletions(-) commit 4202ae936baf0f4c8a722af240a6613f4e8a8cee Author: Naohisa Goto Date: Thu Sep 17 22:48:23 2015 +0900 Remove PhyloXML (split out bio-phyloxml gem) * Bio::PhyloXML is removed from BioRuby core. It will soon be released as separate bio-phyloxml gem. The development repository of the new Bio::PhyloXML is https://github.com/bioruby/bioruby-phyloxml lib/bio/db/phyloxml/phyloxml.xsd | 582 ------ lib/bio/db/phyloxml/phyloxml_elements.rb | 1194 ----------- lib/bio/db/phyloxml/phyloxml_parser.rb | 1001 ---------- lib/bio/db/phyloxml/phyloxml_writer.rb | 227 --- sample/test_phyloxml_big.rb | 205 -- test/data/phyloxml/apaf.xml | 666 ------- test/data/phyloxml/bcl_2.xml | 2097 -------------------- test/data/phyloxml/made_up.xml | 144 -- .../data/phyloxml/ncbi_taxonomy_mollusca_short.xml | 65 - test/data/phyloxml/phyloxml_examples.xml | 415 ---- test/unit/bio/db/test_phyloxml.rb | 821 -------- test/unit/bio/db/test_phyloxml_writer.rb | 334 ---- 12 files changed, 7751 deletions(-) delete mode 100644 lib/bio/db/phyloxml/phyloxml.xsd delete mode 100644 lib/bio/db/phyloxml/phyloxml_elements.rb delete mode 100644 lib/bio/db/phyloxml/phyloxml_parser.rb delete mode 100644 lib/bio/db/phyloxml/phyloxml_writer.rb delete mode 100644 sample/test_phyloxml_big.rb delete mode 100644 test/data/phyloxml/apaf.xml delete mode 100644 test/data/phyloxml/bcl_2.xml delete mode 100644 test/data/phyloxml/made_up.xml delete mode 100644 test/data/phyloxml/ncbi_taxonomy_mollusca_short.xml delete mode 100644 test/data/phyloxml/phyloxml_examples.xml delete mode 100644 test/unit/bio/db/test_phyloxml.rb delete mode 100644 test/unit/bio/db/test_phyloxml_writer.rb commit e3a85ad9eb6d258e79fdfbe600711a5296a20e8c Author: Naohisa Goto Date: Thu Sep 17 22:45:32 2015 +0900 Delete autoload of Bio::PhyloXML * Delete autoload of Bio::PhyloXML, for preparation of spliting out Bio::PhyloXML. lib/bio.rb | 7 ------- 1 file changed, 7 deletions(-) commit 422ffe6fedecf41d83327c01f7a55ebce4afd70d Author: Naohisa Goto Date: Tue Sep 15 22:33:14 2015 +0900 Incompatible change about deprecated Bio::Taxonomy is described. RELEASE_NOTES.rdoc | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) commit 3ea10d73340d8ad571ab6ca386cffca18ec725d1 Author: Naohisa Goto Date: Tue Sep 15 21:06:29 2015 +0900 Bio::Taxonomy is removed and merged to Bio::PhyloXML::Taxonomy * Bio::Taxonomy in lib/bio/db/phyloxml/phyloxml_elements.rb was written for PhyloXML, but it was intended to become general taxonomy data class in BioRuby. However, no efforts have been made to improve the Bio::Taxonomy class, and it still remains to be a PhyloXML specific class. As the first step to split out Bio::PhyloXML to a new Gem (Biogem) package, we now decide to remove Bio::Taxonomy and merge it to Bio::PhyloXML::Taxonomy. * Codes using Bio::Taxonomy should be modified. Changing Bio::Taxonomy to Bio::PhyloXML::Taxonomy, or adding the following monkey patch is needed. module Bio unless defined? Taxonomy Taxonomy = Bio::PhyloXML::Taxonomy end end lib/bio.rb | 2 -- lib/bio/db/phyloxml/phyloxml_elements.rb | 21 +++++++++------------ 2 files changed, 9 insertions(+), 14 deletions(-) commit f89f49223f7d6ed74a8fc50aa2355fb5912c885f Author: Naohisa Goto Date: Mon Sep 14 15:15:56 2015 +0900 regenerate bioruby.gemspec with rake regemspec bioruby.gemspec | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) commit 809e190d710caceee1c213da1aa067dee87e6ebd Author: Naohisa Goto Date: Mon Sep 14 15:14:05 2015 +0900 New RELEASE_NOTES.rdoc for the next release version RELEASE_NOTES.rdoc | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) create mode 100644 RELEASE_NOTES.rdoc commit a44257e933165509f3d2b164ea547ed8fba18ea5 Author: Naohisa Goto Date: Mon Sep 14 15:10:42 2015 +0900 move RELEASE_NOTES.rdoc to doc/RELEASE_NOTES-1.5.0.rdoc RELEASE_NOTES.rdoc => doc/RELEASE_NOTES-1.5.0.rdoc | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename RELEASE_NOTES.rdoc => doc/RELEASE_NOTES-1.5.0.rdoc (100%) commit 4d53755b0181255e2ee69193a5a3b064ef4f4b77 Author: Naohisa Goto Date: Thu Jul 2 22:19:03 2015 +0900 ChangeLog since 1.5.0 release ChangeLog | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 ChangeLog commit e066e3c8bcf0c6b7eadd3573576d4550aca77cc5 Author: Naohisa Goto Date: Thu Jul 2 22:17:06 2015 +0900 ChangeLog is moved to doc/ChangeLog-1.5.0 ChangeLog => doc/ChangeLog-1.5.0 | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename ChangeLog => doc/ChangeLog-1.5.0 (100%) commit dd53e885c1baa765bc094897d53309af7b15497b Author: Naohisa Goto Date: Thu Jul 2 22:09:26 2015 +0900 change version for generating ChangeLog to 1.5.0 Rakefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 7a5d897ebc45d9ec5357918a42eb2980decf01e4 Author: Naohisa Goto Date: Thu Jul 2 21:52:17 2015 +0900 version changed to 1.5.1-dev (pre-release version of 1.5.1) lib/bio/version.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 8fc4d6c64f6958a352c36b171b00d1f1ff2a2354 Author: Naohisa Goto Date: Thu Jul 2 21:47:28 2015 +0900 fix English syntax and unexpected word insertion RELEASE_NOTES.rdoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) bio-2.0.3/sample/0000755000175000017500000000000014141516614013077 5ustar nileshnileshbio-2.0.3/sample/demo_pathway.rb0000644000175000017500000001140214141516614016103 0ustar nileshnilesh# # = sample/demo_pathway.rb - demonstration of Bio::Pathway # # Copyright: Copyright (C) 2001 # Toshiaki Katayama , # Shuichi Kawashima # License:: The Ruby License # # # == Description # # Demonstration of Bio::Pathway, an implementation of the graph data structure # and graph algorithms. # # == Usage # # Simply run this script. # # $ ruby demo_pathway.rb # # == Development information # # The code was moved from lib/bio/pathway.rb. # require 'bio' #if __FILE__ == $0 puts "--- Test === method true/false" r1 = Bio::Relation.new('a', 'b', 1) r2 = Bio::Relation.new('b', 'a', 1) r3 = Bio::Relation.new('b', 'a', 2) r4 = Bio::Relation.new('a', 'b', 1) p r1 === r2 p r1 === r3 p r1 === r4 p [ r1, r2, r3, r4 ].uniq p r1.eql?(r2) p r3.eql?(r2) # Sample Graph : # +----------------+ # | | # v | # +---------(q)-->(t)------->(y)<----(r) # | | | ^ | # v | v | | # +--(s)<--+ | (x)<---+ (u)<-----+ # | | | | | # v | | v | # (v)----->(w)<---+ (z)----+ data = [ [ 'q', 's', 1, ], [ 'q', 't', 1, ], [ 'q', 'w', 1, ], [ 'r', 'u', 1, ], [ 'r', 'y', 1, ], [ 's', 'v', 1, ], [ 't', 'x', 1, ], [ 't', 'y', 1, ], [ 'u', 'y', 1, ], [ 'v', 'w', 1, ], [ 'w', 's', 1, ], [ 'x', 'z', 1, ], [ 'y', 'q', 1, ], [ 'z', 'x', 1, ], ] ary = [] puts "--- List of relations" data.each do |x| ary << Bio::Relation.new(*x) end p ary puts "--- Generate graph from list of relations" graph = Bio::Pathway.new(ary) p graph puts "--- Test to_matrix method" p graph.to_matrix puts "--- Test dump_matrix method" puts graph.dump_matrix(0) puts "--- Test dump_list method" puts graph.dump_list puts "--- Labeling some nodes" hash = { 'q' => "L1", 's' => "L2", 'v' => "L3", 'w' => "L4" } graph.label = hash p graph puts "--- Extract subgraph by label" p graph.subgraph puts "--- Extract subgraph by list" p graph.subgraph(['q', 't', 'x', 'y', 'z']) puts "--- Test cliquishness of the node 'q'" p graph.cliquishness('q') puts "--- Test cliquishness of the node 'q' (undirected)" u_graph = Bio::Pathway.new(ary, 'undirected') p u_graph.cliquishness('q') puts "--- Test small_world histgram" p graph.small_world puts "--- Test breadth_first_search method" distance, predecessor = graph.breadth_first_search('q') p distance p predecessor puts "--- Test bfs_shortest_path method" step, path = graph.bfs_shortest_path('y', 'w') p step p path puts "--- Test depth_first_search method" timestamp, tree, back, cross, forward = graph.depth_first_search p timestamp print "tree edges : "; p tree print "back edges : "; p back print "cross edges : "; p cross print "forward edges : "; p forward puts "--- Test dfs_topological_sort method" # # Professor Bumstead topologically sorts his clothing when getting dressed. # # "undershorts" "socks" # | | | # v | v "watch" # "pants" --+-------> "shoes" # | # v # "belt" <----- "shirt" ----> "tie" ----> "jacket" # | ^ # `---------------------------------------' # dag = Bio::Pathway.new([ Bio::Relation.new("undeershorts", "pants", true), Bio::Relation.new("undeershorts", "shoes", true), Bio::Relation.new("socks", "shoes", true), Bio::Relation.new("watch", "watch", true), Bio::Relation.new("pants", "belt", true), Bio::Relation.new("pants", "shoes", true), Bio::Relation.new("shirt", "belt", true), Bio::Relation.new("shirt", "tie", true), Bio::Relation.new("tie", "jacket", true), Bio::Relation.new("belt", "jacket", true), ]) p dag.dfs_topological_sort puts "--- Test dijkstra method" distance, predecessor = graph.dijkstra('q') p distance p predecessor puts "--- Test dijkstra method by weighted graph" # # 'a' --> 'b' # | 1 | 3 # |5 v # `----> 'c' # r1 = Bio::Relation.new('a', 'b', 1) r2 = Bio::Relation.new('a', 'c', 5) r3 = Bio::Relation.new('b', 'c', 3) w_graph = Bio::Pathway.new([r1, r2, r3]) p w_graph p w_graph.dijkstra('a') puts "--- Test bellman_ford method by negative weighted graph" # # ,-- 'a' --> 'b' # | | 1 | 3 # | |5 v # | `----> 'c' # | ^ # |2 | -5 # `--> 'd' ----' # r4 = Bio::Relation.new('a', 'd', 2) r5 = Bio::Relation.new('d', 'c', -5) w_graph.append(r4) w_graph.append(r5) p w_graph.bellman_ford('a') p graph.bellman_ford('q') #end bio-2.0.3/sample/na2aa.testdata.yaml0000644000175000017500000000030414141516614016552 0ustar nileshnileshseqFile: - class: File location: ../test/data/fasta/example1.txt - class: File location: ../test/data/fasta/example2.txt - class: File location: ../test/data/genbank/SCU49845.gb bio-2.0.3/sample/demo_aaindex.rb0000644000175000017500000000251614141516614016045 0ustar nileshnilesh# # = sample/demo_aaindex.rb - demonstration of Bio::AAindex1 and AAindex2 # # Copyright:: Copyright (C) 2001 # KAWASHIMA Shuichi # Copyright:: Copyright (C) 2006 # Mitsuteru C. Nakao # License:: The Ruby License # # # == Description # # Demonstration of Bio::AAindex1 and Bio::AAindex2. # # == Requirements # # Internet connection and/or OBDA (Open Bio Database Access) configuration. # # == Usage # # Simply run this script. # # $ ruby demo_aaindex.rb # # == Development information # # The code was moved from lib/bio/db/aaindex.rb. # require 'bio' #if __FILE__ == $0 puts "### AAindex1 (PRAM900102)" aax1 = Bio::AAindex1.new(Bio::Fetch.query('aaindex', 'PRAM900102', 'raw')) p aax1.entry_id p aax1.definition p aax1.dblinks p aax1.author p aax1.title p aax1.journal p aax1.comment p aax1.correlation_coefficient p aax1.index p aax1 puts "### AAindex2 (DAYM780301)" aax2 = Bio::AAindex2.new(Bio::Fetch.query('aaindex', 'DAYM780301', 'raw')) p aax2.entry_id p aax2.definition p aax2.dblinks p aax2.author p aax2.title p aax2.journal p aax1.comment p aax2.rows p aax2.cols p aax2.matrix p aax2.matrix[2,2] p aax2.matrix[2,3] p aax2.matrix[4,3] p aax2.matrix.determinant p aax2.matrix.rank p aax2.matrix.transpose p aax2 #end bio-2.0.3/sample/benchmark_clustalw_report.rb0000644000175000017500000000216714141516614020675 0ustar nileshnilesh# # = sample/benchmark_clustalw_report.rb - Benchmark tests for Bio::ClustalW::Report # # Copyright:: Copyright (C) 2013 # Andrew Grimm # License:: The Ruby License require 'pathname' load Pathname.new(File.join(File.dirname(__FILE__), ['..'] * 1, "test", 'bioruby_test_helper.rb')).cleanpath.to_s require 'benchmark' require 'bio' class BenchmarkClustalWReport DataDir = File.join(BioRubyTestDataPath, 'clustalw') Filenames = [ 'example1.aln', 'example1-seqnos.aln' ] def self.benchmark_clustalw_report Filenames.each do |fn| print "\n", fn, "\n" fullpath = File.join(DataDir, fn) self.new(fullpath).benchmark end end def initialize(aln_filename) @text = File.open(aln_filename, 'rb') { |f| f.read } @text.freeze end def benchmark GC.start Benchmark.bmbm do |x| x.report do for i in 1...10_000 aln = Bio::ClustalW::Report.new(@text) aln.alignment end end end end end #class BenchmarkClustalWReport BenchmarkClustalWReport.benchmark_clustalw_report bio-2.0.3/sample/demo_aminoacid.rb0000644000175000017500000000565114141516614016363 0ustar nileshnilesh# # = sample/demo_aminoacid.rb - demonstration of Bio::AminoAcid # # Copyright:: Copyright (C) 2001, 2005 # Toshiaki Katayama # License:: The Ruby License # # # == Description # # Demonstration of Bio::AminoAcid, the class for amino acid data. # # == Usage # # Simply run this script. # # $ ruby demo_aminoacid.rb # # == Development information # # The code was moved from lib/bio/data/aa.rb. # require 'bio' #if __FILE__ == $0 puts "### aa = Bio::AminoAcid.new" aa = Bio::AminoAcid.new puts "# Bio::AminoAcid['A']" p Bio::AminoAcid['A'] puts "# aa['A']" p aa['A'] puts "# Bio::AminoAcid.name('A'), Bio::AminoAcid.name('Ala')" p Bio::AminoAcid.name('A'), Bio::AminoAcid.name('Ala') puts "# aa.name('A'), aa.name('Ala')" p aa.name('A'), aa.name('Ala') puts "# Bio::AminoAcid.to_1('alanine'), Bio::AminoAcid.one('alanine')" p Bio::AminoAcid.to_1('alanine'), Bio::AminoAcid.one('alanine') puts "# aa.to_1('alanine'), aa.one('alanine')" p aa.to_1('alanine'), aa.one('alanine') puts "# Bio::AminoAcid.to_1('Ala'), Bio::AminoAcid.one('Ala')" p Bio::AminoAcid.to_1('Ala'), Bio::AminoAcid.one('Ala') puts "# aa.to_1('Ala'), aa.one('Ala')" p aa.to_1('Ala'), aa.one('Ala') puts "# Bio::AminoAcid.to_1('A'), Bio::AminoAcid.one('A')" p Bio::AminoAcid.to_1('A'), Bio::AminoAcid.one('A') puts "# aa.to_1('A'), aa.one('A')" p aa.to_1('A'), aa.one('A') puts "# Bio::AminoAcid.to_3('alanine'), Bio::AminoAcid.three('alanine')" p Bio::AminoAcid.to_3('alanine'), Bio::AminoAcid.three('alanine') puts "# aa.to_3('alanine'), aa.three('alanine')" p aa.to_3('alanine'), aa.three('alanine') puts "# Bio::AminoAcid.to_3('Ala'), Bio::AminoAcid.three('Ala')" p Bio::AminoAcid.to_3('Ala'), Bio::AminoAcid.three('Ala') puts "# aa.to_3('Ala'), aa.three('Ala')" p aa.to_3('Ala'), aa.three('Ala') puts "# Bio::AminoAcid.to_3('A'), Bio::AminoAcid.three('A')" p Bio::AminoAcid.to_3('A'), Bio::AminoAcid.three('A') puts "# aa.to_3('A'), aa.three('A')" p aa.to_3('A'), aa.three('A') puts "# Bio::AminoAcid.one2three('A')" p Bio::AminoAcid.one2three('A') puts "# aa.one2three('A')" p aa.one2three('A') puts "# Bio::AminoAcid.three2one('Ala')" p Bio::AminoAcid.three2one('Ala') puts "# aa.three2one('Ala')" p aa.three2one('Ala') puts "# Bio::AminoAcid.one2name('A')" p Bio::AminoAcid.one2name('A') puts "# aa.one2name('A')" p aa.one2name('A') puts "# Bio::AminoAcid.name2one('alanine')" p Bio::AminoAcid.name2one('alanine') puts "# aa.name2one('alanine')" p aa.name2one('alanine') puts "# Bio::AminoAcid.three2name('Ala')" p Bio::AminoAcid.three2name('Ala') puts "# aa.three2name('Ala')" p aa.three2name('Ala') puts "# Bio::AminoAcid.name2three('alanine')" p Bio::AminoAcid.name2three('alanine') puts "# aa.name2three('alanine')" p aa.name2three('alanine') puts "# Bio::AminoAcid.to_re('BZACDEFGHIKLMNPQRSTVWYU')" p Bio::AminoAcid.to_re('BZACDEFGHIKLMNPQRSTVWYU') #end bio-2.0.3/sample/test_restriction_enzyme_long.rb0000644000175000017500000062705714141516614021457 0ustar nileshnilesh# # = sample/test_restriction_enzyme_long.rb - Benchmark tests for Bio::RestrictionEnzyme::Analysis.cut for long sequences # # Copyright:: Copyright (C) 2011 # Naohisa Goto # License:: The Ruby License # # Acknowledgements: The idea of the test is based on the issue report # https://github.com/bioruby/bioruby/issues/10 # posted by ray1729 (https://github.com/ray1729). # require 'test/unit' require 'benchmark' require 'bio' entry = Bio::TogoWS::REST.entry('ncbi-nucleotide', 'BA000007.2') EcoliO157H7Seq = Bio::GenBank.new(entry).naseq.freeze module TestRestrictionEnzymeAnalysisCutLong # dummy benchmarch class class DummyBench def report(str); yield; end end module HelperMethods def _truncate_cut_ranges(cut_ranges, len) limit = len - 1 ret = cut_ranges.collect do |a| if a[0] > limit || a[2] > limit then nil else a.collect { |pos| pos > limit ? limit : pos } end end ret.compact! if last_a = ret[-1] then last_a[1] = limit last_a[3] = limit end ret end def _collect_cut_ranges(cuts) cuts.collect do |f| [ f.p_left, f.p_right, f.c_left, f.c_right ] end end def _test_by_size(len, bench = DummyBench.new) cuts = nil bench.report("#{self.class::TestLabel} #{len}") { cuts = _cut(self.class::SampleSequence[0, len]) } cut_ranges = _collect_cut_ranges(cuts) expected = _truncate_cut_ranges(self.class::SampleCutRanges, len) assert_equal(expected, cut_ranges) end def test_10k_to_100k $stderr.print "\n" Benchmark.bm(26) do |bench| 10_000.step(100_000, 10_000) do |len| _test_by_size(len, bench) end end end def test_100k_to_1M $stderr.print "\n" Benchmark.bm(26) do |bench| 100_000.step(1_000_000, 100_000) do |len| _test_by_size(len, bench) end end end def test_1M_to_5M_and_whole $stderr.print "\n" Benchmark.bm(26) do |bench| 1_000_000.step(5_000_000, 1_000_000) do |len| _test_by_size(len, bench) end _test_by_size(self.class::SampleSequence.length, bench) end end if defined? Bio::RestrictionEnzyme::SortedNumArray def disabled_test_whole cuts = _cut(self.class::SampleSequence) cut_ranges = _collect_cut_ranges(cuts) cut_ranges.each do |a| $stderr.print " [ ", a.join(", "), " ], \n" end assert_equal(self.class::SampleCutRanges, cut_ranges) end end #module HelperMethods class TestEcoliO157H7_BstEII < Test::Unit::TestCase include HelperMethods TestLabel = 'BstEII' SampleSequence = EcoliO157H7Seq SampleCutRanges = BstEII_WHOLE = [ [ 0, 79, 0, 84 ], [ 80, 4612, 85, 4617 ], [ 4613, 13483, 4618, 13488 ], [ 13484, 15984, 13489, 15989 ], [ 15985, 21462, 15990, 21467 ], [ 21463, 27326, 21468, 27331 ], [ 27327, 30943, 27332, 30948 ], [ 30944, 34888, 30949, 34893 ], [ 34889, 35077, 34894, 35082 ], [ 35078, 35310, 35083, 35315 ], [ 35311, 36254, 35316, 36259 ], [ 36255, 41885, 36260, 41890 ], [ 41886, 43070, 41891, 43075 ], [ 43071, 45689, 43076, 45694 ], [ 45690, 52325, 45695, 52330 ], [ 52326, 55703, 52331, 55708 ], [ 55704, 58828, 55709, 58833 ], [ 58829, 59178, 58834, 59183 ], [ 59179, 72610, 59184, 72615 ], [ 72611, 72739, 72616, 72744 ], [ 72740, 73099, 72745, 73104 ], [ 73100, 75123, 73105, 75128 ], [ 75124, 77366, 75129, 77371 ], [ 77367, 77810, 77372, 77815 ], [ 77811, 78740, 77816, 78745 ], [ 78741, 79717, 78746, 79722 ], [ 79718, 82250, 79723, 82255 ], [ 82251, 84604, 82256, 84609 ], [ 84605, 95491, 84610, 95496 ], [ 95492, 95785, 95497, 95790 ], [ 95786, 95794, 95791, 95799 ], [ 95795, 96335, 95800, 96340 ], [ 96336, 102044, 96341, 102049 ], [ 102045, 102541, 102050, 102546 ], [ 102542, 103192, 102547, 103197 ], [ 103193, 104722, 103198, 104727 ], [ 104723, 110883, 104728, 110888 ], [ 110884, 120090, 110889, 120095 ], [ 120091, 120657, 120096, 120662 ], [ 120658, 128308, 120663, 128313 ], [ 128309, 138305, 128314, 138310 ], [ 138306, 141147, 138311, 141152 ], [ 141148, 143724, 141153, 143729 ], [ 143725, 143838, 143730, 143843 ], [ 143839, 144303, 143844, 144308 ], [ 144304, 148199, 144309, 148204 ], [ 148200, 149577, 148205, 149582 ], [ 149578, 149731, 149583, 149736 ], [ 149732, 156115, 149737, 156120 ], [ 156116, 161126, 156121, 161131 ], [ 161127, 162856, 161132, 162861 ], [ 162857, 170693, 162862, 170698 ], [ 170694, 170944, 170699, 170949 ], [ 170945, 171201, 170950, 171206 ], [ 171202, 173241, 171207, 173246 ], [ 173242, 177283, 173247, 177288 ], [ 177284, 178177, 177289, 178182 ], [ 178178, 178781, 178183, 178786 ], [ 178782, 181610, 178787, 181615 ], [ 181611, 181706, 181616, 181711 ], [ 181707, 185661, 181712, 185666 ], [ 185662, 193407, 185667, 193412 ], [ 193408, 195511, 193413, 195516 ], [ 195512, 195754, 195517, 195759 ], [ 195755, 197247, 195760, 197252 ], [ 197248, 200659, 197253, 200664 ], [ 200660, 201820, 200665, 201825 ], [ 201821, 202300, 201826, 202305 ], [ 202301, 202686, 202306, 202691 ], [ 202687, 206289, 202692, 206294 ], [ 206290, 206466, 206295, 206471 ], [ 206467, 207011, 206472, 207016 ], [ 207012, 208159, 207017, 208164 ], [ 208160, 209976, 208165, 209981 ], [ 209977, 210078, 209982, 210083 ], [ 210079, 211485, 210084, 211490 ], [ 211486, 212377, 211491, 212382 ], [ 212378, 213569, 212383, 213574 ], [ 213570, 216005, 213575, 216010 ], [ 216006, 220098, 216011, 220103 ], [ 220099, 224063, 220104, 224068 ], [ 224064, 228604, 224069, 228609 ], [ 228605, 239993, 228610, 239998 ], [ 239994, 247914, 239999, 247919 ], [ 247915, 251579, 247920, 251584 ], [ 251580, 257092, 251585, 257097 ], [ 257093, 261621, 257098, 261626 ], [ 261622, 263030, 261627, 263035 ], [ 263031, 265084, 263036, 265089 ], [ 265085, 265243, 265090, 265248 ], [ 265244, 265534, 265249, 265539 ], [ 265535, 266117, 265540, 266122 ], [ 266118, 274428, 266123, 274433 ], [ 274429, 282285, 274434, 282290 ], [ 282286, 286948, 282291, 286953 ], [ 286949, 292547, 286954, 292552 ], [ 292548, 297678, 292553, 297683 ], [ 297679, 308161, 297684, 308166 ], [ 308162, 308706, 308167, 308711 ], [ 308707, 313482, 308712, 313487 ], [ 313483, 337118, 313488, 337123 ], [ 337119, 337935, 337124, 337940 ], [ 337936, 338781, 337941, 338786 ], [ 338782, 339493, 338787, 339498 ], [ 339494, 341025, 339499, 341030 ], [ 341026, 344424, 341031, 344429 ], [ 344425, 348384, 344430, 348389 ], [ 348385, 354781, 348390, 354786 ], [ 354782, 356692, 354787, 356697 ], [ 356693, 357008, 356698, 357013 ], [ 357009, 357305, 357014, 357310 ], [ 357306, 357328, 357311, 357333 ], [ 357329, 358126, 357334, 358131 ], [ 358127, 359472, 358132, 359477 ], [ 359473, 362160, 359478, 362165 ], [ 362161, 365395, 362166, 365400 ], [ 365396, 365704, 365401, 365709 ], [ 365705, 381746, 365710, 381751 ], [ 381747, 381994, 381752, 381999 ], [ 381995, 383335, 382000, 383340 ], [ 383336, 385141, 383341, 385146 ], [ 385142, 390171, 385147, 390176 ], [ 390172, 392764, 390177, 392769 ], [ 392765, 394338, 392770, 394343 ], [ 394339, 394686, 394344, 394691 ], [ 394687, 398703, 394692, 398708 ], [ 398704, 404095, 398709, 404100 ], [ 404096, 408361, 404101, 408366 ], [ 408362, 413032, 408367, 413037 ], [ 413033, 414563, 413038, 414568 ], [ 414564, 416901, 414569, 416906 ], [ 416902, 417419, 416907, 417424 ], [ 417420, 421777, 417425, 421782 ], [ 421778, 423748, 421783, 423753 ], [ 423749, 431903, 423754, 431908 ], [ 431904, 440000, 431909, 440005 ], [ 440001, 448040, 440006, 448045 ], [ 448041, 452994, 448046, 452999 ], [ 452995, 453075, 453000, 453080 ], [ 453076, 454950, 453081, 454955 ], [ 454951, 455888, 454956, 455893 ], [ 455889, 460160, 455894, 460165 ], [ 460161, 463076, 460166, 463081 ], [ 463077, 465003, 463082, 465008 ], [ 465004, 466828, 465009, 466833 ], [ 466829, 467686, 466834, 467691 ], [ 467687, 468596, 467692, 468601 ], [ 468597, 479953, 468602, 479958 ], [ 479954, 480538, 479959, 480543 ], [ 480539, 482869, 480544, 482874 ], [ 482870, 489378, 482875, 489383 ], [ 489379, 492241, 489384, 492246 ], [ 492242, 495406, 492247, 495411 ], [ 495407, 495712, 495412, 495717 ], [ 495713, 497829, 495718, 497834 ], [ 497830, 501698, 497835, 501703 ], [ 501699, 504565, 501704, 504570 ], [ 504566, 505105, 504571, 505110 ], [ 505106, 508452, 505111, 508457 ], [ 508453, 515947, 508458, 515952 ], [ 515948, 519141, 515953, 519146 ], [ 519142, 519398, 519147, 519403 ], [ 519399, 521386, 519404, 521391 ], [ 521387, 526115, 521392, 526120 ], [ 526116, 526729, 526121, 526734 ], [ 526730, 527018, 526735, 527023 ], [ 527019, 528059, 527024, 528064 ], [ 528060, 532689, 528065, 532694 ], [ 532690, 534702, 532695, 534707 ], [ 534703, 535272, 534708, 535277 ], [ 535273, 538668, 535278, 538673 ], [ 538669, 543939, 538674, 543944 ], [ 543940, 547429, 543945, 547434 ], [ 547430, 553890, 547435, 553895 ], [ 553891, 554678, 553896, 554683 ], [ 554679, 555452, 554684, 555457 ], [ 555453, 556296, 555458, 556301 ], [ 556297, 559341, 556302, 559346 ], [ 559342, 559991, 559347, 559996 ], [ 559992, 563242, 559997, 563247 ], [ 563243, 576432, 563248, 576437 ], [ 576433, 582431, 576438, 582436 ], [ 582432, 582959, 582437, 582964 ], [ 582960, 583475, 582965, 583480 ], [ 583476, 583589, 583481, 583594 ], [ 583590, 583670, 583595, 583675 ], [ 583671, 583901, 583676, 583906 ], [ 583902, 584198, 583907, 584203 ], [ 584199, 584633, 584204, 584638 ], [ 584634, 585704, 584639, 585709 ], [ 585705, 585746, 585710, 585751 ], [ 585747, 586175, 585752, 586180 ], [ 586176, 586301, 586181, 586306 ], [ 586302, 586643, 586307, 586648 ], [ 586644, 586775, 586649, 586780 ], [ 586776, 587072, 586781, 587077 ], [ 587073, 587214, 587078, 587219 ], [ 587215, 587540, 587220, 587545 ], [ 587541, 587969, 587546, 587974 ], [ 587970, 588095, 587975, 588100 ], [ 588096, 588437, 588101, 588442 ], [ 588438, 588569, 588443, 588574 ], [ 588570, 589008, 588575, 589013 ], [ 589009, 589166, 589014, 589171 ], [ 589167, 590366, 589172, 590371 ], [ 590367, 590792, 590372, 590797 ], [ 590793, 591077, 590798, 591082 ], [ 591078, 591263, 591083, 591268 ], [ 591264, 591863, 591269, 591868 ], [ 591864, 592058, 591869, 592063 ], [ 592059, 592160, 592064, 592165 ], [ 592161, 592568, 592166, 592573 ], [ 592569, 592760, 592574, 592765 ], [ 592761, 593060, 592766, 593065 ], [ 593061, 593186, 593066, 593191 ], [ 593187, 593366, 593192, 593371 ], [ 593367, 593957, 593372, 593962 ], [ 593958, 594827, 593963, 594832 ], [ 594828, 594980, 594833, 594985 ], [ 594981, 595649, 594986, 595654 ], [ 595650, 595893, 595655, 595898 ], [ 595894, 596057, 595899, 596062 ], [ 596058, 596159, 596063, 596164 ], [ 596160, 596351, 596165, 596356 ], [ 596352, 596660, 596357, 596665 ], [ 596661, 596960, 596666, 596965 ], [ 596961, 597102, 596966, 597107 ], [ 597103, 597155, 597108, 597160 ], [ 597156, 597257, 597161, 597262 ], [ 597258, 599957, 597263, 599962 ], [ 599958, 611038, 599963, 611043 ], [ 611039, 612202, 611044, 612207 ], [ 612203, 614051, 612208, 614056 ], [ 614052, 614134, 614057, 614139 ], [ 614135, 614787, 614140, 614792 ], [ 614788, 616272, 614793, 616277 ], [ 616273, 617737, 616278, 617742 ], [ 617738, 627339, 617743, 627344 ], [ 627340, 628902, 627345, 628907 ], [ 628903, 636523, 628908, 636528 ], [ 636524, 637529, 636529, 637534 ], [ 637530, 647713, 637535, 647718 ], [ 647714, 648684, 647719, 648689 ], [ 648685, 653543, 648690, 653548 ], [ 653544, 659030, 653549, 659035 ], [ 659031, 662241, 659036, 662246 ], [ 662242, 671781, 662247, 671786 ], [ 671782, 672048, 671787, 672053 ], [ 672049, 673788, 672054, 673793 ], [ 673789, 674707, 673794, 674712 ], [ 674708, 674998, 674713, 675003 ], [ 674999, 675157, 675004, 675162 ], [ 675158, 688595, 675163, 688600 ], [ 688596, 693309, 688601, 693314 ], [ 693310, 697406, 693315, 697411 ], [ 697407, 702676, 697412, 702681 ], [ 702677, 707382, 702682, 707387 ], [ 707383, 708604, 707388, 708609 ], [ 708605, 710046, 708610, 710051 ], [ 710047, 711630, 710052, 711635 ], [ 711631, 711696, 711636, 711701 ], [ 711697, 712329, 711702, 712334 ], [ 712330, 716461, 712335, 716466 ], [ 716462, 720238, 716467, 720243 ], [ 720239, 720374, 720244, 720379 ], [ 720375, 724200, 720380, 724205 ], [ 724201, 725687, 724206, 725692 ], [ 725688, 730067, 725693, 730072 ], [ 730068, 730574, 730073, 730579 ], [ 730575, 730699, 730580, 730704 ], [ 730700, 732726, 730705, 732731 ], [ 732727, 738597, 732732, 738602 ], [ 738598, 743326, 738603, 743331 ], [ 743327, 744992, 743332, 744997 ], [ 744993, 745843, 744998, 745848 ], [ 745844, 751518, 745849, 751523 ], [ 751519, 752431, 751524, 752436 ], [ 752432, 752549, 752437, 752554 ], [ 752550, 766036, 752555, 766041 ], [ 766037, 768968, 766042, 768973 ], [ 768969, 770151, 768974, 770156 ], [ 770152, 771158, 770157, 771163 ], [ 771159, 771405, 771164, 771410 ], [ 771406, 781958, 771411, 781963 ], [ 781959, 784226, 781964, 784231 ], [ 784227, 786945, 784232, 786950 ], [ 786946, 787203, 786951, 787208 ], [ 787204, 789251, 787209, 789256 ], [ 789252, 791218, 789257, 791223 ], [ 791219, 793716, 791224, 793721 ], [ 793717, 795003, 793722, 795008 ], [ 795004, 795521, 795009, 795526 ], [ 795522, 804514, 795527, 804519 ], [ 804515, 805238, 804520, 805243 ], [ 805239, 805887, 805244, 805892 ], [ 805888, 808461, 805893, 808466 ], [ 808462, 809805, 808467, 809810 ], [ 809806, 810086, 809811, 810091 ], [ 810087, 810726, 810092, 810731 ], [ 810727, 820111, 810732, 820116 ], [ 820112, 821326, 820117, 821331 ], [ 821327, 821647, 821332, 821652 ], [ 821648, 824277, 821653, 824282 ], [ 824278, 825750, 824283, 825755 ], [ 825751, 828770, 825756, 828775 ], [ 828771, 828924, 828776, 828929 ], [ 828925, 830194, 828930, 830199 ], [ 830195, 830786, 830200, 830791 ], [ 830787, 832788, 830792, 832793 ], [ 832789, 833306, 832794, 833311 ], [ 833307, 835656, 833312, 835661 ], [ 835657, 841180, 835662, 841185 ], [ 841181, 842112, 841186, 842117 ], [ 842113, 843973, 842118, 843978 ], [ 843974, 843990, 843979, 843995 ], [ 843991, 852882, 843996, 852887 ], [ 852883, 854392, 852888, 854397 ], [ 854393, 857721, 854398, 857726 ], [ 857722, 857961, 857727, 857966 ], [ 857962, 862783, 857967, 862788 ], [ 862784, 878953, 862789, 878958 ], [ 878954, 885194, 878959, 885199 ], [ 885195, 886313, 885200, 886318 ], [ 886314, 886460, 886319, 886465 ], [ 886461, 890233, 886466, 890238 ], [ 890234, 890346, 890239, 890351 ], [ 890347, 890379, 890352, 890384 ], [ 890380, 899676, 890385, 899681 ], [ 899677, 903962, 899682, 903967 ], [ 903963, 904236, 903968, 904241 ], [ 904237, 908130, 904242, 908135 ], [ 908131, 916611, 908136, 916616 ], [ 916612, 916803, 916617, 916808 ], [ 916804, 920531, 916809, 920536 ], [ 920532, 928505, 920537, 928510 ], [ 928506, 936947, 928511, 936952 ], [ 936948, 937240, 936953, 937245 ], [ 937241, 939698, 937246, 939703 ], [ 939699, 939711, 939704, 939716 ], [ 939712, 941642, 939717, 941647 ], [ 941643, 949052, 941648, 949057 ], [ 949053, 949800, 949058, 949805 ], [ 949801, 951412, 949806, 951417 ], [ 951413, 951810, 951418, 951815 ], [ 951811, 952386, 951816, 952391 ], [ 952387, 953295, 952392, 953300 ], [ 953296, 953894, 953301, 953899 ], [ 953895, 958753, 953900, 958758 ], [ 958754, 964476, 958759, 964481 ], [ 964477, 967468, 964482, 967473 ], [ 967469, 969631, 967474, 969636 ], [ 969632, 970966, 969637, 970971 ], [ 970967, 971138, 970972, 971143 ], [ 971139, 974185, 971144, 974190 ], [ 974186, 974365, 974191, 974370 ], [ 974366, 975256, 974371, 975261 ], [ 975257, 976794, 975262, 976799 ], [ 976795, 987406, 976800, 987411 ], [ 987407, 988132, 987412, 988137 ], [ 988133, 992809, 988138, 992814 ], [ 992810, 1000225, 992815, 1000230 ], [ 1000226, 1001626, 1000231, 1001631 ], [ 1001627, 1007354, 1001632, 1007359 ], [ 1007355, 1011910, 1007360, 1011915 ], [ 1011911, 1012377, 1011916, 1012382 ], [ 1012378, 1017328, 1012383, 1017333 ], [ 1017329, 1020891, 1017334, 1020896 ], [ 1020892, 1021340, 1020897, 1021345 ], [ 1021341, 1024845, 1021346, 1024850 ], [ 1024846, 1025853, 1024851, 1025858 ], [ 1025854, 1030691, 1025859, 1030696 ], [ 1030692, 1032676, 1030697, 1032681 ], [ 1032677, 1037847, 1032682, 1037852 ], [ 1037848, 1039473, 1037853, 1039478 ], [ 1039474, 1044241, 1039479, 1044246 ], [ 1044242, 1045920, 1044247, 1045925 ], [ 1045921, 1053286, 1045926, 1053291 ], [ 1053287, 1053309, 1053292, 1053314 ], [ 1053310, 1054643, 1053315, 1054648 ], [ 1054644, 1056527, 1054649, 1056532 ], [ 1056528, 1058682, 1056533, 1058687 ], [ 1058683, 1059297, 1058688, 1059302 ], [ 1059298, 1060416, 1059303, 1060421 ], [ 1060417, 1064234, 1060422, 1064239 ], [ 1064235, 1064848, 1064240, 1064853 ], [ 1064849, 1065434, 1064854, 1065439 ], [ 1065435, 1075642, 1065440, 1075647 ], [ 1075643, 1076325, 1075648, 1076330 ], [ 1076326, 1076534, 1076331, 1076539 ], [ 1076535, 1078866, 1076540, 1078871 ], [ 1078867, 1080537, 1078872, 1080542 ], [ 1080538, 1082144, 1080543, 1082149 ], [ 1082145, 1085746, 1082150, 1085751 ], [ 1085747, 1087087, 1085752, 1087092 ], [ 1087088, 1088273, 1087093, 1088278 ], [ 1088274, 1093062, 1088279, 1093067 ], [ 1093063, 1096867, 1093068, 1096872 ], [ 1096868, 1102488, 1096873, 1102493 ], [ 1102489, 1106371, 1102494, 1106376 ], [ 1106372, 1108123, 1106377, 1108128 ], [ 1108124, 1113311, 1108129, 1113316 ], [ 1113312, 1114557, 1113317, 1114562 ], [ 1114558, 1120566, 1114563, 1120571 ], [ 1120567, 1121004, 1120572, 1121009 ], [ 1121005, 1122501, 1121010, 1122506 ], [ 1122502, 1130582, 1122507, 1130587 ], [ 1130583, 1132170, 1130588, 1132175 ], [ 1132171, 1140126, 1132176, 1140131 ], [ 1140127, 1143361, 1140132, 1143366 ], [ 1143362, 1149205, 1143367, 1149210 ], [ 1149206, 1149331, 1149211, 1149336 ], [ 1149332, 1156272, 1149337, 1156277 ], [ 1156273, 1161624, 1156278, 1161629 ], [ 1161625, 1171353, 1161630, 1171358 ], [ 1171354, 1171934, 1171359, 1171939 ], [ 1171935, 1172114, 1171940, 1172119 ], [ 1172115, 1185368, 1172120, 1185373 ], [ 1185369, 1193993, 1185374, 1193998 ], [ 1193994, 1194272, 1193999, 1194277 ], [ 1194273, 1197920, 1194278, 1197925 ], [ 1197921, 1200373, 1197926, 1200378 ], [ 1200374, 1200597, 1200379, 1200602 ], [ 1200598, 1200714, 1200603, 1200719 ], [ 1200715, 1203674, 1200720, 1203679 ], [ 1203675, 1204865, 1203680, 1204870 ], [ 1204866, 1205330, 1204871, 1205335 ], [ 1205331, 1210727, 1205336, 1210732 ], [ 1210728, 1211881, 1210733, 1211886 ], [ 1211882, 1214283, 1211887, 1214288 ], [ 1214284, 1216981, 1214289, 1216986 ], [ 1216982, 1223522, 1216987, 1223527 ], [ 1223523, 1228205, 1223528, 1228210 ], [ 1228206, 1236067, 1228211, 1236072 ], [ 1236068, 1236265, 1236073, 1236270 ], [ 1236266, 1239969, 1236271, 1239974 ], [ 1239970, 1240641, 1239975, 1240646 ], [ 1240642, 1244738, 1240647, 1244743 ], [ 1244739, 1244821, 1244744, 1244826 ], [ 1244822, 1272971, 1244827, 1272976 ], [ 1272972, 1276524, 1272977, 1276529 ], [ 1276525, 1290344, 1276530, 1290349 ], [ 1290345, 1292253, 1290350, 1292258 ], [ 1292254, 1293482, 1292259, 1293487 ], [ 1293483, 1295919, 1293488, 1295924 ], [ 1295920, 1302834, 1295925, 1302839 ], [ 1302835, 1303464, 1302840, 1303469 ], [ 1303465, 1309308, 1303470, 1309313 ], [ 1309309, 1311482, 1309314, 1311487 ], [ 1311483, 1312493, 1311488, 1312498 ], [ 1312494, 1316488, 1312499, 1316493 ], [ 1316489, 1318127, 1316494, 1318132 ], [ 1318128, 1325643, 1318133, 1325648 ], [ 1325644, 1328313, 1325649, 1328318 ], [ 1328314, 1345348, 1328319, 1345353 ], [ 1345349, 1347480, 1345354, 1347485 ], [ 1347481, 1348458, 1347486, 1348463 ], [ 1348459, 1350595, 1348464, 1350600 ], [ 1350596, 1350770, 1350601, 1350775 ], [ 1350771, 1351954, 1350776, 1351959 ], [ 1351955, 1356474, 1351960, 1356479 ], [ 1356475, 1362756, 1356480, 1362761 ], [ 1362757, 1368544, 1362762, 1368549 ], [ 1368545, 1377993, 1368550, 1377998 ], [ 1377994, 1379610, 1377999, 1379615 ], [ 1379611, 1391551, 1379616, 1391556 ], [ 1391552, 1395841, 1391557, 1395846 ], [ 1395842, 1401721, 1395847, 1401726 ], [ 1401722, 1406871, 1401727, 1406876 ], [ 1406872, 1411041, 1406877, 1411046 ], [ 1411042, 1417851, 1411047, 1417856 ], [ 1417852, 1419058, 1417857, 1419063 ], [ 1419059, 1428120, 1419064, 1428125 ], [ 1428121, 1428584, 1428126, 1428589 ], [ 1428585, 1430700, 1428590, 1430705 ], [ 1430701, 1438278, 1430706, 1438283 ], [ 1438279, 1443084, 1438284, 1443089 ], [ 1443085, 1444668, 1443090, 1444673 ], [ 1444669, 1444866, 1444674, 1444871 ], [ 1444867, 1444914, 1444872, 1444919 ], [ 1444915, 1445093, 1444920, 1445098 ], [ 1445094, 1446216, 1445099, 1446221 ], [ 1446217, 1448518, 1446222, 1448523 ], [ 1448519, 1452860, 1448524, 1452865 ], [ 1452861, 1454246, 1452866, 1454251 ], [ 1454247, 1455414, 1454252, 1455419 ], [ 1455415, 1460976, 1455420, 1460981 ], [ 1460977, 1461164, 1460982, 1461169 ], [ 1461165, 1463675, 1461170, 1463680 ], [ 1463676, 1465339, 1463681, 1465344 ], [ 1465340, 1469872, 1465345, 1469877 ], [ 1469873, 1471479, 1469878, 1471484 ], [ 1471480, 1472745, 1471485, 1472750 ], [ 1472746, 1479208, 1472751, 1479213 ], [ 1479209, 1480831, 1479214, 1480836 ], [ 1480832, 1485359, 1480837, 1485364 ], [ 1485360, 1485530, 1485365, 1485535 ], [ 1485531, 1486004, 1485536, 1486009 ], [ 1486005, 1487314, 1486010, 1487319 ], [ 1487315, 1491008, 1487320, 1491013 ], [ 1491009, 1492068, 1491014, 1492073 ], [ 1492069, 1493001, 1492074, 1493006 ], [ 1493002, 1495524, 1493007, 1495529 ], [ 1495525, 1498599, 1495530, 1498604 ], [ 1498600, 1499384, 1498605, 1499389 ], [ 1499385, 1500494, 1499390, 1500499 ], [ 1500495, 1504828, 1500500, 1504833 ], [ 1504829, 1509790, 1504834, 1509795 ], [ 1509791, 1512050, 1509796, 1512055 ], [ 1512051, 1514922, 1512056, 1514927 ], [ 1514923, 1515140, 1514928, 1515145 ], [ 1515141, 1515194, 1515146, 1515199 ], [ 1515195, 1515647, 1515200, 1515652 ], [ 1515648, 1516602, 1515653, 1516607 ], [ 1516603, 1517689, 1516608, 1517694 ], [ 1517690, 1519324, 1517695, 1519329 ], [ 1519325, 1524288, 1519330, 1524293 ], [ 1524289, 1524809, 1524294, 1524814 ], [ 1524810, 1525934, 1524815, 1525939 ], [ 1525935, 1526325, 1525940, 1526330 ], [ 1526326, 1527046, 1526331, 1527051 ], [ 1527047, 1528800, 1527052, 1528805 ], [ 1528801, 1529067, 1528806, 1529072 ], [ 1529068, 1529127, 1529073, 1529132 ], [ 1529128, 1536262, 1529133, 1536267 ], [ 1536263, 1543858, 1536268, 1543863 ], [ 1543859, 1554015, 1543864, 1554020 ], [ 1554016, 1555315, 1554021, 1555320 ], [ 1555316, 1558476, 1555321, 1558481 ], [ 1558477, 1560403, 1558482, 1560408 ], [ 1560404, 1564152, 1560409, 1564157 ], [ 1564153, 1565868, 1564158, 1565873 ], [ 1565869, 1566075, 1565874, 1566080 ], [ 1566076, 1572715, 1566081, 1572720 ], [ 1572716, 1575566, 1572721, 1575571 ], [ 1575567, 1575840, 1575572, 1575845 ], [ 1575841, 1575957, 1575846, 1575962 ], [ 1575958, 1578588, 1575963, 1578593 ], [ 1578589, 1587557, 1578594, 1587562 ], [ 1587558, 1588891, 1587563, 1588896 ], [ 1588892, 1597227, 1588897, 1597232 ], [ 1597228, 1597262, 1597233, 1597267 ], [ 1597263, 1606974, 1597268, 1606979 ], [ 1606975, 1613512, 1606980, 1613517 ], [ 1613513, 1613900, 1613518, 1613905 ], [ 1613901, 1614931, 1613906, 1614936 ], [ 1614932, 1620971, 1614937, 1620976 ], [ 1620972, 1625931, 1620977, 1625936 ], [ 1625932, 1635578, 1625937, 1635583 ], [ 1635579, 1636949, 1635584, 1636954 ], [ 1636950, 1642076, 1636955, 1642081 ], [ 1642077, 1643227, 1642082, 1643232 ], [ 1643228, 1643451, 1643233, 1643456 ], [ 1643452, 1643568, 1643457, 1643573 ], [ 1643569, 1651406, 1643574, 1651411 ], [ 1651407, 1651474, 1651412, 1651479 ], [ 1651475, 1660688, 1651480, 1660693 ], [ 1660689, 1665846, 1660694, 1665851 ], [ 1665847, 1667026, 1665852, 1667031 ], [ 1667027, 1675465, 1667032, 1675470 ], [ 1675466, 1679164, 1675471, 1679169 ], [ 1679165, 1681962, 1679170, 1681967 ], [ 1681963, 1688016, 1681968, 1688021 ], [ 1688017, 1690659, 1688022, 1690664 ], [ 1690660, 1692872, 1690665, 1692877 ], [ 1692873, 1697102, 1692878, 1697107 ], [ 1697103, 1698132, 1697108, 1698137 ], [ 1698133, 1703429, 1698138, 1703434 ], [ 1703430, 1706057, 1703435, 1706062 ], [ 1706058, 1708683, 1706063, 1708688 ], [ 1708684, 1720884, 1708689, 1720889 ], [ 1720885, 1721218, 1720890, 1721223 ], [ 1721219, 1725289, 1721224, 1725294 ], [ 1725290, 1726495, 1725295, 1726500 ], [ 1726496, 1728646, 1726501, 1728651 ], [ 1728647, 1729060, 1728652, 1729065 ], [ 1729061, 1732801, 1729066, 1732806 ], [ 1732802, 1733308, 1732807, 1733313 ], [ 1733309, 1734471, 1733314, 1734476 ], [ 1734472, 1740942, 1734477, 1740947 ], [ 1740943, 1744762, 1740948, 1744767 ], [ 1744763, 1746379, 1744768, 1746384 ], [ 1746380, 1747144, 1746385, 1747149 ], [ 1747145, 1753062, 1747150, 1753067 ], [ 1753063, 1754367, 1753068, 1754372 ], [ 1754368, 1763444, 1754373, 1763449 ], [ 1763445, 1777420, 1763450, 1777425 ], [ 1777421, 1782626, 1777426, 1782631 ], [ 1782627, 1784342, 1782632, 1784347 ], [ 1784343, 1784549, 1784348, 1784554 ], [ 1784550, 1791189, 1784555, 1791194 ], [ 1791190, 1793878, 1791195, 1793883 ], [ 1793879, 1794152, 1793884, 1794157 ], [ 1794153, 1794269, 1794158, 1794274 ], [ 1794270, 1794972, 1794275, 1794977 ], [ 1794973, 1796163, 1794978, 1796168 ], [ 1796164, 1802296, 1796169, 1802301 ], [ 1802297, 1805729, 1802302, 1805734 ], [ 1805730, 1806305, 1805735, 1806310 ], [ 1806306, 1810512, 1806311, 1810517 ], [ 1810513, 1816402, 1810518, 1816407 ], [ 1816403, 1826227, 1816408, 1826232 ], [ 1826228, 1826701, 1826233, 1826706 ], [ 1826702, 1827720, 1826707, 1827725 ], [ 1827721, 1836707, 1827726, 1836712 ], [ 1836708, 1836926, 1836713, 1836931 ], [ 1836927, 1838667, 1836932, 1838672 ], [ 1838668, 1843220, 1838673, 1843225 ], [ 1843221, 1843829, 1843226, 1843834 ], [ 1843830, 1846577, 1843835, 1846582 ], [ 1846578, 1849125, 1846583, 1849130 ], [ 1849126, 1850237, 1849131, 1850242 ], [ 1850238, 1851708, 1850243, 1851713 ], [ 1851709, 1853436, 1851714, 1853441 ], [ 1853437, 1853475, 1853442, 1853480 ], [ 1853476, 1853493, 1853481, 1853498 ], [ 1853494, 1854900, 1853499, 1854905 ], [ 1854901, 1861797, 1854906, 1861802 ], [ 1861798, 1862267, 1861803, 1862272 ], [ 1862268, 1866445, 1862273, 1866450 ], [ 1866446, 1866700, 1866451, 1866705 ], [ 1866701, 1870143, 1866706, 1870148 ], [ 1870144, 1870675, 1870149, 1870680 ], [ 1870676, 1881704, 1870681, 1881709 ], [ 1881705, 1882659, 1881710, 1882664 ], [ 1882660, 1884008, 1882665, 1884013 ], [ 1884009, 1885076, 1884014, 1885081 ], [ 1885077, 1897857, 1885082, 1897862 ], [ 1897858, 1931549, 1897863, 1931554 ], [ 1931550, 1931660, 1931555, 1931665 ], [ 1931661, 1936680, 1931666, 1936685 ], [ 1936681, 1938835, 1936686, 1938840 ], [ 1938836, 1939367, 1938841, 1939372 ], [ 1939368, 1944718, 1939373, 1944723 ], [ 1944719, 1949924, 1944724, 1949929 ], [ 1949925, 1951640, 1949930, 1951645 ], [ 1951641, 1951847, 1951646, 1951852 ], [ 1951848, 1958495, 1951853, 1958500 ], [ 1958496, 1961184, 1958501, 1961189 ], [ 1961185, 1961458, 1961190, 1961463 ], [ 1961459, 1963223, 1961464, 1963228 ], [ 1963224, 1964535, 1963229, 1964540 ], [ 1964536, 1964578, 1964541, 1964583 ], [ 1964579, 1965726, 1964584, 1965731 ], [ 1965727, 1975723, 1965732, 1975728 ], [ 1975724, 1983495, 1975729, 1983500 ], [ 1983496, 1989041, 1983501, 1989046 ], [ 1989042, 1991939, 1989047, 1991944 ], [ 1991940, 1994134, 1991945, 1994139 ], [ 1994135, 2006390, 1994140, 2006395 ], [ 2006391, 2006681, 2006396, 2006686 ], [ 2006682, 2012753, 2006687, 2012758 ], [ 2012754, 2020299, 2012759, 2020304 ], [ 2020300, 2021594, 2020305, 2021599 ], [ 2021595, 2035653, 2021600, 2035658 ], [ 2035654, 2043961, 2035659, 2043966 ], [ 2043962, 2044411, 2043967, 2044416 ], [ 2044412, 2045320, 2044417, 2045325 ], [ 2045321, 2046593, 2045326, 2046598 ], [ 2046594, 2058014, 2046599, 2058019 ], [ 2058015, 2058262, 2058020, 2058267 ], [ 2058263, 2061616, 2058268, 2061621 ], [ 2061617, 2067334, 2061622, 2067339 ], [ 2067335, 2069059, 2067340, 2069064 ], [ 2069060, 2073142, 2069065, 2073147 ], [ 2073143, 2074555, 2073148, 2074560 ], [ 2074556, 2074634, 2074561, 2074639 ], [ 2074635, 2076422, 2074640, 2076427 ], [ 2076423, 2081937, 2076428, 2081942 ], [ 2081938, 2082042, 2081943, 2082047 ], [ 2082043, 2082408, 2082048, 2082413 ], [ 2082409, 2094661, 2082414, 2094666 ], [ 2094662, 2105556, 2094667, 2105561 ], [ 2105557, 2106153, 2105562, 2106158 ], [ 2106154, 2113282, 2106159, 2113287 ], [ 2113283, 2114197, 2113288, 2114202 ], [ 2114198, 2124245, 2114203, 2124250 ], [ 2124246, 2126629, 2124251, 2126634 ], [ 2126630, 2127367, 2126635, 2127372 ], [ 2127368, 2131854, 2127373, 2131859 ], [ 2131855, 2138481, 2131860, 2138486 ], [ 2138482, 2140084, 2138487, 2140089 ], [ 2140085, 2151397, 2140090, 2151402 ], [ 2151398, 2154116, 2151403, 2154121 ], [ 2154117, 2164531, 2154122, 2164536 ], [ 2164532, 2164999, 2164537, 2165004 ], [ 2165000, 2166190, 2165005, 2166195 ], [ 2166191, 2168535, 2166196, 2168540 ], [ 2168536, 2168652, 2168541, 2168657 ], [ 2168653, 2168876, 2168658, 2168881 ], [ 2168877, 2175197, 2168882, 2175202 ], [ 2175198, 2176568, 2175203, 2176573 ], [ 2176569, 2185419, 2176574, 2185424 ], [ 2185420, 2198074, 2185425, 2198079 ], [ 2198075, 2205716, 2198080, 2205721 ], [ 2205717, 2206482, 2205722, 2206487 ], [ 2206483, 2214819, 2206488, 2214824 ], [ 2214820, 2215255, 2214825, 2215260 ], [ 2215256, 2216910, 2215261, 2216915 ], [ 2216911, 2219477, 2216916, 2219482 ], [ 2219478, 2219751, 2219483, 2219756 ], [ 2219752, 2222602, 2219757, 2222607 ], [ 2222603, 2224016, 2222608, 2224021 ], [ 2224017, 2229253, 2224022, 2229258 ], [ 2229254, 2229460, 2229259, 2229465 ], [ 2229461, 2231176, 2229466, 2231181 ], [ 2231177, 2236382, 2231182, 2236387 ], [ 2236383, 2245581, 2236388, 2245586 ], [ 2245582, 2245719, 2245587, 2245724 ], [ 2245720, 2245761, 2245725, 2245766 ], [ 2245762, 2249902, 2245767, 2249907 ], [ 2249903, 2254722, 2249908, 2254727 ], [ 2254723, 2262668, 2254728, 2262673 ], [ 2262669, 2276333, 2262674, 2276338 ], [ 2276334, 2278349, 2276339, 2278354 ], [ 2278350, 2278595, 2278355, 2278600 ], [ 2278596, 2282039, 2278601, 2282044 ], [ 2282040, 2309292, 2282045, 2309297 ], [ 2309293, 2309737, 2309298, 2309742 ], [ 2309738, 2314845, 2309743, 2314850 ], [ 2314846, 2315016, 2314851, 2315021 ], [ 2315017, 2320047, 2315022, 2320052 ], [ 2320048, 2320645, 2320053, 2320650 ], [ 2320646, 2330437, 2320651, 2330442 ], [ 2330438, 2338082, 2330443, 2338087 ], [ 2338083, 2345465, 2338088, 2345470 ], [ 2345466, 2347233, 2345471, 2347238 ], [ 2347234, 2348720, 2347239, 2348725 ], [ 2348721, 2351324, 2348726, 2351329 ], [ 2351325, 2352448, 2351330, 2352453 ], [ 2352449, 2353999, 2352454, 2354004 ], [ 2354000, 2359046, 2354005, 2359051 ], [ 2359047, 2361149, 2359052, 2361154 ], [ 2361150, 2374039, 2361155, 2374044 ], [ 2374040, 2385349, 2374045, 2385354 ], [ 2385350, 2388585, 2385355, 2388590 ], [ 2388586, 2391734, 2388591, 2391739 ], [ 2391735, 2392141, 2391740, 2392146 ], [ 2392142, 2393939, 2392147, 2393944 ], [ 2393940, 2395026, 2393945, 2395031 ], [ 2395027, 2395860, 2395032, 2395865 ], [ 2395861, 2398211, 2395866, 2398216 ], [ 2398212, 2398326, 2398217, 2398331 ], [ 2398327, 2402303, 2398332, 2402308 ], [ 2402304, 2408154, 2402309, 2408159 ], [ 2408155, 2409936, 2408160, 2409941 ], [ 2409937, 2410353, 2409942, 2410358 ], [ 2410354, 2411021, 2410359, 2411026 ], [ 2411022, 2419571, 2411027, 2419576 ], [ 2419572, 2424488, 2419577, 2424493 ], [ 2424489, 2427895, 2424494, 2427900 ], [ 2427896, 2433794, 2427901, 2433799 ], [ 2433795, 2434280, 2433800, 2434285 ], [ 2434281, 2436129, 2434286, 2436134 ], [ 2436130, 2446339, 2436135, 2446344 ], [ 2446340, 2446355, 2446345, 2446360 ], [ 2446356, 2447550, 2446361, 2447555 ], [ 2447551, 2456375, 2447556, 2456380 ], [ 2456376, 2459685, 2456381, 2459690 ], [ 2459686, 2467707, 2459691, 2467712 ], [ 2467708, 2489626, 2467713, 2489631 ], [ 2489627, 2490030, 2489632, 2490035 ], [ 2490031, 2494181, 2490036, 2494186 ], [ 2494182, 2494578, 2494187, 2494583 ], [ 2494579, 2498330, 2494584, 2498335 ], [ 2498331, 2501619, 2498336, 2501624 ], [ 2501620, 2502774, 2501625, 2502779 ], [ 2502775, 2505440, 2502780, 2505445 ], [ 2505441, 2507840, 2505446, 2507845 ], [ 2507841, 2513953, 2507846, 2513958 ], [ 2513954, 2518482, 2513959, 2518487 ], [ 2518483, 2518510, 2518488, 2518515 ], [ 2518511, 2519154, 2518516, 2519159 ], [ 2519155, 2521663, 2519160, 2521668 ], [ 2521664, 2522690, 2521669, 2522695 ], [ 2522691, 2535156, 2522696, 2535161 ], [ 2535157, 2536302, 2535162, 2536307 ], [ 2536303, 2539683, 2536308, 2539688 ], [ 2539684, 2540838, 2539689, 2540843 ], [ 2540839, 2542542, 2540844, 2542547 ], [ 2542543, 2549711, 2542548, 2549716 ], [ 2549712, 2549979, 2549717, 2549984 ], [ 2549980, 2550376, 2549985, 2550381 ], [ 2550377, 2550442, 2550382, 2550447 ], [ 2550443, 2552498, 2550448, 2552503 ], [ 2552499, 2556237, 2552504, 2556242 ], [ 2556238, 2561281, 2556243, 2561286 ], [ 2561282, 2562381, 2561287, 2562386 ], [ 2562382, 2571576, 2562387, 2571581 ], [ 2571577, 2573918, 2571582, 2573923 ], [ 2573919, 2575854, 2573924, 2575859 ], [ 2575855, 2579045, 2575860, 2579050 ], [ 2579046, 2588728, 2579051, 2588733 ], [ 2588729, 2591930, 2588734, 2591935 ], [ 2591931, 2601304, 2591936, 2601309 ], [ 2601305, 2614812, 2601310, 2614817 ], [ 2614813, 2614839, 2614818, 2614844 ], [ 2614840, 2622328, 2614845, 2622333 ], [ 2622329, 2627903, 2622334, 2627908 ], [ 2627904, 2648431, 2627909, 2648436 ], [ 2648432, 2651846, 2648437, 2651851 ], [ 2651847, 2660586, 2651852, 2660591 ], [ 2660587, 2663434, 2660592, 2663439 ], [ 2663435, 2674481, 2663440, 2674486 ], [ 2674482, 2674949, 2674487, 2674954 ], [ 2674950, 2676096, 2674955, 2676101 ], [ 2676097, 2676139, 2676102, 2676144 ], [ 2676140, 2678485, 2676145, 2678490 ], [ 2678486, 2678602, 2678491, 2678607 ], [ 2678603, 2678826, 2678608, 2678831 ], [ 2678827, 2681278, 2678832, 2681283 ], [ 2681279, 2684926, 2681284, 2684931 ], [ 2684927, 2685205, 2684932, 2685210 ], [ 2685206, 2692493, 2685211, 2692498 ], [ 2692494, 2714956, 2692499, 2714961 ], [ 2714957, 2716135, 2714962, 2716140 ], [ 2716136, 2716930, 2716141, 2716935 ], [ 2716931, 2717350, 2716936, 2717355 ], [ 2717351, 2717774, 2717356, 2717779 ], [ 2717775, 2718590, 2717780, 2718595 ], [ 2718591, 2720379, 2718596, 2720384 ], [ 2720380, 2723960, 2720385, 2723965 ], [ 2723961, 2725658, 2723966, 2725663 ], [ 2725659, 2731980, 2725664, 2731985 ], [ 2731981, 2738172, 2731986, 2738177 ], [ 2738173, 2748433, 2738178, 2748438 ], [ 2748434, 2748596, 2748439, 2748601 ], [ 2748597, 2749729, 2748602, 2749734 ], [ 2749730, 2752944, 2749735, 2752949 ], [ 2752945, 2753953, 2752950, 2753958 ], [ 2753954, 2760599, 2753959, 2760604 ], [ 2760600, 2761236, 2760605, 2761241 ], [ 2761237, 2763346, 2761242, 2763351 ], [ 2763347, 2764218, 2763352, 2764223 ], [ 2764219, 2773348, 2764224, 2773353 ], [ 2773349, 2779707, 2773354, 2779712 ], [ 2779708, 2792111, 2779713, 2792116 ], [ 2792112, 2794418, 2792117, 2794423 ], [ 2794419, 2795818, 2794424, 2795823 ], [ 2795819, 2796261, 2795824, 2796266 ], [ 2796262, 2798929, 2796267, 2798934 ], [ 2798930, 2799454, 2798935, 2799459 ], [ 2799455, 2813416, 2799460, 2813421 ], [ 2813417, 2813479, 2813422, 2813484 ], [ 2813480, 2814107, 2813485, 2814112 ], [ 2814108, 2825357, 2814113, 2825362 ], [ 2825358, 2826820, 2825363, 2826825 ], [ 2826821, 2828096, 2826826, 2828101 ], [ 2828097, 2830088, 2828102, 2830093 ], [ 2830089, 2834601, 2830094, 2834606 ], [ 2834602, 2836621, 2834607, 2836626 ], [ 2836622, 2836884, 2836627, 2836889 ], [ 2836885, 2837913, 2836890, 2837918 ], [ 2837914, 2840393, 2837919, 2840398 ], [ 2840394, 2843391, 2840399, 2843396 ], [ 2843392, 2843629, 2843397, 2843634 ], [ 2843630, 2844665, 2843635, 2844670 ], [ 2844666, 2847821, 2844671, 2847826 ], [ 2847822, 2849601, 2847827, 2849606 ], [ 2849602, 2853189, 2849607, 2853194 ], [ 2853190, 2860428, 2853195, 2860433 ], [ 2860429, 2862152, 2860434, 2862157 ], [ 2862153, 2862729, 2862158, 2862734 ], [ 2862730, 2869033, 2862735, 2869038 ], [ 2869034, 2869157, 2869039, 2869162 ], [ 2869158, 2882082, 2869163, 2882087 ], [ 2882083, 2894091, 2882088, 2894096 ], [ 2894092, 2895090, 2894097, 2895095 ], [ 2895091, 2900119, 2895096, 2900124 ], [ 2900120, 2900555, 2900125, 2900560 ], [ 2900556, 2902167, 2900561, 2902172 ], [ 2902168, 2902210, 2902173, 2902215 ], [ 2902211, 2904556, 2902216, 2904561 ], [ 2904557, 2904673, 2904562, 2904678 ], [ 2904674, 2904897, 2904679, 2904902 ], [ 2904898, 2907797, 2904903, 2907802 ], [ 2907798, 2910456, 2907803, 2910461 ], [ 2910457, 2911608, 2910462, 2911613 ], [ 2911609, 2914744, 2911614, 2914749 ], [ 2914745, 2914779, 2914750, 2914784 ], [ 2914780, 2918124, 2914785, 2918129 ], [ 2918125, 2921020, 2918130, 2921025 ], [ 2921021, 2921458, 2921026, 2921463 ], [ 2921459, 2926580, 2921464, 2926585 ], [ 2926581, 2930570, 2926586, 2930575 ], [ 2930571, 2934873, 2930576, 2934878 ], [ 2934874, 2942883, 2934879, 2942888 ], [ 2942884, 2950823, 2942889, 2950828 ], [ 2950824, 2952204, 2950829, 2952209 ], [ 2952205, 2954108, 2952210, 2954113 ], [ 2954109, 2958788, 2954114, 2958793 ], [ 2958789, 2962338, 2958794, 2962343 ], [ 2962339, 2962634, 2962344, 2962639 ], [ 2962635, 2963567, 2962640, 2963572 ], [ 2963568, 2965512, 2963573, 2965517 ], [ 2965513, 2965715, 2965518, 2965720 ], [ 2965716, 2969537, 2965721, 2969542 ], [ 2969538, 2969667, 2969543, 2969672 ], [ 2969668, 2971097, 2969673, 2971102 ], [ 2971098, 2971329, 2971103, 2971334 ], [ 2971330, 2971874, 2971335, 2971879 ], [ 2971875, 2972098, 2971880, 2972103 ], [ 2972099, 2978342, 2972104, 2978347 ], [ 2978343, 2984060, 2978348, 2984065 ], [ 2984061, 2988924, 2984066, 2988929 ], [ 2988925, 2994739, 2988930, 2994744 ], [ 2994740, 3002088, 2994745, 3002093 ], [ 3002089, 3009887, 3002094, 3009892 ], [ 3009888, 3014827, 3009893, 3014832 ], [ 3014828, 3020885, 3014833, 3020890 ], [ 3020886, 3022261, 3020891, 3022266 ], [ 3022262, 3029543, 3022267, 3029548 ], [ 3029544, 3030265, 3029549, 3030270 ], [ 3030266, 3032363, 3030271, 3032368 ], [ 3032364, 3033161, 3032369, 3033166 ], [ 3033162, 3042175, 3033167, 3042180 ], [ 3042176, 3042389, 3042181, 3042394 ], [ 3042390, 3049663, 3042395, 3049668 ], [ 3049664, 3050210, 3049669, 3050215 ], [ 3050211, 3051389, 3050216, 3051394 ], [ 3051390, 3052128, 3051395, 3052133 ], [ 3052129, 3052883, 3052134, 3052888 ], [ 3052884, 3054679, 3052889, 3054684 ], [ 3054680, 3055955, 3054685, 3055960 ], [ 3055956, 3056024, 3055961, 3056029 ], [ 3056025, 3062859, 3056030, 3062864 ], [ 3062860, 3063276, 3062865, 3063281 ], [ 3063277, 3064101, 3063282, 3064106 ], [ 3064102, 3065575, 3064107, 3065580 ], [ 3065576, 3065710, 3065581, 3065715 ], [ 3065711, 3066590, 3065716, 3066595 ], [ 3066591, 3075292, 3066596, 3075297 ], [ 3075293, 3076853, 3075298, 3076858 ], [ 3076854, 3080566, 3076859, 3080571 ], [ 3080567, 3080707, 3080572, 3080712 ], [ 3080708, 3080747, 3080713, 3080752 ], [ 3080748, 3080916, 3080753, 3080921 ], [ 3080917, 3082200, 3080922, 3082205 ], [ 3082201, 3085986, 3082206, 3085991 ], [ 3085987, 3086111, 3085992, 3086116 ], [ 3086112, 3088509, 3086117, 3088514 ], [ 3088510, 3091108, 3088515, 3091113 ], [ 3091109, 3094298, 3091114, 3094303 ], [ 3094299, 3098254, 3094304, 3098259 ], [ 3098255, 3100958, 3098260, 3100963 ], [ 3100959, 3101111, 3100964, 3101116 ], [ 3101112, 3101710, 3101117, 3101715 ], [ 3101711, 3107070, 3101716, 3107075 ], [ 3107071, 3108550, 3107076, 3108555 ], [ 3108551, 3116752, 3108556, 3116757 ], [ 3116753, 3119656, 3116758, 3119661 ], [ 3119657, 3120436, 3119662, 3120441 ], [ 3120437, 3122936, 3120442, 3122941 ], [ 3122937, 3125746, 3122942, 3125751 ], [ 3125747, 3126385, 3125752, 3126390 ], [ 3126386, 3129525, 3126391, 3129530 ], [ 3129526, 3129570, 3129531, 3129575 ], [ 3129571, 3129752, 3129576, 3129757 ], [ 3129753, 3132349, 3129758, 3132354 ], [ 3132350, 3139904, 3132355, 3139909 ], [ 3139905, 3143832, 3139910, 3143837 ], [ 3143833, 3144573, 3143838, 3144578 ], [ 3144574, 3148252, 3144579, 3148257 ], [ 3148253, 3150053, 3148258, 3150058 ], [ 3150054, 3152136, 3150059, 3152141 ], [ 3152137, 3161578, 3152142, 3161583 ], [ 3161579, 3164826, 3161584, 3164831 ], [ 3164827, 3164943, 3164832, 3164948 ], [ 3164944, 3167563, 3164949, 3167568 ], [ 3167564, 3168258, 3167569, 3168263 ], [ 3168259, 3170295, 3168264, 3170300 ], [ 3170296, 3172533, 3170301, 3172538 ], [ 3172534, 3176792, 3172539, 3176797 ], [ 3176793, 3177637, 3176798, 3177642 ], [ 3177638, 3178306, 3177643, 3178311 ], [ 3178307, 3186857, 3178312, 3186862 ], [ 3186858, 3196386, 3186863, 3196391 ], [ 3196387, 3197840, 3196392, 3197845 ], [ 3197841, 3210828, 3197846, 3210833 ], [ 3210829, 3212953, 3210834, 3212958 ], [ 3212954, 3226292, 3212959, 3226297 ], [ 3226293, 3226762, 3226298, 3226767 ], [ 3226763, 3230485, 3226768, 3230490 ], [ 3230486, 3233743, 3230491, 3233748 ], [ 3233744, 3234567, 3233749, 3234572 ], [ 3234568, 3235194, 3234573, 3235199 ], [ 3235195, 3235499, 3235200, 3235504 ], [ 3235500, 3243889, 3235505, 3243894 ], [ 3243890, 3259251, 3243895, 3259256 ], [ 3259252, 3262432, 3259257, 3262437 ], [ 3262433, 3262672, 3262438, 3262677 ], [ 3262673, 3266686, 3262678, 3266691 ], [ 3266687, 3269046, 3266692, 3269051 ], [ 3269047, 3269620, 3269052, 3269625 ], [ 3269621, 3272036, 3269626, 3272041 ], [ 3272037, 3272516, 3272042, 3272521 ], [ 3272517, 3272796, 3272522, 3272801 ], [ 3272797, 3278978, 3272802, 3278983 ], [ 3278979, 3279467, 3278984, 3279472 ], [ 3279468, 3280283, 3279473, 3280288 ], [ 3280284, 3282531, 3280289, 3282536 ], [ 3282532, 3286305, 3282537, 3286310 ], [ 3286306, 3286606, 3286311, 3286611 ], [ 3286607, 3287302, 3286612, 3287307 ], [ 3287303, 3290776, 3287308, 3290781 ], [ 3290777, 3294305, 3290782, 3294310 ], [ 3294306, 3298010, 3294311, 3298015 ], [ 3298011, 3298799, 3298016, 3298804 ], [ 3298800, 3299744, 3298805, 3299749 ], [ 3299745, 3302402, 3299750, 3302407 ], [ 3302403, 3314750, 3302408, 3314755 ], [ 3314751, 3314777, 3314756, 3314782 ], [ 3314778, 3315543, 3314783, 3315548 ], [ 3315544, 3319495, 3315549, 3319500 ], [ 3319496, 3321744, 3319501, 3321749 ], [ 3321745, 3324685, 3321750, 3324690 ], [ 3324686, 3331241, 3324691, 3331246 ], [ 3331242, 3345624, 3331247, 3345629 ], [ 3345625, 3349729, 3345630, 3349734 ], [ 3349730, 3350579, 3349735, 3350584 ], [ 3350580, 3353994, 3350585, 3353999 ], [ 3353995, 3355238, 3354000, 3355243 ], [ 3355239, 3360120, 3355244, 3360125 ], [ 3360121, 3368828, 3360126, 3368833 ], [ 3368829, 3376335, 3368834, 3376340 ], [ 3376336, 3380722, 3376341, 3380727 ], [ 3380723, 3381309, 3380728, 3381314 ], [ 3381310, 3382080, 3381315, 3382085 ], [ 3382081, 3384674, 3382086, 3384679 ], [ 3384675, 3385188, 3384680, 3385193 ], [ 3385189, 3388713, 3385194, 3388718 ], [ 3388714, 3392079, 3388719, 3392084 ], [ 3392080, 3394131, 3392085, 3394136 ], [ 3394132, 3394381, 3394137, 3394386 ], [ 3394382, 3395362, 3394387, 3395367 ], [ 3395363, 3396615, 3395368, 3396620 ], [ 3396616, 3399453, 3396621, 3399458 ], [ 3399454, 3400585, 3399459, 3400590 ], [ 3400586, 3405494, 3400591, 3405499 ], [ 3405495, 3413040, 3405500, 3413045 ], [ 3413041, 3413955, 3413046, 3413960 ], [ 3413956, 3413965, 3413961, 3413970 ], [ 3413966, 3415450, 3413971, 3415455 ], [ 3415451, 3415543, 3415456, 3415548 ], [ 3415544, 3415693, 3415549, 3415698 ], [ 3415694, 3416866, 3415699, 3416871 ], [ 3416867, 3421885, 3416872, 3421890 ], [ 3421886, 3425142, 3421891, 3425147 ], [ 3425143, 3437815, 3425148, 3437820 ], [ 3437816, 3440231, 3437821, 3440236 ], [ 3440232, 3442385, 3440237, 3442390 ], [ 3442386, 3449766, 3442391, 3449771 ], [ 3449767, 3455799, 3449772, 3455804 ], [ 3455800, 3473047, 3455805, 3473052 ], [ 3473048, 3483375, 3473053, 3483380 ], [ 3483376, 3505500, 3483381, 3505505 ], [ 3505501, 3514082, 3505506, 3514087 ], [ 3514083, 3522092, 3514088, 3522097 ], [ 3522093, 3522111, 3522098, 3522116 ], [ 3522112, 3522767, 3522117, 3522772 ], [ 3522768, 3531627, 3522773, 3531632 ], [ 3531628, 3537780, 3531633, 3537785 ], [ 3537781, 3540644, 3537786, 3540649 ], [ 3540645, 3543017, 3540650, 3543022 ], [ 3543018, 3543842, 3543023, 3543847 ], [ 3543843, 3545546, 3543848, 3545551 ], [ 3545547, 3547077, 3545552, 3547082 ], [ 3547078, 3547149, 3547083, 3547154 ], [ 3547150, 3548533, 3547155, 3548538 ], [ 3548534, 3549829, 3548539, 3549834 ], [ 3549830, 3549935, 3549835, 3549940 ], [ 3549936, 3550203, 3549941, 3550208 ], [ 3550204, 3550227, 3550209, 3550232 ], [ 3550228, 3551480, 3550233, 3551485 ], [ 3551481, 3552210, 3551486, 3552215 ], [ 3552211, 3562847, 3552216, 3562852 ], [ 3562848, 3563068, 3562853, 3563073 ], [ 3563069, 3563973, 3563074, 3563978 ], [ 3563974, 3565130, 3563979, 3565135 ], [ 3565131, 3565565, 3565136, 3565570 ], [ 3565566, 3566132, 3565571, 3566137 ], [ 3566133, 3567018, 3566138, 3567023 ], [ 3567019, 3567627, 3567024, 3567632 ], [ 3567628, 3570838, 3567633, 3570843 ], [ 3570839, 3574124, 3570844, 3574129 ], [ 3574125, 3575098, 3574130, 3575103 ], [ 3575099, 3575586, 3575104, 3575591 ], [ 3575587, 3576602, 3575592, 3576607 ], [ 3576603, 3590383, 3576608, 3590388 ], [ 3590384, 3594697, 3590389, 3594702 ], [ 3594698, 3597307, 3594703, 3597312 ], [ 3597308, 3598837, 3597313, 3598842 ], [ 3598838, 3599839, 3598843, 3599844 ], [ 3599840, 3603578, 3599845, 3603583 ], [ 3603579, 3609322, 3603584, 3609327 ], [ 3609323, 3614549, 3609328, 3614554 ], [ 3614550, 3618664, 3614555, 3618669 ], [ 3618665, 3619899, 3618670, 3619904 ], [ 3619900, 3624824, 3619905, 3624829 ], [ 3624825, 3628159, 3624830, 3628164 ], [ 3628160, 3628423, 3628165, 3628428 ], [ 3628424, 3630285, 3628429, 3630290 ], [ 3630286, 3630486, 3630291, 3630491 ], [ 3630487, 3634237, 3630492, 3634242 ], [ 3634238, 3639209, 3634243, 3639214 ], [ 3639210, 3652188, 3639215, 3652193 ], [ 3652189, 3659027, 3652194, 3659032 ], [ 3659028, 3659079, 3659033, 3659084 ], [ 3659080, 3659207, 3659085, 3659212 ], [ 3659208, 3667507, 3659213, 3667512 ], [ 3667508, 3667741, 3667513, 3667746 ], [ 3667742, 3668429, 3667747, 3668434 ], [ 3668430, 3670470, 3668435, 3670475 ], [ 3670471, 3673148, 3670476, 3673153 ], [ 3673149, 3674083, 3673154, 3674088 ], [ 3674084, 3678214, 3674089, 3678219 ], [ 3678215, 3680677, 3678220, 3680682 ], [ 3680678, 3684205, 3680683, 3684210 ], [ 3684206, 3690753, 3684211, 3690758 ], [ 3690754, 3698350, 3690759, 3698355 ], [ 3698351, 3699287, 3698356, 3699292 ], [ 3699288, 3702298, 3699293, 3702303 ], [ 3702299, 3706224, 3702304, 3706229 ], [ 3706225, 3706948, 3706230, 3706953 ], [ 3706949, 3709050, 3706954, 3709055 ], [ 3709051, 3725109, 3709056, 3725114 ], [ 3725110, 3731514, 3725115, 3731519 ], [ 3731515, 3736239, 3731520, 3736244 ], [ 3736240, 3736361, 3736245, 3736366 ], [ 3736362, 3738698, 3736367, 3738703 ], [ 3738699, 3745350, 3738704, 3745355 ], [ 3745351, 3747516, 3745356, 3747521 ], [ 3747517, 3748248, 3747522, 3748253 ], [ 3748249, 3750382, 3748254, 3750387 ], [ 3750383, 3761090, 3750388, 3761095 ], [ 3761091, 3762084, 3761096, 3762089 ], [ 3762085, 3762616, 3762090, 3762621 ], [ 3762617, 3769672, 3762622, 3769677 ], [ 3769673, 3769809, 3769678, 3769814 ], [ 3769810, 3769962, 3769815, 3769967 ], [ 3769963, 3774893, 3769968, 3774898 ], [ 3774894, 3776322, 3774899, 3776327 ], [ 3776323, 3778089, 3776328, 3778094 ], [ 3778090, 3784214, 3778095, 3784219 ], [ 3784215, 3799015, 3784220, 3799020 ], [ 3799016, 3806065, 3799021, 3806070 ], [ 3806066, 3807768, 3806071, 3807773 ], [ 3807769, 3816682, 3807774, 3816687 ], [ 3816683, 3817044, 3816688, 3817049 ], [ 3817045, 3817768, 3817050, 3817773 ], [ 3817769, 3822167, 3817774, 3822172 ], [ 3822168, 3824032, 3822173, 3824037 ], [ 3824033, 3828542, 3824038, 3828547 ], [ 3828543, 3832118, 3828548, 3832123 ], [ 3832119, 3832453, 3832124, 3832458 ], [ 3832454, 3837573, 3832459, 3837578 ], [ 3837574, 3842658, 3837579, 3842663 ], [ 3842659, 3844599, 3842664, 3844604 ], [ 3844600, 3844788, 3844605, 3844793 ], [ 3844789, 3850826, 3844794, 3850831 ], [ 3850827, 3855395, 3850832, 3855400 ], [ 3855396, 3860530, 3855401, 3860535 ], [ 3860531, 3873056, 3860536, 3873061 ], [ 3873057, 3880197, 3873062, 3880202 ], [ 3880198, 3882820, 3880203, 3882825 ], [ 3882821, 3883310, 3882826, 3883315 ], [ 3883311, 3885744, 3883316, 3885749 ], [ 3885745, 3891218, 3885750, 3891223 ], [ 3891219, 3891354, 3891224, 3891359 ], [ 3891355, 3894419, 3891360, 3894424 ], [ 3894420, 3900500, 3894425, 3900505 ], [ 3900501, 3907313, 3900506, 3907318 ], [ 3907314, 3908919, 3907319, 3908924 ], [ 3908920, 3909807, 3908925, 3909812 ], [ 3909808, 3909977, 3909813, 3909982 ], [ 3909978, 3930585, 3909983, 3930590 ], [ 3930586, 3933363, 3930591, 3933368 ], [ 3933364, 3935176, 3933369, 3935181 ], [ 3935177, 3936871, 3935182, 3936876 ], [ 3936872, 3945198, 3936877, 3945203 ], [ 3945199, 3946390, 3945204, 3946395 ], [ 3946391, 3946986, 3946396, 3946991 ], [ 3946987, 3956408, 3946992, 3956413 ], [ 3956409, 3958333, 3956414, 3958338 ], [ 3958334, 3959031, 3958339, 3959036 ], [ 3959032, 3960932, 3959037, 3960937 ], [ 3960933, 3964190, 3960938, 3964195 ], [ 3964191, 3969413, 3964196, 3969418 ], [ 3969414, 3972146, 3969419, 3972151 ], [ 3972147, 3972344, 3972152, 3972349 ], [ 3972345, 3978065, 3972350, 3978070 ], [ 3978066, 3981977, 3978071, 3981982 ], [ 3981978, 3984768, 3981983, 3984773 ], [ 3984769, 3984918, 3984774, 3984923 ], [ 3984919, 3985704, 3984924, 3985709 ], [ 3985705, 3995454, 3985710, 3995459 ], [ 3995455, 3997410, 3995460, 3997415 ], [ 3997411, 4000982, 3997416, 4000987 ], [ 4000983, 4002514, 4000988, 4002519 ], [ 4002515, 4005689, 4002520, 4005694 ], [ 4005690, 4016078, 4005695, 4016083 ], [ 4016079, 4017237, 4016084, 4017242 ], [ 4017238, 4018412, 4017243, 4018417 ], [ 4018413, 4018717, 4018418, 4018722 ], [ 4018718, 4023281, 4018723, 4023286 ], [ 4023282, 4032076, 4023287, 4032081 ], [ 4032077, 4041164, 4032082, 4041169 ], [ 4041165, 4041295, 4041170, 4041300 ], [ 4041296, 4041920, 4041301, 4041925 ], [ 4041921, 4046747, 4041926, 4046752 ], [ 4046748, 4052606, 4046753, 4052611 ], [ 4052607, 4054469, 4052612, 4054474 ], [ 4054470, 4054629, 4054475, 4054634 ], [ 4054630, 4054968, 4054635, 4054973 ], [ 4054969, 4055333, 4054974, 4055338 ], [ 4055334, 4056058, 4055339, 4056063 ], [ 4056059, 4059084, 4056064, 4059089 ], [ 4059085, 4062508, 4059090, 4062513 ], [ 4062509, 4065383, 4062514, 4065388 ], [ 4065384, 4065643, 4065389, 4065648 ], [ 4065644, 4069949, 4065649, 4069954 ], [ 4069950, 4078343, 4069955, 4078348 ], [ 4078344, 4083896, 4078349, 4083901 ], [ 4083897, 4085884, 4083902, 4085889 ], [ 4085885, 4090152, 4085890, 4090157 ], [ 4090153, 4093908, 4090158, 4093913 ], [ 4093909, 4094118, 4093914, 4094123 ], [ 4094119, 4095249, 4094124, 4095254 ], [ 4095250, 4097811, 4095255, 4097816 ], [ 4097812, 4101896, 4097817, 4101901 ], [ 4101897, 4107088, 4101902, 4107093 ], [ 4107089, 4107551, 4107094, 4107556 ], [ 4107552, 4107580, 4107557, 4107585 ], [ 4107581, 4109180, 4107586, 4109185 ], [ 4109181, 4110537, 4109186, 4110542 ], [ 4110538, 4116491, 4110543, 4116496 ], [ 4116492, 4117909, 4116497, 4117914 ], [ 4117910, 4118469, 4117915, 4118474 ], [ 4118470, 4123661, 4118475, 4123666 ], [ 4123662, 4123865, 4123667, 4123870 ], [ 4123866, 4125127, 4123871, 4125132 ], [ 4125128, 4129187, 4125133, 4129192 ], [ 4129188, 4132972, 4129193, 4132977 ], [ 4132973, 4134272, 4132978, 4134277 ], [ 4134273, 4135470, 4134278, 4135475 ], [ 4135471, 4136837, 4135476, 4136842 ], [ 4136838, 4146292, 4136843, 4146297 ], [ 4146293, 4146443, 4146298, 4146448 ], [ 4146444, 4148935, 4146449, 4148940 ], [ 4148936, 4162265, 4148941, 4162270 ], [ 4162266, 4164781, 4162271, 4164786 ], [ 4164782, 4170898, 4164787, 4170903 ], [ 4170899, 4175870, 4170904, 4175875 ], [ 4175871, 4175960, 4175876, 4175965 ], [ 4175961, 4180869, 4175966, 4180874 ], [ 4180870, 4181479, 4180875, 4181484 ], [ 4181480, 4187785, 4181485, 4187790 ], [ 4187786, 4192423, 4187791, 4192428 ], [ 4192424, 4192625, 4192429, 4192630 ], [ 4192626, 4194747, 4192631, 4194752 ], [ 4194748, 4195600, 4194753, 4195605 ], [ 4195601, 4195842, 4195606, 4195847 ], [ 4195843, 4197634, 4195848, 4197639 ], [ 4197635, 4198036, 4197640, 4198041 ], [ 4198037, 4205660, 4198042, 4205665 ], [ 4205661, 4213510, 4205666, 4213515 ], [ 4213511, 4217238, 4213516, 4217243 ], [ 4217239, 4221217, 4217244, 4221222 ], [ 4221218, 4225159, 4221223, 4225164 ], [ 4225160, 4226183, 4225165, 4226188 ], [ 4226184, 4228901, 4226189, 4228906 ], [ 4228902, 4229774, 4228907, 4229779 ], [ 4229775, 4230559, 4229780, 4230564 ], [ 4230560, 4235572, 4230565, 4235577 ], [ 4235573, 4242029, 4235578, 4242034 ], [ 4242030, 4242765, 4242035, 4242770 ], [ 4242766, 4244054, 4242771, 4244059 ], [ 4244055, 4245334, 4244060, 4245339 ], [ 4245335, 4249019, 4245340, 4249024 ], [ 4249020, 4249100, 4249025, 4249105 ], [ 4249101, 4256705, 4249106, 4256710 ], [ 4256706, 4259509, 4256711, 4259514 ], [ 4259510, 4261410, 4259515, 4261415 ], [ 4261411, 4270525, 4261416, 4270530 ], [ 4270526, 4274109, 4270531, 4274114 ], [ 4274110, 4281849, 4274115, 4281854 ], [ 4281850, 4284762, 4281855, 4284767 ], [ 4284763, 4303042, 4284768, 4303047 ], [ 4303043, 4303258, 4303048, 4303263 ], [ 4303259, 4306632, 4303264, 4306637 ], [ 4306633, 4312216, 4306638, 4312221 ], [ 4312217, 4314890, 4312222, 4314895 ], [ 4314891, 4316460, 4314896, 4316465 ], [ 4316461, 4316626, 4316466, 4316631 ], [ 4316627, 4318092, 4316632, 4318097 ], [ 4318093, 4318604, 4318098, 4318609 ], [ 4318605, 4318772, 4318610, 4318777 ], [ 4318773, 4321424, 4318778, 4321429 ], [ 4321425, 4321489, 4321430, 4321494 ], [ 4321490, 4327535, 4321495, 4327540 ], [ 4327536, 4329548, 4327541, 4329553 ], [ 4329549, 4331509, 4329554, 4331514 ], [ 4331510, 4332147, 4331515, 4332152 ], [ 4332148, 4334445, 4332153, 4334450 ], [ 4334446, 4338820, 4334451, 4338825 ], [ 4338821, 4339739, 4338826, 4339744 ], [ 4339740, 4343682, 4339745, 4343687 ], [ 4343683, 4348264, 4343688, 4348269 ], [ 4348265, 4352206, 4348270, 4352211 ], [ 4352207, 4356621, 4352212, 4356626 ], [ 4356622, 4366184, 4356627, 4366189 ], [ 4366185, 4374134, 4366190, 4374139 ], [ 4374135, 4374749, 4374140, 4374754 ], [ 4374750, 4388049, 4374755, 4388054 ], [ 4388050, 4388548, 4388055, 4388553 ], [ 4388549, 4395214, 4388554, 4395219 ], [ 4395215, 4397725, 4395220, 4397730 ], [ 4397726, 4401917, 4397731, 4401922 ], [ 4401918, 4401962, 4401923, 4401967 ], [ 4401963, 4404470, 4401968, 4404475 ], [ 4404471, 4404603, 4404476, 4404608 ], [ 4404604, 4404677, 4404609, 4404682 ], [ 4404678, 4407845, 4404683, 4407850 ], [ 4407846, 4415693, 4407851, 4415698 ], [ 4415694, 4415957, 4415699, 4415962 ], [ 4415958, 4426334, 4415963, 4426339 ], [ 4426335, 4427143, 4426340, 4427148 ], [ 4427144, 4432989, 4427149, 4432994 ], [ 4432990, 4433640, 4432995, 4433645 ], [ 4433641, 4435236, 4433646, 4435241 ], [ 4435237, 4450981, 4435242, 4450986 ], [ 4450982, 4452121, 4450987, 4452126 ], [ 4452122, 4454417, 4452127, 4454422 ], [ 4454418, 4455164, 4454423, 4455169 ], [ 4455165, 4459226, 4455170, 4459231 ], [ 4459227, 4462863, 4459232, 4462868 ], [ 4462864, 4469714, 4462869, 4469719 ], [ 4469715, 4471008, 4469720, 4471013 ], [ 4471009, 4473114, 4471014, 4473119 ], [ 4473115, 4477852, 4473120, 4477857 ], [ 4477853, 4477874, 4477858, 4477879 ], [ 4477875, 4482921, 4477880, 4482926 ], [ 4482922, 4489809, 4482927, 4489814 ], [ 4489810, 4490912, 4489815, 4490917 ], [ 4490913, 4491974, 4490918, 4491979 ], [ 4491975, 4492157, 4491980, 4492162 ], [ 4492158, 4493614, 4492163, 4493619 ], [ 4493615, 4496829, 4493620, 4496834 ], [ 4496830, 4497697, 4496835, 4497702 ], [ 4497698, 4499157, 4497703, 4499162 ], [ 4499158, 4502248, 4499163, 4502253 ], [ 4502249, 4504493, 4502254, 4504498 ], [ 4504494, 4505336, 4504499, 4505341 ], [ 4505337, 4506680, 4505342, 4506685 ], [ 4506681, 4506961, 4506686, 4506966 ], [ 4506962, 4507601, 4506967, 4507606 ], [ 4507602, 4513351, 4507607, 4513356 ], [ 4513352, 4516356, 4513357, 4516361 ], [ 4516357, 4520650, 4516362, 4520655 ], [ 4520651, 4528820, 4520656, 4528825 ], [ 4528821, 4535971, 4528826, 4535976 ], [ 4535972, 4540172, 4535977, 4540177 ], [ 4540173, 4551230, 4540178, 4551235 ], [ 4551231, 4552997, 4551236, 4553002 ], [ 4552998, 4555491, 4553003, 4555496 ], [ 4555492, 4558033, 4555497, 4558038 ], [ 4558034, 4562123, 4558039, 4562128 ], [ 4562124, 4563100, 4562129, 4563105 ], [ 4563101, 4564639, 4563106, 4564644 ], [ 4564640, 4566006, 4564645, 4566011 ], [ 4566007, 4575457, 4566012, 4575462 ], [ 4575458, 4575809, 4575463, 4575814 ], [ 4575810, 4576253, 4575815, 4576258 ], [ 4576254, 4579647, 4576259, 4579652 ], [ 4579648, 4588823, 4579653, 4588828 ], [ 4588824, 4589254, 4588829, 4589259 ], [ 4589255, 4598670, 4589260, 4598675 ], [ 4598671, 4601105, 4598676, 4601110 ], [ 4601106, 4602566, 4601111, 4602571 ], [ 4602567, 4612068, 4602572, 4612073 ], [ 4612069, 4615603, 4612074, 4615608 ], [ 4615604, 4627887, 4615609, 4627892 ], [ 4627888, 4631394, 4627893, 4631399 ], [ 4631395, 4631631, 4631400, 4631636 ], [ 4631632, 4635963, 4631637, 4635968 ], [ 4635964, 4641129, 4635969, 4641134 ], [ 4641130, 4642980, 4641135, 4642985 ], [ 4642981, 4643635, 4642986, 4643640 ], [ 4643636, 4644147, 4643641, 4644152 ], [ 4644148, 4649332, 4644153, 4649337 ], [ 4649333, 4649464, 4649338, 4649469 ], [ 4649465, 4656366, 4649470, 4656371 ], [ 4656367, 4656864, 4656372, 4656869 ], [ 4656865, 4656933, 4656870, 4656938 ], [ 4656934, 4660056, 4656939, 4660061 ], [ 4660057, 4665881, 4660062, 4665886 ], [ 4665882, 4668837, 4665887, 4668842 ], [ 4668838, 4672873, 4668843, 4672878 ], [ 4672874, 4681462, 4672879, 4681467 ], [ 4681463, 4696368, 4681468, 4696373 ], [ 4696369, 4699474, 4696374, 4699479 ], [ 4699475, 4704523, 4699480, 4704528 ], [ 4704524, 4706008, 4704529, 4706013 ], [ 4706009, 4706510, 4706014, 4706515 ], [ 4706511, 4711295, 4706516, 4711300 ], [ 4711296, 4711543, 4711301, 4711548 ], [ 4711544, 4711935, 4711549, 4711940 ], [ 4711936, 4712790, 4711941, 4712795 ], [ 4712791, 4713126, 4712796, 4713131 ], [ 4713127, 4713730, 4713132, 4713735 ], [ 4713731, 4717619, 4713736, 4717624 ], [ 4717620, 4724224, 4717625, 4724229 ], [ 4724225, 4725868, 4724230, 4725873 ], [ 4725869, 4727653, 4725874, 4727658 ], [ 4727654, 4729069, 4727659, 4729074 ], [ 4729070, 4730833, 4729075, 4730838 ], [ 4730834, 4733099, 4730839, 4733104 ], [ 4733100, 4733576, 4733105, 4733581 ], [ 4733577, 4736754, 4733582, 4736759 ], [ 4736755, 4741684, 4736760, 4741689 ], [ 4741685, 4744830, 4741690, 4744835 ], [ 4744831, 4746768, 4744836, 4746773 ], [ 4746769, 4749037, 4746774, 4749042 ], [ 4749038, 4749801, 4749043, 4749806 ], [ 4749802, 4749864, 4749807, 4749869 ], [ 4749865, 4750966, 4749870, 4750971 ], [ 4750967, 4752965, 4750972, 4752970 ], [ 4752966, 4754237, 4752971, 4754242 ], [ 4754238, 4757191, 4754243, 4757196 ], [ 4757192, 4762052, 4757197, 4762057 ], [ 4762053, 4764164, 4762058, 4764169 ], [ 4764165, 4766341, 4764170, 4766346 ], [ 4766342, 4767519, 4766347, 4767524 ], [ 4767520, 4769451, 4767525, 4769456 ], [ 4769452, 4770366, 4769457, 4770371 ], [ 4770367, 4774504, 4770372, 4774509 ], [ 4774505, 4779310, 4774510, 4779315 ], [ 4779311, 4784713, 4779316, 4784718 ], [ 4784714, 4784960, 4784719, 4784965 ], [ 4784961, 4789181, 4784966, 4789186 ], [ 4789182, 4792894, 4789187, 4792899 ], [ 4792895, 4804321, 4792900, 4804326 ], [ 4804322, 4807780, 4804327, 4807785 ], [ 4807781, 4808367, 4807786, 4808372 ], [ 4808368, 4811025, 4808373, 4811030 ], [ 4811026, 4813062, 4811031, 4813067 ], [ 4813063, 4822160, 4813068, 4822165 ], [ 4822161, 4833156, 4822166, 4833161 ], [ 4833157, 4839621, 4833162, 4839626 ], [ 4839622, 4853316, 4839627, 4853321 ], [ 4853317, 4862268, 4853322, 4862273 ], [ 4862269, 4862689, 4862274, 4862694 ], [ 4862690, 4863453, 4862695, 4863458 ], [ 4863454, 4863657, 4863459, 4863662 ], [ 4863658, 4867215, 4863663, 4867220 ], [ 4867216, 4867943, 4867221, 4867948 ], [ 4867944, 4870367, 4867949, 4870372 ], [ 4870368, 4871260, 4870373, 4871265 ], [ 4871261, 4871925, 4871266, 4871930 ], [ 4871926, 4872824, 4871931, 4872829 ], [ 4872825, 4879935, 4872830, 4879940 ], [ 4879936, 4881593, 4879941, 4881598 ], [ 4881594, 4882087, 4881599, 4882092 ], [ 4882088, 4889351, 4882093, 4889356 ], [ 4889352, 4890443, 4889357, 4890448 ], [ 4890444, 4898485, 4890449, 4898490 ], [ 4898486, 4901057, 4898491, 4901062 ], [ 4901058, 4904245, 4901063, 4904250 ], [ 4904246, 4904668, 4904251, 4904673 ], [ 4904669, 4904984, 4904674, 4904989 ], [ 4904985, 4914224, 4904990, 4914229 ], [ 4914225, 4916537, 4914230, 4916542 ], [ 4916538, 4919908, 4916543, 4919913 ], [ 4919909, 4926663, 4919914, 4926668 ], [ 4926664, 4929329, 4926669, 4929334 ], [ 4929330, 4930673, 4929335, 4930678 ], [ 4930674, 4930954, 4930679, 4930959 ], [ 4930955, 4931594, 4930960, 4931599 ], [ 4931595, 4937282, 4931600, 4937287 ], [ 4937283, 4939667, 4937288, 4939672 ], [ 4939668, 4941837, 4939673, 4941842 ], [ 4941838, 4947030, 4941843, 4947035 ], [ 4947031, 4951071, 4947036, 4951076 ], [ 4951072, 4953999, 4951077, 4954004 ], [ 4954000, 4955481, 4954005, 4955486 ], [ 4955482, 4959224, 4955487, 4959229 ], [ 4959225, 4974624, 4959230, 4974629 ], [ 4974625, 4977429, 4974630, 4977434 ], [ 4977430, 4984448, 4977435, 4984453 ], [ 4984449, 4986670, 4984454, 4986675 ], [ 4986671, 4992038, 4986676, 4992043 ], [ 4992039, 4993811, 4992044, 4993816 ], [ 4993812, 4995631, 4993817, 4995636 ], [ 4995632, 4996624, 4995637, 4996629 ], [ 4996625, 4996668, 4996630, 4996673 ], [ 4996669, 4998818, 4996674, 4998823 ], [ 4998819, 5004186, 4998824, 5004191 ], [ 5004187, 5013598, 5004192, 5013603 ], [ 5013599, 5016180, 5013604, 5016185 ], [ 5016181, 5018455, 5016186, 5018460 ], [ 5018456, 5026770, 5018461, 5026775 ], [ 5026771, 5028841, 5026776, 5028846 ], [ 5028842, 5031862, 5028847, 5031867 ], [ 5031863, 5036331, 5031868, 5036336 ], [ 5036332, 5037861, 5036337, 5037866 ], [ 5037862, 5038887, 5037867, 5038892 ], [ 5038888, 5040440, 5038893, 5040445 ], [ 5040441, 5042902, 5040446, 5042907 ], [ 5042903, 5044827, 5042908, 5044832 ], [ 5044828, 5050524, 5044833, 5050529 ], [ 5050525, 5053866, 5050530, 5053871 ], [ 5053867, 5054707, 5053872, 5054712 ], [ 5054708, 5055021, 5054713, 5055026 ], [ 5055022, 5057873, 5055027, 5057878 ], [ 5057874, 5059734, 5057879, 5059739 ], [ 5059735, 5061548, 5059740, 5061553 ], [ 5061549, 5063342, 5061554, 5063347 ], [ 5063343, 5064119, 5063348, 5064124 ], [ 5064120, 5064638, 5064125, 5064643 ], [ 5064639, 5068774, 5064644, 5068779 ], [ 5068775, 5069157, 5068780, 5069162 ], [ 5069158, 5069375, 5069163, 5069380 ], [ 5069376, 5071533, 5069381, 5071538 ], [ 5071534, 5072259, 5071539, 5072264 ], [ 5072260, 5072332, 5072265, 5072337 ], [ 5072333, 5074288, 5072338, 5074293 ], [ 5074289, 5087508, 5074294, 5087513 ], [ 5087509, 5088409, 5087514, 5088414 ], [ 5088410, 5093963, 5088415, 5093968 ], [ 5093964, 5098261, 5093969, 5098266 ], [ 5098262, 5116037, 5098267, 5116042 ], [ 5116038, 5116647, 5116043, 5116652 ], [ 5116648, 5119282, 5116653, 5119287 ], [ 5119283, 5132940, 5119288, 5132945 ], [ 5132941, 5133405, 5132946, 5133410 ], [ 5133406, 5134558, 5133411, 5134563 ], [ 5134559, 5138432, 5134564, 5138437 ], [ 5138433, 5138944, 5138438, 5138949 ], [ 5138945, 5139157, 5138950, 5139162 ], [ 5139158, 5139587, 5139163, 5139592 ], [ 5139588, 5142617, 5139593, 5142622 ], [ 5142618, 5148183, 5142623, 5148188 ], [ 5148184, 5148672, 5148189, 5148677 ], [ 5148673, 5150053, 5148678, 5150058 ], [ 5150054, 5151087, 5150059, 5151092 ], [ 5151088, 5153217, 5151093, 5153222 ], [ 5153218, 5154383, 5153223, 5154388 ], [ 5154384, 5155016, 5154389, 5155021 ], [ 5155017, 5156599, 5155022, 5156604 ], [ 5156600, 5157802, 5156605, 5157807 ], [ 5157803, 5157970, 5157808, 5157975 ], [ 5157971, 5160625, 5157976, 5160630 ], [ 5160626, 5162852, 5160631, 5162857 ], [ 5162853, 5164824, 5162858, 5164829 ], [ 5164825, 5171077, 5164830, 5171082 ], [ 5171078, 5180591, 5171083, 5180596 ], [ 5180592, 5188235, 5180597, 5188240 ], [ 5188236, 5194013, 5188241, 5194018 ], [ 5194014, 5200363, 5194019, 5200368 ], [ 5200364, 5200380, 5200369, 5200385 ], [ 5200381, 5212416, 5200386, 5212421 ], [ 5212417, 5214189, 5212422, 5214194 ], [ 5214190, 5218448, 5214195, 5218453 ], [ 5218449, 5221514, 5218454, 5221519 ], [ 5221515, 5222405, 5221520, 5222410 ], [ 5222406, 5223121, 5222411, 5223126 ], [ 5223122, 5225062, 5223127, 5225067 ], [ 5225063, 5227034, 5225068, 5227039 ], [ 5227035, 5237062, 5227040, 5237067 ], [ 5237063, 5238549, 5237068, 5238554 ], [ 5238550, 5239941, 5238555, 5239946 ], [ 5239942, 5241160, 5239947, 5241165 ], [ 5241161, 5245009, 5241166, 5245014 ], [ 5245010, 5245420, 5245015, 5245425 ], [ 5245421, 5246665, 5245426, 5246670 ], [ 5246666, 5246882, 5246671, 5246887 ], [ 5246883, 5252321, 5246888, 5252326 ], [ 5252322, 5261182, 5252327, 5261187 ], [ 5261183, 5273111, 5261188, 5273116 ], [ 5273112, 5273132, 5273117, 5273137 ], [ 5273133, 5273680, 5273138, 5273685 ], [ 5273681, 5274282, 5273686, 5274287 ], [ 5274283, 5277485, 5274288, 5277490 ], [ 5277486, 5278602, 5277491, 5278607 ], [ 5278603, 5286668, 5278608, 5286673 ], [ 5286669, 5288844, 5286674, 5288849 ], [ 5288845, 5295426, 5288850, 5295431 ], [ 5295427, 5299331, 5295432, 5299336 ], [ 5299332, 5299740, 5299337, 5299745 ], [ 5299741, 5302074, 5299746, 5302079 ], [ 5302075, 5304927, 5302080, 5304932 ], [ 5304928, 5316196, 5304933, 5316201 ], [ 5316197, 5321884, 5316202, 5321889 ], [ 5321885, 5327443, 5321890, 5327448 ], [ 5327444, 5333009, 5327449, 5333014 ], [ 5333010, 5335558, 5333015, 5335563 ], [ 5335559, 5337934, 5335564, 5337939 ], [ 5337935, 5340296, 5337940, 5340301 ], [ 5340297, 5342510, 5340302, 5342515 ], [ 5342511, 5343834, 5342516, 5343839 ], [ 5343835, 5359283, 5343840, 5359288 ], [ 5359284, 5362544, 5359289, 5362549 ], [ 5362545, 5375729, 5362550, 5375734 ], [ 5375730, 5375790, 5375735, 5375795 ], [ 5375791, 5377253, 5375796, 5377258 ], [ 5377254, 5378696, 5377259, 5378701 ], [ 5378697, 5382060, 5378702, 5382065 ], [ 5382061, 5388224, 5382066, 5388229 ], [ 5388225, 5391633, 5388230, 5391638 ], [ 5391634, 5401417, 5391639, 5401422 ], [ 5401418, 5406537, 5401423, 5406542 ], [ 5406538, 5408637, 5406543, 5408642 ], [ 5408638, 5417270, 5408643, 5417275 ], [ 5417271, 5419353, 5417276, 5419358 ], [ 5419354, 5420144, 5419359, 5420149 ], [ 5420145, 5420216, 5420150, 5420221 ], [ 5420217, 5420740, 5420222, 5420745 ], [ 5420741, 5427477, 5420746, 5427482 ], [ 5427478, 5429323, 5427483, 5429328 ], [ 5429324, 5441634, 5429329, 5441639 ], [ 5441635, 5448663, 5441640, 5448668 ], [ 5448664, 5452231, 5448669, 5452236 ], [ 5452232, 5458274, 5452237, 5458279 ], [ 5458275, 5459524, 5458280, 5459529 ], [ 5459525, 5468509, 5459530, 5468514 ], [ 5468510, 5469773, 5468515, 5469778 ], [ 5469774, 5475379, 5469779, 5475384 ], [ 5475380, 5476063, 5475385, 5476068 ], [ 5476064, 5477860, 5476069, 5477865 ], [ 5477861, 5478124, 5477866, 5478129 ], [ 5478125, 5478577, 5478130, 5478582 ], [ 5478578, 5479176, 5478583, 5479181 ], [ 5479177, 5483012, 5479182, 5483017 ], [ 5483013, 5483809, 5483018, 5483814 ], [ 5483810, 5495234, 5483815, 5495239 ], [ 5495235, 5498449, 5495240, 5498449 ] ] def _cut(seq) cuts = Bio::RestrictionEnzyme::Analysis.cut(seq, "BstEII", {:view_ranges => true}) end def test_BstEII_edge_cases (13481..13492).each do |len| _test_by_size(len) end end end # class TestEcoliO157H7_BstEII class TestEcoliO157H7_3enzymes < Test::Unit::TestCase include HelperMethods TestLabel = 'SacI+EcoRI+BstEII' SampleSequence = EcoliO157H7Seq SampleCutRanges = The3Enzymes_WHOLE = [ [ 0, 79, 0, 84 ], [ 80, 3858, 85, 3862 ], [ 3859, 4612, 3863, 4617 ], [ 4613, 5619, 4618, 5623 ], [ 5620, 7472, 5624, 7468 ], [ 7473, 12905, 7469, 12909 ], [ 12906, 13483, 12910, 13488 ], [ 13484, 14551, 13489, 14547 ], [ 14552, 15984, 14548, 15989 ], [ 15985, 20045, 15990, 20049 ], [ 20046, 21462, 20050, 21467 ], [ 21463, 27326, 21468, 27331 ], [ 27327, 30943, 27332, 30948 ], [ 30944, 34888, 30949, 34893 ], [ 34889, 35077, 34894, 35082 ], [ 35078, 35310, 35083, 35315 ], [ 35311, 36254, 35316, 36259 ], [ 36255, 36648, 36260, 36652 ], [ 36649, 36918, 36653, 36922 ], [ 36919, 41885, 36923, 41890 ], [ 41886, 43070, 41891, 43075 ], [ 43071, 45689, 43076, 45694 ], [ 45690, 48588, 45695, 48584 ], [ 48589, 52325, 48585, 52330 ], [ 52326, 54650, 52331, 54654 ], [ 54651, 54728, 54655, 54732 ], [ 54729, 55703, 54733, 55708 ], [ 55704, 58828, 55709, 58833 ], [ 58829, 59178, 58834, 59183 ], [ 59179, 59800, 59184, 59796 ], [ 59801, 61256, 59797, 61260 ], [ 61257, 72610, 61261, 72615 ], [ 72611, 72739, 72616, 72744 ], [ 72740, 73099, 72745, 73104 ], [ 73100, 75123, 73105, 75128 ], [ 75124, 77366, 75129, 77371 ], [ 77367, 77810, 77372, 77815 ], [ 77811, 78740, 77816, 78745 ], [ 78741, 79717, 78746, 79722 ], [ 79718, 82250, 79723, 82255 ], [ 82251, 84604, 82256, 84609 ], [ 84605, 95491, 84610, 95496 ], [ 95492, 95785, 95497, 95790 ], [ 95786, 95794, 95791, 95799 ], [ 95795, 96335, 95800, 96340 ], [ 96336, 96489, 96341, 96493 ], [ 96490, 101464, 96494, 101468 ], [ 101465, 102044, 101469, 102049 ], [ 102045, 102541, 102050, 102546 ], [ 102542, 103192, 102547, 103197 ], [ 103193, 103397, 103198, 103393 ], [ 103398, 104722, 103394, 104727 ], [ 104723, 106365, 104728, 106369 ], [ 106366, 106896, 106370, 106900 ], [ 106897, 107735, 106901, 107739 ], [ 107736, 110020, 107740, 110024 ], [ 110021, 110883, 110025, 110888 ], [ 110884, 112524, 110889, 112528 ], [ 112525, 113324, 112529, 113328 ], [ 113325, 115867, 113329, 115871 ], [ 115868, 117723, 115872, 117727 ], [ 117724, 118742, 117728, 118738 ], [ 118743, 120090, 118739, 120095 ], [ 120091, 120657, 120096, 120662 ], [ 120658, 128060, 120663, 128064 ], [ 128061, 128308, 128065, 128313 ], [ 128309, 136112, 128314, 136116 ], [ 136113, 138305, 136117, 138310 ], [ 138306, 138996, 138311, 139000 ], [ 138997, 139146, 139001, 139142 ], [ 139147, 141147, 139143, 141152 ], [ 141148, 143724, 141153, 143729 ], [ 143725, 143838, 143730, 143843 ], [ 143839, 144303, 143844, 144308 ], [ 144304, 148199, 144309, 148204 ], [ 148200, 149577, 148205, 149582 ], [ 149578, 149731, 149583, 149736 ], [ 149732, 152137, 149737, 152141 ], [ 152138, 156115, 152142, 156120 ], [ 156116, 161126, 156121, 161131 ], [ 161127, 162856, 161132, 162861 ], [ 162857, 168965, 162862, 168961 ], [ 168966, 170693, 168962, 170698 ], [ 170694, 170944, 170699, 170949 ], [ 170945, 171201, 170950, 171206 ], [ 171202, 173241, 171207, 173246 ], [ 173242, 177283, 173247, 177288 ], [ 177284, 178048, 177289, 178052 ], [ 178049, 178177, 178053, 178182 ], [ 178178, 178781, 178183, 178786 ], [ 178782, 181610, 178787, 181615 ], [ 181611, 181706, 181616, 181711 ], [ 181707, 185355, 181712, 185351 ], [ 185356, 185661, 185352, 185666 ], [ 185662, 193407, 185667, 193412 ], [ 193408, 194141, 193413, 194145 ], [ 194142, 194876, 194146, 194880 ], [ 194877, 195511, 194881, 195516 ], [ 195512, 195754, 195517, 195759 ], [ 195755, 197005, 195760, 197009 ], [ 197006, 197247, 197010, 197252 ], [ 197248, 200659, 197253, 200664 ], [ 200660, 201820, 200665, 201825 ], [ 201821, 202300, 201826, 202305 ], [ 202301, 202686, 202306, 202691 ], [ 202687, 206289, 202692, 206294 ], [ 206290, 206466, 206295, 206471 ], [ 206467, 207011, 206472, 207016 ], [ 207012, 208159, 207017, 208164 ], [ 208160, 209976, 208165, 209981 ], [ 209977, 210078, 209982, 210083 ], [ 210079, 211485, 210084, 211490 ], [ 211486, 212377, 211491, 212382 ], [ 212378, 213569, 212383, 213574 ], [ 213570, 214316, 213575, 214312 ], [ 214317, 216005, 214313, 216010 ], [ 216006, 217226, 216011, 217222 ], [ 217227, 220098, 217223, 220103 ], [ 220099, 221476, 220104, 221480 ], [ 221477, 221641, 221481, 221645 ], [ 221642, 224063, 221646, 224068 ], [ 224064, 227774, 224069, 227778 ], [ 227775, 228604, 227779, 228609 ], [ 228605, 229453, 228610, 229449 ], [ 229454, 229931, 229450, 229935 ], [ 229932, 232247, 229936, 232251 ], [ 232248, 235221, 232252, 235225 ], [ 235222, 237291, 235226, 237295 ], [ 237292, 239035, 237296, 239039 ], [ 239036, 239993, 239040, 239998 ], [ 239994, 240624, 239999, 240628 ], [ 240625, 240887, 240629, 240891 ], [ 240888, 242089, 240892, 242093 ], [ 242090, 243880, 242094, 243884 ], [ 243881, 245321, 243885, 245325 ], [ 245322, 247914, 245326, 247919 ], [ 247915, 251579, 247920, 251584 ], [ 251580, 257092, 251585, 257097 ], [ 257093, 259887, 257098, 259891 ], [ 259888, 260535, 259892, 260539 ], [ 260536, 261621, 260540, 261626 ], [ 261622, 263030, 261627, 263035 ], [ 263031, 264258, 263036, 264262 ], [ 264259, 265004, 264263, 265008 ], [ 265005, 265084, 265009, 265089 ], [ 265085, 265243, 265090, 265248 ], [ 265244, 265534, 265249, 265539 ], [ 265535, 266117, 265540, 266122 ], [ 266118, 274428, 266123, 274433 ], [ 274429, 275235, 274434, 275231 ], [ 275236, 276946, 275232, 276950 ], [ 276947, 277457, 276951, 277461 ], [ 277458, 279137, 277462, 279133 ], [ 279138, 282285, 279134, 282290 ], [ 282286, 286948, 282291, 286953 ], [ 286949, 288342, 286954, 288338 ], [ 288343, 289897, 288339, 289901 ], [ 289898, 292547, 289902, 292552 ], [ 292548, 297678, 292553, 297683 ], [ 297679, 303902, 297684, 303906 ], [ 303903, 304580, 303907, 304584 ], [ 304581, 307362, 304585, 307366 ], [ 307363, 307931, 307367, 307935 ], [ 307932, 308161, 307936, 308166 ], [ 308162, 308706, 308167, 308711 ], [ 308707, 313482, 308712, 313487 ], [ 313483, 316025, 313488, 316021 ], [ 316026, 324159, 316022, 324163 ], [ 324160, 326130, 324164, 326134 ], [ 326131, 331620, 326135, 331624 ], [ 331621, 336338, 331625, 336342 ], [ 336339, 336873, 336343, 336877 ], [ 336874, 337118, 336878, 337123 ], [ 337119, 337935, 337124, 337940 ], [ 337936, 338781, 337941, 338786 ], [ 338782, 339493, 338787, 339498 ], [ 339494, 341025, 339499, 341030 ], [ 341026, 343919, 341031, 343923 ], [ 343920, 344424, 343924, 344429 ], [ 344425, 348384, 344430, 348389 ], [ 348385, 348408, 348390, 348404 ], [ 348409, 353417, 348405, 353413 ], [ 353418, 354781, 353414, 354786 ], [ 354782, 356692, 354787, 356697 ], [ 356693, 357008, 356698, 357013 ], [ 357009, 357305, 357014, 357310 ], [ 357306, 357328, 357311, 357333 ], [ 357329, 358126, 357334, 358131 ], [ 358127, 359472, 358132, 359477 ], [ 359473, 362160, 359478, 362165 ], [ 362161, 365395, 362166, 365400 ], [ 365396, 365704, 365401, 365709 ], [ 365705, 368663, 365710, 368667 ], [ 368664, 368841, 368668, 368845 ], [ 368842, 370589, 368846, 370593 ], [ 370590, 371148, 370594, 371152 ], [ 371149, 373639, 371153, 373643 ], [ 373640, 377393, 373644, 377397 ], [ 377394, 381068, 377398, 381072 ], [ 381069, 381692, 381073, 381688 ], [ 381693, 381746, 381689, 381751 ], [ 381747, 381994, 381752, 381999 ], [ 381995, 383335, 382000, 383340 ], [ 383336, 385141, 383341, 385146 ], [ 385142, 389399, 385147, 389403 ], [ 389400, 390171, 389404, 390176 ], [ 390172, 392340, 390177, 392344 ], [ 392341, 392764, 392345, 392769 ], [ 392765, 394338, 392770, 394343 ], [ 394339, 394686, 394344, 394691 ], [ 394687, 397592, 394692, 397596 ], [ 397593, 398703, 397597, 398708 ], [ 398704, 404095, 398709, 404100 ], [ 404096, 408361, 404101, 408366 ], [ 408362, 409029, 408367, 409025 ], [ 409030, 413032, 409026, 413037 ], [ 413033, 414563, 413038, 414568 ], [ 414564, 416901, 414569, 416906 ], [ 416902, 417419, 416907, 417424 ], [ 417420, 420057, 417425, 420061 ], [ 420058, 421129, 420062, 421125 ], [ 421130, 421777, 421126, 421782 ], [ 421778, 423748, 421783, 423753 ], [ 423749, 431903, 423754, 431908 ], [ 431904, 432852, 431909, 432848 ], [ 432853, 440000, 432849, 440005 ], [ 440001, 440754, 440006, 440750 ], [ 440755, 444226, 440751, 444222 ], [ 444227, 448040, 444223, 448045 ], [ 448041, 452994, 448046, 452999 ], [ 452995, 453075, 453000, 453080 ], [ 453076, 454654, 453081, 454658 ], [ 454655, 454950, 454659, 454955 ], [ 454951, 455888, 454956, 455893 ], [ 455889, 460160, 455894, 460165 ], [ 460161, 462319, 460166, 462323 ], [ 462320, 462650, 462324, 462654 ], [ 462651, 463076, 462655, 463081 ], [ 463077, 465003, 463082, 465008 ], [ 465004, 466828, 465009, 466833 ], [ 466829, 467686, 466834, 467691 ], [ 467687, 468596, 467692, 468601 ], [ 468597, 475083, 468602, 475087 ], [ 475084, 479953, 475088, 479958 ], [ 479954, 480538, 479959, 480543 ], [ 480539, 482480, 480544, 482484 ], [ 482481, 482869, 482485, 482874 ], [ 482870, 483410, 482875, 483414 ], [ 483411, 489378, 483415, 489383 ], [ 489379, 492112, 489384, 492116 ], [ 492113, 492241, 492117, 492246 ], [ 492242, 493791, 492247, 493795 ], [ 493792, 495406, 493796, 495411 ], [ 495407, 495712, 495412, 495717 ], [ 495713, 497829, 495718, 497834 ], [ 497830, 501698, 497835, 501703 ], [ 501699, 503304, 501704, 503308 ], [ 503305, 504565, 503309, 504570 ], [ 504566, 505105, 504571, 505110 ], [ 505106, 508452, 505111, 508457 ], [ 508453, 514353, 508458, 514357 ], [ 514354, 515947, 514358, 515952 ], [ 515948, 519141, 515953, 519146 ], [ 519142, 519398, 519147, 519403 ], [ 519399, 519662, 519404, 519666 ], [ 519663, 521386, 519667, 521391 ], [ 521387, 521935, 521392, 521939 ], [ 521936, 523114, 521940, 523118 ], [ 523115, 524176, 523119, 524180 ], [ 524177, 524521, 524181, 524525 ], [ 524522, 524936, 524526, 524932 ], [ 524937, 526115, 524933, 526120 ], [ 526116, 526729, 526121, 526734 ], [ 526730, 527018, 526735, 527023 ], [ 527019, 528059, 527024, 528064 ], [ 528060, 532689, 528065, 532694 ], [ 532690, 534193, 532695, 534189 ], [ 534194, 534702, 534190, 534707 ], [ 534703, 535272, 534708, 535277 ], [ 535273, 538638, 535278, 538642 ], [ 538639, 538668, 538643, 538673 ], [ 538669, 543939, 538674, 543944 ], [ 543940, 547429, 543945, 547434 ], [ 547430, 547624, 547435, 547628 ], [ 547625, 550898, 547629, 550902 ], [ 550899, 553890, 550903, 553895 ], [ 553891, 554678, 553896, 554683 ], [ 554679, 555452, 554684, 555457 ], [ 555453, 556296, 555458, 556301 ], [ 556297, 557116, 556302, 557120 ], [ 557117, 559341, 557121, 559346 ], [ 559342, 559991, 559347, 559996 ], [ 559992, 563242, 559997, 563247 ], [ 563243, 563390, 563248, 563394 ], [ 563391, 566071, 563395, 566075 ], [ 566072, 566857, 566076, 566861 ], [ 566858, 571925, 566862, 571929 ], [ 571926, 576432, 571930, 576437 ], [ 576433, 582431, 576438, 582436 ], [ 582432, 582959, 582437, 582964 ], [ 582960, 583475, 582965, 583480 ], [ 583476, 583589, 583481, 583594 ], [ 583590, 583670, 583595, 583675 ], [ 583671, 583901, 583676, 583906 ], [ 583902, 584198, 583907, 584203 ], [ 584199, 584633, 584204, 584638 ], [ 584634, 585704, 584639, 585709 ], [ 585705, 585746, 585710, 585751 ], [ 585747, 586175, 585752, 586180 ], [ 586176, 586301, 586181, 586306 ], [ 586302, 586643, 586307, 586648 ], [ 586644, 586775, 586649, 586780 ], [ 586776, 587072, 586781, 587077 ], [ 587073, 587214, 587078, 587219 ], [ 587215, 587540, 587220, 587545 ], [ 587541, 587969, 587546, 587974 ], [ 587970, 588095, 587975, 588100 ], [ 588096, 588437, 588101, 588442 ], [ 588438, 588569, 588443, 588574 ], [ 588570, 589008, 588575, 589013 ], [ 589009, 589166, 589014, 589171 ], [ 589167, 590366, 589172, 590371 ], [ 590367, 590792, 590372, 590797 ], [ 590793, 591077, 590798, 591082 ], [ 591078, 591263, 591083, 591268 ], [ 591264, 591863, 591269, 591868 ], [ 591864, 592058, 591869, 592063 ], [ 592059, 592160, 592064, 592165 ], [ 592161, 592568, 592166, 592573 ], [ 592569, 592760, 592574, 592765 ], [ 592761, 593060, 592766, 593065 ], [ 593061, 593186, 593066, 593191 ], [ 593187, 593366, 593192, 593371 ], [ 593367, 593957, 593372, 593962 ], [ 593958, 594827, 593963, 594832 ], [ 594828, 594980, 594833, 594985 ], [ 594981, 595649, 594986, 595654 ], [ 595650, 595893, 595655, 595898 ], [ 595894, 596057, 595899, 596062 ], [ 596058, 596159, 596063, 596164 ], [ 596160, 596351, 596165, 596356 ], [ 596352, 596660, 596357, 596665 ], [ 596661, 596960, 596666, 596965 ], [ 596961, 597102, 596966, 597107 ], [ 597103, 597155, 597108, 597160 ], [ 597156, 597257, 597161, 597262 ], [ 597258, 599957, 597263, 599962 ], [ 599958, 604182, 599963, 604186 ], [ 604183, 611038, 604187, 611043 ], [ 611039, 612202, 611044, 612207 ], [ 612203, 614051, 612208, 614056 ], [ 614052, 614134, 614057, 614139 ], [ 614135, 614787, 614140, 614792 ], [ 614788, 616272, 614793, 616277 ], [ 616273, 616867, 616278, 616871 ], [ 616868, 617737, 616872, 617742 ], [ 617738, 627339, 617743, 627344 ], [ 627340, 628902, 627345, 628907 ], [ 628903, 629142, 628908, 629146 ], [ 629143, 629458, 629147, 629454 ], [ 629459, 636523, 629455, 636528 ], [ 636524, 637529, 636529, 637534 ], [ 637530, 639061, 637535, 639065 ], [ 639062, 647713, 639066, 647718 ], [ 647714, 648684, 647719, 648689 ], [ 648685, 652752, 648690, 652748 ], [ 652753, 653543, 652749, 653548 ], [ 653544, 654406, 653549, 654410 ], [ 654407, 658188, 654411, 658192 ], [ 658189, 659030, 658193, 659035 ], [ 659031, 662241, 659036, 662246 ], [ 662242, 670896, 662247, 670900 ], [ 670897, 671781, 670901, 671786 ], [ 671782, 672048, 671787, 672053 ], [ 672049, 673788, 672054, 673793 ], [ 673789, 674707, 673794, 674712 ], [ 674708, 674998, 674713, 675003 ], [ 674999, 675157, 675004, 675162 ], [ 675158, 688595, 675163, 688600 ], [ 688596, 693309, 688601, 693314 ], [ 693310, 693523, 693315, 693527 ], [ 693524, 696514, 693528, 696518 ], [ 696515, 697406, 696519, 697411 ], [ 697407, 702676, 697412, 702681 ], [ 702677, 707208, 702682, 707212 ], [ 707209, 707382, 707213, 707387 ], [ 707383, 708604, 707388, 708609 ], [ 708605, 710046, 708610, 710051 ], [ 710047, 711630, 710052, 711635 ], [ 711631, 711696, 711636, 711701 ], [ 711697, 712329, 711702, 712334 ], [ 712330, 714099, 712335, 714103 ], [ 714100, 716461, 714104, 716466 ], [ 716462, 720238, 716467, 720243 ], [ 720239, 720374, 720244, 720379 ], [ 720375, 720471, 720380, 720475 ], [ 720472, 721463, 720476, 721467 ], [ 721464, 723143, 721468, 723147 ], [ 723144, 723348, 723148, 723352 ], [ 723349, 724200, 723353, 724205 ], [ 724201, 725464, 724206, 725468 ], [ 725465, 725687, 725469, 725692 ], [ 725688, 729467, 725693, 729471 ], [ 729468, 730067, 729472, 730072 ], [ 730068, 730406, 730073, 730410 ], [ 730407, 730574, 730411, 730579 ], [ 730575, 730699, 730580, 730704 ], [ 730700, 732726, 730705, 732731 ], [ 732727, 734363, 732732, 734359 ], [ 734364, 738597, 734360, 738602 ], [ 738598, 738869, 738603, 738873 ], [ 738870, 739571, 738874, 739575 ], [ 739572, 742040, 739576, 742044 ], [ 742041, 742350, 742045, 742346 ], [ 742351, 743326, 742347, 743331 ], [ 743327, 743557, 743332, 743561 ], [ 743558, 743966, 743562, 743970 ], [ 743967, 744992, 743971, 744997 ], [ 744993, 745843, 744998, 745848 ], [ 745844, 751518, 745849, 751523 ], [ 751519, 752431, 751524, 752436 ], [ 752432, 752549, 752437, 752554 ], [ 752550, 758622, 752555, 758626 ], [ 758623, 760978, 758627, 760982 ], [ 760979, 761319, 760983, 761315 ], [ 761320, 766036, 761316, 766041 ], [ 766037, 768968, 766042, 768973 ], [ 768969, 770151, 768974, 770156 ], [ 770152, 771158, 770157, 771163 ], [ 771159, 771405, 771164, 771410 ], [ 771406, 771429, 771411, 771433 ], [ 771430, 774384, 771434, 774388 ], [ 774385, 781958, 774389, 781963 ], [ 781959, 784226, 781964, 784231 ], [ 784227, 784572, 784232, 784576 ], [ 784573, 784806, 784577, 784810 ], [ 784807, 786886, 784811, 786890 ], [ 786887, 786945, 786891, 786950 ], [ 786946, 787203, 786951, 787208 ], [ 787204, 789251, 787209, 789256 ], [ 789252, 791218, 789257, 791223 ], [ 791219, 793716, 791224, 793721 ], [ 793717, 795003, 793722, 795008 ], [ 795004, 795521, 795009, 795526 ], [ 795522, 800659, 795527, 800663 ], [ 800660, 802360, 800664, 802364 ], [ 802361, 804514, 802365, 804519 ], [ 804515, 805238, 804520, 805243 ], [ 805239, 805887, 805244, 805892 ], [ 805888, 807288, 805893, 807292 ], [ 807289, 808461, 807293, 808466 ], [ 808462, 808692, 808467, 808696 ], [ 808693, 809805, 808697, 809810 ], [ 809806, 810086, 809811, 810091 ], [ 810087, 810726, 810092, 810731 ], [ 810727, 813170, 810732, 813174 ], [ 813171, 813863, 813175, 813867 ], [ 813864, 820111, 813868, 820116 ], [ 820112, 821326, 820117, 821331 ], [ 821327, 821647, 821332, 821652 ], [ 821648, 824277, 821653, 824282 ], [ 824278, 825750, 824283, 825755 ], [ 825751, 828770, 825756, 828775 ], [ 828771, 828924, 828776, 828929 ], [ 828925, 830194, 828930, 830199 ], [ 830195, 830786, 830200, 830791 ], [ 830787, 831245, 830792, 831241 ], [ 831246, 832788, 831242, 832793 ], [ 832789, 833306, 832794, 833311 ], [ 833307, 835264, 833312, 835260 ], [ 835265, 835656, 835261, 835661 ], [ 835657, 841180, 835662, 841185 ], [ 841181, 842112, 841186, 842117 ], [ 842113, 842524, 842118, 842528 ], [ 842525, 843973, 842529, 843978 ], [ 843974, 843990, 843979, 843995 ], [ 843991, 851267, 843996, 851271 ], [ 851268, 852882, 851272, 852887 ], [ 852883, 854392, 852888, 854397 ], [ 854393, 857721, 854398, 857726 ], [ 857722, 857961, 857727, 857966 ], [ 857962, 859112, 857967, 859116 ], [ 859113, 862783, 859117, 862788 ], [ 862784, 869922, 862789, 869926 ], [ 869923, 878953, 869927, 878958 ], [ 878954, 885194, 878959, 885199 ], [ 885195, 886313, 885200, 886318 ], [ 886314, 886460, 886319, 886465 ], [ 886461, 888041, 886466, 888045 ], [ 888042, 890161, 888046, 890165 ], [ 890162, 890233, 890166, 890238 ], [ 890234, 890346, 890239, 890351 ], [ 890347, 890379, 890352, 890384 ], [ 890380, 895074, 890385, 895078 ], [ 895075, 897876, 895079, 897880 ], [ 897877, 897943, 897881, 897947 ], [ 897944, 899572, 897948, 899576 ], [ 899573, 899676, 899577, 899681 ], [ 899677, 903962, 899682, 903967 ], [ 903963, 904236, 903968, 904241 ], [ 904237, 908130, 904242, 908135 ], [ 908131, 912131, 908136, 912127 ], [ 912132, 916611, 912128, 916616 ], [ 916612, 916803, 916617, 916808 ], [ 916804, 920531, 916809, 920536 ], [ 920532, 923592, 920537, 923596 ], [ 923593, 927519, 923597, 927523 ], [ 927520, 928181, 927524, 928185 ], [ 928182, 928505, 928186, 928510 ], [ 928506, 928644, 928511, 928640 ], [ 928645, 934935, 928641, 934931 ], [ 934936, 935911, 934932, 935915 ], [ 935912, 936947, 935916, 936952 ], [ 936948, 937240, 936953, 937245 ], [ 937241, 939698, 937246, 939703 ], [ 939699, 939711, 939704, 939716 ], [ 939712, 941642, 939717, 941647 ], [ 941643, 949052, 941648, 949057 ], [ 949053, 949800, 949058, 949805 ], [ 949801, 949851, 949806, 949855 ], [ 949852, 951412, 949856, 951417 ], [ 951413, 951810, 951418, 951815 ], [ 951811, 952386, 951816, 952391 ], [ 952387, 953295, 952392, 953300 ], [ 953296, 953894, 953301, 953899 ], [ 953895, 955768, 953900, 955772 ], [ 955769, 955953, 955773, 955957 ], [ 955954, 956045, 955958, 956041 ], [ 956046, 958753, 956042, 958758 ], [ 958754, 964476, 958759, 964481 ], [ 964477, 967468, 964482, 967473 ], [ 967469, 969625, 967474, 969629 ], [ 969626, 969631, 969630, 969636 ], [ 969632, 970966, 969637, 970971 ], [ 970967, 971138, 970972, 971143 ], [ 971139, 974185, 971144, 974190 ], [ 974186, 974365, 974191, 974370 ], [ 974366, 975256, 974371, 975261 ], [ 975257, 976794, 975262, 976799 ], [ 976795, 980694, 976800, 980698 ], [ 980695, 987406, 980699, 987411 ], [ 987407, 988132, 987412, 988137 ], [ 988133, 992809, 988138, 992814 ], [ 992810, 996307, 992815, 996311 ], [ 996308, 999121, 996312, 999125 ], [ 999122, 1000225, 999126, 1000230 ], [ 1000226, 1001626, 1000231, 1001631 ], [ 1001627, 1005050, 1001632, 1005054 ], [ 1005051, 1007354, 1005055, 1007359 ], [ 1007355, 1011910, 1007360, 1011915 ], [ 1011911, 1012377, 1011916, 1012382 ], [ 1012378, 1015175, 1012383, 1015179 ], [ 1015176, 1017328, 1015180, 1017333 ], [ 1017329, 1020891, 1017334, 1020896 ], [ 1020892, 1021340, 1020897, 1021345 ], [ 1021341, 1024845, 1021346, 1024850 ], [ 1024846, 1025853, 1024851, 1025858 ], [ 1025854, 1030691, 1025859, 1030696 ], [ 1030692, 1032676, 1030697, 1032681 ], [ 1032677, 1037847, 1032682, 1037852 ], [ 1037848, 1039473, 1037853, 1039478 ], [ 1039474, 1039858, 1039479, 1039854 ], [ 1039859, 1044241, 1039855, 1044246 ], [ 1044242, 1045920, 1044247, 1045925 ], [ 1045921, 1049805, 1045926, 1049809 ], [ 1049806, 1050388, 1049810, 1050392 ], [ 1050389, 1053286, 1050393, 1053291 ], [ 1053287, 1053309, 1053292, 1053314 ], [ 1053310, 1054643, 1053315, 1054648 ], [ 1054644, 1056252, 1054649, 1056256 ], [ 1056253, 1056527, 1056257, 1056532 ], [ 1056528, 1056947, 1056533, 1056943 ], [ 1056948, 1058682, 1056944, 1058687 ], [ 1058683, 1059297, 1058688, 1059302 ], [ 1059298, 1060416, 1059303, 1060421 ], [ 1060417, 1061894, 1060422, 1061898 ], [ 1061895, 1064234, 1061899, 1064239 ], [ 1064235, 1064848, 1064240, 1064853 ], [ 1064849, 1065434, 1064854, 1065439 ], [ 1065435, 1075046, 1065440, 1075042 ], [ 1075047, 1075642, 1075043, 1075647 ], [ 1075643, 1076325, 1075648, 1076330 ], [ 1076326, 1076534, 1076331, 1076539 ], [ 1076535, 1078135, 1076540, 1078139 ], [ 1078136, 1078866, 1078140, 1078871 ], [ 1078867, 1079914, 1078872, 1079910 ], [ 1079915, 1080537, 1079911, 1080542 ], [ 1080538, 1082144, 1080543, 1082149 ], [ 1082145, 1083416, 1082150, 1083420 ], [ 1083417, 1085746, 1083421, 1085751 ], [ 1085747, 1087087, 1085752, 1087092 ], [ 1087088, 1088273, 1087093, 1088278 ], [ 1088274, 1092093, 1088279, 1092097 ], [ 1092094, 1093062, 1092098, 1093067 ], [ 1093063, 1096867, 1093068, 1096872 ], [ 1096868, 1097288, 1096873, 1097292 ], [ 1097289, 1102488, 1097293, 1102493 ], [ 1102489, 1102751, 1102494, 1102747 ], [ 1102752, 1104366, 1102748, 1104362 ], [ 1104367, 1106371, 1104363, 1106376 ], [ 1106372, 1108123, 1106377, 1108128 ], [ 1108124, 1112263, 1108129, 1112259 ], [ 1112264, 1113311, 1112260, 1113316 ], [ 1113312, 1114557, 1113317, 1114562 ], [ 1114558, 1117715, 1114563, 1117719 ], [ 1117716, 1118552, 1117720, 1118556 ], [ 1118553, 1120566, 1118557, 1120571 ], [ 1120567, 1121004, 1120572, 1121009 ], [ 1121005, 1121076, 1121010, 1121080 ], [ 1121077, 1121609, 1121081, 1121613 ], [ 1121610, 1121694, 1121614, 1121698 ], [ 1121695, 1122501, 1121699, 1122506 ], [ 1122502, 1130582, 1122507, 1130587 ], [ 1130583, 1132170, 1130588, 1132175 ], [ 1132171, 1135259, 1132176, 1135263 ], [ 1135260, 1136119, 1135264, 1136123 ], [ 1136120, 1137316, 1136124, 1137320 ], [ 1137317, 1140126, 1137321, 1140131 ], [ 1140127, 1142998, 1140132, 1143002 ], [ 1142999, 1143361, 1143003, 1143366 ], [ 1143362, 1143637, 1143367, 1143633 ], [ 1143638, 1143644, 1143634, 1143648 ], [ 1143645, 1146618, 1143649, 1146622 ], [ 1146619, 1149205, 1146623, 1149210 ], [ 1149206, 1149331, 1149211, 1149336 ], [ 1149332, 1152263, 1149337, 1152259 ], [ 1152264, 1152809, 1152260, 1152805 ], [ 1152810, 1154382, 1152806, 1154386 ], [ 1154383, 1156272, 1154387, 1156277 ], [ 1156273, 1159968, 1156278, 1159972 ], [ 1159969, 1161624, 1159973, 1161629 ], [ 1161625, 1163044, 1161630, 1163048 ], [ 1163045, 1164030, 1163049, 1164026 ], [ 1164031, 1166628, 1164027, 1166632 ], [ 1166629, 1167897, 1166633, 1167901 ], [ 1167898, 1171353, 1167902, 1171358 ], [ 1171354, 1171934, 1171359, 1171939 ], [ 1171935, 1172114, 1171940, 1172119 ], [ 1172115, 1173373, 1172120, 1173369 ], [ 1173374, 1175421, 1173370, 1175417 ], [ 1175422, 1179850, 1175418, 1179854 ], [ 1179851, 1181462, 1179855, 1181466 ], [ 1181463, 1185368, 1181467, 1185373 ], [ 1185369, 1193993, 1185374, 1193998 ], [ 1193994, 1194272, 1193999, 1194277 ], [ 1194273, 1195941, 1194278, 1195945 ], [ 1195942, 1197920, 1195946, 1197925 ], [ 1197921, 1199011, 1197926, 1199007 ], [ 1199012, 1200079, 1199008, 1200075 ], [ 1200080, 1200373, 1200076, 1200378 ], [ 1200374, 1200419, 1200379, 1200423 ], [ 1200420, 1200597, 1200424, 1200602 ], [ 1200598, 1200714, 1200603, 1200719 ], [ 1200715, 1203674, 1200720, 1203679 ], [ 1203675, 1204865, 1203680, 1204870 ], [ 1204866, 1205330, 1204871, 1205335 ], [ 1205331, 1206317, 1205336, 1206321 ], [ 1206318, 1210727, 1206322, 1210732 ], [ 1210728, 1211881, 1210733, 1211886 ], [ 1211882, 1214283, 1211887, 1214288 ], [ 1214284, 1215542, 1214289, 1215546 ], [ 1215543, 1216981, 1215547, 1216986 ], [ 1216982, 1220754, 1216987, 1220750 ], [ 1220755, 1221554, 1220751, 1221550 ], [ 1221555, 1222528, 1221551, 1222532 ], [ 1222529, 1223522, 1222533, 1223527 ], [ 1223523, 1223938, 1223528, 1223942 ], [ 1223939, 1226908, 1223943, 1226912 ], [ 1226909, 1227062, 1226913, 1227066 ], [ 1227063, 1228205, 1227067, 1228210 ], [ 1228206, 1229934, 1228211, 1229938 ], [ 1229935, 1236067, 1229939, 1236072 ], [ 1236068, 1236265, 1236073, 1236270 ], [ 1236266, 1239904, 1236271, 1239908 ], [ 1239905, 1239969, 1239909, 1239974 ], [ 1239970, 1240641, 1239975, 1240646 ], [ 1240642, 1244451, 1240647, 1244447 ], [ 1244452, 1244738, 1244448, 1244743 ], [ 1244739, 1244821, 1244744, 1244826 ], [ 1244822, 1247339, 1244827, 1247343 ], [ 1247340, 1248889, 1247344, 1248885 ], [ 1248890, 1250671, 1248886, 1250675 ], [ 1250672, 1254643, 1250676, 1254647 ], [ 1254644, 1255112, 1254648, 1255108 ], [ 1255113, 1257527, 1255109, 1257523 ], [ 1257528, 1264487, 1257524, 1264491 ], [ 1264488, 1269317, 1264492, 1269321 ], [ 1269318, 1272971, 1269322, 1272976 ], [ 1272972, 1276524, 1272977, 1276529 ], [ 1276525, 1281881, 1276530, 1281877 ], [ 1281882, 1281933, 1281878, 1281937 ], [ 1281934, 1287297, 1281938, 1287301 ], [ 1287298, 1287557, 1287302, 1287561 ], [ 1287558, 1290344, 1287562, 1290349 ], [ 1290345, 1292253, 1290350, 1292258 ], [ 1292254, 1293482, 1292259, 1293487 ], [ 1293483, 1295919, 1293488, 1295924 ], [ 1295920, 1302576, 1295925, 1302580 ], [ 1302577, 1302834, 1302581, 1302839 ], [ 1302835, 1302920, 1302840, 1302924 ], [ 1302921, 1303091, 1302925, 1303095 ], [ 1303092, 1303464, 1303096, 1303469 ], [ 1303465, 1306801, 1303470, 1306805 ], [ 1306802, 1307555, 1306806, 1307559 ], [ 1307556, 1309308, 1307560, 1309313 ], [ 1309309, 1311482, 1309314, 1311487 ], [ 1311483, 1312493, 1311488, 1312498 ], [ 1312494, 1316488, 1312499, 1316493 ], [ 1316489, 1318127, 1316494, 1318132 ], [ 1318128, 1325643, 1318133, 1325648 ], [ 1325644, 1328313, 1325649, 1328318 ], [ 1328314, 1329159, 1328319, 1329155 ], [ 1329160, 1332129, 1329156, 1332133 ], [ 1332130, 1332245, 1332134, 1332249 ], [ 1332246, 1332269, 1332250, 1332265 ], [ 1332270, 1332837, 1332266, 1332841 ], [ 1332838, 1334227, 1332842, 1334223 ], [ 1334228, 1345348, 1334224, 1345353 ], [ 1345349, 1346662, 1345354, 1346666 ], [ 1346663, 1347480, 1346667, 1347485 ], [ 1347481, 1348458, 1347486, 1348463 ], [ 1348459, 1350595, 1348464, 1350600 ], [ 1350596, 1350770, 1350601, 1350775 ], [ 1350771, 1351954, 1350776, 1351959 ], [ 1351955, 1356474, 1351960, 1356479 ], [ 1356475, 1359703, 1356480, 1359699 ], [ 1359704, 1361176, 1359700, 1361180 ], [ 1361177, 1362756, 1361181, 1362761 ], [ 1362757, 1368544, 1362762, 1368549 ], [ 1368545, 1370442, 1368550, 1370438 ], [ 1370443, 1373317, 1370439, 1373321 ], [ 1373318, 1374484, 1373322, 1374488 ], [ 1374485, 1377993, 1374489, 1377998 ], [ 1377994, 1379610, 1377999, 1379615 ], [ 1379611, 1380792, 1379616, 1380796 ], [ 1380793, 1381209, 1380797, 1381205 ], [ 1381210, 1386893, 1381206, 1386897 ], [ 1386894, 1391551, 1386898, 1391556 ], [ 1391552, 1393799, 1391557, 1393803 ], [ 1393800, 1395841, 1393804, 1395846 ], [ 1395842, 1397239, 1395847, 1397243 ], [ 1397240, 1401721, 1397244, 1401726 ], [ 1401722, 1403822, 1401727, 1403818 ], [ 1403823, 1406871, 1403819, 1406876 ], [ 1406872, 1407926, 1406877, 1407922 ], [ 1407927, 1408482, 1407923, 1408486 ], [ 1408483, 1409484, 1408487, 1409480 ], [ 1409485, 1410252, 1409481, 1410256 ], [ 1410253, 1411041, 1410257, 1411046 ], [ 1411042, 1417851, 1411047, 1417856 ], [ 1417852, 1419058, 1417857, 1419063 ], [ 1419059, 1419370, 1419064, 1419366 ], [ 1419371, 1419429, 1419367, 1419433 ], [ 1419430, 1426518, 1419434, 1426522 ], [ 1426519, 1428120, 1426523, 1428125 ], [ 1428121, 1428584, 1428126, 1428589 ], [ 1428585, 1430135, 1428590, 1430139 ], [ 1430136, 1430700, 1430140, 1430705 ], [ 1430701, 1436904, 1430706, 1436908 ], [ 1436905, 1438278, 1436909, 1438283 ], [ 1438279, 1441717, 1438284, 1441721 ], [ 1441718, 1443084, 1441722, 1443089 ], [ 1443085, 1444668, 1443090, 1444673 ], [ 1444669, 1444866, 1444674, 1444871 ], [ 1444867, 1444914, 1444872, 1444919 ], [ 1444915, 1445093, 1444920, 1445098 ], [ 1445094, 1446216, 1445099, 1446221 ], [ 1446217, 1448333, 1446222, 1448329 ], [ 1448334, 1448518, 1448330, 1448523 ], [ 1448519, 1449362, 1448524, 1449358 ], [ 1449363, 1449444, 1449359, 1449440 ], [ 1449445, 1452860, 1449441, 1452865 ], [ 1452861, 1454246, 1452866, 1454251 ], [ 1454247, 1455021, 1454252, 1455017 ], [ 1455022, 1455414, 1455018, 1455419 ], [ 1455415, 1460976, 1455420, 1460981 ], [ 1460977, 1461164, 1460982, 1461169 ], [ 1461165, 1461294, 1461170, 1461298 ], [ 1461295, 1463675, 1461299, 1463680 ], [ 1463676, 1463710, 1463681, 1463714 ], [ 1463711, 1465339, 1463715, 1465344 ], [ 1465340, 1469872, 1465345, 1469877 ], [ 1469873, 1471479, 1469878, 1471484 ], [ 1471480, 1471922, 1471485, 1471926 ], [ 1471923, 1472450, 1471927, 1472454 ], [ 1472451, 1472745, 1472455, 1472750 ], [ 1472746, 1479208, 1472751, 1479213 ], [ 1479209, 1480831, 1479214, 1480836 ], [ 1480832, 1483483, 1480837, 1483479 ], [ 1483484, 1485359, 1483480, 1485364 ], [ 1485360, 1485530, 1485365, 1485535 ], [ 1485531, 1485675, 1485536, 1485679 ], [ 1485676, 1486004, 1485680, 1486009 ], [ 1486005, 1487314, 1486010, 1487319 ], [ 1487315, 1491008, 1487320, 1491013 ], [ 1491009, 1492068, 1491014, 1492073 ], [ 1492069, 1492190, 1492074, 1492194 ], [ 1492191, 1493001, 1492195, 1493006 ], [ 1493002, 1495524, 1493007, 1495529 ], [ 1495525, 1498599, 1495530, 1498604 ], [ 1498600, 1499384, 1498605, 1499389 ], [ 1499385, 1500494, 1499390, 1500499 ], [ 1500495, 1504828, 1500500, 1504833 ], [ 1504829, 1506224, 1504834, 1506228 ], [ 1506225, 1506798, 1506229, 1506802 ], [ 1506799, 1508452, 1506803, 1508456 ], [ 1508453, 1509790, 1508457, 1509795 ], [ 1509791, 1512050, 1509796, 1512055 ], [ 1512051, 1514922, 1512056, 1514927 ], [ 1514923, 1515140, 1514928, 1515145 ], [ 1515141, 1515194, 1515146, 1515199 ], [ 1515195, 1515647, 1515200, 1515652 ], [ 1515648, 1516602, 1515653, 1516607 ], [ 1516603, 1517689, 1516608, 1517694 ], [ 1517690, 1519324, 1517695, 1519329 ], [ 1519325, 1524288, 1519330, 1524293 ], [ 1524289, 1524809, 1524294, 1524814 ], [ 1524810, 1524838, 1524815, 1524842 ], [ 1524839, 1525934, 1524843, 1525939 ], [ 1525935, 1526325, 1525940, 1526330 ], [ 1526326, 1527046, 1526331, 1527051 ], [ 1527047, 1527437, 1527052, 1527441 ], [ 1527438, 1528800, 1527442, 1528805 ], [ 1528801, 1529067, 1528806, 1529072 ], [ 1529068, 1529127, 1529073, 1529132 ], [ 1529128, 1532112, 1529133, 1532108 ], [ 1532113, 1533431, 1532109, 1533435 ], [ 1533432, 1536262, 1533436, 1536267 ], [ 1536263, 1543858, 1536268, 1543863 ], [ 1543859, 1547883, 1543864, 1547887 ], [ 1547884, 1550923, 1547888, 1550927 ], [ 1550924, 1550941, 1550928, 1550945 ], [ 1550942, 1551904, 1550946, 1551900 ], [ 1551905, 1554015, 1551901, 1554020 ], [ 1554016, 1554903, 1554021, 1554907 ], [ 1554904, 1555315, 1554908, 1555320 ], [ 1555316, 1558476, 1555321, 1558481 ], [ 1558477, 1560403, 1558482, 1560408 ], [ 1560404, 1564152, 1560409, 1564157 ], [ 1564153, 1565868, 1564158, 1565873 ], [ 1565869, 1566075, 1565874, 1566080 ], [ 1566076, 1572396, 1566081, 1572392 ], [ 1572397, 1572715, 1572393, 1572720 ], [ 1572716, 1573897, 1572721, 1573901 ], [ 1573898, 1575566, 1573902, 1575571 ], [ 1575567, 1575840, 1575572, 1575845 ], [ 1575841, 1575957, 1575846, 1575962 ], [ 1575958, 1577744, 1575963, 1577748 ], [ 1577745, 1578588, 1577749, 1578593 ], [ 1578589, 1580138, 1578594, 1580134 ], [ 1580139, 1582760, 1580135, 1582764 ], [ 1582761, 1587557, 1582765, 1587562 ], [ 1587558, 1588891, 1587563, 1588896 ], [ 1588892, 1590824, 1588897, 1590828 ], [ 1590825, 1591786, 1590829, 1591790 ], [ 1591787, 1597227, 1591791, 1597232 ], [ 1597228, 1597262, 1597233, 1597267 ], [ 1597263, 1606974, 1597268, 1606979 ], [ 1606975, 1608871, 1606980, 1608875 ], [ 1608872, 1613363, 1608876, 1613367 ], [ 1613364, 1613512, 1613368, 1613517 ], [ 1613513, 1613900, 1613518, 1613905 ], [ 1613901, 1614931, 1613906, 1614936 ], [ 1614932, 1616478, 1614937, 1616482 ], [ 1616479, 1620090, 1616483, 1620094 ], [ 1620091, 1620971, 1620095, 1620976 ], [ 1620972, 1625931, 1620977, 1625936 ], [ 1625932, 1635578, 1625937, 1635583 ], [ 1635579, 1636949, 1635584, 1636954 ], [ 1636950, 1642076, 1636955, 1642081 ], [ 1642077, 1643227, 1642082, 1643232 ], [ 1643228, 1643451, 1643233, 1643456 ], [ 1643452, 1643568, 1643457, 1643573 ], [ 1643569, 1651406, 1643574, 1651411 ], [ 1651407, 1651474, 1651412, 1651479 ], [ 1651475, 1656665, 1651480, 1656669 ], [ 1656666, 1660688, 1656670, 1660693 ], [ 1660689, 1662867, 1660694, 1662871 ], [ 1662868, 1665846, 1662872, 1665851 ], [ 1665847, 1667026, 1665852, 1667031 ], [ 1667027, 1669021, 1667032, 1669025 ], [ 1669022, 1669975, 1669026, 1669979 ], [ 1669976, 1675465, 1669980, 1675470 ], [ 1675466, 1679164, 1675471, 1679169 ], [ 1679165, 1681962, 1679170, 1681967 ], [ 1681963, 1688016, 1681968, 1688021 ], [ 1688017, 1690659, 1688022, 1690664 ], [ 1690660, 1692872, 1690665, 1692877 ], [ 1692873, 1697102, 1692878, 1697107 ], [ 1697103, 1698132, 1697108, 1698137 ], [ 1698133, 1698208, 1698138, 1698212 ], [ 1698209, 1703429, 1698213, 1703434 ], [ 1703430, 1705101, 1703435, 1705105 ], [ 1705102, 1705881, 1705106, 1705885 ], [ 1705882, 1706057, 1705886, 1706062 ], [ 1706058, 1708138, 1706063, 1708142 ], [ 1708139, 1708683, 1708143, 1708688 ], [ 1708684, 1712200, 1708689, 1712204 ], [ 1712201, 1720884, 1712205, 1720889 ], [ 1720885, 1721218, 1720890, 1721223 ], [ 1721219, 1725289, 1721224, 1725294 ], [ 1725290, 1726034, 1725295, 1726038 ], [ 1726035, 1726495, 1726039, 1726500 ], [ 1726496, 1728646, 1726501, 1728651 ], [ 1728647, 1729060, 1728652, 1729065 ], [ 1729061, 1732801, 1729066, 1732806 ], [ 1732802, 1733308, 1732807, 1733313 ], [ 1733309, 1734471, 1733314, 1734476 ], [ 1734472, 1738054, 1734477, 1738058 ], [ 1738055, 1738256, 1738059, 1738260 ], [ 1738257, 1740942, 1738261, 1740947 ], [ 1740943, 1744762, 1740948, 1744767 ], [ 1744763, 1746379, 1744768, 1746384 ], [ 1746380, 1746672, 1746385, 1746668 ], [ 1746673, 1747144, 1746669, 1747149 ], [ 1747145, 1751310, 1747150, 1751306 ], [ 1751311, 1753062, 1751307, 1753067 ], [ 1753063, 1754367, 1753068, 1754372 ], [ 1754368, 1758860, 1754373, 1758864 ], [ 1758861, 1763444, 1758865, 1763449 ], [ 1763445, 1769087, 1763450, 1769091 ], [ 1769088, 1769105, 1769092, 1769109 ], [ 1769106, 1769947, 1769110, 1769951 ], [ 1769948, 1772285, 1769952, 1772289 ], [ 1772286, 1774404, 1772290, 1774408 ], [ 1774405, 1776003, 1774409, 1776007 ], [ 1776004, 1777420, 1776008, 1777425 ], [ 1777421, 1782626, 1777426, 1782631 ], [ 1782627, 1784342, 1782632, 1784347 ], [ 1784343, 1784549, 1784348, 1784554 ], [ 1784550, 1785362, 1784555, 1785366 ], [ 1785363, 1791189, 1785367, 1791194 ], [ 1791190, 1792209, 1791195, 1792213 ], [ 1792210, 1793878, 1792214, 1793883 ], [ 1793879, 1794152, 1793884, 1794157 ], [ 1794153, 1794269, 1794158, 1794274 ], [ 1794270, 1794972, 1794275, 1794977 ], [ 1794973, 1796163, 1794978, 1796168 ], [ 1796164, 1800687, 1796169, 1800683 ], [ 1800688, 1802296, 1800684, 1802301 ], [ 1802297, 1804730, 1802302, 1804734 ], [ 1804731, 1805422, 1804735, 1805426 ], [ 1805423, 1805729, 1805427, 1805734 ], [ 1805730, 1806305, 1805735, 1806310 ], [ 1806306, 1806755, 1806311, 1806759 ], [ 1806756, 1806879, 1806760, 1806875 ], [ 1806880, 1810512, 1806876, 1810517 ], [ 1810513, 1816402, 1810518, 1816407 ], [ 1816403, 1826227, 1816408, 1826232 ], [ 1826228, 1826701, 1826233, 1826706 ], [ 1826702, 1827720, 1826707, 1827725 ], [ 1827721, 1833845, 1827726, 1833849 ], [ 1833846, 1836707, 1833850, 1836712 ], [ 1836708, 1836926, 1836713, 1836931 ], [ 1836927, 1838667, 1836932, 1838672 ], [ 1838668, 1838935, 1838673, 1838939 ], [ 1838936, 1842071, 1838940, 1842075 ], [ 1842072, 1842664, 1842076, 1842668 ], [ 1842665, 1843220, 1842669, 1843225 ], [ 1843221, 1843829, 1843226, 1843834 ], [ 1843830, 1845044, 1843835, 1845048 ], [ 1845045, 1846577, 1845049, 1846582 ], [ 1846578, 1848717, 1846583, 1848721 ], [ 1848718, 1849125, 1848722, 1849130 ], [ 1849126, 1850237, 1849131, 1850242 ], [ 1850238, 1851708, 1850243, 1851713 ], [ 1851709, 1853436, 1851714, 1853441 ], [ 1853437, 1853475, 1853442, 1853480 ], [ 1853476, 1853493, 1853481, 1853498 ], [ 1853494, 1854900, 1853499, 1854905 ], [ 1854901, 1854987, 1854906, 1854991 ], [ 1854988, 1861797, 1854992, 1861802 ], [ 1861798, 1862267, 1861803, 1862272 ], [ 1862268, 1866445, 1862273, 1866450 ], [ 1866446, 1866700, 1866451, 1866705 ], [ 1866701, 1870035, 1866706, 1870039 ], [ 1870036, 1870143, 1870040, 1870148 ], [ 1870144, 1870675, 1870149, 1870680 ], [ 1870676, 1871498, 1870681, 1871502 ], [ 1871499, 1873369, 1871503, 1873373 ], [ 1873370, 1877824, 1873374, 1877828 ], [ 1877825, 1880920, 1877829, 1880916 ], [ 1880921, 1881704, 1880917, 1881709 ], [ 1881705, 1882659, 1881710, 1882664 ], [ 1882660, 1884008, 1882665, 1884013 ], [ 1884009, 1885076, 1884014, 1885081 ], [ 1885077, 1888274, 1885082, 1888270 ], [ 1888275, 1897857, 1888271, 1897862 ], [ 1897858, 1907112, 1897863, 1907116 ], [ 1907113, 1907770, 1907117, 1907774 ], [ 1907771, 1912119, 1907775, 1912123 ], [ 1912120, 1913812, 1912124, 1913816 ], [ 1913813, 1913938, 1913817, 1913942 ], [ 1913939, 1922086, 1913943, 1922090 ], [ 1922087, 1924874, 1922091, 1924878 ], [ 1924875, 1929326, 1924879, 1929322 ], [ 1929327, 1931549, 1929323, 1931554 ], [ 1931550, 1931660, 1931555, 1931665 ], [ 1931661, 1936680, 1931666, 1936685 ], [ 1936681, 1938633, 1936686, 1938637 ], [ 1938634, 1938835, 1938638, 1938840 ], [ 1938836, 1939367, 1938841, 1939372 ], [ 1939368, 1941696, 1939373, 1941700 ], [ 1941697, 1943301, 1941701, 1943305 ], [ 1943302, 1944718, 1943306, 1944723 ], [ 1944719, 1949924, 1944724, 1949929 ], [ 1949925, 1951640, 1949930, 1951645 ], [ 1951641, 1951847, 1951646, 1951852 ], [ 1951848, 1952660, 1951853, 1952664 ], [ 1952661, 1953731, 1952665, 1953735 ], [ 1953732, 1958495, 1953736, 1958500 ], [ 1958496, 1959515, 1958501, 1959519 ], [ 1959516, 1961184, 1959520, 1961189 ], [ 1961185, 1961458, 1961190, 1961463 ], [ 1961459, 1963223, 1961464, 1963228 ], [ 1963224, 1964535, 1963229, 1964540 ], [ 1964536, 1964578, 1964541, 1964583 ], [ 1964579, 1965726, 1964584, 1965731 ], [ 1965727, 1967936, 1965732, 1967940 ], [ 1967937, 1970524, 1967941, 1970528 ], [ 1970525, 1971307, 1970529, 1971303 ], [ 1971308, 1971952, 1971304, 1971956 ], [ 1971953, 1971961, 1971957, 1971965 ], [ 1971962, 1974876, 1971966, 1974880 ], [ 1974877, 1975723, 1974881, 1975728 ], [ 1975724, 1977141, 1975729, 1977137 ], [ 1977142, 1983495, 1977138, 1983500 ], [ 1983496, 1985266, 1983501, 1985270 ], [ 1985267, 1989041, 1985271, 1989046 ], [ 1989042, 1991504, 1989047, 1991508 ], [ 1991505, 1991939, 1991509, 1991944 ], [ 1991940, 1994134, 1991945, 1994139 ], [ 1994135, 1997719, 1994140, 1997723 ], [ 1997720, 1997804, 1997724, 1997800 ], [ 1997805, 2002856, 1997801, 2002860 ], [ 2002857, 2006390, 2002861, 2006395 ], [ 2006391, 2006681, 2006396, 2006686 ], [ 2006682, 2012753, 2006687, 2012758 ], [ 2012754, 2020299, 2012759, 2020304 ], [ 2020300, 2021594, 2020305, 2021599 ], [ 2021595, 2023023, 2021600, 2023027 ], [ 2023024, 2026978, 2023028, 2026974 ], [ 2026979, 2035653, 2026975, 2035658 ], [ 2035654, 2035969, 2035659, 2035973 ], [ 2035970, 2038524, 2035974, 2038528 ], [ 2038525, 2040783, 2038529, 2040787 ], [ 2040784, 2042430, 2040788, 2042434 ], [ 2042431, 2043961, 2042435, 2043966 ], [ 2043962, 2044411, 2043967, 2044416 ], [ 2044412, 2045320, 2044417, 2045325 ], [ 2045321, 2045652, 2045326, 2045648 ], [ 2045653, 2046593, 2045649, 2046598 ], [ 2046594, 2058014, 2046599, 2058019 ], [ 2058015, 2058163, 2058020, 2058167 ], [ 2058164, 2058262, 2058168, 2058267 ], [ 2058263, 2059250, 2058268, 2059246 ], [ 2059251, 2061616, 2059247, 2061621 ], [ 2061617, 2067334, 2061622, 2067339 ], [ 2067335, 2069059, 2067340, 2069064 ], [ 2069060, 2073142, 2069065, 2073147 ], [ 2073143, 2074555, 2073148, 2074560 ], [ 2074556, 2074634, 2074561, 2074639 ], [ 2074635, 2076234, 2074640, 2076238 ], [ 2076235, 2076422, 2076239, 2076427 ], [ 2076423, 2080648, 2076428, 2080652 ], [ 2080649, 2081937, 2080653, 2081942 ], [ 2081938, 2082042, 2081943, 2082047 ], [ 2082043, 2082408, 2082048, 2082413 ], [ 2082409, 2086395, 2082414, 2086399 ], [ 2086396, 2088062, 2086400, 2088066 ], [ 2088063, 2088207, 2088067, 2088211 ], [ 2088208, 2092297, 2088212, 2092301 ], [ 2092298, 2093453, 2092302, 2093449 ], [ 2093454, 2094661, 2093450, 2094666 ], [ 2094662, 2098386, 2094667, 2098390 ], [ 2098387, 2105404, 2098391, 2105408 ], [ 2105405, 2105556, 2105409, 2105561 ], [ 2105557, 2106153, 2105562, 2106158 ], [ 2106154, 2107284, 2106159, 2107288 ], [ 2107285, 2110444, 2107289, 2110448 ], [ 2110445, 2110523, 2110449, 2110527 ], [ 2110524, 2113282, 2110528, 2113287 ], [ 2113283, 2114197, 2113288, 2114202 ], [ 2114198, 2119617, 2114203, 2119621 ], [ 2119618, 2124245, 2119622, 2124250 ], [ 2124246, 2126629, 2124251, 2126634 ], [ 2126630, 2127367, 2126635, 2127372 ], [ 2127368, 2131057, 2127373, 2131061 ], [ 2131058, 2131854, 2131062, 2131859 ], [ 2131855, 2134278, 2131860, 2134282 ], [ 2134279, 2134456, 2134283, 2134460 ], [ 2134457, 2137761, 2134461, 2137765 ], [ 2137762, 2138481, 2137766, 2138486 ], [ 2138482, 2139541, 2138487, 2139545 ], [ 2139542, 2140084, 2139546, 2140089 ], [ 2140085, 2151397, 2140090, 2151402 ], [ 2151398, 2154116, 2151403, 2154121 ], [ 2154117, 2158877, 2154122, 2158881 ], [ 2158878, 2158886, 2158882, 2158890 ], [ 2158887, 2160314, 2158891, 2160318 ], [ 2160315, 2164531, 2160319, 2164536 ], [ 2164532, 2164999, 2164537, 2165004 ], [ 2165000, 2166190, 2165005, 2166195 ], [ 2166191, 2168535, 2166196, 2168540 ], [ 2168536, 2168652, 2168541, 2168657 ], [ 2168653, 2168876, 2168658, 2168881 ], [ 2168877, 2169179, 2168882, 2169175 ], [ 2169180, 2170247, 2169176, 2170243 ], [ 2170248, 2175197, 2170244, 2175202 ], [ 2175198, 2176568, 2175203, 2176573 ], [ 2176569, 2185419, 2176574, 2185424 ], [ 2185420, 2187124, 2185425, 2187128 ], [ 2187125, 2188632, 2187129, 2188636 ], [ 2188633, 2194633, 2188637, 2194637 ], [ 2194634, 2196521, 2194638, 2196525 ], [ 2196522, 2197400, 2196526, 2197396 ], [ 2197401, 2198074, 2197397, 2198079 ], [ 2198075, 2200597, 2198080, 2200601 ], [ 2200598, 2202376, 2200602, 2202380 ], [ 2202377, 2203542, 2202381, 2203546 ], [ 2203543, 2205716, 2203547, 2205721 ], [ 2205717, 2206482, 2205722, 2206487 ], [ 2206483, 2209774, 2206488, 2209778 ], [ 2209775, 2214819, 2209779, 2214824 ], [ 2214820, 2215255, 2214825, 2215260 ], [ 2215256, 2216910, 2215261, 2216915 ], [ 2216911, 2219477, 2216916, 2219482 ], [ 2219478, 2219751, 2219483, 2219756 ], [ 2219752, 2221421, 2219757, 2221425 ], [ 2221422, 2222602, 2221426, 2222607 ], [ 2222603, 2222929, 2222608, 2222925 ], [ 2222930, 2224016, 2222926, 2224021 ], [ 2224017, 2228441, 2224022, 2228445 ], [ 2228442, 2229253, 2228446, 2229258 ], [ 2229254, 2229460, 2229259, 2229465 ], [ 2229461, 2231176, 2229466, 2231181 ], [ 2231177, 2236382, 2231182, 2236387 ], [ 2236383, 2238087, 2236388, 2238091 ], [ 2238088, 2239596, 2238092, 2239600 ], [ 2239597, 2242924, 2239601, 2242920 ], [ 2242925, 2243879, 2242921, 2243883 ], [ 2243880, 2243897, 2243884, 2243901 ], [ 2243898, 2245581, 2243902, 2245586 ], [ 2245582, 2245719, 2245587, 2245724 ], [ 2245720, 2245761, 2245725, 2245766 ], [ 2245762, 2249902, 2245767, 2249907 ], [ 2249903, 2254722, 2249908, 2254727 ], [ 2254723, 2254803, 2254728, 2254807 ], [ 2254804, 2257149, 2254808, 2257153 ], [ 2257150, 2262668, 2257154, 2262673 ], [ 2262669, 2267507, 2262674, 2267503 ], [ 2267508, 2270370, 2267504, 2270366 ], [ 2270371, 2276333, 2270367, 2276338 ], [ 2276334, 2277071, 2276339, 2277075 ], [ 2277072, 2278349, 2277076, 2278354 ], [ 2278350, 2278595, 2278355, 2278600 ], [ 2278596, 2281303, 2278601, 2281307 ], [ 2281304, 2282039, 2281308, 2282044 ], [ 2282040, 2286122, 2282045, 2286126 ], [ 2286123, 2295986, 2286127, 2295990 ], [ 2295987, 2296911, 2295991, 2296907 ], [ 2296912, 2300006, 2296908, 2300010 ], [ 2300007, 2309292, 2300011, 2309297 ], [ 2309293, 2309737, 2309298, 2309742 ], [ 2309738, 2312320, 2309743, 2312324 ], [ 2312321, 2314845, 2312325, 2314850 ], [ 2314846, 2315016, 2314851, 2315021 ], [ 2315017, 2320047, 2315022, 2320052 ], [ 2320048, 2320645, 2320053, 2320650 ], [ 2320646, 2326925, 2320651, 2326921 ], [ 2326926, 2330437, 2326922, 2330442 ], [ 2330438, 2335656, 2330443, 2335660 ], [ 2335657, 2338082, 2335661, 2338087 ], [ 2338083, 2343729, 2338088, 2343725 ], [ 2343730, 2345465, 2343726, 2345470 ], [ 2345466, 2345517, 2345471, 2345521 ], [ 2345518, 2347136, 2345522, 2347140 ], [ 2347137, 2347233, 2347141, 2347238 ], [ 2347234, 2348720, 2347239, 2348725 ], [ 2348721, 2351324, 2348726, 2351329 ], [ 2351325, 2352448, 2351330, 2352453 ], [ 2352449, 2353999, 2352454, 2354004 ], [ 2354000, 2354134, 2354005, 2354138 ], [ 2354135, 2359046, 2354139, 2359051 ], [ 2359047, 2361149, 2359052, 2361154 ], [ 2361150, 2374039, 2361155, 2374044 ], [ 2374040, 2382502, 2374045, 2382506 ], [ 2382503, 2385349, 2382507, 2385354 ], [ 2385350, 2388585, 2385355, 2388590 ], [ 2388586, 2391734, 2388591, 2391739 ], [ 2391735, 2392141, 2391740, 2392146 ], [ 2392142, 2393939, 2392147, 2393944 ], [ 2393940, 2395026, 2393945, 2395031 ], [ 2395027, 2395860, 2395032, 2395865 ], [ 2395861, 2398211, 2395866, 2398216 ], [ 2398212, 2398326, 2398217, 2398331 ], [ 2398327, 2400696, 2398332, 2400700 ], [ 2400697, 2402303, 2400701, 2402308 ], [ 2402304, 2405335, 2402309, 2405339 ], [ 2405336, 2408154, 2405340, 2408159 ], [ 2408155, 2409936, 2408160, 2409941 ], [ 2409937, 2410353, 2409942, 2410358 ], [ 2410354, 2411021, 2410359, 2411026 ], [ 2411022, 2414614, 2411027, 2414618 ], [ 2414615, 2419571, 2414619, 2419576 ], [ 2419572, 2421744, 2419577, 2421748 ], [ 2421745, 2424488, 2421749, 2424493 ], [ 2424489, 2427895, 2424494, 2427900 ], [ 2427896, 2430604, 2427901, 2430600 ], [ 2430605, 2433794, 2430601, 2433799 ], [ 2433795, 2434280, 2433800, 2434285 ], [ 2434281, 2435465, 2434286, 2435461 ], [ 2435466, 2436129, 2435462, 2436134 ], [ 2436130, 2446339, 2436135, 2446344 ], [ 2446340, 2446355, 2446345, 2446360 ], [ 2446356, 2447550, 2446361, 2447555 ], [ 2447551, 2447589, 2447556, 2447593 ], [ 2447590, 2451279, 2447594, 2451283 ], [ 2451280, 2456375, 2451284, 2456380 ], [ 2456376, 2458676, 2456381, 2458672 ], [ 2458677, 2459075, 2458673, 2459079 ], [ 2459076, 2459685, 2459080, 2459690 ], [ 2459686, 2467707, 2459691, 2467712 ], [ 2467708, 2474920, 2467713, 2474924 ], [ 2474921, 2483809, 2474925, 2483813 ], [ 2483810, 2487345, 2483814, 2487349 ], [ 2487346, 2489626, 2487350, 2489631 ], [ 2489627, 2490030, 2489632, 2490035 ], [ 2490031, 2493086, 2490036, 2493090 ], [ 2493087, 2494181, 2493091, 2494186 ], [ 2494182, 2494578, 2494187, 2494583 ], [ 2494579, 2498330, 2494584, 2498335 ], [ 2498331, 2501619, 2498336, 2501624 ], [ 2501620, 2502148, 2501625, 2502152 ], [ 2502149, 2502774, 2502153, 2502779 ], [ 2502775, 2503405, 2502780, 2503409 ], [ 2503406, 2505440, 2503410, 2505445 ], [ 2505441, 2507840, 2505446, 2507845 ], [ 2507841, 2513737, 2507846, 2513741 ], [ 2513738, 2513953, 2513742, 2513958 ], [ 2513954, 2516708, 2513959, 2516712 ], [ 2516709, 2518482, 2516713, 2518487 ], [ 2518483, 2518510, 2518488, 2518515 ], [ 2518511, 2519154, 2518516, 2519159 ], [ 2519155, 2521663, 2519160, 2521668 ], [ 2521664, 2522690, 2521669, 2522695 ], [ 2522691, 2533188, 2522696, 2533184 ], [ 2533189, 2535156, 2533185, 2535161 ], [ 2535157, 2536302, 2535162, 2536307 ], [ 2536303, 2536525, 2536308, 2536521 ], [ 2536526, 2539683, 2536522, 2539688 ], [ 2539684, 2540838, 2539689, 2540843 ], [ 2540839, 2542188, 2540844, 2542192 ], [ 2542189, 2542542, 2542193, 2542547 ], [ 2542543, 2543529, 2542548, 2543533 ], [ 2543530, 2543833, 2543534, 2543829 ], [ 2543834, 2549711, 2543830, 2549716 ], [ 2549712, 2549979, 2549717, 2549984 ], [ 2549980, 2550376, 2549985, 2550381 ], [ 2550377, 2550442, 2550382, 2550447 ], [ 2550443, 2552498, 2550448, 2552503 ], [ 2552499, 2556237, 2552504, 2556242 ], [ 2556238, 2557871, 2556243, 2557875 ], [ 2557872, 2561281, 2557876, 2561286 ], [ 2561282, 2562381, 2561287, 2562386 ], [ 2562382, 2571576, 2562387, 2571581 ], [ 2571577, 2573918, 2571582, 2573923 ], [ 2573919, 2575854, 2573924, 2575859 ], [ 2575855, 2575961, 2575860, 2575965 ], [ 2575962, 2576170, 2575966, 2576166 ], [ 2576171, 2579045, 2576167, 2579050 ], [ 2579046, 2587793, 2579051, 2587797 ], [ 2587794, 2588693, 2587798, 2588689 ], [ 2588694, 2588728, 2588690, 2588733 ], [ 2588729, 2589643, 2588734, 2589647 ], [ 2589644, 2591930, 2589648, 2591935 ], [ 2591931, 2592492, 2591936, 2592496 ], [ 2592493, 2595383, 2592497, 2595387 ], [ 2595384, 2601304, 2595388, 2601309 ], [ 2601305, 2610396, 2601310, 2610400 ], [ 2610397, 2613914, 2610401, 2613918 ], [ 2613915, 2614812, 2613919, 2614817 ], [ 2614813, 2614839, 2614818, 2614844 ], [ 2614840, 2622328, 2614845, 2622333 ], [ 2622329, 2624051, 2622334, 2624055 ], [ 2624052, 2627903, 2624056, 2627908 ], [ 2627904, 2633758, 2627909, 2633762 ], [ 2633759, 2640020, 2633763, 2640024 ], [ 2640021, 2648431, 2640025, 2648436 ], [ 2648432, 2651846, 2648437, 2651851 ], [ 2651847, 2658412, 2651852, 2658416 ], [ 2658413, 2660586, 2658417, 2660591 ], [ 2660587, 2660643, 2660592, 2660647 ], [ 2660644, 2662994, 2660648, 2662998 ], [ 2662995, 2663434, 2662999, 2663439 ], [ 2663435, 2664288, 2663440, 2664292 ], [ 2664289, 2666257, 2664293, 2666261 ], [ 2666258, 2668305, 2666262, 2668301 ], [ 2668306, 2668367, 2668302, 2668363 ], [ 2668368, 2669373, 2668364, 2669377 ], [ 2669374, 2674481, 2669378, 2674486 ], [ 2674482, 2674949, 2674487, 2674954 ], [ 2674950, 2676096, 2674955, 2676101 ], [ 2676097, 2676139, 2676102, 2676144 ], [ 2676140, 2678485, 2676145, 2678490 ], [ 2678486, 2678602, 2678491, 2678607 ], [ 2678603, 2678826, 2678608, 2678831 ], [ 2678827, 2679129, 2678832, 2679125 ], [ 2679130, 2680196, 2679126, 2680192 ], [ 2680197, 2681278, 2680193, 2681283 ], [ 2681279, 2683258, 2681284, 2683262 ], [ 2683259, 2684926, 2683263, 2684931 ], [ 2684927, 2685205, 2684932, 2685210 ], [ 2685206, 2691894, 2685211, 2691898 ], [ 2691895, 2692493, 2691899, 2692498 ], [ 2692494, 2695364, 2692499, 2695368 ], [ 2695365, 2696872, 2695369, 2696876 ], [ 2696873, 2701148, 2696877, 2701152 ], [ 2701149, 2701165, 2701153, 2701169 ], [ 2701166, 2703577, 2701170, 2703581 ], [ 2703578, 2709273, 2703582, 2709269 ], [ 2709274, 2710251, 2709270, 2710255 ], [ 2710252, 2714956, 2710256, 2714961 ], [ 2714957, 2716135, 2714962, 2716140 ], [ 2716136, 2716265, 2716141, 2716261 ], [ 2716266, 2716930, 2716262, 2716935 ], [ 2716931, 2717350, 2716936, 2717355 ], [ 2717351, 2717774, 2717356, 2717779 ], [ 2717775, 2718590, 2717780, 2718595 ], [ 2718591, 2720379, 2718596, 2720384 ], [ 2720380, 2721786, 2720385, 2721790 ], [ 2721787, 2723960, 2721791, 2723965 ], [ 2723961, 2725032, 2723966, 2725036 ], [ 2725033, 2725167, 2725037, 2725171 ], [ 2725168, 2725658, 2725172, 2725663 ], [ 2725659, 2731050, 2725664, 2731054 ], [ 2731051, 2731980, 2731055, 2731985 ], [ 2731981, 2738172, 2731986, 2738177 ], [ 2738173, 2738501, 2738178, 2738505 ], [ 2738502, 2741368, 2738506, 2741372 ], [ 2741369, 2741523, 2741373, 2741527 ], [ 2741524, 2743821, 2741528, 2743817 ], [ 2743822, 2743903, 2743818, 2743899 ], [ 2743904, 2746696, 2743900, 2746700 ], [ 2746697, 2748433, 2746701, 2748438 ], [ 2748434, 2748596, 2748439, 2748601 ], [ 2748597, 2749549, 2748602, 2749553 ], [ 2749550, 2749729, 2749554, 2749734 ], [ 2749730, 2750963, 2749735, 2750967 ], [ 2750964, 2751613, 2750968, 2751609 ], [ 2751614, 2752944, 2751610, 2752949 ], [ 2752945, 2753953, 2752950, 2753958 ], [ 2753954, 2754044, 2753959, 2754040 ], [ 2754045, 2755376, 2754041, 2755380 ], [ 2755377, 2760599, 2755381, 2760604 ], [ 2760600, 2761236, 2760605, 2761241 ], [ 2761237, 2763346, 2761242, 2763351 ], [ 2763347, 2764218, 2763352, 2764223 ], [ 2764219, 2767629, 2764224, 2767633 ], [ 2767630, 2771002, 2767634, 2771006 ], [ 2771003, 2773348, 2771007, 2773353 ], [ 2773349, 2778556, 2773354, 2778560 ], [ 2778557, 2779108, 2778561, 2779112 ], [ 2779109, 2779707, 2779113, 2779712 ], [ 2779708, 2780639, 2779713, 2780643 ], [ 2780640, 2781795, 2780644, 2781791 ], [ 2781796, 2782076, 2781792, 2782080 ], [ 2782077, 2788770, 2782081, 2788774 ], [ 2788771, 2792111, 2788775, 2792116 ], [ 2792112, 2794418, 2792117, 2794423 ], [ 2794419, 2795818, 2794424, 2795823 ], [ 2795819, 2796261, 2795824, 2796266 ], [ 2796262, 2798929, 2796267, 2798934 ], [ 2798930, 2799454, 2798935, 2799459 ], [ 2799455, 2811616, 2799460, 2811620 ], [ 2811617, 2813416, 2811621, 2813421 ], [ 2813417, 2813479, 2813422, 2813484 ], [ 2813480, 2814107, 2813485, 2814112 ], [ 2814108, 2818929, 2814113, 2818933 ], [ 2818930, 2825357, 2818934, 2825362 ], [ 2825358, 2826820, 2825363, 2826825 ], [ 2826821, 2828096, 2826826, 2828101 ], [ 2828097, 2828932, 2828102, 2828936 ], [ 2828933, 2830088, 2828937, 2830093 ], [ 2830089, 2830687, 2830094, 2830683 ], [ 2830688, 2834601, 2830684, 2834606 ], [ 2834602, 2836621, 2834607, 2836626 ], [ 2836622, 2836884, 2836627, 2836889 ], [ 2836885, 2837913, 2836890, 2837918 ], [ 2837914, 2840393, 2837919, 2840398 ], [ 2840394, 2841225, 2840399, 2841229 ], [ 2841226, 2843391, 2841230, 2843396 ], [ 2843392, 2843629, 2843397, 2843634 ], [ 2843630, 2844665, 2843635, 2844670 ], [ 2844666, 2845263, 2844671, 2845267 ], [ 2845264, 2847821, 2845268, 2847826 ], [ 2847822, 2848450, 2847827, 2848454 ], [ 2848451, 2849601, 2848455, 2849606 ], [ 2849602, 2853189, 2849607, 2853194 ], [ 2853190, 2853798, 2853195, 2853802 ], [ 2853799, 2860428, 2853803, 2860433 ], [ 2860429, 2862152, 2860434, 2862157 ], [ 2862153, 2862729, 2862158, 2862734 ], [ 2862730, 2869033, 2862735, 2869038 ], [ 2869034, 2869157, 2869039, 2869162 ], [ 2869158, 2872699, 2869163, 2872703 ], [ 2872700, 2882082, 2872704, 2882087 ], [ 2882083, 2888775, 2882088, 2888779 ], [ 2888776, 2894091, 2888780, 2894096 ], [ 2894092, 2895090, 2894097, 2895095 ], [ 2895091, 2895718, 2895096, 2895722 ], [ 2895719, 2896669, 2895723, 2896665 ], [ 2896670, 2900119, 2896666, 2900124 ], [ 2900120, 2900555, 2900125, 2900560 ], [ 2900556, 2902167, 2900561, 2902172 ], [ 2902168, 2902210, 2902173, 2902215 ], [ 2902211, 2904556, 2902216, 2904561 ], [ 2904557, 2904673, 2904562, 2904678 ], [ 2904674, 2904897, 2904679, 2904902 ], [ 2904898, 2905200, 2904903, 2905196 ], [ 2905201, 2906268, 2905197, 2906264 ], [ 2906269, 2907797, 2906265, 2907802 ], [ 2907798, 2910456, 2907803, 2910461 ], [ 2910457, 2911608, 2910462, 2911613 ], [ 2911609, 2914587, 2911614, 2914591 ], [ 2914588, 2914744, 2914592, 2914749 ], [ 2914745, 2914779, 2914750, 2914784 ], [ 2914780, 2918124, 2914785, 2918129 ], [ 2918125, 2921020, 2918130, 2921025 ], [ 2921021, 2921458, 2921026, 2921463 ], [ 2921459, 2923338, 2921464, 2923342 ], [ 2923339, 2926580, 2923343, 2926585 ], [ 2926581, 2930570, 2926586, 2930575 ], [ 2930571, 2931452, 2930576, 2931456 ], [ 2931453, 2934873, 2931457, 2934878 ], [ 2934874, 2940294, 2934879, 2940298 ], [ 2940295, 2942883, 2940299, 2942888 ], [ 2942884, 2943246, 2942889, 2943250 ], [ 2943247, 2946109, 2943251, 2946105 ], [ 2946110, 2950823, 2946106, 2950828 ], [ 2950824, 2952204, 2950829, 2952209 ], [ 2952205, 2952842, 2952210, 2952846 ], [ 2952843, 2954108, 2952847, 2954113 ], [ 2954109, 2958788, 2954114, 2958793 ], [ 2958789, 2959652, 2958794, 2959656 ], [ 2959653, 2962338, 2959657, 2962343 ], [ 2962339, 2962634, 2962344, 2962639 ], [ 2962635, 2963567, 2962640, 2963572 ], [ 2963568, 2965512, 2963573, 2965517 ], [ 2965513, 2965542, 2965518, 2965546 ], [ 2965543, 2965715, 2965547, 2965720 ], [ 2965716, 2969537, 2965721, 2969542 ], [ 2969538, 2969667, 2969543, 2969672 ], [ 2969668, 2971097, 2969673, 2971102 ], [ 2971098, 2971329, 2971103, 2971334 ], [ 2971330, 2971441, 2971335, 2971437 ], [ 2971442, 2971588, 2971438, 2971584 ], [ 2971589, 2971874, 2971585, 2971879 ], [ 2971875, 2972098, 2971880, 2972103 ], [ 2972099, 2974355, 2972104, 2974359 ], [ 2974356, 2978342, 2974360, 2978347 ], [ 2978343, 2984060, 2978348, 2984065 ], [ 2984061, 2985968, 2984066, 2985972 ], [ 2985969, 2987338, 2985973, 2987342 ], [ 2987339, 2988924, 2987343, 2988929 ], [ 2988925, 2991267, 2988930, 2991271 ], [ 2991268, 2994739, 2991272, 2994744 ], [ 2994740, 3002088, 2994745, 3002093 ], [ 3002089, 3009887, 3002094, 3009892 ], [ 3009888, 3014827, 3009893, 3014832 ], [ 3014828, 3017608, 3014833, 3017612 ], [ 3017609, 3020196, 3017613, 3020192 ], [ 3020197, 3020885, 3020193, 3020890 ], [ 3020886, 3022261, 3020891, 3022266 ], [ 3022262, 3029543, 3022267, 3029548 ], [ 3029544, 3030265, 3029549, 3030270 ], [ 3030266, 3032363, 3030271, 3032368 ], [ 3032364, 3033161, 3032369, 3033166 ], [ 3033162, 3042175, 3033167, 3042180 ], [ 3042176, 3042389, 3042181, 3042394 ], [ 3042390, 3043311, 3042395, 3043315 ], [ 3043312, 3045343, 3043316, 3045347 ], [ 3045344, 3049663, 3045348, 3049668 ], [ 3049664, 3050210, 3049669, 3050215 ], [ 3050211, 3051389, 3050216, 3051394 ], [ 3051390, 3052128, 3051395, 3052133 ], [ 3052129, 3052883, 3052134, 3052888 ], [ 3052884, 3054679, 3052889, 3054684 ], [ 3054680, 3055955, 3054685, 3055960 ], [ 3055956, 3056024, 3055961, 3056029 ], [ 3056025, 3062859, 3056030, 3062864 ], [ 3062860, 3063276, 3062865, 3063281 ], [ 3063277, 3064101, 3063282, 3064106 ], [ 3064102, 3065135, 3064107, 3065131 ], [ 3065136, 3065575, 3065132, 3065580 ], [ 3065576, 3065710, 3065581, 3065715 ], [ 3065711, 3066590, 3065716, 3066595 ], [ 3066591, 3070621, 3066596, 3070625 ], [ 3070622, 3075292, 3070626, 3075297 ], [ 3075293, 3076853, 3075298, 3076858 ], [ 3076854, 3078695, 3076859, 3078699 ], [ 3078696, 3080566, 3078700, 3080571 ], [ 3080567, 3080707, 3080572, 3080712 ], [ 3080708, 3080747, 3080713, 3080752 ], [ 3080748, 3080916, 3080753, 3080921 ], [ 3080917, 3082200, 3080922, 3082205 ], [ 3082201, 3085986, 3082206, 3085991 ], [ 3085987, 3086111, 3085992, 3086116 ], [ 3086112, 3088509, 3086117, 3088514 ], [ 3088510, 3090889, 3088515, 3090893 ], [ 3090890, 3091108, 3090894, 3091113 ], [ 3091109, 3094298, 3091114, 3094303 ], [ 3094299, 3095542, 3094304, 3095546 ], [ 3095543, 3098254, 3095547, 3098259 ], [ 3098255, 3100958, 3098260, 3100963 ], [ 3100959, 3101111, 3100964, 3101116 ], [ 3101112, 3101710, 3101117, 3101715 ], [ 3101711, 3102184, 3101716, 3102180 ], [ 3102185, 3107070, 3102181, 3107075 ], [ 3107071, 3108550, 3107076, 3108555 ], [ 3108551, 3109638, 3108556, 3109642 ], [ 3109639, 3110010, 3109643, 3110014 ], [ 3110011, 3110596, 3110015, 3110592 ], [ 3110597, 3114705, 3110593, 3114709 ], [ 3114706, 3116752, 3114710, 3116757 ], [ 3116753, 3118943, 3116758, 3118947 ], [ 3118944, 3118963, 3118948, 3118967 ], [ 3118964, 3119656, 3118968, 3119661 ], [ 3119657, 3120436, 3119662, 3120441 ], [ 3120437, 3122936, 3120442, 3122941 ], [ 3122937, 3125746, 3122942, 3125751 ], [ 3125747, 3126385, 3125752, 3126390 ], [ 3126386, 3126792, 3126391, 3126796 ], [ 3126793, 3129487, 3126797, 3129491 ], [ 3129488, 3129525, 3129492, 3129530 ], [ 3129526, 3129570, 3129531, 3129575 ], [ 3129571, 3129752, 3129576, 3129757 ], [ 3129753, 3132349, 3129758, 3132354 ], [ 3132350, 3133454, 3132355, 3133458 ], [ 3133455, 3135493, 3133459, 3135497 ], [ 3135494, 3139904, 3135498, 3139909 ], [ 3139905, 3143832, 3139910, 3143837 ], [ 3143833, 3144573, 3143838, 3144578 ], [ 3144574, 3146787, 3144579, 3146783 ], [ 3146788, 3148252, 3146784, 3148257 ], [ 3148253, 3149799, 3148258, 3149803 ], [ 3149800, 3150053, 3149804, 3150058 ], [ 3150054, 3152136, 3150059, 3152141 ], [ 3152137, 3159529, 3152142, 3159533 ], [ 3159530, 3161578, 3159534, 3161583 ], [ 3161579, 3162165, 3161584, 3162169 ], [ 3162166, 3162816, 3162170, 3162820 ], [ 3162817, 3164826, 3162821, 3164831 ], [ 3164827, 3164943, 3164832, 3164948 ], [ 3164944, 3167365, 3164949, 3167369 ], [ 3167366, 3167563, 3167370, 3167568 ], [ 3167564, 3168258, 3167569, 3168263 ], [ 3168259, 3170295, 3168264, 3170300 ], [ 3170296, 3172533, 3170301, 3172538 ], [ 3172534, 3176792, 3172539, 3176797 ], [ 3176793, 3177637, 3176798, 3177642 ], [ 3177638, 3178306, 3177643, 3178311 ], [ 3178307, 3180657, 3178312, 3180661 ], [ 3180658, 3181485, 3180662, 3181489 ], [ 3181486, 3182629, 3181490, 3182633 ], [ 3182630, 3186857, 3182634, 3186862 ], [ 3186858, 3190084, 3186863, 3190088 ], [ 3190085, 3194127, 3190089, 3194131 ], [ 3194128, 3196386, 3194132, 3196391 ], [ 3196387, 3196711, 3196392, 3196715 ], [ 3196712, 3197840, 3196716, 3197845 ], [ 3197841, 3201756, 3197846, 3201760 ], [ 3201757, 3208054, 3201761, 3208058 ], [ 3208055, 3210828, 3208059, 3210833 ], [ 3210829, 3212953, 3210834, 3212958 ], [ 3212954, 3224927, 3212959, 3224931 ], [ 3224928, 3226292, 3224932, 3226297 ], [ 3226293, 3226762, 3226298, 3226767 ], [ 3226763, 3228540, 3226768, 3228544 ], [ 3228541, 3229224, 3228545, 3229228 ], [ 3229225, 3230485, 3229229, 3230490 ], [ 3230486, 3233260, 3230491, 3233264 ], [ 3233261, 3233743, 3233265, 3233748 ], [ 3233744, 3234567, 3233749, 3234572 ], [ 3234568, 3235194, 3234573, 3235199 ], [ 3235195, 3235499, 3235200, 3235504 ], [ 3235500, 3243889, 3235505, 3243894 ], [ 3243890, 3245053, 3243895, 3245057 ], [ 3245054, 3245941, 3245058, 3245945 ], [ 3245942, 3255842, 3245946, 3255838 ], [ 3255843, 3259251, 3255839, 3259256 ], [ 3259252, 3260019, 3259257, 3260023 ], [ 3260020, 3262432, 3260024, 3262437 ], [ 3262433, 3262672, 3262438, 3262677 ], [ 3262673, 3266221, 3262678, 3266225 ], [ 3266222, 3266686, 3266226, 3266691 ], [ 3266687, 3269046, 3266692, 3269051 ], [ 3269047, 3269620, 3269052, 3269625 ], [ 3269621, 3272036, 3269626, 3272041 ], [ 3272037, 3272516, 3272042, 3272521 ], [ 3272517, 3272796, 3272522, 3272801 ], [ 3272797, 3278978, 3272802, 3278983 ], [ 3278979, 3279467, 3278984, 3279472 ], [ 3279468, 3280283, 3279473, 3280288 ], [ 3280284, 3282531, 3280289, 3282536 ], [ 3282532, 3283074, 3282537, 3283078 ], [ 3283075, 3283850, 3283079, 3283854 ], [ 3283851, 3286305, 3283855, 3286310 ], [ 3286306, 3286606, 3286311, 3286611 ], [ 3286607, 3287302, 3286612, 3287307 ], [ 3287303, 3290573, 3287308, 3290569 ], [ 3290574, 3290776, 3290570, 3290781 ], [ 3290777, 3294305, 3290782, 3294310 ], [ 3294306, 3295201, 3294311, 3295205 ], [ 3295202, 3298010, 3295206, 3298015 ], [ 3298011, 3298799, 3298016, 3298804 ], [ 3298800, 3299744, 3298805, 3299749 ], [ 3299745, 3302402, 3299750, 3302407 ], [ 3302403, 3308399, 3302408, 3308403 ], [ 3308400, 3313629, 3308404, 3313633 ], [ 3313630, 3314750, 3313634, 3314755 ], [ 3314751, 3314777, 3314756, 3314782 ], [ 3314778, 3315543, 3314783, 3315548 ], [ 3315544, 3317191, 3315549, 3317195 ], [ 3317192, 3319495, 3317196, 3319500 ], [ 3319496, 3321744, 3319501, 3321749 ], [ 3321745, 3324685, 3321750, 3324690 ], [ 3324686, 3331190, 3324691, 3331194 ], [ 3331191, 3331241, 3331195, 3331246 ], [ 3331242, 3334147, 3331247, 3334151 ], [ 3334148, 3334546, 3334152, 3334550 ], [ 3334547, 3339219, 3334551, 3339215 ], [ 3339220, 3339452, 3339216, 3339456 ], [ 3339453, 3345624, 3339457, 3345629 ], [ 3345625, 3349642, 3345630, 3349646 ], [ 3349643, 3349729, 3349647, 3349734 ], [ 3349730, 3350579, 3349735, 3350584 ], [ 3350580, 3351752, 3350585, 3351756 ], [ 3351753, 3351958, 3351757, 3351962 ], [ 3351959, 3353994, 3351963, 3353999 ], [ 3353995, 3355238, 3354000, 3355243 ], [ 3355239, 3356216, 3355244, 3356220 ], [ 3356217, 3360120, 3356221, 3360125 ], [ 3360121, 3360807, 3360126, 3360811 ], [ 3360808, 3362361, 3360812, 3362365 ], [ 3362362, 3363476, 3362366, 3363480 ], [ 3363477, 3364333, 3363481, 3364337 ], [ 3364334, 3368828, 3364338, 3368833 ], [ 3368829, 3370235, 3368834, 3370239 ], [ 3370236, 3371627, 3370240, 3371631 ], [ 3371628, 3375963, 3371632, 3375967 ], [ 3375964, 3376335, 3375968, 3376340 ], [ 3376336, 3380722, 3376341, 3380727 ], [ 3380723, 3381309, 3380728, 3381314 ], [ 3381310, 3382080, 3381315, 3382085 ], [ 3382081, 3384674, 3382086, 3384679 ], [ 3384675, 3385188, 3384680, 3385193 ], [ 3385189, 3388713, 3385194, 3388718 ], [ 3388714, 3392079, 3388719, 3392084 ], [ 3392080, 3394131, 3392085, 3394136 ], [ 3394132, 3394381, 3394137, 3394386 ], [ 3394382, 3395362, 3394387, 3395367 ], [ 3395363, 3395419, 3395368, 3395423 ], [ 3395420, 3396615, 3395424, 3396620 ], [ 3396616, 3399453, 3396621, 3399458 ], [ 3399454, 3400585, 3399459, 3400590 ], [ 3400586, 3401981, 3400591, 3401977 ], [ 3401982, 3405494, 3401978, 3405499 ], [ 3405495, 3408302, 3405500, 3408306 ], [ 3408303, 3413040, 3408307, 3413045 ], [ 3413041, 3413955, 3413046, 3413960 ], [ 3413956, 3413965, 3413961, 3413970 ], [ 3413966, 3414900, 3413971, 3414904 ], [ 3414901, 3415450, 3414905, 3415455 ], [ 3415451, 3415543, 3415456, 3415548 ], [ 3415544, 3415693, 3415549, 3415698 ], [ 3415694, 3416832, 3415699, 3416828 ], [ 3416833, 3416866, 3416829, 3416871 ], [ 3416867, 3420715, 3416872, 3420719 ], [ 3420716, 3421885, 3420720, 3421890 ], [ 3421886, 3425027, 3421891, 3425031 ], [ 3425028, 3425142, 3425032, 3425147 ], [ 3425143, 3429164, 3425148, 3429168 ], [ 3429165, 3431958, 3429169, 3431962 ], [ 3431959, 3437815, 3431963, 3437820 ], [ 3437816, 3439239, 3437821, 3439243 ], [ 3439240, 3440231, 3439244, 3440236 ], [ 3440232, 3440415, 3440237, 3440411 ], [ 3440416, 3442385, 3440412, 3442390 ], [ 3442386, 3448532, 3442391, 3448536 ], [ 3448533, 3449018, 3448537, 3449014 ], [ 3449019, 3449766, 3449015, 3449771 ], [ 3449767, 3450597, 3449772, 3450601 ], [ 3450598, 3451510, 3450602, 3451514 ], [ 3451511, 3455799, 3451515, 3455804 ], [ 3455800, 3457236, 3455805, 3457240 ], [ 3457237, 3463488, 3457241, 3463492 ], [ 3463489, 3473047, 3463493, 3473052 ], [ 3473048, 3479777, 3473053, 3479773 ], [ 3479778, 3483375, 3479774, 3483380 ], [ 3483376, 3484793, 3483381, 3484797 ], [ 3484794, 3486395, 3484798, 3486399 ], [ 3486396, 3490246, 3486400, 3490242 ], [ 3490247, 3494295, 3490243, 3494299 ], [ 3494296, 3495597, 3494300, 3495601 ], [ 3495598, 3496721, 3495602, 3496725 ], [ 3496722, 3499522, 3496726, 3499526 ], [ 3499523, 3503195, 3499527, 3503199 ], [ 3503196, 3505500, 3503200, 3505505 ], [ 3505501, 3510463, 3505506, 3510467 ], [ 3510464, 3514082, 3510468, 3514087 ], [ 3514083, 3520619, 3514088, 3520623 ], [ 3520620, 3521049, 3520624, 3521053 ], [ 3521050, 3522092, 3521054, 3522097 ], [ 3522093, 3522111, 3522098, 3522116 ], [ 3522112, 3522311, 3522117, 3522315 ], [ 3522312, 3522767, 3522316, 3522772 ], [ 3522768, 3531627, 3522773, 3531632 ], [ 3531628, 3537780, 3531633, 3537785 ], [ 3537781, 3538989, 3537786, 3538993 ], [ 3538990, 3540393, 3538994, 3540397 ], [ 3540394, 3540644, 3540398, 3540649 ], [ 3540645, 3543017, 3540650, 3543022 ], [ 3543018, 3543842, 3543023, 3543847 ], [ 3543843, 3545546, 3543848, 3545551 ], [ 3545547, 3546913, 3545552, 3546917 ], [ 3546914, 3547077, 3546918, 3547082 ], [ 3547078, 3547149, 3547083, 3547154 ], [ 3547150, 3548533, 3547155, 3548538 ], [ 3548534, 3549829, 3548539, 3549834 ], [ 3549830, 3549935, 3549835, 3549940 ], [ 3549936, 3550203, 3549941, 3550208 ], [ 3550204, 3550227, 3550209, 3550232 ], [ 3550228, 3551480, 3550233, 3551485 ], [ 3551481, 3552210, 3551486, 3552215 ], [ 3552211, 3561260, 3552216, 3561264 ], [ 3561261, 3562847, 3561265, 3562852 ], [ 3562848, 3563068, 3562853, 3563073 ], [ 3563069, 3563973, 3563074, 3563978 ], [ 3563974, 3565130, 3563979, 3565135 ], [ 3565131, 3565565, 3565136, 3565570 ], [ 3565566, 3566132, 3565571, 3566137 ], [ 3566133, 3567018, 3566138, 3567023 ], [ 3567019, 3567627, 3567024, 3567632 ], [ 3567628, 3570838, 3567633, 3570843 ], [ 3570839, 3574124, 3570844, 3574129 ], [ 3574125, 3575098, 3574130, 3575103 ], [ 3575099, 3575586, 3575104, 3575591 ], [ 3575587, 3576602, 3575592, 3576607 ], [ 3576603, 3590383, 3576608, 3590388 ], [ 3590384, 3594697, 3590389, 3594702 ], [ 3594698, 3596771, 3594703, 3596775 ], [ 3596772, 3597307, 3596776, 3597312 ], [ 3597308, 3598184, 3597313, 3598188 ], [ 3598185, 3598837, 3598189, 3598842 ], [ 3598838, 3599839, 3598843, 3599844 ], [ 3599840, 3603578, 3599845, 3603583 ], [ 3603579, 3605362, 3603584, 3605366 ], [ 3605363, 3609322, 3605367, 3609327 ], [ 3609323, 3614549, 3609328, 3614554 ], [ 3614550, 3618664, 3614555, 3618669 ], [ 3618665, 3619899, 3618670, 3619904 ], [ 3619900, 3624824, 3619905, 3624829 ], [ 3624825, 3627309, 3624830, 3627313 ], [ 3627310, 3628159, 3627314, 3628164 ], [ 3628160, 3628423, 3628165, 3628428 ], [ 3628424, 3629521, 3628429, 3629525 ], [ 3629522, 3629584, 3629526, 3629588 ], [ 3629585, 3629833, 3629589, 3629837 ], [ 3629834, 3630285, 3629838, 3630290 ], [ 3630286, 3630486, 3630291, 3630491 ], [ 3630487, 3634237, 3630492, 3634242 ], [ 3634238, 3639209, 3634243, 3639214 ], [ 3639210, 3639649, 3639215, 3639645 ], [ 3639650, 3647161, 3639646, 3647165 ], [ 3647162, 3648227, 3647166, 3648231 ], [ 3648228, 3652188, 3648232, 3652193 ], [ 3652189, 3659027, 3652194, 3659032 ], [ 3659028, 3659079, 3659033, 3659084 ], [ 3659080, 3659207, 3659085, 3659212 ], [ 3659208, 3660377, 3659213, 3660381 ], [ 3660378, 3667507, 3660382, 3667512 ], [ 3667508, 3667741, 3667513, 3667746 ], [ 3667742, 3668429, 3667747, 3668434 ], [ 3668430, 3670470, 3668435, 3670475 ], [ 3670471, 3673148, 3670476, 3673153 ], [ 3673149, 3674083, 3673154, 3674088 ], [ 3674084, 3678214, 3674089, 3678219 ], [ 3678215, 3680677, 3678220, 3680682 ], [ 3680678, 3684205, 3680683, 3684210 ], [ 3684206, 3688887, 3684211, 3688883 ], [ 3688888, 3690753, 3688884, 3690758 ], [ 3690754, 3698350, 3690759, 3698355 ], [ 3698351, 3699287, 3698356, 3699292 ], [ 3699288, 3700655, 3699293, 3700659 ], [ 3700656, 3702298, 3700660, 3702303 ], [ 3702299, 3706224, 3702304, 3706229 ], [ 3706225, 3706948, 3706230, 3706953 ], [ 3706949, 3709050, 3706954, 3709055 ], [ 3709051, 3710401, 3709056, 3710405 ], [ 3710402, 3713151, 3710406, 3713155 ], [ 3713152, 3714587, 3713156, 3714591 ], [ 3714588, 3714794, 3714592, 3714798 ], [ 3714795, 3725109, 3714799, 3725114 ], [ 3725110, 3725822, 3725115, 3725818 ], [ 3725823, 3726413, 3725819, 3726409 ], [ 3726414, 3731356, 3726410, 3731352 ], [ 3731357, 3731514, 3731353, 3731519 ], [ 3731515, 3736239, 3731520, 3736244 ], [ 3736240, 3736361, 3736245, 3736366 ], [ 3736362, 3738698, 3736367, 3738703 ], [ 3738699, 3742314, 3738704, 3742318 ], [ 3742315, 3745350, 3742319, 3745355 ], [ 3745351, 3747516, 3745356, 3747521 ], [ 3747517, 3747787, 3747522, 3747791 ], [ 3747788, 3748248, 3747792, 3748253 ], [ 3748249, 3750382, 3748254, 3750387 ], [ 3750383, 3752531, 3750388, 3752527 ], [ 3752532, 3753792, 3752528, 3753788 ], [ 3753793, 3755999, 3753789, 3756003 ], [ 3756000, 3761090, 3756004, 3761095 ], [ 3761091, 3762084, 3761096, 3762089 ], [ 3762085, 3762616, 3762090, 3762621 ], [ 3762617, 3769672, 3762622, 3769677 ], [ 3769673, 3769809, 3769678, 3769814 ], [ 3769810, 3769962, 3769815, 3769967 ], [ 3769963, 3770651, 3769968, 3770655 ], [ 3770652, 3773164, 3770656, 3773168 ], [ 3773165, 3774893, 3773169, 3774898 ], [ 3774894, 3776296, 3774899, 3776300 ], [ 3776297, 3776322, 3776301, 3776327 ], [ 3776323, 3778089, 3776328, 3778094 ], [ 3778090, 3783192, 3778095, 3783188 ], [ 3783193, 3784214, 3783189, 3784219 ], [ 3784215, 3795080, 3784220, 3795084 ], [ 3795081, 3799015, 3795085, 3799020 ], [ 3799016, 3805176, 3799021, 3805180 ], [ 3805177, 3806065, 3805181, 3806070 ], [ 3806066, 3807768, 3806071, 3807773 ], [ 3807769, 3808991, 3807774, 3808995 ], [ 3808992, 3809153, 3808996, 3809157 ], [ 3809154, 3809882, 3809158, 3809886 ], [ 3809883, 3813249, 3809887, 3813253 ], [ 3813250, 3816682, 3813254, 3816687 ], [ 3816683, 3817044, 3816688, 3817049 ], [ 3817045, 3817768, 3817050, 3817773 ], [ 3817769, 3821121, 3817774, 3821125 ], [ 3821122, 3822167, 3821126, 3822172 ], [ 3822168, 3824032, 3822173, 3824037 ], [ 3824033, 3827315, 3824038, 3827311 ], [ 3827316, 3828542, 3827312, 3828547 ], [ 3828543, 3829917, 3828548, 3829921 ], [ 3829918, 3832118, 3829922, 3832123 ], [ 3832119, 3832453, 3832124, 3832458 ], [ 3832454, 3837573, 3832459, 3837578 ], [ 3837574, 3842658, 3837579, 3842663 ], [ 3842659, 3843673, 3842664, 3843677 ], [ 3843674, 3844593, 3843678, 3844597 ], [ 3844594, 3844599, 3844598, 3844604 ], [ 3844600, 3844788, 3844605, 3844793 ], [ 3844789, 3850826, 3844794, 3850831 ], [ 3850827, 3854490, 3850832, 3854494 ], [ 3854491, 3855395, 3854495, 3855400 ], [ 3855396, 3855755, 3855401, 3855751 ], [ 3855756, 3860530, 3855752, 3860535 ], [ 3860531, 3860887, 3860536, 3860883 ], [ 3860888, 3862913, 3860884, 3862917 ], [ 3862914, 3872850, 3862918, 3872854 ], [ 3872851, 3873056, 3872855, 3873061 ], [ 3873057, 3875254, 3873062, 3875258 ], [ 3875255, 3880197, 3875259, 3880202 ], [ 3880198, 3882479, 3880203, 3882483 ], [ 3882480, 3882820, 3882484, 3882825 ], [ 3882821, 3883310, 3882826, 3883315 ], [ 3883311, 3885744, 3883316, 3885749 ], [ 3885745, 3890942, 3885750, 3890946 ], [ 3890943, 3891218, 3890947, 3891223 ], [ 3891219, 3891354, 3891224, 3891359 ], [ 3891355, 3893483, 3891360, 3893487 ], [ 3893484, 3894419, 3893488, 3894424 ], [ 3894420, 3900500, 3894425, 3900505 ], [ 3900501, 3901493, 3900506, 3901489 ], [ 3901494, 3903140, 3901490, 3903136 ], [ 3903141, 3907313, 3903137, 3907318 ], [ 3907314, 3908755, 3907319, 3908759 ], [ 3908756, 3908919, 3908760, 3908924 ], [ 3908920, 3909807, 3908925, 3909812 ], [ 3909808, 3909977, 3909813, 3909982 ], [ 3909978, 3911959, 3909983, 3911955 ], [ 3911960, 3914009, 3911956, 3914005 ], [ 3914010, 3922759, 3914006, 3922763 ], [ 3922760, 3930585, 3922764, 3930590 ], [ 3930586, 3931083, 3930591, 3931087 ], [ 3931084, 3933248, 3931088, 3933252 ], [ 3933249, 3933363, 3933253, 3933368 ], [ 3933364, 3935176, 3933369, 3935181 ], [ 3935177, 3936871, 3935182, 3936876 ], [ 3936872, 3937525, 3936877, 3937529 ], [ 3937526, 3945198, 3937530, 3945203 ], [ 3945199, 3946390, 3945204, 3946395 ], [ 3946391, 3946986, 3946396, 3946991 ], [ 3946987, 3952348, 3946992, 3952344 ], [ 3952349, 3956408, 3952345, 3956413 ], [ 3956409, 3958333, 3956414, 3958338 ], [ 3958334, 3959031, 3958339, 3959036 ], [ 3959032, 3960932, 3959037, 3960937 ], [ 3960933, 3964190, 3960938, 3964195 ], [ 3964191, 3969413, 3964196, 3969418 ], [ 3969414, 3972146, 3969419, 3972151 ], [ 3972147, 3972344, 3972152, 3972349 ], [ 3972345, 3978065, 3972350, 3978070 ], [ 3978066, 3981977, 3978071, 3981982 ], [ 3981978, 3984768, 3981983, 3984773 ], [ 3984769, 3984918, 3984774, 3984923 ], [ 3984919, 3985704, 3984924, 3985709 ], [ 3985705, 3986544, 3985710, 3986548 ], [ 3986545, 3995454, 3986549, 3995459 ], [ 3995455, 3997410, 3995460, 3997415 ], [ 3997411, 4000982, 3997416, 4000987 ], [ 4000983, 4002297, 4000988, 4002301 ], [ 4002298, 4002514, 4002302, 4002519 ], [ 4002515, 4005689, 4002520, 4005694 ], [ 4005690, 4008445, 4005695, 4008449 ], [ 4008446, 4008591, 4008450, 4008595 ], [ 4008592, 4016078, 4008596, 4016083 ], [ 4016079, 4017237, 4016084, 4017242 ], [ 4017238, 4018412, 4017243, 4018417 ], [ 4018413, 4018717, 4018418, 4018722 ], [ 4018718, 4018835, 4018723, 4018839 ], [ 4018836, 4021930, 4018840, 4021934 ], [ 4021931, 4023281, 4021935, 4023286 ], [ 4023282, 4032076, 4023287, 4032081 ], [ 4032077, 4041164, 4032082, 4041169 ], [ 4041165, 4041295, 4041170, 4041300 ], [ 4041296, 4041920, 4041301, 4041925 ], [ 4041921, 4046747, 4041926, 4046752 ], [ 4046748, 4049001, 4046753, 4049005 ], [ 4049002, 4050345, 4049006, 4050349 ], [ 4050346, 4052606, 4050350, 4052611 ], [ 4052607, 4054469, 4052612, 4054474 ], [ 4054470, 4054629, 4054475, 4054634 ], [ 4054630, 4054968, 4054635, 4054973 ], [ 4054969, 4055333, 4054974, 4055338 ], [ 4055334, 4056058, 4055339, 4056063 ], [ 4056059, 4057967, 4056064, 4057963 ], [ 4057968, 4059084, 4057964, 4059089 ], [ 4059085, 4062508, 4059090, 4062513 ], [ 4062509, 4065383, 4062514, 4065388 ], [ 4065384, 4065588, 4065389, 4065592 ], [ 4065589, 4065643, 4065593, 4065648 ], [ 4065644, 4065670, 4065649, 4065674 ], [ 4065671, 4069949, 4065675, 4069954 ], [ 4069950, 4073053, 4069955, 4073057 ], [ 4073054, 4076608, 4073058, 4076612 ], [ 4076609, 4078343, 4076613, 4078348 ], [ 4078344, 4083896, 4078349, 4083901 ], [ 4083897, 4085884, 4083902, 4085889 ], [ 4085885, 4090152, 4085890, 4090157 ], [ 4090153, 4093908, 4090158, 4093913 ], [ 4093909, 4094118, 4093914, 4094123 ], [ 4094119, 4095249, 4094124, 4095254 ], [ 4095250, 4095382, 4095255, 4095386 ], [ 4095383, 4095456, 4095387, 4095460 ], [ 4095457, 4097811, 4095461, 4097816 ], [ 4097812, 4101896, 4097817, 4101901 ], [ 4101897, 4102182, 4101902, 4102186 ], [ 4102183, 4104214, 4102187, 4104218 ], [ 4104215, 4107088, 4104219, 4107093 ], [ 4107089, 4107551, 4107094, 4107556 ], [ 4107552, 4107580, 4107557, 4107585 ], [ 4107581, 4107671, 4107586, 4107675 ], [ 4107672, 4109180, 4107676, 4109185 ], [ 4109181, 4110537, 4109186, 4110542 ], [ 4110538, 4116188, 4110543, 4116192 ], [ 4116189, 4116491, 4116193, 4116496 ], [ 4116492, 4117909, 4116497, 4117914 ], [ 4117910, 4118469, 4117915, 4118474 ], [ 4118470, 4123661, 4118475, 4123666 ], [ 4123662, 4123865, 4123667, 4123870 ], [ 4123866, 4125127, 4123871, 4125132 ], [ 4125128, 4125135, 4125133, 4125139 ], [ 4125136, 4129187, 4125140, 4129192 ], [ 4129188, 4132972, 4129193, 4132977 ], [ 4132973, 4134272, 4132978, 4134277 ], [ 4134273, 4135470, 4134278, 4135475 ], [ 4135471, 4136837, 4135476, 4136842 ], [ 4136838, 4146292, 4136843, 4146297 ], [ 4146293, 4146443, 4146298, 4146448 ], [ 4146444, 4148935, 4146449, 4148940 ], [ 4148936, 4150723, 4148941, 4150727 ], [ 4150724, 4156926, 4150728, 4156930 ], [ 4156927, 4160939, 4156931, 4160943 ], [ 4160940, 4161425, 4160944, 4161421 ], [ 4161426, 4162265, 4161422, 4162270 ], [ 4162266, 4163096, 4162271, 4163100 ], [ 4163097, 4163991, 4163101, 4163995 ], [ 4163992, 4164781, 4163996, 4164786 ], [ 4164782, 4167637, 4164787, 4167633 ], [ 4167638, 4170898, 4167634, 4170903 ], [ 4170899, 4175178, 4170904, 4175174 ], [ 4175179, 4175563, 4175175, 4175567 ], [ 4175564, 4175870, 4175568, 4175875 ], [ 4175871, 4175960, 4175876, 4175965 ], [ 4175961, 4178040, 4175966, 4178044 ], [ 4178041, 4179795, 4178045, 4179799 ], [ 4179796, 4180869, 4179800, 4180874 ], [ 4180870, 4181479, 4180875, 4181484 ], [ 4181480, 4184808, 4181485, 4184812 ], [ 4184809, 4185316, 4184813, 4185320 ], [ 4185317, 4187692, 4185321, 4187696 ], [ 4187693, 4187785, 4187697, 4187790 ], [ 4187786, 4188293, 4187791, 4188289 ], [ 4188294, 4189984, 4188290, 4189980 ], [ 4189985, 4191368, 4189981, 4191372 ], [ 4191369, 4191989, 4191373, 4191993 ], [ 4191990, 4192187, 4191994, 4192191 ], [ 4192188, 4192423, 4192192, 4192428 ], [ 4192424, 4192625, 4192429, 4192630 ], [ 4192626, 4194747, 4192631, 4194752 ], [ 4194748, 4195600, 4194753, 4195605 ], [ 4195601, 4195842, 4195606, 4195847 ], [ 4195843, 4196615, 4195848, 4196619 ], [ 4196616, 4197634, 4196620, 4197639 ], [ 4197635, 4198036, 4197640, 4198041 ], [ 4198037, 4198447, 4198042, 4198451 ], [ 4198448, 4198840, 4198452, 4198836 ], [ 4198841, 4203137, 4198837, 4203141 ], [ 4203138, 4203155, 4203142, 4203159 ], [ 4203156, 4203840, 4203160, 4203844 ], [ 4203841, 4205660, 4203845, 4205665 ], [ 4205661, 4213510, 4205666, 4213515 ], [ 4213511, 4217238, 4213516, 4217243 ], [ 4217239, 4219123, 4217244, 4219127 ], [ 4219124, 4221217, 4219128, 4221222 ], [ 4221218, 4225159, 4221223, 4225164 ], [ 4225160, 4226183, 4225165, 4226188 ], [ 4226184, 4228901, 4226189, 4228906 ], [ 4228902, 4229490, 4228907, 4229494 ], [ 4229491, 4229774, 4229495, 4229779 ], [ 4229775, 4230559, 4229780, 4230564 ], [ 4230560, 4231316, 4230565, 4231312 ], [ 4231317, 4235572, 4231313, 4235577 ], [ 4235573, 4242029, 4235578, 4242034 ], [ 4242030, 4242765, 4242035, 4242770 ], [ 4242766, 4242869, 4242771, 4242865 ], [ 4242870, 4244054, 4242866, 4244059 ], [ 4244055, 4245120, 4244060, 4245116 ], [ 4245121, 4245334, 4245117, 4245339 ], [ 4245335, 4245565, 4245340, 4245569 ], [ 4245566, 4247741, 4245570, 4247745 ], [ 4247742, 4249019, 4247746, 4249024 ], [ 4249020, 4249100, 4249025, 4249105 ], [ 4249101, 4256705, 4249106, 4256710 ], [ 4256706, 4258303, 4256711, 4258307 ], [ 4258304, 4259509, 4258308, 4259514 ], [ 4259510, 4261410, 4259515, 4261415 ], [ 4261411, 4269352, 4261416, 4269356 ], [ 4269353, 4270525, 4269357, 4270530 ], [ 4270526, 4274109, 4270531, 4274114 ], [ 4274110, 4275137, 4274115, 4275141 ], [ 4275138, 4281838, 4275142, 4281842 ], [ 4281839, 4281849, 4281843, 4281854 ], [ 4281850, 4283260, 4281855, 4283264 ], [ 4283261, 4284055, 4283265, 4284059 ], [ 4284056, 4284552, 4284060, 4284556 ], [ 4284553, 4284762, 4284557, 4284767 ], [ 4284763, 4287988, 4284768, 4287992 ], [ 4287989, 4294545, 4287993, 4294549 ], [ 4294546, 4303042, 4294550, 4303047 ], [ 4303043, 4303258, 4303048, 4303263 ], [ 4303259, 4303753, 4303264, 4303757 ], [ 4303754, 4306632, 4303758, 4306637 ], [ 4306633, 4310014, 4306638, 4310018 ], [ 4310015, 4312216, 4310019, 4312221 ], [ 4312217, 4314890, 4312222, 4314895 ], [ 4314891, 4316460, 4314896, 4316465 ], [ 4316461, 4316626, 4316466, 4316631 ], [ 4316627, 4318092, 4316632, 4318097 ], [ 4318093, 4318604, 4318098, 4318609 ], [ 4318605, 4318772, 4318610, 4318777 ], [ 4318773, 4321424, 4318778, 4321429 ], [ 4321425, 4321489, 4321430, 4321494 ], [ 4321490, 4324980, 4321495, 4324984 ], [ 4324981, 4327535, 4324985, 4327540 ], [ 4327536, 4329548, 4327541, 4329553 ], [ 4329549, 4331509, 4329554, 4331514 ], [ 4331510, 4332147, 4331515, 4332152 ], [ 4332148, 4334445, 4332153, 4334450 ], [ 4334446, 4338820, 4334451, 4338825 ], [ 4338821, 4339739, 4338826, 4339744 ], [ 4339740, 4343682, 4339745, 4343687 ], [ 4343683, 4348264, 4343688, 4348269 ], [ 4348265, 4351770, 4348270, 4351774 ], [ 4351771, 4352206, 4351775, 4352211 ], [ 4352207, 4356621, 4352212, 4356626 ], [ 4356622, 4362927, 4356627, 4362923 ], [ 4362928, 4366184, 4362924, 4366189 ], [ 4366185, 4374134, 4366190, 4374139 ], [ 4374135, 4374749, 4374140, 4374754 ], [ 4374750, 4386564, 4374755, 4386568 ], [ 4386565, 4386786, 4386569, 4386790 ], [ 4386787, 4388049, 4386791, 4388054 ], [ 4388050, 4388548, 4388055, 4388553 ], [ 4388549, 4395145, 4388554, 4395141 ], [ 4395146, 4395214, 4395142, 4395219 ], [ 4395215, 4396301, 4395220, 4396297 ], [ 4396302, 4397725, 4396298, 4397730 ], [ 4397726, 4400829, 4397731, 4400833 ], [ 4400830, 4401917, 4400834, 4401922 ], [ 4401918, 4401962, 4401923, 4401967 ], [ 4401963, 4404470, 4401968, 4404475 ], [ 4404471, 4404603, 4404476, 4404608 ], [ 4404604, 4404677, 4404609, 4404682 ], [ 4404678, 4407845, 4404683, 4407850 ], [ 4407846, 4408723, 4407851, 4408727 ], [ 4408724, 4414102, 4408728, 4414106 ], [ 4414103, 4415693, 4414107, 4415698 ], [ 4415694, 4415957, 4415699, 4415962 ], [ 4415958, 4418305, 4415963, 4418309 ], [ 4418306, 4426334, 4418310, 4426339 ], [ 4426335, 4427143, 4426340, 4427148 ], [ 4427144, 4432989, 4427149, 4432994 ], [ 4432990, 4433640, 4432995, 4433645 ], [ 4433641, 4435236, 4433646, 4435241 ], [ 4435237, 4436966, 4435242, 4436962 ], [ 4436967, 4447677, 4436963, 4447681 ], [ 4447678, 4449373, 4447682, 4449377 ], [ 4449374, 4450981, 4449378, 4450986 ], [ 4450982, 4452121, 4450987, 4452126 ], [ 4452122, 4453471, 4452127, 4453475 ], [ 4453472, 4454417, 4453476, 4454422 ], [ 4454418, 4455164, 4454423, 4455169 ], [ 4455165, 4459226, 4455170, 4459231 ], [ 4459227, 4460531, 4459232, 4460535 ], [ 4460532, 4462863, 4460536, 4462868 ], [ 4462864, 4469714, 4462869, 4469719 ], [ 4469715, 4469976, 4469720, 4469980 ], [ 4469977, 4471008, 4469981, 4471013 ], [ 4471009, 4473114, 4471014, 4473119 ], [ 4473115, 4477852, 4473120, 4477857 ], [ 4477853, 4477874, 4477858, 4477879 ], [ 4477875, 4482921, 4477880, 4482926 ], [ 4482922, 4489809, 4482927, 4489814 ], [ 4489810, 4490912, 4489815, 4490917 ], [ 4490913, 4491974, 4490918, 4491979 ], [ 4491975, 4492157, 4491980, 4492162 ], [ 4492158, 4493614, 4492163, 4493619 ], [ 4493615, 4496829, 4493620, 4496834 ], [ 4496830, 4497697, 4496835, 4497702 ], [ 4497698, 4499157, 4497703, 4499162 ], [ 4499158, 4502248, 4499163, 4502253 ], [ 4502249, 4504493, 4502254, 4504498 ], [ 4504494, 4505336, 4504499, 4505341 ], [ 4505337, 4505567, 4505342, 4505571 ], [ 4505568, 4506680, 4505572, 4506685 ], [ 4506681, 4506961, 4506686, 4506966 ], [ 4506962, 4507601, 4506967, 4507606 ], [ 4507602, 4509484, 4507607, 4509488 ], [ 4509485, 4509536, 4509489, 4509532 ], [ 4509537, 4513351, 4509533, 4513356 ], [ 4513352, 4516356, 4513357, 4516361 ], [ 4516357, 4520650, 4516362, 4520655 ], [ 4520651, 4522948, 4520656, 4522952 ], [ 4522949, 4527592, 4522953, 4527596 ], [ 4527593, 4528820, 4527597, 4528825 ], [ 4528821, 4533216, 4528826, 4533212 ], [ 4533217, 4535971, 4533213, 4535976 ], [ 4535972, 4536320, 4535977, 4536324 ], [ 4536321, 4539874, 4536325, 4539878 ], [ 4539875, 4540172, 4539879, 4540177 ], [ 4540173, 4541756, 4540178, 4541752 ], [ 4541757, 4543862, 4541753, 4543858 ], [ 4543863, 4545833, 4543859, 4545829 ], [ 4545834, 4551230, 4545830, 4551235 ], [ 4551231, 4552997, 4551236, 4553002 ], [ 4552998, 4555491, 4553003, 4555496 ], [ 4555492, 4557748, 4555497, 4557744 ], [ 4557749, 4558033, 4557745, 4558038 ], [ 4558034, 4561276, 4558039, 4561280 ], [ 4561277, 4562123, 4561281, 4562128 ], [ 4562124, 4562284, 4562129, 4562288 ], [ 4562285, 4563100, 4562289, 4563105 ], [ 4563101, 4564382, 4563106, 4564386 ], [ 4564383, 4564639, 4564387, 4564644 ], [ 4564640, 4566006, 4564645, 4566011 ], [ 4566007, 4575457, 4566012, 4575462 ], [ 4575458, 4575809, 4575463, 4575814 ], [ 4575810, 4576253, 4575815, 4576258 ], [ 4576254, 4579647, 4576259, 4579652 ], [ 4579648, 4582407, 4579653, 4582411 ], [ 4582408, 4588823, 4582412, 4588828 ], [ 4588824, 4589254, 4588829, 4589259 ], [ 4589255, 4589933, 4589260, 4589929 ], [ 4589934, 4591334, 4589930, 4591338 ], [ 4591335, 4598670, 4591339, 4598675 ], [ 4598671, 4599152, 4598676, 4599156 ], [ 4599153, 4600707, 4599157, 4600703 ], [ 4600708, 4601105, 4600704, 4601110 ], [ 4601106, 4601445, 4601111, 4601449 ], [ 4601446, 4602566, 4601450, 4602571 ], [ 4602567, 4606392, 4602572, 4606396 ], [ 4606393, 4607895, 4606397, 4607899 ], [ 4607896, 4612068, 4607900, 4612073 ], [ 4612069, 4615603, 4612074, 4615608 ], [ 4615604, 4618026, 4615609, 4618030 ], [ 4618027, 4621061, 4618031, 4621065 ], [ 4621062, 4627887, 4621066, 4627892 ], [ 4627888, 4631394, 4627893, 4631399 ], [ 4631395, 4631631, 4631400, 4631636 ], [ 4631632, 4635963, 4631637, 4635968 ], [ 4635964, 4641129, 4635969, 4641134 ], [ 4641130, 4642188, 4641135, 4642192 ], [ 4642189, 4642980, 4642193, 4642985 ], [ 4642981, 4643635, 4642986, 4643640 ], [ 4643636, 4644147, 4643641, 4644152 ], [ 4644148, 4644814, 4644153, 4644818 ], [ 4644815, 4649017, 4644819, 4649021 ], [ 4649018, 4649332, 4649022, 4649337 ], [ 4649333, 4649464, 4649338, 4649469 ], [ 4649465, 4650584, 4649470, 4650588 ], [ 4650585, 4653836, 4650589, 4653840 ], [ 4653837, 4654913, 4653841, 4654917 ], [ 4654914, 4656366, 4654918, 4656371 ], [ 4656367, 4656864, 4656372, 4656869 ], [ 4656865, 4656933, 4656870, 4656938 ], [ 4656934, 4660056, 4656939, 4660061 ], [ 4660057, 4665881, 4660062, 4665886 ], [ 4665882, 4666341, 4665887, 4666345 ], [ 4666342, 4668837, 4666346, 4668842 ], [ 4668838, 4669896, 4668843, 4669900 ], [ 4669897, 4670841, 4669901, 4670845 ], [ 4670842, 4672873, 4670846, 4672878 ], [ 4672874, 4681462, 4672879, 4681467 ], [ 4681463, 4682181, 4681468, 4682185 ], [ 4682182, 4691924, 4682186, 4691928 ], [ 4691925, 4696368, 4691929, 4696373 ], [ 4696369, 4699474, 4696374, 4699479 ], [ 4699475, 4702530, 4699480, 4702534 ], [ 4702531, 4704523, 4702535, 4704528 ], [ 4704524, 4704899, 4704529, 4704903 ], [ 4704900, 4706008, 4704904, 4706013 ], [ 4706009, 4706120, 4706014, 4706124 ], [ 4706121, 4706510, 4706125, 4706515 ], [ 4706511, 4710334, 4706516, 4710330 ], [ 4710335, 4710765, 4710331, 4710769 ], [ 4710766, 4711295, 4710770, 4711300 ], [ 4711296, 4711543, 4711301, 4711548 ], [ 4711544, 4711935, 4711549, 4711940 ], [ 4711936, 4712790, 4711941, 4712795 ], [ 4712791, 4713126, 4712796, 4713131 ], [ 4713127, 4713287, 4713132, 4713291 ], [ 4713288, 4713730, 4713292, 4713735 ], [ 4713731, 4717619, 4713736, 4717624 ], [ 4717620, 4718789, 4717625, 4718785 ], [ 4718790, 4722736, 4718786, 4722740 ], [ 4722737, 4724224, 4722741, 4724229 ], [ 4724225, 4725868, 4724230, 4725873 ], [ 4725869, 4727653, 4725874, 4727658 ], [ 4727654, 4729069, 4727659, 4729074 ], [ 4729070, 4730833, 4729075, 4730838 ], [ 4730834, 4733099, 4730839, 4733104 ], [ 4733100, 4733576, 4733105, 4733581 ], [ 4733577, 4735011, 4733582, 4735015 ], [ 4735012, 4735924, 4735016, 4735928 ], [ 4735925, 4736754, 4735929, 4736759 ], [ 4736755, 4737511, 4736760, 4737507 ], [ 4737512, 4737991, 4737508, 4737995 ], [ 4737992, 4740307, 4737996, 4740311 ], [ 4740308, 4741684, 4740312, 4741689 ], [ 4741685, 4744830, 4741690, 4744835 ], [ 4744831, 4746768, 4744836, 4746773 ], [ 4746769, 4749037, 4746774, 4749042 ], [ 4749038, 4749801, 4749043, 4749806 ], [ 4749802, 4749864, 4749807, 4749869 ], [ 4749865, 4750090, 4749870, 4750094 ], [ 4750091, 4750966, 4750095, 4750971 ], [ 4750967, 4751355, 4750972, 4751359 ], [ 4751356, 4751493, 4751360, 4751497 ], [ 4751494, 4752345, 4751498, 4752349 ], [ 4752346, 4752965, 4752350, 4752970 ], [ 4752966, 4753059, 4752971, 4753063 ], [ 4753060, 4754115, 4753064, 4754119 ], [ 4754116, 4754237, 4754120, 4754242 ], [ 4754238, 4757191, 4754243, 4757196 ], [ 4757192, 4761010, 4757197, 4761014 ], [ 4761011, 4762052, 4761015, 4762057 ], [ 4762053, 4762604, 4762058, 4762600 ], [ 4762605, 4764164, 4762601, 4764169 ], [ 4764165, 4766341, 4764170, 4766346 ], [ 4766342, 4767519, 4766347, 4767524 ], [ 4767520, 4769451, 4767525, 4769456 ], [ 4769452, 4770366, 4769457, 4770371 ], [ 4770367, 4774504, 4770372, 4774509 ], [ 4774505, 4777167, 4774510, 4777171 ], [ 4777168, 4778234, 4777172, 4778238 ], [ 4778235, 4779310, 4778239, 4779315 ], [ 4779311, 4784157, 4779316, 4784161 ], [ 4784158, 4784713, 4784162, 4784718 ], [ 4784714, 4784960, 4784719, 4784965 ], [ 4784961, 4789181, 4784966, 4789186 ], [ 4789182, 4792300, 4789187, 4792304 ], [ 4792301, 4792894, 4792305, 4792899 ], [ 4792895, 4804321, 4792900, 4804326 ], [ 4804322, 4807780, 4804327, 4807785 ], [ 4807781, 4808367, 4807786, 4808372 ], [ 4808368, 4811025, 4808373, 4811030 ], [ 4811026, 4811936, 4811031, 4811940 ], [ 4811937, 4812883, 4811941, 4812887 ], [ 4812884, 4813062, 4812888, 4813067 ], [ 4813063, 4813117, 4813068, 4813121 ], [ 4813118, 4822160, 4813122, 4822165 ], [ 4822161, 4830957, 4822166, 4830961 ], [ 4830958, 4831410, 4830962, 4831414 ], [ 4831411, 4832326, 4831415, 4832330 ], [ 4832327, 4833156, 4832331, 4833161 ], [ 4833157, 4834005, 4833162, 4834001 ], [ 4834006, 4834485, 4834002, 4834489 ], [ 4834486, 4839621, 4834490, 4839626 ], [ 4839622, 4844592, 4839627, 4844588 ], [ 4844593, 4853316, 4844589, 4853321 ], [ 4853317, 4853434, 4853322, 4853438 ], [ 4853435, 4853704, 4853439, 4853708 ], [ 4853705, 4862268, 4853709, 4862273 ], [ 4862269, 4862689, 4862274, 4862694 ], [ 4862690, 4863453, 4862695, 4863458 ], [ 4863454, 4863657, 4863459, 4863662 ], [ 4863658, 4865285, 4863663, 4865289 ], [ 4865286, 4867215, 4865290, 4867220 ], [ 4867216, 4867943, 4867221, 4867948 ], [ 4867944, 4870367, 4867949, 4870372 ], [ 4870368, 4871260, 4870373, 4871265 ], [ 4871261, 4871925, 4871266, 4871930 ], [ 4871926, 4872824, 4871931, 4872829 ], [ 4872825, 4877259, 4872830, 4877263 ], [ 4877260, 4877520, 4877264, 4877524 ], [ 4877521, 4879935, 4877525, 4879940 ], [ 4879936, 4880330, 4879941, 4880334 ], [ 4880331, 4881593, 4880335, 4881598 ], [ 4881594, 4882087, 4881599, 4882092 ], [ 4882088, 4889351, 4882093, 4889356 ], [ 4889352, 4890443, 4889357, 4890448 ], [ 4890444, 4892812, 4890449, 4892816 ], [ 4892813, 4893056, 4892817, 4893052 ], [ 4893057, 4893088, 4893053, 4893092 ], [ 4893089, 4897454, 4893093, 4897458 ], [ 4897455, 4898485, 4897459, 4898490 ], [ 4898486, 4901057, 4898491, 4901062 ], [ 4901058, 4904245, 4901063, 4904250 ], [ 4904246, 4904281, 4904251, 4904285 ], [ 4904282, 4904668, 4904286, 4904673 ], [ 4904669, 4904984, 4904674, 4904989 ], [ 4904985, 4907736, 4904990, 4907740 ], [ 4907737, 4908072, 4907741, 4908076 ], [ 4908073, 4908096, 4908077, 4908100 ], [ 4908097, 4908561, 4908101, 4908565 ], [ 4908562, 4913327, 4908566, 4913323 ], [ 4913328, 4914224, 4913324, 4914229 ], [ 4914225, 4916537, 4914230, 4916542 ], [ 4916538, 4918099, 4916543, 4918103 ], [ 4918100, 4919908, 4918104, 4919913 ], [ 4919909, 4926663, 4919914, 4926668 ], [ 4926664, 4929329, 4926669, 4929334 ], [ 4929330, 4929560, 4929335, 4929564 ], [ 4929561, 4930673, 4929565, 4930678 ], [ 4930674, 4930954, 4930679, 4930959 ], [ 4930955, 4931594, 4930960, 4931599 ], [ 4931595, 4932928, 4931600, 4932932 ], [ 4932929, 4937282, 4932933, 4937287 ], [ 4937283, 4939667, 4937288, 4939672 ], [ 4939668, 4940551, 4939673, 4940555 ], [ 4940552, 4940908, 4940556, 4940912 ], [ 4940909, 4941837, 4940913, 4941842 ], [ 4941838, 4946673, 4941843, 4946677 ], [ 4946674, 4947030, 4946678, 4947035 ], [ 4947031, 4948839, 4947036, 4948843 ], [ 4948840, 4951071, 4948844, 4951076 ], [ 4951072, 4953999, 4951077, 4954004 ], [ 4954000, 4955481, 4954005, 4955486 ], [ 4955482, 4959224, 4955487, 4959229 ], [ 4959225, 4967229, 4959230, 4967233 ], [ 4967230, 4972603, 4967234, 4972607 ], [ 4972604, 4974094, 4972608, 4974098 ], [ 4974095, 4974624, 4974099, 4974629 ], [ 4974625, 4975683, 4974630, 4975687 ], [ 4975684, 4976599, 4975688, 4976603 ], [ 4976600, 4977429, 4976604, 4977434 ], [ 4977430, 4978186, 4977435, 4978182 ], [ 4978187, 4978666, 4978183, 4978670 ], [ 4978667, 4984448, 4978671, 4984453 ], [ 4984449, 4986670, 4984454, 4986675 ], [ 4986671, 4987571, 4986676, 4987575 ], [ 4987572, 4989614, 4987576, 4989618 ], [ 4989615, 4990698, 4989619, 4990702 ], [ 4990699, 4992038, 4990703, 4992043 ], [ 4992039, 4993566, 4992044, 4993570 ], [ 4993567, 4993811, 4993571, 4993816 ], [ 4993812, 4994767, 4993817, 4994771 ], [ 4994768, 4995631, 4994772, 4995636 ], [ 4995632, 4996624, 4995637, 4996629 ], [ 4996625, 4996668, 4996630, 4996673 ], [ 4996669, 4997203, 4996674, 4997207 ], [ 4997204, 4998818, 4997208, 4998823 ], [ 4998819, 5004186, 4998824, 5004191 ], [ 5004187, 5004432, 5004192, 5004436 ], [ 5004433, 5008758, 5004437, 5008754 ], [ 5008759, 5013598, 5008755, 5013603 ], [ 5013599, 5016180, 5013604, 5016185 ], [ 5016181, 5016267, 5016186, 5016271 ], [ 5016268, 5017625, 5016272, 5017629 ], [ 5017626, 5018455, 5017630, 5018460 ], [ 5018456, 5019212, 5018461, 5019208 ], [ 5019213, 5019692, 5019209, 5019696 ], [ 5019693, 5026663, 5019697, 5026667 ], [ 5026664, 5026770, 5026668, 5026775 ], [ 5026771, 5028841, 5026776, 5028846 ], [ 5028842, 5030486, 5028847, 5030490 ], [ 5030487, 5031862, 5030491, 5031867 ], [ 5031863, 5036331, 5031868, 5036336 ], [ 5036332, 5037861, 5036337, 5037866 ], [ 5037862, 5038887, 5037867, 5038892 ], [ 5038888, 5040440, 5038893, 5040445 ], [ 5040441, 5042902, 5040446, 5042907 ], [ 5042903, 5044827, 5042908, 5044832 ], [ 5044828, 5050524, 5044833, 5050529 ], [ 5050525, 5053866, 5050530, 5053871 ], [ 5053867, 5054707, 5053872, 5054712 ], [ 5054708, 5055021, 5054713, 5055026 ], [ 5055022, 5057873, 5055027, 5057878 ], [ 5057874, 5059653, 5057879, 5059657 ], [ 5059654, 5059734, 5059658, 5059739 ], [ 5059735, 5061548, 5059740, 5061553 ], [ 5061549, 5063342, 5061554, 5063347 ], [ 5063343, 5064119, 5063348, 5064124 ], [ 5064120, 5064638, 5064125, 5064643 ], [ 5064639, 5066316, 5064644, 5066320 ], [ 5066317, 5068774, 5066321, 5068779 ], [ 5068775, 5069157, 5068780, 5069162 ], [ 5069158, 5069375, 5069163, 5069380 ], [ 5069376, 5071533, 5069381, 5071538 ], [ 5071534, 5072259, 5071539, 5072264 ], [ 5072260, 5072332, 5072265, 5072337 ], [ 5072333, 5074288, 5072338, 5074293 ], [ 5074289, 5078241, 5074294, 5078237 ], [ 5078242, 5084929, 5078238, 5084933 ], [ 5084930, 5087508, 5084934, 5087513 ], [ 5087509, 5088409, 5087514, 5088414 ], [ 5088410, 5093963, 5088415, 5093968 ], [ 5093964, 5097754, 5093969, 5097750 ], [ 5097755, 5098261, 5097751, 5098266 ], [ 5098262, 5100144, 5098267, 5100148 ], [ 5100145, 5102713, 5100149, 5102709 ], [ 5102714, 5105308, 5102710, 5105312 ], [ 5105309, 5110195, 5105313, 5110199 ], [ 5110196, 5116037, 5110200, 5116042 ], [ 5116038, 5116647, 5116043, 5116652 ], [ 5116648, 5119282, 5116653, 5119287 ], [ 5119283, 5121619, 5119288, 5121623 ], [ 5121620, 5122889, 5121624, 5122893 ], [ 5122890, 5125691, 5122894, 5125695 ], [ 5125692, 5125943, 5125696, 5125947 ], [ 5125944, 5132940, 5125948, 5132945 ], [ 5132941, 5133068, 5132946, 5133072 ], [ 5133069, 5133405, 5133073, 5133410 ], [ 5133406, 5134558, 5133411, 5134563 ], [ 5134559, 5138091, 5134564, 5138095 ], [ 5138092, 5138432, 5138096, 5138437 ], [ 5138433, 5138944, 5138438, 5138949 ], [ 5138945, 5139157, 5138950, 5139162 ], [ 5139158, 5139587, 5139163, 5139592 ], [ 5139588, 5142617, 5139593, 5142622 ], [ 5142618, 5148183, 5142623, 5148188 ], [ 5148184, 5148672, 5148189, 5148677 ], [ 5148673, 5150053, 5148678, 5150058 ], [ 5150054, 5151087, 5150059, 5151092 ], [ 5151088, 5153217, 5151093, 5153222 ], [ 5153218, 5154383, 5153223, 5154388 ], [ 5154384, 5154947, 5154389, 5154951 ], [ 5154948, 5155016, 5154952, 5155021 ], [ 5155017, 5156599, 5155022, 5156604 ], [ 5156600, 5157802, 5156605, 5157807 ], [ 5157803, 5157970, 5157808, 5157975 ], [ 5157971, 5160625, 5157976, 5160630 ], [ 5160626, 5162852, 5160631, 5162857 ], [ 5162853, 5164824, 5162858, 5164829 ], [ 5164825, 5171077, 5164830, 5171082 ], [ 5171078, 5176566, 5171083, 5176570 ], [ 5176567, 5180104, 5176571, 5180108 ], [ 5180105, 5180591, 5180109, 5180596 ], [ 5180592, 5184142, 5180597, 5184138 ], [ 5184143, 5188235, 5184139, 5188240 ], [ 5188236, 5190430, 5188241, 5190434 ], [ 5190431, 5194013, 5190435, 5194018 ], [ 5194014, 5194634, 5194019, 5194638 ], [ 5194635, 5199338, 5194639, 5199342 ], [ 5199339, 5200363, 5199343, 5200368 ], [ 5200364, 5200380, 5200369, 5200385 ], [ 5200381, 5207919, 5200386, 5207923 ], [ 5207920, 5210484, 5207924, 5210480 ], [ 5210485, 5211327, 5210481, 5211331 ], [ 5211328, 5212227, 5211332, 5212231 ], [ 5212228, 5212416, 5212232, 5212421 ], [ 5212417, 5214189, 5212422, 5214194 ], [ 5214190, 5218448, 5214195, 5218453 ], [ 5218449, 5221514, 5218454, 5221519 ], [ 5221515, 5222166, 5221520, 5222170 ], [ 5222167, 5222235, 5222171, 5222239 ], [ 5222236, 5222405, 5222240, 5222410 ], [ 5222406, 5223121, 5222411, 5223126 ], [ 5223122, 5225062, 5223127, 5225067 ], [ 5225063, 5227034, 5225068, 5227039 ], [ 5227035, 5230060, 5227040, 5230064 ], [ 5230061, 5235911, 5230065, 5235915 ], [ 5235912, 5237062, 5235916, 5237067 ], [ 5237063, 5238549, 5237068, 5238554 ], [ 5238550, 5239941, 5238555, 5239946 ], [ 5239942, 5241160, 5239947, 5241165 ], [ 5241161, 5243887, 5241166, 5243891 ], [ 5243888, 5244364, 5243892, 5244368 ], [ 5244365, 5245009, 5244369, 5245014 ], [ 5245010, 5245420, 5245015, 5245425 ], [ 5245421, 5246665, 5245426, 5246670 ], [ 5246666, 5246882, 5246671, 5246887 ], [ 5246883, 5252321, 5246888, 5252326 ], [ 5252322, 5258113, 5252327, 5258117 ], [ 5258114, 5259300, 5258118, 5259304 ], [ 5259301, 5261182, 5259305, 5261187 ], [ 5261183, 5261507, 5261188, 5261503 ], [ 5261508, 5268860, 5261504, 5268864 ], [ 5268861, 5269941, 5268865, 5269945 ], [ 5269942, 5270945, 5269946, 5270949 ], [ 5270946, 5271167, 5270950, 5271171 ], [ 5271168, 5273111, 5271172, 5273116 ], [ 5273112, 5273132, 5273117, 5273137 ], [ 5273133, 5273680, 5273138, 5273685 ], [ 5273681, 5274282, 5273686, 5274287 ], [ 5274283, 5277485, 5274288, 5277490 ], [ 5277486, 5278602, 5277491, 5278607 ], [ 5278603, 5279312, 5278608, 5279316 ], [ 5279313, 5279965, 5279317, 5279969 ], [ 5279966, 5286668, 5279970, 5286673 ], [ 5286669, 5288844, 5286674, 5288849 ], [ 5288845, 5292505, 5288850, 5292509 ], [ 5292506, 5295145, 5292510, 5295149 ], [ 5295146, 5295426, 5295150, 5295431 ], [ 5295427, 5296411, 5295432, 5296407 ], [ 5296412, 5299331, 5296408, 5299336 ], [ 5299332, 5299740, 5299337, 5299745 ], [ 5299741, 5302074, 5299746, 5302079 ], [ 5302075, 5304927, 5302080, 5304932 ], [ 5304928, 5309929, 5304933, 5309925 ], [ 5309930, 5311027, 5309926, 5311023 ], [ 5311028, 5313142, 5311024, 5313138 ], [ 5313143, 5313612, 5313139, 5313616 ], [ 5313613, 5316196, 5313617, 5316201 ], [ 5316197, 5320424, 5316202, 5320428 ], [ 5320425, 5321884, 5320429, 5321889 ], [ 5321885, 5327443, 5321890, 5327448 ], [ 5327444, 5333009, 5327449, 5333014 ], [ 5333010, 5335558, 5333015, 5335563 ], [ 5335559, 5337934, 5335564, 5337939 ], [ 5337935, 5340296, 5337940, 5340301 ], [ 5340297, 5341322, 5340302, 5341326 ], [ 5341323, 5342510, 5341327, 5342515 ], [ 5342511, 5343834, 5342516, 5343839 ], [ 5343835, 5351733, 5343840, 5351737 ], [ 5351734, 5355838, 5351738, 5355842 ], [ 5355839, 5359283, 5355843, 5359288 ], [ 5359284, 5362430, 5359289, 5362434 ], [ 5362431, 5362544, 5362435, 5362549 ], [ 5362545, 5364468, 5362550, 5364472 ], [ 5364469, 5369245, 5364473, 5369241 ], [ 5369246, 5375729, 5369242, 5375734 ], [ 5375730, 5375790, 5375735, 5375795 ], [ 5375791, 5377253, 5375796, 5377258 ], [ 5377254, 5378696, 5377259, 5378701 ], [ 5378697, 5382060, 5378702, 5382065 ], [ 5382061, 5388003, 5382066, 5388007 ], [ 5388004, 5388224, 5388008, 5388229 ], [ 5388225, 5389155, 5388230, 5389159 ], [ 5389156, 5391633, 5389160, 5391638 ], [ 5391634, 5399712, 5391639, 5399716 ], [ 5399713, 5401417, 5399717, 5401422 ], [ 5401418, 5406297, 5401423, 5406301 ], [ 5406298, 5406537, 5406302, 5406542 ], [ 5406538, 5406850, 5406543, 5406854 ], [ 5406851, 5408637, 5406855, 5408642 ], [ 5408638, 5410093, 5408643, 5410097 ], [ 5410094, 5417270, 5410098, 5417275 ], [ 5417271, 5419353, 5417276, 5419358 ], [ 5419354, 5420144, 5419359, 5420149 ], [ 5420145, 5420216, 5420150, 5420221 ], [ 5420217, 5420740, 5420222, 5420745 ], [ 5420741, 5423081, 5420746, 5423085 ], [ 5423082, 5424075, 5423086, 5424079 ], [ 5424076, 5425137, 5424080, 5425141 ], [ 5425138, 5427477, 5425142, 5427482 ], [ 5427478, 5429323, 5427483, 5429328 ], [ 5429324, 5429631, 5429329, 5429635 ], [ 5429632, 5432849, 5429636, 5432853 ], [ 5432850, 5432874, 5432854, 5432878 ], [ 5432875, 5440584, 5432879, 5440588 ], [ 5440585, 5440833, 5440589, 5440837 ], [ 5440834, 5441634, 5440838, 5441639 ], [ 5441635, 5446602, 5441640, 5446606 ], [ 5446603, 5448663, 5446607, 5448668 ], [ 5448664, 5452231, 5448669, 5452236 ], [ 5452232, 5453350, 5452237, 5453354 ], [ 5453351, 5455164, 5453355, 5455168 ], [ 5455165, 5458274, 5455169, 5458279 ], [ 5458275, 5459524, 5458280, 5459529 ], [ 5459525, 5468509, 5459530, 5468514 ], [ 5468510, 5469773, 5468515, 5469778 ], [ 5469774, 5474040, 5469779, 5474044 ], [ 5474041, 5475379, 5474045, 5475384 ], [ 5475380, 5476063, 5475385, 5476068 ], [ 5476064, 5477860, 5476069, 5477865 ], [ 5477861, 5478124, 5477866, 5478129 ], [ 5478125, 5478577, 5478130, 5478582 ], [ 5478578, 5479176, 5478583, 5479181 ], [ 5479177, 5483012, 5479182, 5483017 ], [ 5483013, 5483809, 5483018, 5483814 ], [ 5483810, 5490967, 5483815, 5490963 ], [ 5490968, 5491739, 5490964, 5491743 ], [ 5491740, 5495234, 5491744, 5495239 ], [ 5495235, 5498449, 5495240, 5498449 ] ] def _cut(seq) cuts = Bio::RestrictionEnzyme::Analysis.cut(seq, "SacI", "EcoRI", "BstEII", {:view_ranges => true}) end end #TestEcoliO157H7_3enzymes end #module TestRestrictionEnzymeAnalysisCutLong bio-2.0.3/sample/demo_kegg_glycan.rb0000644000175000017500000000277614141516614016716 0ustar nileshnilesh# # = sample/demo_kegg_glycan.rb - demonstration of Bio::KEGG::GLYCAN # # Copyright:: Copyright (C) 2004 Toshiaki Katayama # License:: The Ruby License # # # == Description # # Demonstration of Bio::KEGG::GLYCAN, a parser class for the KEGG GLYCAN # glycome informatics database. # # == Usage # # Specify files containing KEGG GLYCAN data. # # $ ruby demo_kegg_glycan.rb files... # # == Example of running this script # # Download test data. # # $ ruby -Ilib bin/br_biofetch.rb glycan G00001 > G00001.glycan # $ ruby -Ilib bin/br_biofetch.rb glycan G00024 > G00024.glycan # # Run this script. # # $ ruby -Ilib sample/demo_kegg_glycan.rb G00001.glycan G00024.glycan # # == Development information # # The code was moved from lib/bio/db/kegg/glycan.rb and modified. # require 'bio' Bio::FlatFile.foreach(Bio::KEGG::GLYCAN, ARGF) do |gl| #entry = ARGF.read # gl:G00024 #gl = Bio::KEGG::GLYCAN.new(entry) puts "### gl = Bio::KEGG::GLYCAN.new(str)" puts "# gl.entry_id" p gl.entry_id puts "# gl.name" p gl.name puts "# gl.composition" p gl.composition puts "# gl.mass" p gl.mass puts "# gl.keggclass" p gl.keggclass #puts "# gl.bindings" #p gl.bindings puts "# gl.compounds" p gl.compounds puts "# gl.reactions" p gl.reactions puts "# gl.pathways" p gl.pathways puts "# gl.enzymes" p gl.enzymes puts "# gl.orthologs" p gl.orthologs puts "# gl.references" p gl.references puts "# gl.dblinks" p gl.dblinks puts "# gl.kcf" p gl.kcf puts "=" * 78 end bio-2.0.3/sample/genes2pep.rb0000755000175000017500000000167314141516614015326 0ustar nileshnilesh#!/usr/bin/env ruby # # genes2nuc.rb - convert KEGG/GENES entry into FASTA format (nuc) # # Copyright (C) 2001 KATAYAMA Toshiaki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: genes2pep.rb,v 0.4 2002/06/23 20:21:56 k Exp $ # require 'bio/db/kegg/genes' require 'bio/extend' include Bio while gets(KEGG::GENES::DELIMITER) genes = KEGG::GENES.new($_) next if genes.aalen == 0 puts ">#{genes.entry_id} #{genes.definition}" puts genes.aaseq.fold(60+12, 12) end bio-2.0.3/sample/gbtab2mysql.rb0000755000175000017500000000776114141516614015671 0ustar nileshnilesh#!/usr/bin/env ruby # # gbtab2mysql.rb - load tab delimited GenBank data files into MySQL # # Copyright (C) 2002 KATAYAMA Toshiaki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # $Id: gbtab2mysql.rb,v 1.3 2002/06/25 19:30:26 k Exp $ # require 'dbi' $schema_ent = < 20 unless dbh.tables.include?(table) create_table(dbh, table) tables.push(table) end load_tab(dbh, base, table) end merge_table(dbh, tables) end $stderr.puts Time.now bio-2.0.3/sample/fastq2html.rb0000644000175000017500000000410514141516614015511 0ustar nileshnilesh#!/usr/bin/env ruby # # fastq2html.rb - HTML visualization of FASTQ sequences # # Usage: # # % ruby fastq2html.rb seq00.fastq > seq00.html # # # Copyright:: Copyright (C) 2019 BioRuby Project # Copyright (C) 2005 Mitsuteru C. Nakao # License:: The Ruby License # # require 'bio' # thickness to color def thickness2color(t) c = "%02X" % ((t * 255.0).to_i) c * 3 end # Creates def create_score2color_hashes h_bg = {} h_char = {} cutoff_low = 0 cutoff_high = 50 range = cutoff_high - cutoff_low sc_min = -5 sc_max = 100 (sc_min..sc_max).each do |i| t = if i <= cutoff_low then 0.0 elsif i >= cutoff_high then 1.0 else (i - cutoff_low).to_f / range end h_bg[i] = thickness2color(t) h_char[i] = thickness2color((t > 0.3) ? 0.0 : 0.55) end h_bg.default = h_bg[cutoff_low] h_char.default = h_char[cutoff_low] [h_bg, h_char] end # Color code from quality score SCORE2COLOR_BG, SCORE2COLOR_CHAR = create_score2color_hashes # returns folded sequence with
. def br(i, width = 80) return "" if i % width == 0 "" end # returns sequence html doc def display(naseq, scores) html = '

' postfix = '' i = 0 naseq.each_char.with_index do |c, i| sc = scores[i] bgcol = SCORE2COLOR_BG[sc] col = SCORE2COLOR_CHAR[sc] prefix = %Q() html += prefix + c + postfix html += br(i += 1) end html + '

' end # returns colorized html doc def fastq2html(definition, naseq, scores) html = display(naseq, scores) return ['
', "
>#{CGI.escapeHTML(definition)}
", html, '
'] end title = 'Sequences with quality scores' puts ['', '
', '', title, '', '
', '', '

', title, '

'] #main loop ARGV.each do |filename| Bio::FlatFile.open(filename) do |ff| ff.each do |e| puts fastq2html(e.definition, e.naseq, e.quality_scores) end end end puts ['',''] bio-2.0.3/sample/demo_nucleicacid.rb0000644000175000017500000000167214141516614016701 0ustar nileshnilesh# # = sample/demo_nucleicacid.rb - demonstration of Bio::NucleicAcid # # Copyright:: Copyright (C) 2001, 2005 # Toshiaki Katayama # License:: The Ruby License # # # == Description # # Demonstration of Bio::NucleicAcid, data related to nucleic acids. # # == Usage # # Simply run this script. # # $ ruby demo_nucleicacid.rb # # == Development information # # The code was moved from lib/bio/data/na.rb. # require 'bio' #if __FILE__ == $0 puts "### na = Bio::NucleicAcid.new" na = Bio::NucleicAcid.new puts "# na.to_re('yrwskmbdhvnatgc')" p na.to_re('yrwskmbdhvnatgc') puts "# Bio::NucleicAcid.to_re('yrwskmbdhvnatgc')" p Bio::NucleicAcid.to_re('yrwskmbdhvnatgc') puts "# na.weight('A')" p na.weight('A') puts "# Bio::NucleicAcid.weight('A')" p Bio::NucleicAcid.weight('A') puts "# na.weight('atgc')" p na.weight('atgc') puts "# Bio::NucleicAcid.weight('atgc')" p Bio::NucleicAcid.weight('atgc') #end bio-2.0.3/sample/demo_kegg_orthology.rb0000644000175000017500000000246014141516614017455 0ustar nileshnilesh# # = sample/demo_kegg_orthology.rb - demonstration of Bio::KEGG::ORTHOLOGY # # Copyright:: Copyright (C) 2003-2007 Toshiaki Katayama # Copyright:: Copyright (C) 2003 Masumi Itoh # License:: The Ruby License # # # == Description # # Demonstration of Bio::KEGG::ORTHOLOGY, the parser class for the KEGG # ORTHOLOGY database entry. # # == Usage # # Specify files containing KEGG ORTHOLOGY data. # # $ ruby demo_kegg_orthology.rb files... # # == Example of running this script # # Download test data. # # $ ruby -Ilib bin/br_biofetch.rb ko K00001 > K00001.ko # $ ruby -Ilib bin/br_biofetch.rb ko K00161 > K00161.ko # # Run this script. # # $ ruby -Ilib sample/demo_kegg_orthology.rb K00001.ko K00161.ko # # == Development information # # The code was moved from lib/bio/db/kegg/orthology.rb and modified. # require 'bio' Bio::FlatFile.foreach(Bio::KEGG::ORTHOLOGY, ARGF) do |ko| puts "### ko = Bio::KEGG::ORTHOLOGY.new(str)" puts "# ko.ko_id" p ko.entry_id puts "# ko.name" p ko.name puts "# ko.names" p ko.names puts "# ko.definition" p ko.definition puts "# ko.keggclass" p ko.keggclass puts "# ko.keggclasses" p ko.keggclasses puts "# ko.pathways" p ko.pathways puts "# ko.dblinks" p ko.dblinks puts "# ko.genes" p ko.genes puts "=" * 78 end bio-2.0.3/sample/demo_blast_report.rb0000644000175000017500000002412514141516614017134 0ustar nileshnilesh# # = sample/demo_blast_report.rb - demonstration of Bio::Blast::Report, Bio::Blast::Default::Report, and Bio::Blast::WU::Report # # Copyright:: Copyright (C) 2003 Toshiaki Katayama # Copyright:: Copyright (C) 2003-2006,2008-2009 Naohisa Goto # License:: The Ruby License # # # == Description # # Demonstration of Bio::Blast::Report (NCBI BLAST XML format parser), # Bio::Blast::Default::Report (NCBI BLAST default (-m 0) format parser), # and Bio::Blast::WU::Report (WU-BLAST default format parser). # # == Usage # # Specify files containing BLAST results. # # $ ruby demo_blast_report.rb files... # # Example usage using test data: # # $ ruby -Ilib sample/demo_blast_report.rb test/data/blast/b0002.faa.m7 # $ ruby -Ilib sample/demo_blast_report.rb test/data/blast/b0002.faa.m0 # # == Development information # # The code was moved from lib/bio/appl/blast/report.rb, # lib/bio/appl/blast/format0.rb, and lib/bio/appl/blast/wublast.rb, # and modified. # require 'bio' # dummpy class to return specific object class Dummy def initialize(obj) @obj = obj end def size @obj end def inspect @obj.inspect end end #class Dummy # wrapper class to ignore error class Wrapper def initialize(obj) @obj = obj end def class @obj.class end def respond_to?(*arg) @obj.respond_to?(*arg) end def method_missing(meth, *arg, &block) begin @obj.__send__(meth, *arg, &block) rescue NoMethodError => evar Dummy.new(evar) end end end #class Wrapper def wrap(obj) Wrapper.new(obj) end # -m0: not defined in Bio::Blast::Default::Report ??? # +m0: newly added in Bio::Blast::Default::Report ??? # -WU: not defined in Bio::Blast::WU::Report ??? # +WU: newly added in Bio::Blast::WU::Report ??? Bio::FlatFile.open(ARGF) do |ff| puts "Detected file format: #{ff.dbclass}" unless ff.dbclass then ff.dbclass = Bio::Blast::Report puts "Input data may be tab-delimited format (-m 8)." end ff.each do |rep| rep = wrap(rep) #print "# === Bio::Blast::Default::Report\n" print "# === #{rep.class}\n" puts print " rep.program #=> "; p rep.program print " rep.version #=> "; p rep.version print " rep.reference #=> "; p rep.reference print " rep.notice [WU] #=> "; p rep.notice #+WU print " rep.db #=> "; p rep.db print " rep.query_id #=> "; p rep.query_id #-m0,-WU print " rep.query_def #=> "; p rep.query_def print " rep.query_len #=> "; p rep.query_len #puts print " rep.version_number #=> "; p rep.version_number #+m0,+WU print " rep.version_date #=> "; p rep.version_date #+m0,+WU puts print "# === Parameters\n" #puts print " rep.parameters #=> "; p rep.parameters #-m0 puts print " rep.matrix #=> "; p rep.matrix #-WU print " rep.expect #=> "; p rep.expect print " rep.inclusion #=> "; p rep.inclusion #-m0,-WU print " rep.sc_match #=> "; p rep.sc_match #-WU print " rep.sc_mismatch #=> "; p rep.sc_mismatch #-WU print " rep.gap_open #=> "; p rep.gap_open #-WU print " rep.gap_extend #=> "; p rep.gap_extend #-WU print " rep.filter #=> "; p rep.filter #-m0,-WU print " rep.pattern #=> "; p rep.pattern #-WU print " rep.entrez_query #=> "; p rep.entrez_query #-m0 #puts print " rep.pattern_positions #=> "; p rep.pattern_positions #+m0 puts print "# === Statistics (last iteration's)\n" #puts print " rep.statistics #=> "; p rep.statistics #-m0,-WU puts print " rep.db_num #=> "; p rep.db_num print " rep.db_len #=> "; p rep.db_len print " rep.hsp_len #=> "; p rep.hsp_len #-m0,-WU print " rep.eff_space #=> "; p rep.eff_space #-WU print " rep.kappa #=> "; p rep.kappa #-WU print " rep.lambda #=> "; p rep.lambda #-WU print " rep.entropy #=> "; p rep.entropy #-WU puts print " rep.num_hits #=> "; p rep.num_hits #+m0 print " rep.gapped_kappa #=> "; p rep.gapped_kappa #+m0 print " rep.gapped_lambda #=> "; p rep.gapped_lambda #+m0 print " rep.gapped_entropy #=> "; p rep.gapped_entropy #+m0 print " rep.posted_date #=> "; p rep.posted_date #+m0 puts print "# === Message (last iteration's)\n" puts print " rep.message #=> "; p rep.message #-WU #puts print " rep.converged? #=> "; p rep.converged? #+m0 puts print "# === Warning messages\n" print " rep.warnings [WU] #=> "; p rep.warnings #+WU print "# === Iterations\n" puts print " rep.itrerations.each do |itr|\n" puts rep.iterations.each do |itr| itr = wrap(itr) #print "# --- Bio::Blast::Default::Report::Iteration\n" print "# --- #{itr.class}\n" puts print " itr.num #=> "; p itr.num print " itr.statistics #=> "; p itr.statistics #-m0,-WU print " itr.warnings [WU] #=> "; p itr.warnings #+WU print " itr.message #=> "; p itr.message print " itr.hits.size #=> "; p itr.hits.size #puts print " itr.hits_newly_found.size #=> "; p((itr.hits_newly_found.size rescue nil)); #+m0 print " itr.hits_found_again.size #=> "; p((itr.hits_found_again.size rescue nil)); #+m0 if itr.respond_to?(:hits_for_pattern) and itr.hits_for_pattern then #+m0 itr.hits_for_pattern.each_with_index do |hp, hpi| print " itr.hits_for_pattern[#{hpi}].size #=> "; p hp.size; end end print " itr.converged? #=> "; p itr.converged? #+m0,+WU puts print " itr.hits.each do |hit|\n" puts itr.hits.each_with_index do |hit, i| hit = wrap(hit) #print "# --- Bio::Blast::Default::Report::Hit" print "# --- #{hit.class}" print " ([#{i}])\n" puts print " hit.num #=> "; p hit.num #-m0,-WU print " hit.hit_id #=> "; p hit.hit_id #-m0,-WU print " hit.len #=> "; p hit.len print " hit.definition #=> "; p hit.definition print " hit.accession #=> "; p hit.accession #-m0,-WU #puts print " hit.found_again? #=> "; p hit.found_again? #+m0,+WU print " hit.score [WU] #=> "; p hit.score #+WU print " hit.pvalue [WU] #=> "; p hit.pvalue #+WU print " hit.n_number [WU] #=> "; p hit.n_number #+WU print " --- compatible/shortcut ---\n" print " hit.query_id #=> "; p hit.query_id #-m0,-WU print " hit.query_def #=> "; p hit.query_def #-m0,-WU print " hit.query_len #=> "; p hit.query_len #-m0,-WU print " hit.target_id #=> "; p hit.target_id #-m0,-WU print " hit.target_def #=> "; p hit.target_def print " hit.target_len #=> "; p hit.target_len print " --- first HSP's values (shortcut) ---\n" print " hit.evalue #=> "; p hit.evalue print " hit.bit_score #=> "; p hit.bit_score print " hit.identity #=> "; p hit.identity print " hit.overlap #=> "; p hit.overlap #-m0,-WU print " hit.query_seq #=> "; p hit.query_seq print " hit.midline #=> "; p hit.midline print " hit.target_seq #=> "; p hit.target_seq print " hit.query_start #=> "; p hit.query_start print " hit.query_end #=> "; p hit.query_end print " hit.target_start #=> "; p hit.target_start print " hit.target_end #=> "; p hit.target_end print " hit.lap_at #=> "; p hit.lap_at print " --- first HSP's vaules (shortcut) ---\n" print " --- compatible/shortcut ---\n" puts print " hit.hsps.size #=> "; p hit.hsps.size if hit.hsps.size == 0 then puts " (HSP not found: please see blastall's -b and -v options)" puts else puts print " hit.hsps.each do |hsp|\n" puts hit.hsps.each_with_index do |hsp, j| hsp = wrap(hsp) #print "# --- Bio::Blast::Default::Report::Hsp" print "# --- #{hsp.class}" print " ([#{j}])\n" puts print " hsp.num #=> "; p hsp.num #-m0,-WU print " hsp.bit_score #=> "; p hsp.bit_score print " hsp.score #=> "; p hsp.score print " hsp.evalue #=> "; p hsp.evalue print " hsp.identity #=> "; p hsp.identity print " hsp.gaps #=> "; p hsp.gaps print " hsp.positive #=> "; p hsp.positive print " hsp.align_len #=> "; p hsp.align_len print " hsp.density #=> "; p hsp.density #-m0,-WU print " hsp.pvalue [WU]#=> "; p hsp.pvalue #+WU print " hsp.p_sum_n [WU]#=> "; p hsp.p_sum_n #+WU print " hsp.query_frame #=> "; p hsp.query_frame print " hsp.query_from #=> "; p hsp.query_from print " hsp.query_to #=> "; p hsp.query_to print " hsp.hit_frame #=> "; p hsp.hit_frame print " hsp.hit_from #=> "; p hsp.hit_from print " hsp.hit_to #=> "; p hsp.hit_to print " hsp.pattern_from#=> "; p hsp.pattern_from #-m0,-WU print " hsp.pattern_to #=> "; p hsp.pattern_to #-m0,-WU print " hsp.qseq #=> "; p hsp.qseq print " hsp.midline #=> "; p hsp.midline print " hsp.hseq #=> "; p hsp.hseq puts print " hsp.percent_identity #=> "; p hsp.percent_identity print " hsp.mismatch_count #=> "; p hsp.mismatch_count #-m0,-WU # print " hsp.query_strand #=> "; p hsp.query_strand #+m0,+WU print " hsp.hit_strand #=> "; p hsp.hit_strand #+m0,+WU print " hsp.percent_positive #=> "; p hsp.percent_positive #+m0,+WU print " hsp.percent_gaps #=> "; p hsp.percent_gaps #+m0,+WU puts end #each end #if hit.hsps.size == 0 end end end #ff.each end #Bio::FlatFile.open bio-2.0.3/sample/gb2fasta.rb0000755000175000017500000000163414141516614015124 0ustar nileshnilesh#!/usr/bin/env ruby # # gb2fasta.rb - convert GenBank entry into FASTA format (nuc) # # Copyright (C) 2001 KATAYAMA Toshiaki # Copyright (C) 2002 Yoshinori K. Okuji # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: gb2fasta.rb,v 0.6 2008/02/05 12:11:11 pjotr Exp $ # require 'bio' include Bio ff = FlatFile.new(GenBank, ARGF) while gb = ff.next_entry print gb.seq.to_fasta("gb:#{gb.entry_id} #{gb.definition}", 70) end bio-2.0.3/sample/demo_kegg_drug.rb0000644000175000017500000000242414141516614016370 0ustar nileshnilesh# # = sample/demo_kegg_drug.rb - demonstration of Bio::KEGG::DRUG # # Copyright:: Copyright (C) 2007 Toshiaki Katayama # License:: The Ruby License # # # == Description # # Demonstration of Bio::KEGG::DRUG, a parser class for the KEGG DRUG # drug database entry. # # == Usage # # Specify files containing KEGG DRUG data. # # $ ruby demo_kegg_drug.rb files... # # == Example of running this script # # Download test data. # # $ ruby -Ilib bin/br_biofetch.rb dr D00001 > D00001.drug # $ ruby -Ilib bin/br_biofetch.rb dr D00002 > D00002.drug # # Run this script. # # $ ruby -Ilib sample/demo_kegg_drug.rb D00001.drug D00002.drug # # == Development information # # The code was moved from lib/bio/db/kegg/drug.rb and modified. # require 'bio' Bio::FlatFile.foreach(Bio::KEGG::DRUG, ARGF) do |dr| #entry = ARGF.read # dr:D00001 #dr = Bio::KEGG::DRUG.new(entry) puts "### dr = Bio::KEGG::DRUG.new(str)" puts "# dr.entry_id" p dr.entry_id puts "# dr.names" p dr.names puts "# dr.name" p dr.name puts "# dr.formula" p dr.formula puts "# dr.mass" p dr.mass puts "# dr.activity" p dr.activity puts "# dr.remark" p dr.remark puts "# dr.comment" p dr.comment puts "# dr.dblinks" p dr.dblinks puts "# dr.kcf" p dr.kcf puts "=" * 78 end bio-2.0.3/sample/demo_targetp_report.rb0000644000175000017500000000642414141516614017477 0ustar nileshnilesh# # = sample/demo_targetp_report.rb - demonstration of Bio::TargetP::Report # # Copyright:: Copyright (C) 2003 # Mitsuteru C. Nakao # License:: The Ruby License # # # == Description # # Demonstration of Bio::TargetP::Report, TargetP output parser. # # == Usage # # Usage 1: Without arguments, runs demo using preset example data. # # $ ruby demo_targetp_report.rb # # Usage 2: Specify files containing TargetP reports. # # $ ruby demo_targetp_report.rb files... # # == References # # * http://www.cbs.dtu.dk/services/TargetP/ # # == Development information # # The code was moved from lib/bio/appl/targetp/report.rb, and modified # as below: # * Disables internal sample data when arguments are specified. # * Method name is changed. # require 'bio' begin require 'pp' alias p pp rescue LoadError end plant = < # License:: The Ruby License # # # == Description # # Demonstration of Bio::KEGG::GENOME, a parser class for the KEGG/GENOME # genome database. # # == Usage # # Specify files containing KEGG GENOME data. # # $ ruby demo_kegg_genome.rb files... # # == Example of running this script # # Download test data. # # $ ruby -Ilib bin/br_biofetch.rb genome eco > eco.genome # $ ruby -Ilib bin/br_biofetch.rb genome hsa > hsa.genome # # Run this script. # # $ ruby -Ilib sample/demo_kegg_genome.rb eco.genome hsa.genome # # == Development information # # The code was moved from lib/bio/db/kegg/genome.rb and modified. # require 'bio' #if __FILE__ == $0 begin require 'pp' def p(arg); pp(arg); end rescue LoadError end #require 'bio/io/flatfile' ff = Bio::FlatFile.new(Bio::KEGG::GENOME, ARGF) ff.each do |genome| puts "### Tags" p genome.tags [ %w( ENTRY entry_id ), %w( NAME name ), %w( DEFINITION definition ), %w( TAXONOMY taxonomy taxid lineage ), %w( REFERENCE references ), %w( CHROMOSOME chromosomes ), %w( PLASMID plasmids ), %w( STATISTICS statistics nalen num_gene num_rna ), ].each do |x| puts "### " + x.shift x.each do |m| p genome.__send__(m) end end puts "=" * 78 end #end bio-2.0.3/sample/enzymes.rb0000755000175000017500000000400514141516614015120 0ustar nileshnilesh#!/usr/bin/env ruby # # enzymes.rb - cut input file using enzyme on command line # # Copyright (C) 2006 Pjotr Prins and Trevor Wennblom # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: enzymes.rb,v 1.1 2006/03/03 15:31:06 pjotr Exp $ # require 'bio/io/flatfile' require 'bio/util/restriction_enzyme' include Bio usage = < '+entry.definition+"\n" print frag.primary,"\n" end end end bio-2.0.3/sample/rev_comp.rb0000644000175000017500000000071314141516614015237 0ustar nileshnilesh#!/usr/bin/env ruby # # rev_comp.rb - Reverse complement DNA sequences # # Copyright:: Copyright (C) 2019 BioRuby Project # License:: The Ruby License # require 'bio' ARGV.each do |fn| Bio::FlatFile.open(fn) do |ff| ff.each do |entry| next if /\A\s*\z/ =~ ff.entry_raw.to_s na = entry.naseq revcomp = na.reverse_complement print revcomp.to_fasta("complement(#{entry.entry_id}) " + entry.definition, 70) end end end bio-2.0.3/sample/genes2tab.rb0000755000175000017500000000335214141516614015304 0ustar nileshnilesh#!/usr/bin/env ruby # # genes2tab.rb - convert KEGG/GENES into tab delimited data for MySQL # # Usage: # # % genes2tab.rb /bio/db/kegg/genes/e.coli > genes_eco.tab # # Copyright (C) 2001 KATAYAMA Toshiaki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: genes2tab.rb,v 0.5 2002/06/23 20:21:56 k Exp $ # require 'bio/db/kegg/genes' include Bio while entry = gets(KEGG::GENES::DELIMITER) genes = KEGG::GENES.new(entry) db = genes.dblinks.inspect if genes.codon_usage.length == 64 cu = genes.codon_usage.join(' ') else cu = '\N' end ary = [ genes.entry_id, genes.division, genes.organism, genes.name, genes.definition, genes.keggclass, genes.position, db, cu, genes.aalen, genes.aaseq, genes.nalen, genes.naseq ] puts ary.join("\t") end =begin CREATE DATABASE IF NOT EXISTS db_name; CREATE TABLE IF NOT EXISTS db_name.genes ( id varchar(30) not NULL, # ENTRY ID division varchar(30), # CDS, tRNA etc. organism varchar(255), gene varchar(255), definition varchar(255), keggclass varchar(255), position varchar(255), dblinks varchar(255), codon_usage text, aalen integer, aaseq text, nalen integer, naseq text ); LOAD DATA LOCAL INFILE 'genes.tab' INTO TABLE db_name.genes; =end bio-2.0.3/sample/color_scheme_na.rb0000644000175000017500000000370514141516614016551 0ustar nileshnilesh#!/usr/bin/env ruby # # color_scheme_na.rb - A Bio::ColorScheme demo script for Nucleic Acids # sequences. # # Usage: # # % ruby color_scheme_na.rb > cs-seq-fna.html # # % cat seq.fna # >DNA_sequence # acgtgtgtcatgctagtcgatcgtactagtcgtagctagtca # % ruby color_scheme_na.rb seq.fna > colored-seq-fna.html # # # Copyright:: Copyright (C) 2005 # Mitsuteru C. Nakao # License:: The Ruby License # # require 'bio' # returns folded sequence with
. def br(i, width = 80) return "" if i % width == 0 "" end # returns sequence html doc def display(seq, cs) html = '

' postfix = '' i = 0 seq.each_char do |c| color = cs[c] prefix = %Q() html += prefix + c + postfix html += br(i += 1) end html + '

' end # returns scheme wise html doc def display_scheme(scheme, naseq, aaseq) html = '' cs = Bio::ColorScheme.const_get(scheme.intern) [naseq, aaseq].each do |seq| html += display(seq, cs) end return ['
', "

#{cs}

", html, '
'] end if fna = ARGV.shift naseq = Bio::FlatFile.open(fna) { |ff| ff.next_entry.naseq } aaseq = naseq.translate else naseq = Bio::Sequence::NA.new('acgtu' * 20).randomize aaseq = naseq.translate end title = 'Bio::ColorScheme for DNA sequences' doc = ['', '
', '', title, '', '
', '', '

', title, '

'] doc << ['
', '

', 'Simple colors', '

'] ['Nucleotide'].each do |scheme| doc << display_scheme(scheme, naseq, "") end doc << ['
'] ['Zappo', 'Taylor' ].each do |scheme| doc << display_scheme(scheme, "", aaseq) end doc << ['
'] doc << ['
', '

', 'Score colors', '

'] ['Buried', 'Helix', 'Hydropathy', 'Strand', 'Turn'].each do |score| doc << display_scheme(score, "", aaseq) end doc << ['
'] puts doc + ['',''] bio-2.0.3/sample/demo_das.rb0000644000175000017500000000435014141516614015201 0ustar nileshnilesh# # = sample/demo_go.rb - demonstration of Bio::DAS, BioDAS access module # # Copyright:: Copyright (C) 2003, 2004, 2007 # Shuichi Kawashima , # Toshiaki Katayama # License:: The Ruby License # # # == Description # # Demonstration of Bio::GO, BioDAS access module. # # == Requirements # # Internet connection is needed. # # == Usage # # Simply run this script. # # $ ruby demo_das.rb # # == Notes # # Demo using the WormBase DAS server is temporarily disabled because # it does not work well possibly because of the server trouble. # # == Development information # # The code was moved from lib/bio/io/das.rb and modified as below: # # * Demo codes using UCSC DAS server is added. # require 'bio' # begin # require 'pp' # alias p pp # rescue LoadError # end if false #disabled puts "### WormBase" wormbase = Bio::DAS.new('http://www.wormbase.org/db/') puts ">>> test get_dsn" p wormbase.get_dsn puts ">>> create segment obj Bio::DAS::SEGMENT.region('I', 1, 1000)" seg = Bio::DAS::SEGMENT.region('I', 1, 1000) p seg puts ">>> test get_dna" p wormbase.get_dna('elegans', seg) puts "### test get_features" p wormbase.get_features('elegans', seg) end #if false #disabled if true #enabled puts "### UCSC" ucsc = Bio::DAS.new('http://genome.ucsc.edu/cgi-bin/') puts ">>> test get_dsn" p ucsc.get_dsn puts ">>> test get_entry_points('hg19')" p ucsc.get_entry_points('hg19') puts ">>> test get_types('hg19')" p ucsc.get_types('hg19') len = rand(50) * 10 + 100 pos = rand(243199373 - len) puts ">>> create segment obj Bio::DAS::SEGMENT.region('2', #{pos}, #{pos + len - 1})" seg2 = Bio::DAS::SEGMENT.region('2', pos, pos + len - 1) p seg2 puts ">>> test get_dna" p ucsc.get_dna('hg19', seg2) puts "### test get_features" p ucsc.get_features('hg19', seg2) end #if true #enabled if true #enabled puts "### KEGG DAS" kegg_das = Bio::DAS.new("http://das.hgc.jp/cgi-bin/") dsn_list = kegg_das.get_dsn org_list = dsn_list.collect {|x| x.source} puts ">>> dsn : entry_points" org_list.each do |org| print "#{org} : " list = kegg_das.get_entry_points(org) list.segments.each do |seg| print " #{seg.entry_id}" end puts end end #if true #enabled bio-2.0.3/sample/pmfetch.rb0000755000175000017500000000254314141516614015061 0ustar nileshnilesh#!/usr/bin/env ruby # # pmfetch.rb - generate BibTeX format reference list by PubMed ID list # # Copyright (C) 2002 KATAYAMA Toshiaki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # $Id:$ # require 'bio' Bio::NCBI.default_email = 'staff@bioruby.org' if ARGV[0] =~ /\A\-f/ ARGV.shift form = ARGV.shift else form = 'bibtex' end ARGV.each do |id| entries = Bio::PubMed.efetch(id) if entries and entries.size == 1 then entry = entries[0] else # dummy entry if not found or possibly incorrect result entry = 'PMID- ' end case form when 'medline' puts entry else puts Bio::MEDLINE.new(entry).reference.__send__(form.intern) end print "\n" end bio-2.0.3/sample/demo_tmhmm_report.rb0000644000175000017500000000256214141516614017152 0ustar nileshnilesh# # = sample/demo_tmhmm_report.rb - demonstration of Bio::TMHMM::Report # # Copyright:: Copyright (C) 2003 # Mitsuteru C. Nakao # License:: The Ruby License # # # == Description # # Demonstration of Bio::TMHMM::Report, TMHMM output parser. # # == Usage # # Specify files containing SOSUI reports. # # $ ruby demo_tmhmm_report.rb files... # # Example usage using test data: # # $ ruby -Ilib sample/demo_tmhmm_report.rb test/data/TMHMM/sample.report # # == References # # * http://www.cbs.dtu.dk/services/TMHMM/ # # == Development information # # The code was moved from lib/bio/appl/tmhmm/report.rb. # require 'bio' #if __FILE__ == $0 begin require 'pp' alias p pp rescue LoadError end Bio::TMHMM.reports(ARGF.read) do |ent| puts '==>' puts ent.to_s pp ent p [:entry_id, ent.entry_id] p [:query_len, ent.query_len] p [:predicted_tmhs, ent.predicted_tmhs] p [:tmhs_size, ent.tmhs.size] p [:exp_aas_in_tmhs, ent.exp_aas_in_tmhs] p [:exp_first_60aa, ent.exp_first_60aa] p [:total_prob_of_N_in, ent.total_prob_of_N_in] ent.tmhs.each do |t| p t p [:entry_id, t.entry_id] p [:version, t.version] p [:status, t.status] p [:range, t.range] p [:pos, t.pos] end p [:helix, ent.helix] p ent.tmhs.map {|t| t if t.status == 'TMhelix' }.compact end #end bio-2.0.3/sample/demo_locations.rb0000644000175000017500000000563414141516614016433 0ustar nileshnilesh# # = sample/demo_locations.rb - demonstration of Bio::Locations # # Copyright:: Copyright (C) 2001, 2005 Toshiaki Katayama # 2006 Jan Aerts # 2008 Naohisa Goto # License:: The Ruby License # # == Description # # Demonstration of Bio::Locations, a parser class for the location string # used in the INSDC Feature Table. # # == Usage # # Simply run this script. # # $ ruby demo_locations.rb # # == Development information # # The code was moved from lib/bio/location.rb. # require 'bio' #if __FILE__ == $0 puts "Test new & span methods" [ '450', '500..600', 'join(500..550, 600..625)', 'complement(join(500..550, 600..625))', 'join(complement(500..550), 600..625)', '754^755', 'complement(53^54)', 'replace(4792^4793,"a")', 'replace(1905^1906,"acaaagacaccgccctacgcc")', '157..(800.806)', '(67.68)..(699.703)', '(45934.45974)..46135', '<180..(731.761)', '(88.89)..>1122', 'complement((1700.1708)..(1715.1721))', 'complement(<22..(255.275))', 'complement((64.74)..1525)', 'join((8298.8300)..10206,1..855)', 'replace((651.655)..(651.655),"")', 'one-of(898,900)..983', 'one-of(5971..6308,5971..6309)', '8050..one-of(10731,10758,10905,11242)', 'one-of(623,627,632)..one-of(628,633,637)', 'one-of(845,953,963,1078,1104)..1354', 'join(2035..2050,complement(1775..1818),13..345,414..992,1232..1253,1024..1157)', 'join(complement(1..61),complement(AP000007.1:252907..253505))', 'complement(join(71606..71829,75327..75446,76039..76203))', 'order(3..26,complement(964..987))', 'order(L44135.1:(454.445)..>538,<1..181)', '<200001..<318389', ].each do |pos| p pos # p Bio::Locations.new(pos) # p Bio::Locations.new(pos).span # p Bio::Locations.new(pos).range Bio::Locations.new(pos).each do |location| puts "class=" + location.class.to_s puts "start=" + location.from.to_s + "\tend=" + location.to.to_s + "\tstrand=" + location.strand.to_s end end puts "Test rel2abs/abs2rel method" [ '6..15', 'join(6..10,16..30)', 'complement(join(6..10,16..30))', 'join(complement(6..10),complement(16..30))', 'join(6..10,complement(16..30))', ].each do |pos| loc = Bio::Locations.new(pos) p pos # p loc (1..21).each do |x| print "absolute(#{x}) #=> ", y = loc.absolute(x), "\n" print "relative(#{y}) #=> ", y ? loc.relative(y) : y, "\n" print "absolute(#{x}, :aa) #=> ", y = loc.absolute(x, :aa), "\n" print "relative(#{y}, :aa) #=> ", y ? loc.relative(y, :aa) : y, "\n" end end pos = 'join(complement(6..10),complement(16..30))' loc = Bio::Locations.new(pos) print "pos : "; p pos print "`- loc[1] : "; p loc[1] print " `- range : "; p loc[1].range puts Bio::Location.new('5').<=>(Bio::Location.new('3')) #end bio-2.0.3/sample/gt2fasta.rb0000755000175000017500000000227714141516614015152 0ustar nileshnilesh#!/usr/bin/env ruby # # gt2fasta.rb - convert GenBank translations into FASTA format (pep) # # Copyright (C) 2001 KATAYAMA Toshiaki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: gt2fasta.rb,v 0.3 2002/04/15 03:06:17 k Exp $ # require 'bio/io/flatfile' require 'bio/feature' require 'bio/db/genbank' include Bio ff = FlatFile.new(GenBank, ARGF) while gb = ff.next_entry orf = 0 gb.features.each do |f| f = f.assoc if aaseq = f['translation'] orf += 1 gene = [ f['gene'], f['product'], f['note'], f['function'] ].compact.join(', ') definition = "gp:#{gb.entry_id}_#{orf} #{gene} [#{gb.organism}]" print aaseq.to_fasta(definition, 70) end end end bio-2.0.3/sample/demo_sirna.rb0000644000175000017500000000257314141516614015553 0ustar nileshnilesh# # = sample/demo_sirna.rb - demonstration of Bio::SiRNA # # Copyright:: Copyright (C) 2004, 2005 # Itoshi NIKAIDO # License:: The Ruby License # # # == Description # # Demonstration of Bio::SiRNA, class for designing small inhibitory RNAs. # # == Usage # # Specify files containing nucleotide sequences. # # $ ruby demo_sirna.rb files... # # Example usage using test data: # # $ ruby -Ilib sample/demo_sirna.rb test/data/fasta/example1.txt # # == Development information # # The code was moved from lib/bio/util/sirna.rb, and modified for reading # normal sequence files. # require 'bio' if ARGV.size <= 0 then puts "Demonstration of designing SiRNA for each sequence." puts "Usage: #{$0} files..." exit(0) end ARGV.each do |filename| Bio::FlatFile.foreach(filename) do |entry| puts "##entry.entry_id: #{entry.entry_id}" puts "##entry.definition: #{entry.definition}" seq = entry.naseq puts "##entry.naseq.length: #{seq.length}" sirna = Bio::SiRNA.new(seq) pairs = sirna.design # or .design('uitei') or .uitei or .reynolds pairs.each do |pair| puts pair.report shrna = Bio::SiRNA::ShRNA.new(pair) shrna.design # or .design('BLOCK-iT') or .block_it puts shrna.report puts "# as DNA" puts shrna.top_strand.dna puts shrna.bottom_strand.dna end puts "=" * 78 end #Bio::FlatFile.foreach end #ARGV.each bio-2.0.3/sample/genome2rb.rb0000755000175000017500000000161614141516614015313 0ustar nileshnilesh#!/usr/bin/env ruby # # genome2rb.rb - used to generate contents of the bio/data/keggorg.rb # # Usage: # # % genome2rb.rb genome | sort # # Copyright (C) 2002 KATAYAMA Toshiaki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: genome2rb.rb,v 1.1 2002/03/04 08:14:45 katayama Exp $ # require 'bio' Bio::FlatFile.new(Bio::KEGG::GENOME,ARGF).each do |x| puts " '#{x.entry_id}' => [ '#{x.name}', '#{x.definition}' ]," end bio-2.0.3/sample/goslim.rb0000755000175000017500000001516714141516614014733 0ustar nileshnilesh#!/usr/bin/env ruby # # goslim.rb - making a GO slim histgram # # Usage: # # % goslim.rb -p process.ontology -f function.ontology \ # -c component.ontology -s goslim_goa.2002 -g gene_association.mgi \ # -o mgi -r # % R < mgi.R # % gv mgi.pdf # # Copyright:: Copyright (C) 2003 # Mitsuteru C. Nakao # License:: The Ruby License # # $Id: goslim.rb,v 1.5 2007/04/05 23:35:42 trevor Exp $ # SCRIPT_VERSION = '$Id: goslim.rb,v 1.5 2007/04/05 23:35:42 trevor Exp $' USAGE = "${__FILE__} - GO slim Usage: #{__FILE__} -p process.ontology -f function.ontology \ -c component.ontolgy -g gene_association.mgi -s goslim_goa.2002 \ -o goslim.uniqued.out -r #{__FILE__} -p process.ontology -f function.ontology \ -c component.ontolgy -l gene_association.list -s goslim_goa.2002 \ -o mgi.out -r #{__FILE__} -p process.ontology -f function.ontology \ -c component.ontolgy -g gene_association.mgi -s goslim_goa.2002 >\ go_goslit.paired.list Options; -p,--process -f,--function -c,--component -g,--ga -l,--galist -s,--goslim -o,--output -- output file name. -r,--r_script -- Writing a R script in .R to plot a barplot. -h,--help -v,--version Format: GO ID list: /^GO:\d{7}/ for each line Mitsuteru C. Nakao " require 'getoptlong' parser = GetoptLong.new parser.set_options( ['--process', '-p', GetoptLong::REQUIRED_ARGUMENT], ['--function', '-f', GetoptLong::REQUIRED_ARGUMENT], ['--component', '-c', GetoptLong::REQUIRED_ARGUMENT], ['--ga', '-g', GetoptLong::REQUIRED_ARGUMENT], ['--galist', '-l', GetoptLong::REQUIRED_ARGUMENT], ['--goslim', '-s', GetoptLong::REQUIRED_ARGUMENT], ['--output', '-o', GetoptLong::REQUIRED_ARGUMENT], ['--r_script', '-r', GetoptLong::NO_ARGUMENT], ['--help', '-h', GetoptLong::NO_ARGUMENT], ['--version', '-v', GetoptLong::NO_ARGUMENT]) begin parser.each_option do |name, arg| eval "$OPT_#{name.sub(/^--/, '').gsub(/-/, '_').upcase} = '#{arg}'" end rescue exit(1) end if $OPT_VERSION puts SCRIPT_VERSION exit(0) end if $OPT_HELP or !($OPT_PROCESS or $OPT_FUNCTION or $OPT_COMPONENT or ($OPT_GA or $OPT_GALIST)) puts USAGE exit(0) end # subroutines def slim2r(datname) tmp = "# usage: % R --vanilla < #{datname}.R data <- read.delim2('#{datname}') dat <- data$count names(dat) <- paste(data$GO.Term, dat) # set graphc format pdf('#{datname}.pdf') #postscript('#{datname}.ps') # outside margins par(mai = c(1,2.8,1,0.7)) barplot(dat, cex.names = 0.6, # row names font size las = 2, # set horizontal row names horiz = T, # set horizontal main = 'GO slim', # main title # set color schema, proc, blue(3); func, red(2); comp, green(4) col = cbind(c(data$aspect == 'process'), c(data$aspect == 'function'), c(data$aspect == 'component')) %*% c(4,2,3)) # color dev.off() " end # build GOslim uniqued list def slim(ontology, slim_ids, tmp, ga, aspect) tmp[aspect] = Hash.new(0) slim_ids.each {|slim_id| term = ontology.goid2term(slim_id) if term tmp[aspect][term] = 0 else next end ga.each {|gaid| begin res = ontology.bfs_shortest_path(slim_id, gaid) tmp[aspect][term] += 1 if res[0] rescue NameError $stderr.puts "Warnning: GO:#{slim_id} (#{term}) doesn't exist in the #{aspect}.ontology." tmp[aspect].delete(term) break end } } end # build GO-GOslim uniqued list def slim2(ontology, slim_ids, tmp, ga, aspect) tmp[aspect] = Hash.new slim_ids.each {|slim_id| term = ontology.goid2term(slim_id) if term begin unless tmp[aspect][term]['GOslim'].index(slim_id) tmp[aspect][term]['GOslim'] << slim_id end rescue NameError tmp[aspect][term] = {'GOslim'=>[slim_id], 'GO'=>[]} end else next end ga.each {|gaid| begin res = ontology.bfs_shortest_path(slim_id, gaid) tmp[aspect][term]['GO'] << gaid if res[0] rescue NameError break end } } end # # main # require 'bio/db/go' aspects = ['process', 'function', 'component'] rootids = { 'process' => '0008150', 'function' => '0003674', 'component' => '0005575'} # files open ios = {} files = { 'process' => $OPT_PROCESS, 'function' => $OPT_FUNCTION, 'component' => $OPT_COMPONENT, 'ga' => $OPT_GA, # gene-association 'list' => $OPT_GALIST, # gene-association list 'slim' => $OPT_GOSLIM} # GO slim files.each {|k, file_name| next if file_name == nil ios[k] = File.open(file_name) } if $OPT_OUTPUT ios['output'] = File.new($OPT_OUTPUT, "w+") ios['r_script'] = File.new("#{$OPT_OUTPUT}.R", "w+") else ios['r_script'] = ios['output'] = $stdout end # start # ontology ontology = {} aspects.each {|aspect| ontology[aspect] = Bio::GO::Ontology.new(ios[aspect].read) } # GO slim goslim = Bio::GO::Ontology.new(ios['slim'].read) # assign a aspect to terms in the GO slim. slim_ids = Hash.new([]) goslim.to_list.map {|ent| ent.node }.flatten.uniq.each {|goid| rootids.each {|aspect, rootid| begin a,b = ontology[aspect].bfs_shortest_path(rootid, goid) slim_ids[aspect] << goid rescue NameError $stderr.puts "Error: (#{rootid}, #{goid})" end } } # gene-associations ga_ids = [] if $OPT_GA ga = Bio::GO::GeneAssociation.parser(ios['ga'].read) ga_ids = ga.map {|ent| ent.goid } elsif $OPT_GALIST while line = ios['list'].gets if /^GO:(\d{7})/ =~ line goid = $1 ga_ids << goid end end else puts "Error: -l or -g options" exit end # count number count = Hash.new(0) aspects.each {|aspect| slim2(ontology[aspect], slim_ids[aspect], count, ga_ids, aspect) } # output if $OPT_R_SCRIPT and $OPT_OUTPUT tmp = [['aspect', 'count', 'GO Term'].join("\t")] else tmp = [['aspect', 'GO ID', 'GOslim Term', 'GOslim ID'].join("\t")] end ['component','function','process'].each {|aspect| count[aspect].sort {|a, b| b[1]['GO'].size <=> a[1]['GO'].size }.each {|term, value| next if term == "" if $OPT_R_SCRIPT and $OPT_OUTPUT tmp << [aspect, value['GO'].size, term].join("\t") else value['GO'].each {|goid| tmp << [aspect, "GO:#{goid}", term, value['GOslim'].map {|e| "GO:#{e}" }.join(' ')].join("\t") } end } } ios['output'].puts tmp.join("\n") if $OPT_R_SCRIPT and $OPT_OUTPUT ios['r_script'].puts slim2r($OPT_OUTPUT) end # bio-2.0.3/sample/demo_pubmed.rb0000644000175000017500000000576614141516614015722 0ustar nileshnilesh# # = sample/demo_pubmed.rb - demonstration of Bio::PubMed # # Copyright:: Copyright (C) 2001, 2007, 2008 Toshiaki Katayama # Copyright:: Copyright (C) 2006 Jan Aerts # License:: The Ruby License # # # == Description # # Demonstration of Bio::PubMed, NCBI Entrez/PubMed client module. # # == Requirements # # Internet connection is needed. # # == Usage # # Simply run this script. # # $ ruby demo_pubmed.rb # # == Development information # # The code was moved from lib/bio/io/pubmed.rb and modified as below: # * Codes using Entrez CGI are disabled. require 'bio' Bio::NCBI.default_email = 'staff@bioruby.org' #if __FILE__ == $0 puts "=== instance methods ===" pubmed = Bio::PubMed.new puts "--- Search PubMed by E-Utils ---" opts = {"rettype" => "count"} puts Time.now puts pubmed.esearch("(genome AND analysis) OR bioinformatics", opts) puts Time.now puts pubmed.esearch("(genome AND analysis) OR bioinformatics", opts) puts Time.now puts pubmed.esearch("(genome AND analysis) OR bioinformatics", opts) puts Time.now pubmed.esearch("(genome AND analysis) OR bioinformatics").each do |x| puts x end puts "--- Retrieve PubMed entry by E-Utils ---" puts Time.now puts pubmed.efetch(16381885) puts Time.now puts pubmed.efetch("16381885") puts Time.now puts pubmed.efetch("16381885") puts Time.now opts = {"retmode" => "xml"} puts pubmed.efetch([10592173, 14693808], opts) puts Time.now puts pubmed.efetch(["10592173", "14693808"], opts) #puts "--- Search PubMed by Entrez CGI ---" #pubmed.search("(genome AND analysis) OR bioinformatics").each do |x| # p x #end #puts "--- Retrieve PubMed entry by Entrez CGI ---" #puts pubmed.query("16381885") puts "--- Retrieve PubMed entry by PMfetch ---" puts pubmed.pmfetch("16381885") puts "=== class methods ===" puts "--- Search PubMed by E-Utils ---" opts = {"rettype" => "count"} puts Time.now puts Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics", opts) puts Time.now puts Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics", opts) puts Time.now puts Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics", opts) puts Time.now Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics").each do |x| puts x end puts "--- Retrieve PubMed entry by E-Utils ---" puts Time.now puts Bio::PubMed.efetch(16381885) puts Time.now puts Bio::PubMed.efetch("16381885") puts Time.now puts Bio::PubMed.efetch("16381885") puts Time.now opts = {"retmode" => "xml"} puts Bio::PubMed.efetch([10592173, 14693808], opts) puts Time.now puts Bio::PubMed.efetch(["10592173", "14693808"], opts) #puts "--- Search PubMed by Entrez CGI ---" #Bio::PubMed.search("(genome AND analysis) OR bioinformatics").each do |x| # p x #end #puts "--- Retrieve PubMed entry by Entrez CGI ---" #puts Bio::PubMed.query("16381885") puts "--- Retrieve PubMed entry by PMfetch ---" puts Bio::PubMed.pmfetch("16381885") #end bio-2.0.3/sample/tdiary.rb0000644000175000017500000001120714141516614014721 0ustar nileshnilesh# # tDiary : plugin/bio.rb # # Copyright (C) 2003 KATAYAMA Toshiaki # Mitsuteru C. Nakao # Itoshi NIKAIDO # Takeya KASUKAWA # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # $Id: tdiary.rb,v 1.3 2003/03/17 04:24:47 k Exp $ # =begin == What's this? This is a plugin for the (()) to create various links for biological resources from your diary. tDiary is an extensible web diary application written in Ruby. == How to install Just copy this file under the tDiary's plugin directory as bio.rb. == Usage --- pubmed(pmid, comment = nil) Create a link to NCBI Entrez reference database by using PubMed ID. See (()) for more information. * tDiary style * <%= pubmed 12345 %> * <%= pubmed 12345, 'hogehoge' %> * RD style * ((% pubmed 12345 %)) * ((% pubmed 12345, 'hogehoge' %)) --- biofetch(db, entry_id) Create a link to the BioFetch detabase entry retrieval system. See (()) for more information. * tDiary style * <%= biofetch 'genbank', 'AA2CG' %> * RD style * ((% biofetch 'genbank', 'AA2CG' %)) --- amigo(go_id, comment = nil) Create a link to the AmiGO GO term browser by using GO ID. See (()) for more information. * tDiary style * <%= amigo '0003673' %> * <%= amigo '0003673', 'The root of GO' %> * RD style * ((% amigo 0003673 %)) * ((% amigo 0003673, 'The root of GO' %)) --- fantom(id, comment = nil) Create a link to FANTOM database by using Clone ID. You can use RIKEN clone ID, Rearray ID, Seq ID and Accession Number. See (()) for more information. * tDiary style * <%= fantom 12345 %> * <%= fantom 12345, 'hogehoge' %> * RD style * ((% fantom 12345 %)) * ((% fantom 12345, 'hogehoge' %)) --- rtps(id, comment = nil) Create a link to FANTOM RTPS database by using Clone ID. You can use only RTPS ID. See (()) for more information. * tDiary style * <%= rtps 12345 %> * <%= rtps 12345, 'hogehoge' %> * RD style * ((% rtps 12345 %)) * ((% rtps 12345, 'hogehoge' %)) == References * Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs, The FANTOM Consortium and the RIKEN Genome Exploration Research Group Phase I & II Team, Nature 420:563-573, 2002 * Functional annotation of a full-length mouse cDNA collection, The RIKEN Genome Exploration Research Group Phase II Team and the FANTOM Consortium, Nature 409:685-690, 2001 =end def pubmed(pmid, comment = nil) pmid = pmid.to_s.strip url = "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi" url << "?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=#{pmid}" if comment %Q[#{comment.to_s.strip}] else %Q[PMID:#{pmid}] end end def biofetch(db, entry_id) url = "http://biofetch.bioruby.org/" %Q[#{db}:#{entry_id}] end def amigo(go_id = '0003673', comment = nil) go_id = go_id.to_s.strip url = "http://www.godatabase.org/cgi-bin/go.cgi?query=#{go_id};view=query;action=query;search_constraint=terms" comment = "AmiGO:#{go_id}" unless comment %Q[#{comment}] end def fantom(id, comment = nil) id = id.to_s.strip url = "http://fantom2.gsc.riken.go.jp/db/link/id.cgi" url << "?id=#{id}" if comment %Q[#{comment.to_s.strip}] else %Q[FANTOM DB:#{id}] end end def rtps(id, comment = nil) id = id.to_s.strip url = "http://fantom2.gsc.riken.go.jp/RTPS/link/id.cgi" url << "?id=#{id}" if comment %Q[#{comment.to_s.strip}] else %Q[FANTOM RTPS DB:#{id}] end end bio-2.0.3/sample/fsplit.rb0000755000175000017500000000216214141516614014731 0ustar nileshnilesh#!/usr/bin/env ruby # # fsplit.rb - split FASTA file by each n entries # # Copyright (C) 2001 KATAYAMA Toshiaki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: fsplit.rb,v 0.1 2001/06/21 08:22:29 katayama Exp $ # if ARGV.length != 2 print <<-USAGE fsplit.rb - split FASTA file by each n entries Usage : % ./fsplit.rb 2000 seq.f This will produce seq.f.1, seq.f.2, ... with containing 2000 sequences in each file. USAGE exit 1 end count = ARGV.shift.to_i i = -1 while gets if /^>/ i += 1 if i % count == 0 n = i / count out = File.new("#{$FILENAME}.#{n+1}", "w+") end end out.print end bio-2.0.3/sample/ssearch2tab.rb0000755000175000017500000000436314141516614015636 0ustar nileshnilesh#!/usr/bin/env ruby # # ssearch2tab.rb - convert SSEARCH output into tab delimited data for MySQL # # Usage: # # % ssearch2tab.rb SSEARCH-output-file[s] > ssearch_results.tab # % mysql < ssearch_results.sql (use sample at the end of this file) # # Format accepted: # # % ssearch3[3][_t] -Q -H -m 6 query.f target.f > SSEARCH-output-file # # Copyright (C) 2001 KATAYAMA Toshiaki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: ssearch2tab.rb,v 0.1 2001/06/21 08:25:58 katayama Exp $ # while gets # query if /^\S+: (\d+) aa$/ q_len = $1 end # each hit if /^>>([^>]\S+).*\((\d+) aa\)$/ target = $1 t_len = $2 # d = dummy variable d, d, d, swopt, d, zscore, d, bits, d, evalue = gets.split(/\s+/) d, d, sw, ident, d, ugident, d, d, overlap, d, d, lap = gets.split(/\s+/) # query-hit pair print "#{$FILENAME}\t#{q_len}\t#{target}\t#{t_len}" # pick up values ary = [ swopt, zscore, bits, evalue, sw, ident, ugident, overlap, lap ] # print values for i in ary i.tr!('^0-9.:e\-','') print "\t#{i}" end print "\n" end end =begin MySQL ssearch_results.sql sample CREATE DATABASE IF NOT EXISTS db_name; CREATE TABLE IF NOT EXISTS db_name.table_name ( query varchar(25) not NULL, q_len integer unsigned default 0, target varchar(25) not NULL, t_len integer unsigned default 0, swopt integer unsigned default 0, zscore float default 0.0, bits float default 0.0, evalue float default 0.0, sw integer unsigned default 0, ident float default 0.0, ugident float default 0.0, overlap integer unsigned default 0, lap_at varchar(25) default NULL ); LOAD DATA LOCAL INFILE 'ssearch_results.tab' INTO TABLE db_name.table_name; =end bio-2.0.3/sample/demo_go.rb0000644000175000017500000000373414141516614015044 0ustar nileshnilesh# # = sample/demo_go.rb - demonstration of Bio::GO, classes for Gene Ontology # # Copyright:: Copyright (C) 2003 # Mitsuteru C. Nakao # License:: The Ruby License # # # == Description # # Demonstration of Bio::GO, classes for Gene Ontology. # # == Requirement # # Internet connection is needed. # # == Usage # # Simply run this script. # # $ ruby demo_go.rb # # == Note # # The code was originally written in 2003, and it can only parse GO format # that is deprecated and no new data is available after August 2009. # # == Development information # # The code was moved from lib/bio/db/go.rb. # require 'bio' #if __FILE__ == $0 def wget(url) Bio::Command.read_uri(url) end go_c_url = 'http://www.geneontology.org/ontology/component.ontology' ga_url = 'http://www.geneontology.org/gene-associations/gene_association.sgd.gz' e2g_url = 'http://www.geneontology.org/external2go/spkw2go' puts "\n #==> Bio::GO::Ontology" p go_c_url component_ontology = wget(go_c_url) comp = Bio::GO::Ontology.new(component_ontology) [['0003673', '0005632'], ['0003673', '0005619'], ['0003673', '0004649']].each {|pair| puts p pair p [:pair, pair.map {|i| [comp.id2term[i], comp.goid2term(i)] }] puts "\n #==> comp.bfs_shortest_path(pair[0], pair[1])" p comp.bfs_shortest_path(pair[0], pair[1]) } puts "\n #==> Bio::GO::External2go" p e2g_url spkw2go = Bio::GO::External2go.parser(wget(e2g_url)) puts "\n #==> spkw2go.dbs" p spkw2go.dbs puts "\n #==> spkw2go[1]" p spkw2go[1] require 'zlib' puts "\n #==> Bio::GO::GeenAssociation" p ga_url # # The workaround (Zlib::MAX_WBITS + 32) is taken from: # http://d.hatena.ne.jp/ksef-3go/20070924/1190563143 # ga = Zlib::Inflate.new(Zlib::MAX_WBITS + 32).inflate(wget(ga_url)) #ga = Zlib::Inflate.inflate(wget(ga_url)) ga = Bio::GO::GeneAssociation.parser(ga) puts "\n #==> ga.size" p ga.size puts "\n #==> ga[100]" p ga[100] #end bio-2.0.3/sample/demo_litdb.rb0000644000175000017500000000150014141516614015522 0ustar nileshnilesh# # = sample/demo_litdb.rb - demonstration of Bio::LITDB # # Copyright:: Copyright (C) 2001 Toshiaki Katayama # License:: The Ruby License # # # == Description # # Demonstration of Bio::LITDB, LITDB literature database parser class. # # == Requirements # # Internet connection and/or OBDA (Open Bio Database Access) configuration. # # == Usage # # Simply run this script. # # $ ruby demo_litdb.rb # # == Development information # # The code was moved from lib/bio/db/litdb.rb. # require 'bio' #if __FILE__ == $0 entry = Bio::Fetch.query('litdb', '0308004') puts entry p Bio::LITDB.new(entry).reference entry = Bio::Fetch.query('litdb', '0309094') puts entry p Bio::LITDB.new(entry).reference entry = Bio::Fetch.query('litdb', '0309093') puts entry p Bio::LITDB.new(entry).reference #end bio-2.0.3/sample/genes2nuc.rb0000755000175000017500000000167314141516614015327 0ustar nileshnilesh#!/usr/bin/env ruby # # genes2nuc.rb - convert KEGG/GENES entry into FASTA format (nuc) # # Copyright (C) 2001 KATAYAMA Toshiaki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: genes2nuc.rb,v 0.4 2002/06/23 20:21:56 k Exp $ # require 'bio/db/kegg/genes' require 'bio/extend' include Bio while gets(KEGG::GENES::DELIMITER) genes = KEGG::GENES.new($_) next if genes.nalen == 0 puts ">#{genes.entry_id} #{genes.definition}" puts genes.naseq.fold(60+12, 12) end bio-2.0.3/sample/pmsearch.rb0000755000175000017500000000251114141516614015230 0ustar nileshnilesh#!/usr/bin/env ruby # # pmsearch.rb - generate BibTeX format reference list by PubMed keyword search # # Copyright (C) 2002 KATAYAMA Toshiaki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # $Id:$ # require 'bio' Bio::NCBI.default_email = 'staff@bioruby.org' if ARGV[0] =~ /\A\-f/ ARGV.shift form = ARGV.shift else form = 'bibtex' end keywords = ARGV.join(' ') uids = Bio::PubMed.esearch(keywords) if uids and !uids.empty? then entries = Bio::PubMed.efetch(uids) else entries = [] end entries.each do |entry| case form when 'medline' puts entry else puts Bio::MEDLINE.new(entry).reference.__send__(form.intern) end print "\n" end bio-2.0.3/sample/demo_sosui_report.rb0000644000175000017500000000357214141516614017174 0ustar nileshnilesh# # = sample/demo_sosui_report.rb - demonstration of Bio::SOSUI::Report # # Copyright:: Copyright (C) 2003 # Mitsuteru C. Nakao # License:: The Ruby License # # # == Description # # Demonstration of Bio::SOSUI::Report, SOSUI output parser. # # SOSUI performs classification and secondary structures prediction # of membrane proteins. # # == Usage # # Usage 1: Without arguments, runs demo using preset example data. # # $ ruby demo_sosui_report.rb # # Usage 2: Specify files containing SOSUI reports. # # $ ruby demo_sosui_report.rb files... # # Example usage using test data: # # $ ruby -Ilib sample/demo_sosui_report.rb test/data/SOSUI/sample.report # # == References # # * http://bp.nuap.nagoya-u.ac.jp/sosui/ # # == Development information # # The code was moved from lib/bio/appl/sosui/report.rb, and modified as below: # * Disables internal sample data when arguments are specified. # * Method name is changed. # * Bug fix about tmhs demo. require 'bio' begin require 'pp' alias p pp rescue LoadError end sample = <HOGE1 MEMBRANE PROTEIN NUMBER OF TM HELIX = 6 TM 1 12- 34 SECONDARY LLVPILLPEKCYDQLFVQWDLLH TM 2 36- 58 PRIMARY PCLKILLSKGLGLGIVAGSLLVK TM 3 102- 124 SECONDARY SWGEALFLMLQTITICFLVMHYR TM 4 126- 148 PRIMARY QTVKGVAFLACYGLVLLVLLSPL TM 5 152- 174 SECONDARY TVVTLLQASNVPAVVVGRLLQAA TM 6 214- 236 SECONDARY AGTFVVSSLCNGLIAAQLLFYWN >HOGE2 SOLUBLE PROTEIN HOGE def demo_sosui_report(ent) puts '===' puts ent puts '===' sosui = Bio::SOSUI::Report.new(ent) p [:entry_id, sosui.entry_id] p [:prediction, sosui.prediction] p [:tmhs, sosui.tmhs] end if ARGV.empty? then sample.split(/#{Bio::SOSUI::Report::DELIMITER}/).each {|ent| demo_sosui_report(ent) } else while ent = $<.gets(Bio::SOSUI::Report::DELIMITER) demo_sosui_report(ent) end end bio-2.0.3/sample/demo_kegg_reaction.rb0000644000175000017500000000274414141516614017240 0ustar nileshnilesh# # = sample/demo_kegg_reaction.rb - demonstration of Bio::KEGG::REACTION # # Copyright:: Copyright (C) 2004 Toshiaki Katayama # Copyright:: Copyright (C) 2009 Kozo Nishida # License:: The Ruby License # # # == Description # # Demonstration of Bio::KEGG::REACTION, the parser class for the KEGG # REACTION biochemical reaction database. # # == Usage # # Specify files containing KEGG REACTION data. # # $ ruby demo_kegg_reaction.rb files... # # Example usage using test data: # # $ ruby -Ilib sample/demo_kegg_reaction.rb test/data/KEGG/R00006.reaction # # == Example of running this script # # Download test data. # # $ ruby -Ilib bin/br_biofetch.rb reaction R00259 > R00259.reaction # $ ruby -Ilib bin/br_biofetch.rb reaction R02282 > R02282.reaction # # Run this script. # # $ ruby -Ilib sample/demo_kegg_reaction.rb R00259.reaction R02282.reaction # # == Development information # # The code was moved from lib/bio/db/kegg/reaction.rb and modified. # require 'bio' Bio::FlatFile.foreach(Bio::KEGG::REACTION, ARGF) do |rn| puts "### rn = Bio::KEGG::REACTION.new(str)" puts "# rn.entry_id" p rn.entry_id puts "# rn.name" p rn.name puts "# rn.definition" p rn.definition puts "# rn.equation" p rn.equation puts "# rn.rpairs" p rn.rpairs puts "# rn.pathways" p rn.pathways puts "# rn.enzymes" p rn.enzymes puts "# rn.orthologs" p rn.orthologs puts "# rn.orthologs_as_hash" p rn.orthologs_as_hash puts "=" * 78 end bio-2.0.3/sample/na2aa.cwl0000644000175000017500000000057614141516614014600 0ustar nileshnilesh#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool baseCommand: [ruby] inputs: - id: script type: File default: class: File location: na2aa.rb inputBinding: position: -1 - id: seqFile type: File[] inputBinding: position: 1 outputs: - id: out type: stdout stdout: $(inputs.script.nameroot)-$(inputs.seqFile[0].nameroot).fst bio-2.0.3/sample/demo_genscan_report.rb0000644000175000017500000001406214141516614017444 0ustar nileshnilesh# # = sample/demo_genscan_report.rb - demonstration of Bio::Genscan::Report # # Copyright:: Copyright (C) 2003 # Mitsuteru C. Nakao # License:: The Ruby License # # # == Description # # Demonstration of Bio::Genscan::Report, parser class for Genscan output. # # == Usage # # Usage 1: Without arguments, demonstrates using preset sample data. # # $ ruby demo_genscan.rb # # Usage 2: When a "-" is specified as the argument, read data from stdin. # # $ cat testdata | ruby demo_genscan.rb - # # Usage 3: Specify a file containing a Genscan output. # # $ ruby demo_genscan.rb file # # Example usage using test data: # # $ ruby -Ilib sample/demo_genscan.rb test/data/genscan/sample.report # # == Development information # # The code was moved from lib/bio/appl/genscan/report.rb and modified: # * Changed the way to read preset sample data. # require 'bio' #if __FILE__ == $0 if ARGV.empty? then report = DATA.read elsif ARGV.size == 1 and ARGV[0] == '-' then ARGV.shift report = $<.read else report = ARGF.read end puts "= class Bio::Genscan::Report " report = Bio::Genscan::Report.new(report) print " report.genscan_version #=> " p report.genscan_version print " report.date_run #=> " p report.date_run print " report.time #=> " p report.time print " report.query_name #=> " p report.query_name print " report.length #=> " p report.length print " report.gccontent #=> " p report.gccontent print " report.isochore #=> " p report.isochore print " report.matrix #=> " p report.matrix puts " report.predictions (Array of Bio::Genscan::Report::Gene) " print " report.predictions.size #=> " p report.predictions.size report.predictions.each {|gene| puts "\n== class Bio::Genscan::Report::Gene " print " gene.number #=> " p gene.number print " gene.aaseq (Bio::FastaFormat) #=> " p gene.aaseq print " gene.naseq (Bio::FastaFormat) #=> " p gene.naseq print " ene.promoter (Bio::Genscan::Report::Exon) #=> " p gene.promoter print " gene.polyA (Bio::Genscan::Report::Exon) #=> " p gene.polyA puts " gene.exons (Array of Bio::Genscan::Report::Exon) " print " gene.exons.size #=> " p gene.exons.size gene.exons.each {|exon| puts "\n== class Bio::Genscan::Report::Exon " print " exon.number #=> " p exon.number print " exon.exon_type #=> " p exon.exon_type print " exon.exon_type_long #=> " p exon.exon_type_long print " exon.strand #=> " p exon.strand print " exon.first #=> " p exon.first print " exon.last #=> " p exon.last print " exon.range (Range) #=> " p exon.range print " exon.frame #=> " p exon.frame print " exon.phase #=> " p exon.phase print " exon.acceptor_score #=> " p exon.acceptor_score print " exon.donor_score #=> " p exon.donor_score print " exon.initiation_score #=> " p exon.initiation_score print " exon.termination_score #=> " p exon.termination_score print " exon.score #=> " p exon.score print " exon.p_value #=> " p exon.p_value print " exon.t_score #=> " p exon.t_score puts } puts } #end ### Sample Genscan report is attached below. ### The lines after the "__END__" can be accessed by using "DATA". __END__ GENSCAN 1.0 Date run: 30-May-103 Time: 14:06:28 Sequence HUMRASH : 12942 bp : 68.17% C+G : Isochore 4 (57 - 100 C+G%) Parameter matrix: HumanIso.smat Predicted genes/exons: Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr.. ----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------ 1.01 Init + 1664 1774 111 1 0 94 83 212 0.997 21.33 1.02 Intr + 2042 2220 179 1 2 104 66 408 0.997 40.12 1.03 Intr + 2374 2533 160 1 1 89 94 302 0.999 32.08 1.04 Term + 3231 3350 120 2 0 115 48 202 0.980 18.31 1.05 PlyA + 3722 3727 6 -5.80 2.00 Prom + 6469 6508 40 -7.92 2.01 Init + 8153 8263 111 1 0 94 83 212 0.998 21.33 2.02 Intr + 8531 8709 179 1 2 104 66 408 0.997 40.12 2.03 Intr + 8863 9022 160 1 1 89 94 302 0.999 32.08 2.04 Term + 9720 9839 120 2 0 115 48 202 0.961 18.31 Predicted peptide sequence(s): Predicted coding sequence(s): >HUMRASH|GENSCAN_predicted_peptide_1|189_aa MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAG QEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDL AARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQHKLRKLNPPDESGPG CMSCKCVLS >HUMRASH|GENSCAN_predicted_CDS_1|570_bp atgacggaatataagctggtggtggtgggcgccggcggtgtgggcaagagtgcgctgacc atccagctgatccagaaccattttgtggacgaatacgaccccactatagaggattcctac cggaagcaggtggtcattgatggggagacgtgcctgttggacatcctggataccgccggc caggaggagtacagcgccatgcgggaccagtacatgcgcaccggggagggcttcctgtgt gtgtttgccatcaacaacaccaagtcttttgaggacatccaccagtacagggagcagatc aaacgggtgaaggactcggatgacgtgcccatggtgctggtggggaacaagtgtgacctg gctgcacgcactgtggaatctcggcaggctcaggacctcgcccgaagctacggcatcccc tacatcgagacctcggccaagacccggcagggagtggaggatgccttctacacgttggtg cgtgagatccggcagcacaagctgcggaagctgaaccctcctgatgagagtggccccggc tgcatgagctgcaagtgtgtgctctcctga >HUMRASH|GENSCAN_predicted_peptide_2|189_aa MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAG QEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDL AARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQHKLRKLNPPDESGPG CMSCKCVLS >HUMRASH|GENSCAN_predicted_CDS_2|570_bp atgacggaatataagctggtggtggtgggcgccggcggtgtgggcaagagtgcgctgacc atccagctgatccagaaccattttgtggacgaatacgaccccactatagaggattcctac cggaagcaggtggtcattgatggggagacgtgcctgttggacatcctggataccgccggc caggaggagtacagcgccatgcgggaccagtacatgcgcaccggggagggcttcctgtgt gtgtttgccatcaacaacaccaagtcttttgaggacatccaccagtacagggagcagatc aaacgggtgaaggactcggatgacgtgcccatggtgctggtggggaacaagtgtgacctg gctgcacgcactgtggaatctcggcaggctcaggacctcgcccgaagctacggcatcccc tacatcgagacctcggccaagacccggcagggagtggaggatgccttctacacgttggtg cgtgagatccggcagcacaagctgcggaagctgaaccctcctgatgagagtggccccggc tgcatgagctgcaagtgtgtgctctcctga bio-2.0.3/sample/fastasort.rb0000755000175000017500000000337314141516614015443 0ustar nileshnilesh#!/usr/bin/env ruby # # fastasort: Sorts a FASTA file (in fact it can use any flat file input supported # by BIORUBY) while modifying the definition of each record in the # process so it is suitable for processing with (for example) pal2nal # and PAML. # # Copyright (C) 2008 KATAYAMA Toshiaki & Pjotr Prins # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: fastasort.rb,v 1.2 2008/05/19 12:22:05 pjotr Exp $ # require 'bio' include Bio table = Hash.new # table to sort objects ARGV.each do | fn | Bio::FlatFile.auto(fn).each do | item | # Some procession of the definition for external programs (just # an example): # strip JALView extension from definition e.g. .../1-212 if item.definition =~ /\/\d+-\d+$/ item.definition = $` end # substitute slashes: definition = item.definition.gsub(/\//,'-') # substitute quotes and ampersands: definition = item.definition.gsub(/['"&]/,'x') # prefix letters if the first position is a number: definition = 'seq'+definition if definition =~ /^\d/ # Now add the data to the sort table table[definition] = item.data end end # Output sorted table table.sort.each do | definition, data | rec = Bio::FastaFormat.new('> '+definition.strip+"\n"+data) print rec end bio-2.0.3/sample/demo_hmmer_report.rb0000644000175000017500000000733514141516614017143 0ustar nileshnilesh# # = sample/demo_hmmer_report.rb - demonstration of Bio::HMMER::Report # # Copyright:: Copyright (C) 2002 # Hiroshi Suga , # Copyright:: Copyright (C) 2005 # Masashi Fujita # License:: The Ruby License # # # == Description # # Demonstration of Bio::HMMER::Report (HMMER output parser). # # Note that it (and Bio::HMMER::Report) supports HMMER 2.x. # HMMER 3.x is currently not supported. # # == Usage # # Specify a file containing a HMMER result. # # $ ruby demo_hmmer_report.rb file # # Example usage using test data: # # $ ruby -Ilib sample/demo_hmmer_report.rb test/data/HMMER/hmmsearch.out # $ ruby -Ilib sample/demo_blast_report.rb test/data/HMMER/hmmpfam.out # # == Development information # # The code was moved from lib/bio/appl/hmmer/report.rb. # require 'bio' #if __FILE__ == $0 =begin # # for multiple reports in a single output file (hmmpfam) # Bio::HMMER.reports(ARGF.read) do |report| report.hits.each do |hit| hit.hsps.each do |hsp| end end end =end begin require 'pp' alias p pp rescue LoadError end rep = Bio::HMMER::Report.new(ARGF.read) p rep indent = 18 puts "### hmmer result" print "name : ".rjust(indent) p rep.program['name'] print "version : ".rjust(indent) p rep.program['version'] print "copyright : ".rjust(indent) p rep.program['copyright'] print "license : ".rjust(indent) p rep.program['license'] print "HMM file : ".rjust(indent) p rep.parameter['HMM file'] print "Sequence file : ".rjust(indent) p rep.parameter['Sequence file'] print "Query sequence : ".rjust(indent) p rep.query_info['Query sequence'] print "Accession : ".rjust(indent) p rep.query_info['Accession'] print "Description : ".rjust(indent) p rep.query_info['Description'] rep.each do |hit| puts "## each hit" print "accession : ".rjust(indent) p [ hit.accession, hit.target_id, hit.hit_id, hit.entry_id ] print "description : ".rjust(indent) p [ hit.description, hit.definition ] print "target_def : ".rjust(indent) p hit.target_def print "score : ".rjust(indent) p [ hit.score, hit.bit_score ] print "evalue : ".rjust(indent) p hit.evalue print "num : ".rjust(indent) p hit.num hit.each do |hsp| puts "## each hsp" print "accession : ".rjust(indent) p [ hsp.accession, hsp.target_id ] print "domain : ".rjust(indent) p hsp.domain print "seq_f : ".rjust(indent) p hsp.seq_f print "seq_t : ".rjust(indent) p hsp.seq_t print "seq_ft : ".rjust(indent) p hsp.seq_ft print "hmm_f : ".rjust(indent) p hsp.hmm_f print "hmm_t : ".rjust(indent) p hsp.hmm_t print "hmm_ft : ".rjust(indent) p hsp.hmm_ft print "score : ".rjust(indent) p [ hsp.score, hsp.bit_score ] print "evalue : ".rjust(indent) p hsp.evalue print "midline : ".rjust(indent) p hsp.midline print "hmmseq : ".rjust(indent) p hsp.hmmseq print "flatseq : ".rjust(indent) p hsp.flatseq print "query_frame : ".rjust(indent) p hsp.query_frame print "target_frame : ".rjust(indent) p hsp.target_frame print "query_seq : ".rjust(indent) p hsp.query_seq # hmmseq, flatseq print "target_seq : ".rjust(indent) p hsp.target_seq # flatseq, hmmseq print "target_from : ".rjust(indent) p hsp.target_from # seq_f, hmm_f print "target_to : ".rjust(indent) p hsp.target_to # seq_t, hmm_t print "query_from : ".rjust(indent) p hsp.query_from # hmm_f, seq_f print "query_to : ".rjust(indent) p hsp.query_to # hmm_t, seq_t end end #end bio-2.0.3/sample/seqdatabase.ini0000644000175000017500000000664014141516614016063 0ustar nileshnileshVERSION=1.00 [embl] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=embl [embl-upd] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=embl-upd [embl_biofetch] protocol=biofetch location=http://www.ebi.ac.uk/cgi-bin/dbfetch dbname=embl [embl_biosql] protocol=biosql location=localhost dbname=biosql driver=postgres user=hack pass= biodbname=embl [embl_biocorba] protocol=bsane-corba location=sqldbsrv.ior [embl_xembl] protocol=xembl location=http://www.ebi.ac.uk/xembl/XEMBL.wsdl format=Bsml [embl_flat] protcol=flat location=/export/database/ dbname=embl [genbank_bdb] protcol=flat location=/export/database/ dbname=genbank [swissprot] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=swissprot [swissprot-upd] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=swissprot-upd [swissprot_biofetch] protocol=biofetch location=http://www.ebi.ac.uk/cgi-bin/dbfetch dbname=swall [swissprot_biosql] protocol=biosql location=db.bioruby.org dbname=biosql driver=mysql user=root pass= biodbname=sp [genbank] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=genbank [genbank-upd] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=genbank-upd [genbank_biosql] protocol=biosql location=db.bioruby.org dbname=biosql driver=mysql user=root pass= biodbname=gb [refseq] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=refseq [refseq_biosql] protocol=biosql location=db.bioruby.org dbname=biosql driver=mysql user= pass= biodbname=rs [kegg-pathway] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=pathway [kegg-genome] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=genome [kegg-genes] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=genes [kegg-vgenes] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=vgenes [aaindex] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=aaindex [blocks] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=blocks [enzyme] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=enzyme [epd] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=epd [litdb] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=litdb [omim] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=omim [pdb] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=pdb [pdbstr] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=pdbstr [pfam] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=pfam [pir] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=pir [pmd] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=pmd [prf] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=prf [prints] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=prints [prodom] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=prodom [prosdoc] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=prosdoc [prosite] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=prosite [transfac] protocol=biofetch location=http://bioruby.org/cgi-bin/biofetch.rb dbname=transfac bio-2.0.3/sample/demo_kegg_compound.rb0000644000175000017500000000236514141516614017257 0ustar nileshnilesh# # = sample/demo_kegg_compound.rb - demonstration of Bio::KEGG::COMPOUND # # Copyright:: Copyright (C) 2001, 2002, 2004, 2007 Toshiaki Katayama # Copyright:: Copyright (C) 2009 Kozo Nishida # License:: The Ruby License # # # == Description # # Demonstration of Bio::KEGG::COMPOUND, a parser class for the KEGG COMPOUND # chemical structure database. # # == Usage # # Specify files containing KEGG COMPOUND data. # # $ ruby demo_kegg_compound.rb files... # # Example usage using test data: # # $ ruby -Ilib sample/demo_kegg_compound.rb test/data/KEGG/C00025.compound # # == Development information # # The code was moved from lib/bio/db/kegg/compound.rb and modified. # require 'bio' Bio::FlatFile.foreach(Bio::KEGG::COMPOUND, ARGF) do |cpd| puts "### cpd = Bio::KEGG::COMPOUND.new(str)" puts "# cpd.entry_id" p cpd.entry_id puts "# cpd.names" p cpd.names puts "# cpd.name" p cpd.name puts "# cpd.formula" p cpd.formula puts "# cpd.mass" p cpd.mass puts "# cpd.reactions" p cpd.reactions puts "# cpd.rpairs" p cpd.rpairs puts "# cpd.pathways" p cpd.pathways puts "# cpd.enzymes" p cpd.enzymes puts "# cpd.dblinks" p cpd.dblinks puts "# cpd.kcf" p cpd.kcf puts "=" * 78 end bio-2.0.3/sample/demo_ncbi_rest.rb0000644000175000017500000000675314141516614016413 0ustar nileshnilesh# # = sample/demo_ncbi_rest.rb - demonstration of Bio::NCBI::REST, NCBI E-Utilities client # # Copyright:: Copyright (C) 2008 Toshiaki Katayama # License:: The Ruby License # # # == Description # # Demonstration of Bio::NCBI::REST, NCBI E-Utilities client. # # == Requirements # # Internet connection is needed. # # == Usage # # Simply run this script. # # $ ruby demo_ncbi_rest.rb # # == Development information # # The code was moved from lib/bio/io/ncbirest.rb. # require 'bio' Bio::NCBI.default_email = 'staff@bioruby.org' #if __FILE__ == $0 gbopts = {"db"=>"nuccore", "rettype"=>"gb"} pmopts = {"db"=>"pubmed", "rettype"=>"medline"} count = {"rettype" => "count"} xml = {"retmode"=>"xml"} max = {"retmax"=>5} puts "=== class methods ===" puts "--- Search NCBI by E-Utils ---" puts Time.now puts "# count of 'tardigrada' in nuccore" puts Bio::NCBI::REST.esearch("tardigrada", gbopts.merge(count)) puts Time.now puts "# max 5 'tardigrada' entries in nuccore" puts Bio::NCBI::REST.esearch("tardigrada", gbopts.merge(max)) puts Time.now puts "# count of 'yeast kinase' in nuccore" puts Bio::NCBI::REST.esearch("yeast kinase", gbopts.merge(count)) puts Time.now puts "# max 5 'yeast kinase' entries in nuccore (XML)" puts Bio::NCBI::REST.esearch("yeast kinase", gbopts.merge(xml).merge(max)) puts Time.now puts "# count of 'genome&analysis|bioinformatics' in pubmed" puts Bio::NCBI::REST.esearch("(genome AND analysis) OR bioinformatics", pmopts.merge(count)) puts Time.now puts "# max 5 'genome&analysis|bioinformatics' entries in pubmed (XML)" puts Bio::NCBI::REST.esearch("(genome AND analysis) OR bioinformatics", pmopts.merge(xml).merge(max)) puts Time.now Bio::NCBI::REST.esearch("(genome AND analysis) OR bioinformatics", pmopts.merge(max)).each do |x| puts "# each of 5 'genome&analysis|bioinformatics' entries in pubmed" puts x end puts "--- Retrieve NCBI entry by E-Utils ---" puts Time.now puts "# '185041' entry in nuccore" puts Bio::NCBI::REST.efetch("185041", gbopts) puts Time.now puts "# 'J00231' entry in nuccore (XML)" puts Bio::NCBI::REST.efetch("J00231", gbopts.merge(xml)) puts Time.now puts "# 16381885 entry in pubmed" puts Bio::NCBI::REST.efetch(16381885, pmopts) puts Time.now puts "# '16381885' entry in pubmed" puts Bio::NCBI::REST.efetch("16381885", pmopts) puts Time.now puts "# [10592173,14693808] entries in pubmed" puts Bio::NCBI::REST.efetch([10592173, 14693808], pmopts) puts Time.now puts "# [10592173,14693808] entries in pubmed (XML)" puts Bio::NCBI::REST.efetch([10592173, 14693808], pmopts.merge(xml)) puts "=== instance methods ===" ncbi = Bio::NCBI::REST.new puts "--- Search NCBI by E-Utils ---" puts Time.now puts "# count of 'genome&analysis|bioinformatics' in pubmed" puts ncbi.esearch("(genome AND analysis) OR bioinformatics", pmopts.merge(count)) puts Time.now puts "# max 5 'genome&analysis|bioinformatics' entries in pubmed" puts ncbi.esearch("(genome AND analysis) OR bioinformatics", pmopts.merge(max)) puts Time.now ncbi.esearch("(genome AND analysis) OR bioinformatics", pmopts).each do |x| puts "# each 'genome&analysis|bioinformatics' entries in pubmed" puts x end puts "--- Retrieve NCBI entry by E-Utils ---" puts Time.now puts "# 16381885 entry in pubmed" puts ncbi.efetch(16381885, pmopts) puts Time.now puts "# [10592173,14693808] entries in pubmed" puts ncbi.efetch([10592173, 14693808], pmopts) #end bio-2.0.3/sample/demo_sequence.rb0000644000175000017500000001036614141516614016246 0ustar nileshnilesh# # = sample/demo_sequence.rb - demonstration of sequence manipulation # # Copyright:: Copyright (C) 2000-2006 # Toshiaki Katayama , # Mitsuteru C. Nakao # License:: The Ruby License # # $Id:$ # # == Description # # Demonstration of biological sequence manipulation. # # == Usage # # Simply run this script. # # $ ruby demo_sequence.rb # # == Development information # # The code was moved from lib/bio/sequence.rb. # require 'bio' #if __FILE__ == $0 puts "== Test Bio::Sequence::NA.new" p Bio::Sequence::NA.new('') p na = Bio::Sequence::NA.new('atgcatgcATGCATGCAAAA') p rna = Bio::Sequence::NA.new('augcaugcaugcaugcaaaa') puts "\n== Test Bio::Sequence::AA.new" p Bio::Sequence::AA.new('') p aa = Bio::Sequence::AA.new('ACDEFGHIKLMNPQRSTVWYU') puts "\n== Test Bio::Sequence#to_s" p na.to_s p aa.to_s puts "\n== Test Bio::Sequence#subseq(2,6)" p na p na.subseq(2,6) puts "\n== Test Bio::Sequence#[2,6]" p na p na[2,6] puts "\n== Test Bio::Sequence#to_fasta('hoge', 8)" puts na.to_fasta('hoge', 8) puts "\n== Test Bio::Sequence#window_search(15)" p na na.window_search(15) {|x| p x} puts "\n== Test Bio::Sequence#total({'a'=>0.1,'t'=>0.2,'g'=>0.3,'c'=>0.4})" p na.total({'a'=>0.1,'t'=>0.2,'g'=>0.3,'c'=>0.4}) puts "\n== Test Bio::Sequence#composition" p na p na.composition p rna p rna.composition puts "\n== Test Bio::Sequence::NA#splicing('complement(join(1..5,16..20))')" p na p na.splicing("complement(join(1..5,16..20))") p rna p rna.splicing("complement(join(1..5,16..20))") puts "\n== Test Bio::Sequence::NA#complement" p na.complement p rna.complement p Bio::Sequence::NA.new('tacgyrkmhdbvswn').complement p Bio::Sequence::NA.new('uacgyrkmhdbvswn').complement puts "\n== Test Bio::Sequence::NA#translate" p na p na.translate p rna p rna.translate puts "\n== Test Bio::Sequence::NA#gc_percent" p na.gc_percent p rna.gc_percent puts "\n== Test Bio::Sequence::NA#illegal_bases" p na.illegal_bases p Bio::Sequence::NA.new('tacgyrkmhdbvswn').illegal_bases p Bio::Sequence::NA.new('abcdefghijklmnopqrstuvwxyz-!%#$@').illegal_bases puts "\n== Test Bio::Sequence::NA#molecular_weight" p na p na.molecular_weight p rna p rna.molecular_weight puts "\n== Test Bio::Sequence::NA#to_re" p Bio::Sequence::NA.new('atgcrymkdhvbswn') p Bio::Sequence::NA.new('atgcrymkdhvbswn').to_re p Bio::Sequence::NA.new('augcrymkdhvbswn') p Bio::Sequence::NA.new('augcrymkdhvbswn').to_re puts "\n== Test Bio::Sequence::NA#names" p na.names puts "\n== Test Bio::Sequence::NA#pikachu" p na.pikachu puts "\n== Test Bio::Sequence::NA#randomize" print "Orig : "; p na print "Rand : "; p na.randomize print "Rand : "; p na.randomize print "Rand : "; p na.randomize.randomize print "Block : "; na.randomize do |x| print x end; puts print "Orig : "; p rna print "Rand : "; p rna.randomize print "Rand : "; p rna.randomize print "Rand : "; p rna.randomize.randomize print "Block : "; rna.randomize do |x| print x end; puts puts "\n== Test Bio::Sequence::NA.randomize(counts)" print "Count : "; p counts = {'a'=>10,'c'=>20,'g'=>30,'t'=>40} print "Rand : "; p Bio::Sequence::NA.randomize(counts) print "Count : "; p counts = {'a'=>10,'c'=>20,'g'=>30,'u'=>40} print "Rand : "; p Bio::Sequence::NA.randomize(counts) print "Block : "; Bio::Sequence::NA.randomize(counts) {|x| print x}; puts puts "\n== Test Bio::Sequence::AA#codes" p aa p aa.codes puts "\n== Test Bio::Sequence::AA#names" p aa p aa.names puts "\n== Test Bio::Sequence::AA#molecular_weight" p aa.subseq(1,20) p aa.subseq(1,20).molecular_weight puts "\n== Test Bio::Sequence::AA#randomize" aaseq = 'MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDA' s = Bio::Sequence::AA.new(aaseq) print "Orig : "; p s print "Rand : "; p s.randomize print "Rand : "; p s.randomize print "Rand : "; p s.randomize.randomize print "Block : "; s.randomize {|x| print x}; puts puts "\n== Test Bio::Sequence::AA.randomize(counts)" print "Count : "; p counts = s.composition print "Rand : "; puts Bio::Sequence::AA.randomize(counts) print "Block : "; Bio::Sequence::AA.randomize(counts) {|x| print x}; puts #end bio-2.0.3/sample/fastq2html.testdata.yaml0000644000175000017500000000025314141516614017660 0ustar nileshnileshfastq: - class: File location: ../test/data/fastq/longreads_as_sanger.fastq - class: File location: ../test/data/fastq/sanger_full_range_original_sanger.fastq bio-2.0.3/sample/demo_bl2seq_report.rb0000644000175000017500000001710614141516614017220 0ustar nileshnilesh# # = sample/demo_bl2seq_report.rb - demo of bl2seq (BLAST 2 sequences) parser # # Copyright:: Copyright (C) 2005 Naohisa Goto # License:: The Ruby License # # == Description # # Demonstration of Bio::Blast::Bl2seq::Report, bl2seq (BLAST 2 sequences) # parser class. # # == Usage # # Run this script with specifying filename(s) containing bl2seq result(s). # # $ ruby demo_bl2seq_report.rb files... # # Example usage using test data: # # $ ruby -I lib sample/demo_bl2seq_report.rb test/data/bl2seq/cd8a_cd8b_blastp.bl2seq # # == Development information # # The code was moved from lib/bio/appl/bl2seq/report.rb # require 'bio' if ARGV.empty? then puts "Demonstration of bl2seq (BLAST 2 sequences) parser." puts "Usage: #{$0} files..." exit(0) end Bio::FlatFile.open(Bio::Blast::Bl2seq::Report, ARGF) do |ff| ff.each do |rep| print "# === Bio::Blast::Bl2seq::Report\n" puts #@#print " rep.program #=> "; p rep.program #@#print " rep.version #=> "; p rep.version #@#print " rep.reference #=> "; p rep.reference #@#print " rep.db #=> "; p rep.db #print " rep.query_id #=> "; p rep.query_id print " rep.query_def #=> "; p rep.query_def print " rep.query_len #=> "; p rep.query_len #puts #@#print " rep.version_number #=> "; p rep.version_number #@#print " rep.version_date #=> "; p rep.version_date puts print "# === Parameters\n" #puts #print " rep.parameters #=> "; p rep.parameters puts print " rep.matrix #=> "; p rep.matrix print " rep.expect #=> "; p rep.expect #print " rep.inclusion #=> "; p rep.inclusion print " rep.sc_match #=> "; p rep.sc_match print " rep.sc_mismatch #=> "; p rep.sc_mismatch print " rep.gap_open #=> "; p rep.gap_open print " rep.gap_extend #=> "; p rep.gap_extend #print " rep.filter #=> "; p rep.filter #@#print " rep.pattern #=> "; p rep.pattern #print " rep.entrez_query #=> "; p rep.entrez_query #puts #@#print " rep.pattern_positions #=> "; p rep.pattern_positions puts print "# === Statistics (last iteration's)\n" #puts #print " rep.statistics #=> "; p rep.statistics puts print " rep.db_num #=> "; p rep.db_num print " rep.db_len #=> "; p rep.db_len #print " rep.hsp_len #=> "; p rep.hsp_len print " rep.eff_space #=> "; p rep.eff_space print " rep.kappa #=> "; p rep.kappa print " rep.lambda #=> "; p rep.lambda print " rep.entropy #=> "; p rep.entropy puts print " rep.num_hits #=> "; p rep.num_hits print " rep.gapped_kappa #=> "; p rep.gapped_kappa print " rep.gapped_lambda #=> "; p rep.gapped_lambda print " rep.gapped_entropy #=> "; p rep.gapped_entropy print " rep.posted_date #=> "; p rep.posted_date puts #@#print "# === Message (last iteration's)\n" #@#puts #@#print " rep.message #=> "; p rep.message #puts #@#print " rep.converged? #=> "; p rep.converged? #@#puts print "# === Iterations\n" puts print " rep.itrerations.each do |itr|\n" puts rep.iterations.each do |itr| print "# --- Bio::Blast::Bl2seq::Report::Iteration\n" puts print " itr.num #=> "; p itr.num #print " itr.statistics #=> "; p itr.statistics #@#print " itr.message #=> "; p itr.message print " itr.hits.size #=> "; p itr.hits.size #puts #@#print " itr.hits_newly_found.size #=> "; p itr.hits_newly_found.size; #@#print " itr.hits_found_again.size #=> "; p itr.hits_found_again.size; #@#if itr.hits_for_pattern then #@#itr.hits_for_pattern.each_with_index do |hp, hpi| #@#print " itr.hits_for_pattern[#{hpi}].size #=> "; p hp.size; #@#end #@#end #@#print " itr.converged? #=> "; p itr.converged? puts print " itr.hits.each do |hit|\n" puts itr.hits.each_with_index do |hit, i| print "# --- Bio::Blast::Bl2seq::Default::Report::Hit" print " ([#{i}])\n" puts #print " hit.num #=> "; p hit.num #print " hit.hit_id #=> "; p hit.hit_id print " hit.len #=> "; p hit.len print " hit.definition #=> "; p hit.definition #print " hit.accession #=> "; p hit.accession #puts print " hit.found_again? #=> "; p hit.found_again? print " --- compatible/shortcut ---\n" #print " hit.query_id #=> "; p hit.query_id #print " hit.query_def #=> "; p hit.query_def #print " hit.query_len #=> "; p hit.query_len #print " hit.target_id #=> "; p hit.target_id print " hit.target_def #=> "; p hit.target_def print " hit.target_len #=> "; p hit.target_len print " --- first HSP's values (shortcut) ---\n" print " hit.evalue #=> "; p hit.evalue print " hit.bit_score #=> "; p hit.bit_score print " hit.identity #=> "; p hit.identity #print " hit.overlap #=> "; p hit.overlap print " hit.query_seq #=> "; p hit.query_seq print " hit.midline #=> "; p hit.midline print " hit.target_seq #=> "; p hit.target_seq print " hit.query_start #=> "; p hit.query_start print " hit.query_end #=> "; p hit.query_end print " hit.target_start #=> "; p hit.target_start print " hit.target_end #=> "; p hit.target_end print " hit.lap_at #=> "; p hit.lap_at print " --- first HSP's vaules (shortcut) ---\n" print " --- compatible/shortcut ---\n" puts print " hit.hsps.size #=> "; p hit.hsps.size if hit.hsps.size == 0 then puts " (HSP not found: please see blastall's -b and -v options)" puts else puts print " hit.hsps.each do |hsp|\n" puts hit.hsps.each_with_index do |hsp, j| print "# --- Bio::Blast::Default::Report::HSP (Bio::Blast::Bl2seq::Report::HSP)" print " ([#{j}])\n" puts #print " hsp.num #=> "; p hsp.num print " hsp.bit_score #=> "; p hsp.bit_score print " hsp.score #=> "; p hsp.score print " hsp.evalue #=> "; p hsp.evalue print " hsp.identity #=> "; p hsp.identity print " hsp.gaps #=> "; p hsp.gaps print " hsp.positive #=> "; p hsp.positive print " hsp.align_len #=> "; p hsp.align_len #print " hsp.density #=> "; p hsp.density print " hsp.query_frame #=> "; p hsp.query_frame print " hsp.query_from #=> "; p hsp.query_from print " hsp.query_to #=> "; p hsp.query_to print " hsp.hit_frame #=> "; p hsp.hit_frame print " hsp.hit_from #=> "; p hsp.hit_from print " hsp.hit_to #=> "; p hsp.hit_to #print " hsp.pattern_from#=> "; p hsp.pattern_from #print " hsp.pattern_to #=> "; p hsp.pattern_to print " hsp.qseq #=> "; p hsp.qseq print " hsp.midline #=> "; p hsp.midline print " hsp.hseq #=> "; p hsp.hseq puts print " hsp.percent_identity #=> "; p hsp.percent_identity #print " hsp.mismatch_count #=> "; p hsp.mismatch_count # print " hsp.query_strand #=> "; p hsp.query_strand print " hsp.hit_strand #=> "; p hsp.hit_strand print " hsp.percent_positive #=> "; p hsp.percent_positive print " hsp.percent_gaps #=> "; p hsp.percent_gaps puts end #each end #if hit.hsps.size == 0 end end end #ff.each end #FlatFile.open bio-2.0.3/sample/fastagrep.rb0000755000175000017500000000344114141516614015405 0ustar nileshnilesh#!/usr/bin/env ruby # # fastagrep: Greps a FASTA file (in fact it can use any flat file input supported # by BIORUBY) and outputs sorted FASTA # # Copyright (C) 2008 KATAYAMA Toshiaki & Pjotr Prins # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: fastagrep.rb,v 1.1 2008/05/19 12:22:05 pjotr Exp $ # require 'bio' include Bio usage = < reduced.fasta As the result is a FASTA stream you could pipe it for sorting: fastagrep.rb "/Arabidopsis|Drosophila/i" *.seq | fastasort.rb USAGE if ARGV.size == 0 print usage exit 1 end skip = (ARGV[0] == '-v') ARGV.shift if skip # ---- Valid regular expression - if it is not a file regex = ARGV[0] if regex=~/^\// and !File.exist?(regex) ARGV.shift else print usage exit 1 end ARGV.each do | fn | Bio::FlatFile.auto(fn).each do | item | if skip next if eval("item.definition =~ #{regex}") else next if eval("item.definition !~ #{regex}") end rec = Bio::FastaFormat.new('> '+item.definition.strip+"\n"+item.data) print rec end end bio-2.0.3/sample/biofetch.rb0000755000175000017500000003347114141516614015222 0ustar nileshnilesh#!/usr/bin/env ruby # coding: utf-8 # # biofetch.rb : BioFetch server (interface to TogoWS) # # Copyright (C) 2002-2004 KATAYAMA Toshiaki # 2013 GOTO Naohisa # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # require 'cgi' require 'erb' require 'open-uri' require 'fileutils' require 'tempfile' MAX_ID_NUM = 50 # script name SCRIPT_NAME = File.basename(__FILE__) # full URL for this CGI BASE_URL = "http://bioruby.org/cgi-bin/#{SCRIPT_NAME}" # cache directory for metadata # Note: The cache is only for metadata (database list and format list). # Data entries are NOT cached. CACHE_DIR = '/tmp/biofetch_rb.cache' # cache lifetime CACHE_LIFETIME = 60 * 60 # 1 hour module TogoWS TOGOWS_URL = 'http://togows.dbcls.jp/' def togows_database_complete_list result = togows_get_cached('/entry/') result.to_s.split(/\n/).collect {|x| x.split(/\t/) } end def togows_database_formats(db) db = CGI.escape(db) result = togows_get_cached("/entry/#{db}/?formats") end def togows_get(path) uristr = TOGOWS_URL + path begin result = OpenURI.open_uri(uristr).read rescue OpenURI::HTTPError result = nil end result end private def togows_get_cached(path) filepath = path.sub(/\A\//, '').sub(/\/\z/, '') filepath = filepath.gsub(/\//, " ") filepath = filepath.sub(/\?/, '_') filepath = File.join(CACHE_DIR, filepath) result = nil begin if Time.now - File.mtime(filepath) > CACHE_LIFETIME # delete expired cache file File.delete(filepath) end result = File.read(filepath) rescue IOError, SystemCallError result = nil end unless result then # valid cache is not found result = togows_get(path) if result then # create cache directory if not found FileUtils.mkdir_p(CACHE_DIR, :mode => 0700) # simple security check for the cache dir if File.stat(CACHE_DIR).mode & 0022 != 0 then raise SecurityError, "CACHE_DIR #{CACHE_DIR} is writeable by others" end # write to temporary file tmp = Tempfile.open('temp', CACHE_DIR) tmp.print result tmp.close # create a hard link from the temporary to the cache file begin File.link(tmp.path, filepath) rescue IOError, SystemCallError end # the temporay file will be automatically removed at exit end end result end end #module TogoWS module BioFetchError def print_text_page(str) print "Content-type: text/plain; charset=UTF-8\n\n" puts str exit end def print_html_page(str) print "Content-type: text/html; charset=UTF-8\n\n" print "
", CGI.escapeHTML(str), "
\n" exit end def error1(db) db = CGI.escapeHTML(db.to_s) # to avoid potential XSS with old IE str = "ERROR 1 Unknown database [#{db}]." print_text_page(str) end def error2(style) style = CGI.escapeHTML(style.to_s) # to avoid potential XSS with old IE str = "ERROR 2 Unknown style [#{style}]." print_text_page(str) end def error3(format, db) # to avoid potential XSS with old IE which ignores Content-Type db = CGI.escapeHTML(db.to_s) format = CGI.escapeHTML(format.to_s) str = "ERROR 3 Format [#{format}] not known for database [#{db}]." print_text_page(str) end def error4(entry_id, db) # to avoid potential XSS with old IE which ignores Content-Type entry_id = CGI.escapeHTML(entry_id.to_s) db = CGI.escapeHTML(db.to_s) str = "ERROR 4 ID [#{entry_id}] not found in database [#{db}]." print_text_page(str) end def error5(count) # to avoid potential XSS with old IE which ignores Content-Type count = CGI.escapeHTML(count.to_s) str = "ERROR 5 Too many IDs [#{count}]. Max [#{MAX_ID_NUM}] allowed." print_text_page(str) end def error6(info) # to avoid potential XSS with old IE which ignores Content-Type count = CGI.escapeHTML(info.to_s) str = "ERROR 6 Illegal information request [#{info}]." print_text_page(str) end end module ApiBridge include BioFetchError include TogoWS def list_databases_with_synonyms togows_database_complete_list end def list_databases list_databases_with_synonyms.flatten end def bget(db, id_list, format) case format when 'fasta' format = '.fasta' else format = '' end db = CGI.escape(db) results = '' id_list.each do |query_id| query_id = CGI.escape(query_id) path = "/entry/#{db}/#{query_id}#{format}" result = togows_get(path) if !result or result.empty? or /\AError\: / =~ result then error4(query_id, db) else results << result end end return results end def check_fasta_ok?(db) result = togows_database_formats(db) /^fasta$/ =~ result.to_s end end #module ApiBridge module BioFetchCheck include ApiBridge private def check_style(style) style = style.to_s.downcase error2(style) unless /\A(html|raw)\z/.match(style) style end def check_format(format, db) fmt = format ? format.to_s.downcase : nil case fmt when 'fasta' db = check_dbname(db) fmt = nil unless check_fasta_ok?(db) when 'default' # do nothing when nil fmt = 'default' else fmt = nil end error3(format, db) unless fmt fmt end def check_number_of_id(num) error5(num) if num > MAX_ID_NUM end def check_dbname(db) db = db.to_s.downcase error1(db) unless list_databases.include?(db) db end end #module BioFetchCheck class BioFetch include BioFetchCheck include BioFetchError include ApiBridge def initialize(db, id_list, style, format) style = check_style(style) format = check_format(format, db) check_number_of_id(id_list.length) db = check_dbname(db) entries = bget(db, id_list, format) if style == 'html' then print_html_page(entries) else print_text_page(entries) end end end #class BioFetch class BioFetchInfo include BioFetchCheck include BioFetchError include ApiBridge def initialize(info, db) @db = db begin check_info(info) ? __send__(info) : raise rescue error6(info) end end private def check_info(meth_name) /\A(dbs|formats|maxids)\z/ =~ meth_name end def dbs str = list_databases.sort.join(' ') print_text_page(str) end def formats db = check_dbname(@db) fasta = " fasta" if check_fasta_ok?(db) str = "default#{fasta}" print_text_page(str) end def maxids str = MAX_ID_NUM.to_s print_text_page(str) end end #class BioFetchInfo class BioFetchCGI include ApiBridge def initialize(cgi) @cgi = cgi show_page end private def show_page if info.empty? if id_list.empty? show_query_page else show_result_page(db, id_list, style, format) end else show_info_page(info, db) end end def show_query_page html = ERB.new(DATA.read) max_id_num = MAX_ID_NUM databases_with_synonyms = list_databases_with_synonyms databases = list_databases script_name = SCRIPT_NAME base_url = BASE_URL @cgi.out({ "type" => "text/html", "charset" => "utf-8" }) do html.result(binding) end end def show_result_page(db, id_list, style, format) BioFetch.new(db, id_list, style, format) end def show_info_page(info, db) BioFetchInfo.new(info, db) end def info @cgi['info'].downcase end def db @cgi['db'].downcase end def id_list @cgi['id'].strip.split(/[\,\s]+/) end def style s = @cgi['style'].downcase return s.empty? ? "html" : s end def format f = @cgi['format'].downcase return f.empty? ? "default" : f end end BioFetchCGI.new(CGI.new) =begin This program was created during BioHackathon 2002, Tucson and updated in Cape Town :) Rewrited in 2013 to use TogoWS API as the bioruby.org server left from The University of Tokyo and the old SOAP-based KEGG API is discontinued. =end __END__ BioFetch interface to TogoWS

BioFetch interface to TogoWS

This page allows you to retrieve up to <%= max_id_num %> entries at a time from various up-to-date biological databases.



Direct access

<%= base_url %>?format=(default|fasta|...);style=(html|raw);db=(nuccore|embl|...);id=ID[,ID,ID,...]

(NOTE: the option separator ';' can be '&')

format (optional)
default|fasta|...
style (required)
html|raw
db (required)
<%= databases.join('|') %>
id (required)
comma separated list of IDs

See the BioFetch specification for more details.

Server informations

What databases are available?
<%= base_url %>?info=dbs
What formats does the database X have?
<%= base_url %>?info=formats;db=embl
How many entries can be retrieved simultaneously?
<%= base_url %>?info=maxids

Examples

nuccore/AJ617376 (default/raw)
<%= base_url %>?format=default;style=raw;db=nuccore;id=AJ617376
nuccore/AJ617376 (fasta/raw)
<%= base_url %>?format=fasta;style=raw;db=nuccore;id=AJ617376
nuccore/AJ617376 (default/html)
<%= base_url %>?format=default;style=html;db=nuccore;id=AJ617376
nuccore/AJ617376,AJ617377 (default/raw, multiple)
<%= base_url %>?format=default;style=raw;db=nuccore;id=AJ617376,AJ617377
embl/J00231 (default/raw)
<%= base_url %>?format=default;style=raw;db=embl;id=J00231
uniprot/CYC_BOVIN (default/raw)
<%= base_url %>?format=default;style=raw;db=uniprot;id=CYC_BOVIN
uniprot/CYC_BOVIN (fasta/raw)
<%= base_url %>?format=fasta;style=raw;db=uniprot;id=CYC_BOVIN
genes/eco:b0015 (default/raw)
<%= base_url %>?format=default;style=raw;db=genes;id=eco%3Ab0015
<%= base_url %>?format=default;style=raw;db=genes;id=eco:b0015

Errors

Error1 sample : DB not found
<%= base_url %>?format=default;style=raw;db=nonexistent;id=AJ617376
Error2 sample : unknown style
<%= base_url %>?format=default;style=nonexistent;db=nuccore;id=AJ617376
Error3 sample : unknown format
<%= base_url %>?format=nonexistent;style=raw;db=nuccore;id=AJ617376
Error4 sample : ID not found
<%= base_url %>?format=default;style=raw;db=nuccore;id=nonexistent
Error5 sample : too many IDs
<%= base_url %>?style=raw;db=genes;id=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51
Error6 sample : unknown info
<%= base_url %>?info=nonexistent"

Other BioFetch implementations


staff@BioRuby.org

bio-2.0.3/sample/na2aa.rb0000755000175000017500000000063514141516614014415 0ustar nileshnilesh#!/usr/bin/env ruby # # na2aa.rb - translate any NA input into AA FASTA format # # Copyright:: Copyright (C) 2019 BioRuby Project # License:: The Ruby License # require 'bio' ARGV.each do |fn| Bio::FlatFile.open(fn) do |ff| ff.each do |entry| next if /\A\s*\z/ =~ ff.entry_raw.to_s na = entry.naseq aa = na.translate print aa.to_fasta(entry.definition, 70) end end end bio-2.0.3/sample/demo_codontable.rb0000644000175000017500000000553214141516614016547 0ustar nileshnilesh# # = sample/demo_codontable.rb - demonstration of Bio::CodonTable # # Copyright:: Copyright (C) 2001, 2004 # Toshiaki Katayama # License:: The Ruby License # # # == Description # # Demonstration of Bio::CodonTable. # # == Usage # # Simply run this script. # # $ ruby demo_codontable.rb # # == Development information # # The code was moved from lib/bio/data/codontable.rb. # require 'bio' #if __FILE__ == $0 begin require 'pp' alias p pp rescue LoadError end puts "### Bio::CodonTable[1]" p ct1 = Bio::CodonTable[1] puts ">>> Bio::CodonTable#table" p ct1.table puts ">>> Bio::CodonTable#each" ct1.each do |codon, aa| puts "#{codon} -- #{aa}" end puts ">>> Bio::CodonTable#definition" p ct1.definition puts ">>> Bio::CodonTable#['atg']" p ct1['atg'] puts ">>> Bio::CodonTable#revtrans('A')" p ct1.revtrans('A') puts ">>> Bio::CodonTable#start_codon?('atg')" p ct1.start_codon?('atg') puts ">>> Bio::CodonTable#start_codon?('aaa')" p ct1.start_codon?('aaa') puts ">>> Bio::CodonTable#stop_codon?('tag')" p ct1.stop_codon?('tag') puts ">>> Bio::CodonTable#stop_codon?('aaa')" p ct1.stop_codon?('aaa') puts ">>> ct1_copy = Bio::CodonTable.copy(1)" p ct1_copy = Bio::CodonTable.copy(1) puts ">>> ct1_copy['tga'] = 'U'" p ct1_copy['tga'] = 'U' puts " orig : #{ct1['tga']}" puts " copy : #{ct1_copy['tga']}" puts "### ct = Bio::CodonTable.new(hash, definition)" hash = { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => 'L', 'tca' => 'S', 'taa' => '*', 'tga' => 'U', 'ttg' => 'L', 'tcg' => 'S', 'tag' => '*', 'tgg' => 'W', 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'L', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'I', 'aca' => 'T', 'aaa' => 'K', 'aga' => 'R', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'R', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', } my_ct = Bio::CodonTable.new(hash, "my codon table") puts ">>> ct.definition" puts my_ct.definition puts ">>> ct.definition=(str)" my_ct.definition = "selenoproteins (Eukaryote)" puts my_ct.definition puts ">>> ct['tga']" puts my_ct['tga'] puts ">>> ct.revtrans('U')" puts my_ct.revtrans('U') puts ">>> ct.stop_codon?('tga')" puts my_ct.stop_codon?('tga') puts ">>> ct.stop_codon?('tag')" puts my_ct.stop_codon?('tag') #end bio-2.0.3/sample/demo_genbank.rb0000644000175000017500000000475514141516614016050 0ustar nileshnilesh# # = sample/demo_genbank.rb - demonstration of Bio::GenBank # # Copyright:: Copyright (C) 2000-2005 Toshiaki Katayama # License:: The Ruby License # # # == Description # # Demonstration of Bio::GenBank, the parser class for the GenBank entry. # # == Usage # # Usage 1: Without arguments, showing demo with a GenBank entry. # Internet connection is needed. # # $ ruby demo_genbank.rb # # Usage 2: IDs or accession numbers are given as the arguments. # Internet connection is needed. # # $ ruby demo_genbank.rb X94434 NM_000669 # # Usage 3: When the first argument is "--files", "-files", "--file", or # "-file", filenames are given as the arguments. # # $ ruby demo_genbank.rb --files file1.gbk file2.gbk ... # # == Development information # # The code was moved from lib/bio/db/genbank/genbank.rb, and modified # as below: # * To get sequences from the NCBI web service. # * By default, arguments are sequence IDs (accession numbers). # * New option "--files" (or "-files", "--file", or "-file") to # read sequences from file(s). # require 'bio' begin require 'pp' alias p pp rescue LoadError end def demo_genbank(gb) puts "### GenBank" puts "## LOCUS" puts "# GenBank.locus" p gb.locus puts "# GenBank.entry_id" p gb.entry_id puts "# GenBank.nalen" p gb.nalen puts "# GenBank.strand" p gb.strand puts "# GenBank.natype" p gb.natype puts "# GenBank.circular" p gb.circular puts "# GenBank.division" p gb.division puts "# GenBank.date" p gb.date puts "## DEFINITION" p gb.definition puts "## ACCESSION" p gb.accession puts "## VERSION" p gb.versions p gb.version p gb.gi puts "## NID" p gb.nid puts "## KEYWORDS" p gb.keywords puts "## SEGMENT" p gb.segment puts "## SOURCE" p gb.source p gb.common_name p gb.vernacular_name p gb.organism p gb.taxonomy puts "## REFERENCE" p gb.references puts "## COMMENT" p gb.comment puts "## FEATURES" p gb.features puts "## BASE COUNT" p gb.basecount p gb.basecount('a') p gb.basecount('A') puts "## ORIGIN" p gb.origin p gb.naseq puts "=" * 78 end case ARGV[0] when '-file', '--file', '-files', '--files' ARGV.shift ARGV.each do |filename| Bio::FlatFile.foreach(filename) do |gb| demo_genbank(gb) end end else efetch = Bio::NCBI::REST::EFetch.new argv = ARGV.empty? ? [ 'X94434' ] : ARGV argv.each do |id_or_accession| raw = efetch.sequence(id_or_accession) gb = Bio::GenBank.new(raw) demo_genbank(gb) end end bio-2.0.3/sample/tfastx2tab.rb0000755000175000017500000000462114141516614015514 0ustar nileshnilesh#!/usr/bin/env ruby # # tfastx2tab.rb - convert TFASTX (-m 6) output into tab delimited data for MySQL # # Usage: # # % tfastx2tab.rb TFASTX-output-file[s] > tfastx_results.tab # % mysql < tfastx_results.sql (use sample at the end of this file) # # Format accepted: # # % tfastx3[3][_t] -Q -H -m 6 query.f target.f ktup > TFASTX-output-file # # Copyright (C) 2001 KATAYAMA Toshiaki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: tfastx2tab.rb,v 0.1 2001/06/21 08:26:14 katayama Exp $ # while gets # query if /^\S+: (\d+) aa$/ q_len = $1 end # each hit if /^>>([^>]\S+).*\((\d+) aa\)$/ target = $1 t_len = $2 # d = dummy variable d, frame, d, initn, d, init1, d, opt, d, zscore, d, bits, d, evalue = gets.split(/\s+/) d, d, sw, ident, d, ugident, d, d, overlap, d, d, lap = gets.split(/\s+/) # query-hit pair print "#{$FILENAME}\t#{q_len}\t#{target}\t#{t_len}" # pick up values ary = [ initn, init1, opt, zscore, bits, evalue, sw, ident, ugident, overlap, lap ] # print values for i in ary i.tr!('^0-9.:e\-','') print "\t#{i}" end print "\t#{frame}\n" end end =begin MySQL tfastx_results.sql sample CREATE DATABASE IF NOT EXISTS db_name; CREATE TABLE IF NOT EXISTS db_name.table_name ( query varchar(25) not NULL, q_len integer unsigned default 0, target varchar(25) not NULL, t_len integer unsigned default 0, initn integer unsigned default 0, init1 integer unsigned default 0, opt integer unsigned default 0, zscore float default 0.0, bits float default 0.0, evalue float default 0.0, sw integer unsigned default 0, ident float default 0.0, ugident float default 0.0, overlap integer unsigned default 0, lap_at varchar(25) default NULL, frame varchar(5) default NULL ); LOAD DATA LOCAL INFILE 'tfastx_results.tab' INTO TABLE db_name.table_name; =end bio-2.0.3/sample/gb2tab.rb0000755000175000017500000001323614141516614014575 0ustar nileshnilesh#!/usr/bin/env ruby # # gb2tab.rb - convert GenBank into tab delimited data for MySQL # # Usage: # # % gb2tab.rb gb*.seq # # Copyright (C) 2001 KATAYAMA Toshiaki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: gb2tab.rb,v 0.11 2002/04/22 09:10:10 k Exp $ # require 'bio' $stderr.puts Time.now ARGV.each do |gbkfile| gbk = open("#{gbkfile}") ent = open("#{gbkfile}.ent.tab", "w") ft = open("#{gbkfile}.ft.tab", "w") ref = open("#{gbkfile}.ref.tab", "w") seq = open("#{gbkfile}.seq.tab", "w") while entry = gbk.gets(Bio::GenBank::DELIMITER) gb = Bio::GenBank.new(entry) ### MAIN BODY ary = [ gb.entry_id, gb.nalen, gb.strand, gb.natype, gb.circular, gb.division, gb.date, gb.definition, gb.accession, gb.versions.inspect, gb.keywords.inspect, gb.segment.inspect, gb.common_name, gb.organism, gb.taxonomy, gb.comment, gb.basecount.inspect, gb.origin, ] ent.puts ary.join("\t") ### FEATURES num = 0 gb.features.each do |f| num += 1 span_min, span_max = f.locations.span if f.qualifiers.empty? ary = [ gb.entry_id, num, f.feature, f.position, span_min, span_max, '', '', ] ft.puts ary.join("\t") else f.each do |q| ary = [ gb.entry_id, num, f.feature, f.position, span_min, span_max, q.qualifier, q.value, ] ft.puts ary.join("\t") end end end ### REFERENCE num = 0 gb.references.each do |r| num += 1 ary = [ gb.entry_id, num, r.authors.inspect, r.title, r.journal, r.medline, r.pubmed, ] ref.puts ary.join("\t") end ### SEQUENCE maxlen = 16 * 10 ** 6 num = 0 0.step(gb.nalen, maxlen) do |i| num += 1 ary = [ gb.entry_id, num, gb.naseq[i, maxlen] ] seq.puts ary.join("\t") end end gbk.close ent.close ft.close ref.close seq.close end $stderr.puts Time.now =begin Example usage in zsh: % gb2tab.rb *.seq % for i in *.seq > do > base=`basename $i .seq` > ruby -pe "gsub(/%HOGE%/,'$base')" gb2tab.sql | mysql > done gb2tab.sql: CREATE DATABASE IF NOT EXISTS genbank; USE genbank; CREATE TABLE IF NOT EXISTS %HOGE% ( id varchar(16) NOT NULL PRIMARY KEY, nalen integer, strand varchar(5), natype varchar(5), circular varchar(10), division varchar(5), date varchar(12), definition varchar(255), accession varchar(30), versions varchar(30), keywords varchar(255), segment varchar(255), source varchar(255), organism varchar(255), taxonomy varchar(255), comment text, basecount varchar(255), origin varchar(255), KEY (nalen), KEY (division), KEY (accession), KEY (organism), KEY (taxonomy) ); LOAD DATA LOCAL INFILE '%HOGE%.seq.ent.tab' INTO TABLE %HOGE%; CREATE TABLE IF NOT EXISTS %HOGE%ft ( id varchar(16) NOT NULL, num integer, feature varchar(30), position text, span_min integer, span_max integer, qualifier varchar(30), value text, KEY (id), KEY (num), KEY (feature), KEY (span_min), KEY (span_max), KEY (qualifier) ); LOAD DATA LOCAL INFILE '%HOGE%.seq.ft.tab' INTO TABLE %HOGE%ft; CREATE TABLE IF NOT EXISTS %HOGE%ref ( id varchar(16) NOT NULL, num integer, authors text, title text, journal text, medline varchar(255), pubmed varchar(255), KEY (id), KEY (medline), KEY (pubmed) ); LOAD DATA LOCAL INFILE '%HOGE%.seq.ref.tab' INTO TABLE %HOGE%ref; CREATE TABLE IF NOT EXISTS %HOGE%seq ( id varchar(16) NOT NULL, num integer, naseq mediumtext, KEY (id) ); LOAD DATA LOCAL INFILE '%HOGE%.seq.seq.tab' INTO TABLE %HOGE%seq; gbmerge.sql sample: CREATE TABLE IF NOT EXISTS ent ( id varchar(16) NOT NULL PRIMARY KEY, nalen integer, strand varchar(5), natype varchar(5), circular varchar(10), division varchar(5), date varchar(12), definition varchar(255), accession varchar(30), versions varchar(30), keywords varchar(255), segment varchar(255), source varchar(255), organism varchar(255), taxonomy varchar(255), comment text, basecount varchar(255), origin varchar(255), KEY (nalen), KEY (division), KEY (accession), KEY (organism), KEY (taxonomy) ) TYPE=MERGE UNION=( gbbct1, gbbct2, ..., # list up all tables by yourself gbvrt ); CREATE TABLE IF NOT EXISTS ft ( id varchar(16) NOT NULL, num integer, feature varchar(30), position text, span_min integer, span_max integer, qualifier varchar(30), value text, KEY (id), KEY (num), KEY (feature), KEY (span_min), KEY (span_max), KEY (qualifier) ) TYPE=MERGE UNION=( gbbct1ft, gbbct2ft, ..., # list up all ft tables by yourself gbvrtft ); CREATE TABLE IF NOT EXISTS ref ( id varchar(16) NOT NULL, num integer, authors text, title text, journal text, medline varchar(255), pubmed varchar(255), KEY (id), KEY (medline), KEY (pubmed) ) TYPE=MERGE UNION=( gbbct1ref, gbbct2ref, ..., # list up all ref tables by yourself gbvrtref ); CREATE TABLE IF NOT EXISTS seq ( id varchar(16) NOT NULL, num integer, naseq mediumtext, KEY (id) ) TYPE=MERGE UNION=( gbbct1seq, gbbct2seq, ..., # list up all seq tables by yourself gbvrtseq ); =end bio-2.0.3/sample/color_scheme_aa.rb0000644000175000017500000000337514141516614016537 0ustar nileshnilesh#!/usr/bin/env ruby # # color_scheme_aa.rb - A Bio::ColorScheme demo script for Amino Acid sequences. # # Usage: # # % ruby color_scheme_aa.rb > cs-seq-faa.html # # % cat seq.faa # >AA_sequence # MKRISTTITTTITITTGNGAG # % ruby color_scheme_aa.rb seq.faa > colored-seq-faa.html # # # Copyright:: Copyright (C) 2005 # Mitsuteru C. Nakao # License:: The Ruby License # require 'bio' # returns folded sequence with
. def br(i, width = 80) return "" if i % width == 0 "" end # returns sequence html doc def display(seq, cs) html = '

' postfix = '' i = 0 seq.each_char do |c| color = cs[c] prefix = %Q() html += prefix + c + postfix html += br(i += 1) end html + '

' end # returns scheme wise html doc def display_scheme(scheme, aaseq) html = '' cs = Bio::ColorScheme.const_get(scheme.intern) [aaseq].each do |seq| html += display(seq, cs) end return ['
', "

#{cs}

", html, '
'] end if fna = ARGV.shift aaseq = Bio::FlatFile.open(fna) { |ff| ff.next_entry.aaseq } else aaseq = Bio::Sequence::AA.new('ARNDCQEGHILKMFPSTWYV' * 20).randomize end title = 'Bio::ColorScheme for amino acid sequences' doc = ['', '
', '', title, '', '
', '', '

', title, '

'] doc << ['
', '

', 'Simple colors', '

'] ['Zappo', 'Taylor' ].each do |scheme| doc << display_scheme(scheme, aaseq) end doc << ['
'] doc << ['
', '

', 'Score colors', '

'] ['Buried', 'Helix', 'Hydropathy', 'Strand', 'Turn'].each do |score| doc << display_scheme(score, aaseq) end doc << ['
'] puts doc + ['',''] bio-2.0.3/sample/demo_fastaformat.rb0000644000175000017500000000467414141516614016752 0ustar nileshnilesh# # = sample/demo_fastaformat.rb - demonstration of the FASTA format parser # # Copyright:: Copyright (C) 2001, 2002 # Naohisa Goto , # Toshiaki Katayama # License:: The Ruby License # # $Id:$ # # == Description # # Demonstration of FASTA format parser. # # == Usage # # Simply run the script. # # $ ruby demo_fastaformat.rb # # == Development information # # The code was moved from lib/bio/db/fasta.rb. # require 'bio' f_str = <sce:YBR160W CDC28, SRM5; cyclin-dependent protein kinase catalytic subunit [EC:2.7.1.-] [SP:CC28_YEAST] MSGELANYKRLEKVGEGTYGVVYKALDLRPGQGQRVVALKKIRLESEDEG VPSTAIREISLLKELKDDNIVRLYDIVHSDAHKLYLVFEFLDLDLKRYME GIPKDQPLGADIVKKFMMQLCKGIAYCHSHRILHRDLKPQNLLINKDGNL KLGDFGLARAFGVPLRAYTHEIVTLWYRAPEVLLGGKQYSTGVDTWSIGC IFAEMCNRKPIFSGDSEIDQIFKIFRVLGTPNEAIWPDIVYLPDFKPSFP QWRRKDLSQVVPSLDPRGIDLLDKLLAYDPINRISARRAAIHPYFQES >sce:YBR274W CHK1; probable serine/threonine-protein kinase [EC:2.7.1.-] [SP:KB9S_YEAST] MSLSQVSPLPHIKDVVLGDTVGQGAFACVKNAHLQMDPSIILAVKFIHVP TCKKMGLSDKDITKEVVLQSKCSKHPNVLRLIDCNVSKEYMWIILEMADG GDLFDKIEPDVGVDSDVAQFYFQQLVSAINYLHVECGVAHRDIKPENILL DKNGNLKLADFGLASQFRRKDGTLRVSMDQRGSPPYMAPEVLYSEEGYYA DRTDIWSIGILLFVLLTGQTPWELPSLENEDFVFFIENDGNLNWGPWSKI EFTHLNLLRKILQPDPNKRVTLKALKLHPWVLRRASFSGDDGLCNDPELL AKKLFSHLKVSLSNENYLKFTQDTNSNNRYISTQPIGNELAELEHDSMHF QTVSNTQRAFTSYDSNTNYNSGTGMTQEAKWTQFISYDIAALQFHSDEND CNELVKRHLQFNPNKLTKFYTLQPMDVLLPILEKALNLSQIRVKPDLFAN FERLCELLGYDNVFPLIINIKTKSNGGYQLCGSISIIKIEEELKSVGFER KTGDPLEWRRLFKKISTICRDIILIPN END f = Bio::FastaFormat.new(f_str) puts "### FastaFormat" puts "# entry" puts f.entry puts "# entry_id" p f.entry_id puts "# definition" p f.definition puts "# data" p f.data puts "# seq" p f.seq puts "# seq.type" p f.seq.type puts "# length" p f.length puts "# aaseq" p f.aaseq puts "# aaseq.type" p f.aaseq.type puts "# aaseq.composition" p f.aaseq.composition puts "# aalen" p f.aalen puts n_str = <CRA3575282.F 24 15 23 29 20 13 20 21 21 23 22 25 13 22 17 15 25 27 32 26 32 29 29 25 END n = Bio::FastaNumericFormat.new(n_str) puts "### FastaNumericFormat" puts "# entry" puts n.entry puts "# entry_id" p n.entry_id puts "# definition" p n.definition puts "# data" p n.data puts "# length" p n.length #puts "# percent to ratio by yield" #n.each do |x| # p x/100.0 #end puts "# first three" p n[0] p n[1] p n[2] puts "# last one" p n[-1] bio-2.0.3/sample/any2fasta.rb0000755000175000017500000000252314141516614015321 0ustar nileshnilesh#!/usr/bin/env ruby # # any2fasta.rb - convert input file into FASTA format using a regex # filter # # Copyright (C) 2006 Pjotr Prins # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: any2fasta.rb,v 1.1 2006/02/17 14:59:27 pjotr Exp $ # require 'bio/io/flatfile' include Bio usage = < reduced.fasta USAGE if ARGV.size == 0 print usage exit 1 end # ---- Valid regular expression - if it is not a file regex = ARGV[0] if regex=~/^\// and !File.exist?(regex) ARGV.shift else regex = nil end ARGV.each do | fn | ff = Bio::FlatFile.auto(fn) ff.each_entry do |entry| if regex != nil next if eval("entry.seq !~ #{regex}") end print entry.seq.to_fasta(entry.definition,70) end end bio-2.0.3/sample/rev_comp.testdata.yaml0000644000175000017500000000030414141516614017402 0ustar nileshnileshseqFile: - class: File location: ../test/data/fasta/example2.txt - class: File location: ../test/data/fasta/example1.txt - class: File location: ../test/data/genbank/SCU49845.gb bio-2.0.3/sample/vs-genes.rb0000755000175000017500000001351414141516614015162 0ustar nileshnilesh#!/usr/bin/env ruby # # vs-genes.rb - homology/motif search wrapper # # FASTA/BLAST/Pfam interface for the multiple query in the FASTA format # # Copyright (C) 2001 KATAYAMA Toshiaki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: vs-genes.rb,v 0.1 2001/06/21 08:26:31 katayama Exp $ # def usage(cpu, ktup, skip, resultdir, verbose) print <<-END Usage: % #{$0} -p PROG -q QUERY -t TARGET [-c #] [-k #] [-s #] [-d DIR] [-v on] options -p PROG : (fasta3|ssearch3|tfasta3|fastx3|tfastx3)[3] or (blastp|blastn|blastx|tblastn|tblastx) or (hmmpfam|hmmpfam_n) -q QUERY : query nucleotide or peptide sequences in the FASTA format -t TARGET : target DB (FASTA or BLAST2 formatdb or Pfam format) optional arguments -c num : number of CPUs (for the SMP machines, default is #{cpu}) -k num : FASTA ktup value (2 for pep, 6 for nuc, default is #{ktup}) -s num : skip query (for the resume session, default is #{skip}) -d DIR : result output directory (default is "#{resultdir}") -v on/off : verbose output of processing if on (default is "#{verbose}") END exit 1 end ### initialize def init arg = {} # default values arg['c'] = 1 # num of CPUs arg['k'] = 2 # ktup value for FASTA arg['s'] = 0 # skip query arg['d'] = "./result" # result directory arg['v'] = 'off' # verbose mode # parse options ARGV.join(' ').scan(/-(\w) (\S+)/).each do |key, val| arg[key] = val end # check program, query, target or print usage unless arg['p'] and arg['q'] and arg['t'] usage(arg['c'], arg['k'], arg['s'], arg['d'], arg['v']) end # create result output directory unless test(?d, "#{arg['d']}") Dir.mkdir("#{arg['d']}", 0755) end # print status if arg['v'] != 'off' puts "PROG : #{arg['p']}" puts " ktup : #{arg['k']}" if arg['p'] =~ /fast/ puts "QUERY : #{arg['q']}" puts " skip : #{arg['s']}" puts "TARGET : #{arg['t']}" puts "RESULT : #{arg['d']}" end return arg end ### generate command line def cmd_line(arg, orf) # program with default command line options # query -> target DB opt = { # FASTA : "-b n" for best n scores, "-d n" for best n alignment 'fasta3' => "fasta3 -Q -H -m 6", # pep -> pep or nuc -> nuc 'ssearch3' => "ssearch3 -Q -H -m 6", # pep -> pep or nuc -> nuc 'tfasta3' => "tfasta3 -Q -H -m 6", # pep -> nuc 'fastx3' => "fastx3 -Q -H -m 6", # nuc -> pep 'tfastx3' => "tfastx3 -Q -H -m 6", # pep -> nuc (with frameshifts) 'fasta33' => "fasta33 -Q -H -m 6", # pep -> pep or nuc -> nuc 'ssearch33' => "ssearch33 -Q -H -m 6", # pep -> pep or nuc -> nuc 'tfasta33' => "tfasta33 -Q -H -m 6", # pep -> nuc 'fastx33' => "fastx33 -Q -H -m 6", # nuc -> pep 'tfastx33' => "tfastx33 -Q -H -m 6", # pep -> nuc (with frameshifts) # BLAST : outputs XML 'blastp' => "blastall -m 7 -p blastp -d", # pep -> pep 'blastn' => "blastall -m 7 -p blastn -d", # nuc -> nuc 'blastx' => "blastall -m 7 -p blastx -d", # nuc -> pep 'tblastn' => "blastall -m 7 -p tblastn -d", # pep -> nuc 'tblastx' => "blastall -m 7 -p tblastx -d", # nuc -> nuc (by trans) # Pfam : "-A n" for best n alignment, "-E n" for E value cutoff etc. 'hmmpfam' => "hmmpfam", # pep -> Pfam DB 'hmmpfam_n' => "hmmpfam -n", # nuc -> Pfam DB } # arguments used in the command line cpu = arg['c'].to_i ktup = arg['k'] target = arg['t'] query = arg['d'] + "/query." + orf result = arg['d'] + "/" + orf prog = opt[arg['p']] if cpu > 1 # use multiple CPUs case arg['p'] when /(fast|ssearch)/ prog += " -T #{cpu}" prog.sub!(' ', '_t ') # rename program with "_t" when /pfam/ prog += " --cpu #{cpu}" end end # generate complete command line to execute case arg['p'] when /fast/ command = "#{prog} #{query} #{target} #{ktup} > #{result}" when /ssearch/ command = "#{prog} #{query} #{target} > #{result}" when /blast/ command = "#{prog} #{target} -i #{query} > #{result}" when /pfam/ command = "#{prog} #{target} #{query} > #{result}" end return command end ### main begin arg = init count = 0 open(arg['q'], "r") do |f| while seq = f.gets("\n>") count += 1 # skip (-s option) next unless count > arg['s'].to_i # clean up seq.sub!(/^>?[ \t]*/, '') # delete '>' and SPACEs or TABs at the head seq.sub!(/>$/, '') # delete '>' at the tail (separator) # get ORF name if seq[/^$/] # no definition (e.g. ">\nSEQ>" or ">\n>") next # -> useless for the multiple query else orf = seq[/^\S+/] # the first word in the definition line end # KEGG uses ">DB:ENTRY" format in the definition line if orf =~ /:/ db,orf = orf.split(/:/) end # add time if the same ORF name was already used if test(?f, "#{arg['d']}/#{orf}") orf = "#{orf}.#{Time.now.to_f.to_s}" end # create temporal file of the query open("#{arg['d']}/query.#{orf}", "w+") do |tmp| tmp.print(">#{seq}") end command = cmd_line(arg, orf) # print status if arg['v'] != 'off' puts "#{count} : #{orf} ..." puts " #{command}" end # execute system("#{command}") # remove temporal file File.delete("#{arg['d']}/query.#{orf}") end end end bio-2.0.3/sample/demo_fasta_remote.rb0000644000175000017500000000221614141516614017102 0ustar nileshnilesh# # = sample/demo_fasta_remote.rb - demonstration of FASTA execution using GenomeNet web service # # Copyright:: Copyright (C) 2001, 2002 Toshiaki Katayama # License:: The Ruby License # # == Description # # Demonstration of Bio::Fasta.remote, wrapper class for FASTA execution using # GenomeNet fasta.genome.jp web service. # # == Requirements # # * Internet connection # # == Usage # # Specify a files containing a nucleic acid sequence. # The file format should be the fasta format. # # $ ruby demo_fasta_remote.rb file.fst # # Example usage using test data: # # $ ruby -Ilib sample/demo_fasta_remote.rb test/data/blast/b0002.faa # # Note that it may take very long time. Please wait for 3 to 5 minutes. # # == Development information # # The code was moved from lib/bio/appl/fasta.rb. # require 'bio' #if __FILE__ == $0 begin require 'pp' alias p pp rescue end # serv = Bio::Fasta.local('fasta34', 'hoge.nuc') # serv = Bio::Fasta.local('fasta34', 'hoge.pep') # serv = Bio::Fasta.local('ssearch34', 'hoge.pep') # This may take 3 minutes or so. serv = Bio::Fasta.remote('fasta', 'genes') p serv.query(ARGF.read) #end bio-2.0.3/sample/demo_prosite.rb0000644000175000017500000000416614141516614016124 0ustar nileshnilesh# # = sample/demo_prosite.rb - demonstration of Bio::PROSITE # # Copyright:: Copyright (C) 2001 Toshiaki Katayama # License:: The Ruby License # # # == Description # # Demonstration of Bio::PROSITE, parser class for PROSITE database entry. # # == Usage # # Specify files containing PROSITE data. # # $ ruby demo_prosite.rb files... # # Example usage using test data: # # $ ruby -Ilib sample/demo_prosite.rb test/data/prosite/prosite.dat # # == Development information # # The code was moved from lib/bio/db/prosite.rb. # require 'bio' begin require 'pp' alias p pp rescue LoadError end Bio::FlatFile.foreach(Bio::PROSITE, ARGF) do |ps| puts "### ps = Bio::PROSITE.new(str)" list = %w( name division ac entry_id dt date de definition pa pattern ma profile ru rule nr statistics release swissprot_release_number swissprot_release_sequences total total_hits total_sequences positive positive_hits positive_sequences unknown unknown_hits unknown_sequences false_pos false_positive_hits false_positive_sequences false_neg false_negative_hits partial cc comment max_repeat site skip_flag dr sp_xref pdb_xref pdoc_xref ) list.each do |method| puts ">>> #{method}" p ps.__send__(method) end puts ">>> taxon_range" p ps.taxon_range puts ">>> taxon_range(expand)" p ps.taxon_range(true) puts ">>> list_truepositive" p ps.list_truepositive puts ">>> list_truepositive(by_name)" p ps.list_truepositive(true) puts ">>> list_falsenegative" p ps.list_falsenegative puts ">>> list_falsenegative(by_name)" p ps.list_falsenegative(true) puts ">>> list_falsepositive" p ps.list_falsepositive puts ">>> list_falsepositive(by_name)" p ps.list_falsepositive(true) puts ">>> list_potentialhit" p ps.list_potentialhit puts ">>> list_potentialhit(by_name)" p ps.list_potentialhit(true) puts ">>> list_unknown" p ps.list_unknown puts ">>> list_unknown(by_name)" p ps.list_unknown(true) puts "=" * 78 end bio-2.0.3/sample/demo_gff1.rb0000644000175000017500000000227014141516614015254 0ustar nileshnilesh# # = sample/demo_gff1.rb - very simple demonstration of Bio::GFF # # Copyright:: Copyright (C) 2003, 2005 # Toshiaki Katayama # 2006 Jan Aerts # 2008 Naohisa Goto # License:: The Ruby License # # # == Description # # Very simple demonstration of Bio::GFF, parser classes for GFF formatted # text. # # == Usage # # Simply run this script. # # $ ruby demo_gff1.rb # # == To do # # Bio::GFF and related classes have many functions, and we should write # more example and/or demonstration codes. # # == Development information # # The code was moved from lib/bio/db/gff.rb. # require 'bio' #if __FILE__ == $0 begin require 'pp' alias p pp rescue LoadError end this_gff = "SEQ1\tEMBL\tatg\t103\t105\t.\t+\t0\n" this_gff << "SEQ1\tEMBL\texon\t103\t172\t.\t+\t0\n" this_gff << "SEQ1\tEMBL\tsplice5\t172\t173\t.\t+\t.\n" this_gff << "SEQ1\tnetgene\tsplice5\t172\t173\t0.94\t+\t.\n" this_gff << "SEQ1\tgenie\tsp5-20\t163\t182\t2.3\t+\t.\n" this_gff << "SEQ1\tgenie\tsp5-10\t168\t177\t2.1\t+\t.\n" this_gff << "SEQ1\tgrail\tATG\t17\t19\t2.1\t-\t0\n" p Bio::GFF.new(this_gff) #end bio-2.0.3/sample/genome2tab.rb0000755000175000017500000000323614141516614015456 0ustar nileshnilesh#!/usr/bin/env ruby # # genome2tab.rb - convert KEGG/GENOME into tab delimited data for MySQL # # Usage: # # % genome2tab.rb /bio/db/kegg/genome/genome > genome.tab # # Copyright (C) 2001 KATAYAMA Toshiaki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: genome2tab.rb,v 0.5 2002/06/23 20:21:56 k Exp $ # require 'bio/db/kegg/genome' include Bio while entry = gets(KEGG::GENOME::DELIMITER) genome = KEGG::GENOME.new(entry) ref = genome.references.inspect chr = genome.chromosomes.inspect ary = [ genome.entry_id, genome.name, genome.definition, genome.taxid, genome.taxonomy, genome.comment, ref, chr, genome.nalen, genome.num_gene, genome.num_rna, genome.gc, genome.genomemap, ] puts ary.join("\t") end =begin CREATE DATABASE IF NOT EXISTS db_name; CREATE TABLE IF NOT EXISTS db_name.genome ( id varchar(30) not NULL, name varchar(80), definition varchar(255), taxid varchar(30), taxonomy varchar(255), comment varchar(255), reference text, chromosome text, nalen integer, num_gene integer, num_rna integer, gc float, genomemap varchar(30), ); LOAD DATA LOCAL INFILE 'genome.tab' INTO TABLE db_name.genome; =end bio-2.0.3/sample/fasta2tab.rb0000755000175000017500000000452514141516614015304 0ustar nileshnilesh#!/usr/bin/env ruby # # fasta2tab.rb - convert FASTA (-m 6) output into tab delimited data for MySQL # # Usage: # # % fasta2tab.rb FASTA-output-file[s] > fasta_results.tab # % mysql < fasta_results.sql (use sample at the end of this file) # # Format accepted: # # % fasta3[3][_t] -Q -H -m 6 query.f target.f ktup > FASTA-output-file # # Copyright (C) 2001 KATAYAMA Toshiaki # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: fasta2tab.rb,v 0.1 2001/06/21 08:21:58 katayama Exp $ # while gets # query if /^\S+: (\d+) aa$/ q_len = $1 end # each hit if /^>>([^>]\S+).*\((\d+) aa\)$/ target = $1 t_len = $2 # d = dummy variable d, d, initn, d, init1, d, opt, d, zscore, d, bits, d, evalue = gets.split(/\s+/) d, d, sw, ident, d, ugident, d, d, overlap, d, d, lap = gets.split(/\s+/) # query-hit pair print "#{$FILENAME}\t#{q_len}\t#{target}\t#{t_len}" # pick up values ary = [ initn, init1, opt, zscore, bits, evalue, sw, ident, ugident, overlap, lap ] # print values for i in ary i.tr!('^0-9.:e\-','') print "\t#{i}" end print "\n" end end =begin MySQL fasta_results.sql sample CREATE DATABASE IF NOT EXISTS db_name; CREATE TABLE IF NOT EXISTS db_name.table_name ( query varchar(25) not NULL, q_len integer unsigned default 0, target varchar(25) not NULL, t_len integer unsigned default 0, initn integer unsigned default 0, init1 integer unsigned default 0, opt integer unsigned default 0, zscore float default 0.0, bits float default 0.0, evalue float default 0.0, sw integer unsigned default 0, ident float default 0.0, ugident float default 0.0, overlap integer unsigned default 0, lap_at varchar(25) default NULL ); LOAD DATA LOCAL INFILE 'fasta_results.tab' INTO TABLE db_name.table_name; =end bio-2.0.3/sample/demo_psort_report.rb0000644000175000017500000000266014141516614017176 0ustar nileshnilesh# # = sample/demo_psort_report.rb - demonstration of Bio::PSORT::PSORT2::Report # # Copyright:: Copyright (C) 2003 # Mitsuteru C. Nakao # License:: The Ruby License # # # == IMPORTANT NOTE # # The sample may not work because it has not been tested for a long time. # # == Description # # Demonstration of Bio::PSORT::PSORT2::Report, parser class for the PSORT # systems output. # # == Usage # # Specify a file containing PSORT2 output. # # $ ruby demo_psort_report.rb # # == Development information # # The code was moved from lib/bio/appl/psort/report.rb. # require 'bio' # testing code #if __FILE__ == $0 while entry = $<.gets(Bio::PSORT::PSORT2::Report::DELIMITER) puts "\n ==> a = Bio::PSORT::PSORT2::Report.parser(entry)" a = Bio::PSORT::PSORT2::Report.parser(entry) puts "\n ==> a.entry_id " p a.entry_id puts "\n ==> a.scl " p a.scl puts "\n ==> a.pred " p a.pred puts "\n ==> a.prob " p a.prob p a.prob.keys.sort.map {|k| k.rjust(4)}.inspect.gsub('"','') p a.prob.keys.sort.map {|k| a.prob[k].to_s.rjust(4) }.inspect.gsub('"','') puts "\n ==> a.k " p a.k puts "\n ==> a.definition" p a.definition puts "\n ==> a.seq" p a.seq puts "\n ==> a.features.keys.sort " p a.features.keys.sort a.features.keys.sort.each do |key| puts "\n ==> a.features['#{key}'] " puts a.features[key] end end #end bio-2.0.3/sample/rev_comp.cwl0000644000175000017500000000060114141516614015415 0ustar nileshnilesh#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool baseCommand: [ruby] inputs: - id: script type: File default: class: File location: rev_comp.rb inputBinding: position: -1 - id: seqFile type: File[] inputBinding: position: 1 outputs: - id: out type: stdout stdout: $(inputs.script.nameroot)-$(inputs.seqFile[0].nameroot).fst bio-2.0.3/sample/demo_psort.rb0000644000175000017500000000657114141516614015610 0ustar nileshnilesh# # = sample/demo_psort.rb - demonstration of Bio::PSORT, client for PSORT WWW server # # Copyright:: Copyright (C) 2003-2006 # Mitsuteru C. Nakao # License:: The Ruby License # # # == Description # # Demonstration of Bio::PSORT, client for PSORT (protein sorting site # prediction systems) WWW server. # # == Requirements # # Internet connection is needed. # # == Usage # # Simply run this script. # # $ ruby demo_psort.rb # # == Development information # # The code was moved from lib/bio/appl/psort.rb. # require 'bio' #if __FILE__ == $0 #begin # require 'psort/report.rb' #rescue LoadError #end seq = ">hoge mit MALEPIDYTT RDEDDLDENE LLMKISNAAG SSRVNDNNDD LTFVENDKII ARYSIQTSSK QQGKASTPPV EEAEEAAPQL PSRSSAAPPP PPRRATPEKK DVKDLKSKFE GLAASEKEEE EMENKFAPPP KKSEPTIISP KPFSKPQEPV FKGYHVQVTA HSREIDAEYL KIVRGSDPDT TWLIISPNAK KEYEPESTGS KKSFTPSKSP APVSKKEPVK TPSPAPAAKI PKENPWATAE YDYDAAEDNE NIEFVDDDWW LGELEKDGSK GLFPSNYVSL LPSRNVASGA PVQKEEPEQE SFHDFLQLFD ETKVQYGLAR RKAKQNSGNA ETKAEAPKPE VPEDEPEGEP DDWNEPELKE RDFDQAPLKP NQSSYKPIGK IDLQKVIAEE KAKEDPRLVQ DYKKIGNPLP GMHIEADNEE EPEENDDDWD DDEDEAAQPP ANFAAVANNL KPTAAGSKID DDKVIKGFRN EKSPAQLWAE VSPPGSDVEK IIIIGWCPDS APLKTRASFA PSSDIANLKN ESKLKRDSEF NSFLGTTKPP SMTESSLKND KAEEAEQPKT EIAPSLPSRN SIPAPKQEEA PEQAPEEEIE GN " Seq1 = ">hgoe LTFVENDKII NI " puts "\n Bio::PSORT::PSORT" puts "\n ==> p serv = Bio::PSORT::PSORT.imsut" p serv = Bio::PSORT::PSORT1.imsut puts "\n ==> p serv.class " p serv.class puts "\n ==> p serv.title = 'Query_title_splited_by_white space'" p serv.title = 'Query_title_splited_by_white space' puts "\n ==> p serv.exec(seq, false) " p serv.exec(seq, false) puts "\n ==> p serv.exec(seq) " p serv.exec(seq) puts "\n ==> p report = serv.exec(Bio::FastaFormat.new(seq)) " p report = serv.exec(Bio::FastaFormat.new(seq)) puts "\n ==> p report.class" p report.class puts "\n ==> p report_raw = serv.exec(Bio::FastaFormat.new(seq), false) " p report_raw = serv.exec(Bio::FastaFormat.new(seq), false) puts "\n ==> p report_raw.class" p report_raw.class puts "\n ==> p report.methods" p report.methods methods = ['entry_id', 'origin', 'title', 'sequence','result_info', 'reasoning', 'final_result', 'raw'] methods.each do |method| puts "\n ==> p report.#{method}" p eval("report.#{method}") end puts "\n Bio::PSORT::PSORT2" puts "\n ==> p serv = Bio::PSORT::PSORT2.imsut" p serv = Bio::PSORT::PSORT2.imsut puts "\n ==> p serv.class " p serv.class puts "\n ==> p seq " p seq puts "\n ==> p serv.title = 'Query_title_splited_by_white space'" p serv.title = 'Query_title_splited_by_white space' puts "\n ==> p serv.exec(seq) # parsed report" p serv.exec(seq) puts "\n ==> p report = serv.exec(Bio::FastaFormat.new(seq)) # parsed report" p report = serv.exec(Bio::FastaFormat.new(seq)) puts "\n ==> p serv.exec(seq, false) # report in plain text" p serv.exec(seq, false) puts "\n ==> p report_raw = serv.exec(Bio::FastaFormat.new(seq), false) # report in plain text" p report_raw = serv.exec(Bio::FastaFormat.new(seq), false) puts "\n ==> p report.methods" p report.methods methods = ['entry_id', 'scl', 'definition', 'seq', 'features', 'prob', 'pred', 'k', 'raw'] methods.each do |method| puts "\n ==> p report.#{method}" p eval("report.#{method}") end #end bio-2.0.3/LEGAL0000644000175000017500000001031014141516614012360 0ustar nileshnileshLEGAL NOTICE INFORMATION ------------------------ All the files in this distribution are covered under either the Ruby's license (see the file COPYING) or public-domain except some files mentioned below. sample/any2fasta.rb: Copyright (C) 2006 Pjotr Prins This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. sample/biofetch.rb: Copyright (C) 2002-2004 KATAYAMA Toshiaki This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. sample/enzymes.rb: Copyright (C) 2006 Pjotr Prins and Trevor Wennblom This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. sample/fasta2tab.rb: sample/fsplit.rb: sample/gb2tab.rb: sample/genes2nuc.rb: sample/genes2pep.rb: sample/genes2tab.rb: sample/genome2tab.rb: sample/gt2fasta.rb: sample/ssearch2tab.rb: sample/tfastx2tab.rb: sample/vs-genes.rb: Copyright (C) 2001 KATAYAMA Toshiaki This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. sample/fastagrep.rb: sample/fastasort.rb: Copyright (C) 2008 KATAYAMA Toshiaki & Pjotr Prins This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. sample/gb2fasta.rb: Copyright (C) 2001 KATAYAMA Toshiaki Copyright (C) 2002 Yoshinori K. Okuji This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. sample/gbtab2mysql.rb: sample/genome2rb.rb: sample/pmfetch.rb: sample/pmsearch.rb: Copyright (C) 2002 KATAYAMA Toshiaki This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. sample/tdiary.rb: Copyright (C) 2003 KATAYAMA Toshiaki Mitsuteru C. Nakao Itoshi NIKAIDO Takeya KASUKAWA This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. test/data/uniprot/p53_human.uniprot: This Swiss-Prot entry is copyright. It is produced through a collaboration between the Swiss Institute of Bioinformatics and the EMBL outstation - the European Bioinformatics Institute. There are no restrictions on its use as long as its content is in no way modified and this statement is not removed. GPL: Copyright (C) 1989, 1991 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. LGPL: Copyright (C) 1991, 1999 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. bio-2.0.3/KNOWN_ISSUES.rdoc0000644000175000017500000001435214141516614014463 0ustar nileshnilesh= KNOWN_ISSUES.rdoc - Known issues and bugs in BioRuby Copyright:: Copyright (C) 2009-2020 Naohisa Goto License:: The Ruby License = Known issues and bugs in BioRuby Below are known issues and bugs in BioRuby. Patches to fix them are welcome. We hope they will be fixed in the future. Items marked with (WONT_FIX) tags would not be fixed within BioRuby because they are not BioRuby's issues and/or it is very difficult to fix them. == 1. Ruby version specific issues ==== String encodings Currently, BioRuby do not care string encodings. In some cases, Encoding::CompatibilityError or "ArgumentError: invalid byte sequence in (encoding name)" may be raised. === End-of-life Ruby versions ==== Ruby 1.9.0 (WONT_FIX) Ruby 1.9.0 is NOT supported because it isn't a stable release. ==== Ruby 1.9.1 or earlier (including Ruby 1.8.7) (WONT_FIX) Problems observed only with Ruby 1.9.1 or earlier will not be fixed. Note that Ruby 1.9.1 or earlier is no longer supported, as described in README.rdoc. ==== Ruby 1.8.2 or earlier (WONT_FIX) In some cases, temporary files and directories may not be removed because of the lack of FileUtils.remove_entry_secure. === Problem with REXML DoS vulnerability patch before 09-Nov-2008 (WONT_FIX) If you have applied a patch taken from http://www.ruby-lang.org/en/news/2008/08/23/dos-vulnerability-in-rexml/ before 09 Nov 2008 12:40 +0900, because of the bug in the patch, parsing of Blast XML results with REXML parser may fail. The bug is already fixed and new patch is available on the above URL. Note that some Linux distributions would have incorporated the patch in their manners, and may have the same problem. === RubyGems 0.8.11 or earlier (WONT_FIX) With very old version of RubyGems, use 'require_gem' which was deprecated in RubyGems 0.9.0 and removed in RubyGems 1.0.1. #!/usr/bin/env ruby require 'rubygems' require_gem 'bio' === JRuby On JRuby, errors may be raised due to the following unfixed bugs in JRuby. * {JRUBY-6195}[http://jira.codehaus.org/browse/JRUBY-6195] Process.spawn (and related methods) ignore option hash * {JRUBY-6818}[http://jira.codehaus.org/browse/JRUBY-6818] Kernel.exec, Process.spawn (and IO.popen etc.) raise error when program is an array containing two strings (WONT_FIX) With older version of JRuby, you may be bothered by the following bugs that have already been fixed in the head of JRuby. * {JRUBY-6658}[http://jira.codehaus.org/browse/JRUBY-6658] Problem when setting up an autoload entry, defining a class via require, then redefining the autoload entry * {JRUBY-6666}[http://jira.codehaus.org/browse/JRUBY-6666] Open3.popen3 failing due to missing handling for [path, argv[0]] array * {JRUBY-6819}[http://jira.codehaus.org/browse/JRUBY-6819] java.lang.ArrayIndexOutOfBoundsException in String#each_line (WONT_FIX) Due to JRUBY-5678 (resolved issue) and the difference of behavior between CRuby and JRuby written in the comments of the issue tracking page, when running BioRuby on JRuby with sudo or root rights, TMPDIR environment variable should be set to a directory that is not world-writable. Currently, the workaround is needed for running BioRuby tests with JRuby on Travis-CI. * {JRUBY-5678}[http://jira.codehaus.org/browse/JRUBY-5678] tmpdir cannot be delete when jruby has sudo/root rights === Rubinius According to Travis-CI, unit tests have failed on 1.9 mode of Rubinius. (WONT_FIX) With older version of Rubinius, you may be bothered by the following bugs that have already been fixed in the head of Rubinius. * {Rubinius Issue #1693}[https://github.com/rubinius/rubinius/issues/1693] String#split gives incorrect output when splitting by /^/ * {Rubinius Issue #1724}[https://github.com/rubinius/rubinius/issues/1724] Creating Struct class with length attribute == 2. OS and/or architecture-dependent issues === Microsoft Windows ==== Text mode issues Following 4 tests failed on mswin32 (and maybe on mingw32 and bccwin32) because of the conversion of line feed codes in the text mode. * test_ended_pos and test_start_pos in test/unit/bio/io/test_flatfile.rb * test_pos in test/unit/bio/io/flatfile/test_buffer.rb * test_entry_pos in test/unit/bio/appl/blast/test_rpsblast.rb This indicates that br_bioflat.rb and Bio::FlatFileIndex may create incorrect indexes on mswin32, mingw32, and bccwin32. In addition, Bio::FlatFile may return incorrect data. ==== String escaping of command-line arguments After BioRuby 1.4.1, in Ruby 1.9.X running on Windows, escaping of command-line arguments are processed by the Ruby interpreter. Before BioRuby 1.4.0, the escaping is executed in Bio::Command#escape_shell_windows, and the behavior is different from the Ruby interpreter's one. Curreltly, due to the change, test/functional/bio/test_command.rb may fail on Windows with Ruby 1.9.X. ==== Windows 95/98/98SE/ME (WONT_FIX) Some methods that call external programs may not work in Windows 95/98/98SE/ME because of the limitation of COMMAND.COM. === OpenVMS, BeOS, OS/2, djgpp, Windows CE (WONT_FIX) BioRuby may not work on these platforms. == 3. Known issues and bugs in BioRuby === Bio::UniProtKB Bio::UniProtKB should be updated to follow UniProtKB format changes described in http://www.uniprot.org/docs/sp_news.htm . === Bio::PDB Bio::PDB should be updated to follow PDB format version 3.3. === Bio::Blast::Report NCBI announces that that they are makeing a new version of BLAST XML data format. BioRuby should support it. === Bio::Blast::Default::Report Bio::Blast::Default::Report currently supports legacy BLAST only. It may be better to support BLAST+ text output format, although NCBI do not recommend to do so because the format is unstable. == 4. Compatibility issues with other libraries/extensions === Ruby on Rails BioRuby Shell on Web uses Ruby on Rails, but the author of the document does not know which version is suitable. == 5. Historical descriptions === CVS For historical purposes: the anonymous CVS was provided at * http://cvs.bioruby.org/ and could be obtained by the following procedure. % cvs -d :pserver:cvs@code.open-bio.org:/home/repository/bioruby login CVS password: cvs (login with a password 'cvs' for the first time) % cvs -d :pserver:cvs@code.open-bio.org:/home/repository/bioruby co bioruby These may be closed without any prior notice. bio-2.0.3/etc/0000755000175000017500000000000014141516614012371 5ustar nileshnileshbio-2.0.3/etc/bioinformatics/0000755000175000017500000000000014141516614015401 5ustar nileshnileshbio-2.0.3/etc/bioinformatics/seqdatabase.ini0000644000175000017500000000075414141516614020365 0ustar nileshnileshVERSION=1.00 [embl] protocol=biofetch location=http://www.ebi.ac.uk/Tools/dbfetch/dbfetch dbname=embl [emblcds] protocol=biofetch location=http://www.ebi.ac.uk/Tools/dbfetch/dbfetch dbname=emblcds [uniprotkb] protocol=biofetch location=http://www.ebi.ac.uk/Tools/dbfetch/dbfetch dbname=uniprotkb [refseqn] protocol=biofetch location=http://www.ebi.ac.uk/Tools/dbfetch/dbfetch dbname=refseqn [refseqp] protocol=biofetch location=http://www.ebi.ac.uk/Tools/dbfetch/dbfetch dbname=refseqp bio-2.0.3/appveyor.yml0000644000175000017500000000120714141516614014206 0ustar nileshnilesh--- version: "{build}" branches: only: - master clone_depth: 10 install: - SET PATH=C:\Ruby%ruby_version%\bin;%PATH% - SET BUNDLE_GEMFILE=gemfiles/Gemfile.windows - bundle install - bundle exec rake regemspec - bundle exec rake gem - bundle exec gem install pkg/bio-*.gem - echo gem "bio" >> gemfiles\Gemfile.windows build: off before_test: - ruby --version - gem --version - bundle --version test_script: - bundle exec rake gem-test environment: matrix: - ruby_version: "23" - ruby_version: "23-x64" - ruby_version: "24" - ruby_version: "24-x64" - ruby_version: "25" - ruby_version: "25-x64" bio-2.0.3/lib/0000755000175000017500000000000014141516614012364 5ustar nileshnileshbio-2.0.3/lib/bio.rb0000644000175000017500000002042314141516614013463 0ustar nileshnilesh# # = bio.rb - Loading all BioRuby modules # # Copyright:: Copyright (C) 2001-2007 # Toshiaki Katayama # License:: The Ruby License # # $Id:$ # module Bio autoload :BIORUBY_VERSION, 'bio/version' autoload :BIORUBY_EXTRA_VERSION, 'bio/version' autoload :BIORUBY_VERSION_ID, 'bio/version' ### Basic data types ## Sequence autoload :Sequence, 'bio/sequence' ## below are described in bio/sequence.rb #class Sequence # autoload :Common, 'bio/sequence/common' # autoload :NA, 'bio/sequence/na' # autoload :AA, 'bio/sequence/aa' # autoload :Generic, 'bio/sequence/generic' # autoload :Format, 'bio/sequence/format' # autoload :Adapter, 'bio/sequence/adapter' #end ## Locations/Location autoload :Location, 'bio/location' autoload :Locations, 'bio/location' ## Features/Feature autoload :Feature, 'bio/feature' autoload :Features, 'bio/compat/features' ## References/Reference autoload :Reference, 'bio/reference' autoload :References, 'bio/compat/references' ## Pathway/Relation autoload :Pathway, 'bio/pathway' autoload :Relation, 'bio/pathway' ## Alignment autoload :Alignment, 'bio/alignment' ## Tree autoload :Tree, 'bio/tree' ## Map autoload :Map, 'bio/map' ### Constants autoload :NucleicAcid, 'bio/data/na' autoload :AminoAcid, 'bio/data/aa' autoload :CodonTable, 'bio/data/codontable' ### DB parsers autoload :DB, 'bio/db' autoload :NCBIDB, 'bio/db' autoload :KEGGDB, 'bio/db' autoload :EMBLDB, 'bio/db' ## GenBank, GenPept autoload :GenBank, 'bio/db/genbank/genbank' autoload :GenPept, 'bio/db/genbank/genpept' ## (deprecated) Bio::RefSeq, Bio::DDBJ autoload :RefSeq, 'bio/db/genbank/refseq' autoload :DDBJ, 'bio/db/genbank/ddbj' ## EMBL, UniProtKB autoload :EMBL, 'bio/db/embl/embl' autoload :UniProtKB, 'bio/db/embl/uniprotkb' ## aliases of Bio::UniProtKB autoload :SPTR, 'bio/db/embl/sptr' autoload :UniProt, 'bio/db/embl/uniprot' ## (deprecated) Bio::TrEMBL, Bio::SwissProt autoload :TrEMBL, 'bio/db/embl/trembl' autoload :SwissProt, 'bio/db/embl/swissprot' ## KEGG class KEGG autoload :GENOME, 'bio/db/kegg/genome' autoload :GENES, 'bio/db/kegg/genes' autoload :ENZYME, 'bio/db/kegg/enzyme' autoload :COMPOUND, 'bio/db/kegg/compound' autoload :DRUG, 'bio/db/kegg/drug' autoload :GLYCAN, 'bio/db/kegg/glycan' autoload :REACTION, 'bio/db/kegg/reaction' autoload :BRITE, 'bio/db/kegg/brite' autoload :CELL, 'bio/db/kegg/cell' autoload :EXPRESSION, 'bio/db/kegg/expression' autoload :ORTHOLOGY, 'bio/db/kegg/orthology' autoload :KGML, 'bio/db/kegg/kgml' autoload :PATHWAY, 'bio/db/kegg/pathway' autoload :MODULE, 'bio/db/kegg/module' end ## other formats autoload :FastaFormat, 'bio/db/fasta' autoload :FastaNumericFormat, 'bio/db/fasta/qual' # change to FastaFormat::Numeric ? autoload :FastaDefline, 'bio/db/fasta/defline' # change to FastaFormat::Defline ? autoload :Fastq, 'bio/db/fastq' autoload :GFF, 'bio/db/gff' autoload :AAindex, 'bio/db/aaindex' autoload :AAindex1, 'bio/db/aaindex' # change to AAindex::AAindex1 ? autoload :AAindex2, 'bio/db/aaindex' # change to AAindex::AAindex2 ? autoload :TRANSFAC, 'bio/db/transfac' autoload :PROSITE, 'bio/db/prosite' autoload :LITDB, 'bio/db/litdb' autoload :MEDLINE, 'bio/db/medline' autoload :FANTOM, 'bio/db/fantom' autoload :GO, 'bio/db/go' autoload :PDB, 'bio/db/pdb' autoload :NBRF, 'bio/db/nbrf' autoload :REBASE, 'bio/db/rebase' autoload :SOFT, 'bio/db/soft' autoload :Lasergene, 'bio/db/lasergene' autoload :SangerChromatogram, 'bio/db/sanger_chromatogram/chromatogram' autoload :Scf, 'bio/db/sanger_chromatogram/scf' autoload :Abif, 'bio/db/sanger_chromatogram/abif' autoload :Newick, 'bio/db/newick' autoload :Nexus, 'bio/db/nexus' ### IO interface modules autoload :Registry, 'bio/io/registry' autoload :Fetch, 'bio/io/fetch' autoload :FlatFile, 'bio/io/flatfile' autoload :FlatFileIndex, 'bio/io/flatfile/index' # chage to FlatFile::Index ? ## below are described in bio/io/flatfile/index.rb #class FlatFileIndex # autoload :Indexer, 'bio/io/flatfile/indexer' # autoload :BDBdefault, 'bio/io/flatfile/bdb' # autoload :BDBwrapper, 'bio/io/flatfile/bdb' # autoload :BDB_1, 'bio/io/flatfile/bdb' #end autoload :PubMed, 'bio/io/pubmed' autoload :DAS, 'bio/io/das' autoload :Hinv, 'bio/io/hinv' ## below are described in bio/appl/blast.rb #class Blast # autoload :Fastacmd, 'bio/io/fastacmd' #end autoload :NCBI, 'bio/io/ncbirest' ## below are described in bio/io/ncbirest.rb #class NCBI # autoload :REST, 'bio/io/ncbirest' #end autoload :TogoWS, 'bio/io/togows' ### Applications autoload :Fasta, 'bio/appl/fasta' ## below are described in bio/appl/fasta.rb #class Fasta # autoload :Report, 'bio/appl/fasta/format10' #end autoload :Blast, 'bio/appl/blast' ## below are described in bio/appl/blast.rb #class Blast # autoload :Fastacmd, 'bio/io/fastacmd' # autoload :Report, 'bio/appl/blast/report' # autoload :Default, 'bio/appl/blast/format0' # autoload :WU, 'bio/appl/blast/wublast' # autoload :Bl2seq, 'bio/appl/bl2seq/report' # autoload :RPSBlast, 'bio/appl/blast/rpsblast' # autoload :NCBIOptions, 'bio/appl/blast/ncbioptions' # autoload :Remote, 'bio/appl/blast/remote' #end autoload :HMMER, 'bio/appl/hmmer' ## below are described in bio/appl/hmmer.rb #class HMMER # autoload :Report, 'bio/appl/hmmer/report' #end autoload :EMBOSS, 'bio/appl/emboss' # use bio/command, improve autoload :PSORT, 'bio/appl/psort' ## below are described in bio/appl/psort.rb #class PSORT # class PSORT1 # autoload :Report, 'bio/appl/psort/report' # end # class PSORT2 # autoload :Report, 'bio/appl/psort/report' # end #end autoload :TMHMM, 'bio/appl/tmhmm/report' autoload :TargetP, 'bio/appl/targetp/report' autoload :SOSUI, 'bio/appl/sosui/report' autoload :Genscan, 'bio/appl/genscan/report' autoload :ClustalW, 'bio/appl/clustalw' ## below are described in bio/appl/clustalw.rb #class ClustalW # autoload :Report, 'bio/appl/clustalw/report' #end autoload :MAFFT, 'bio/appl/mafft' ## below are described in bio/appl/mafft.rb #class MAFFT # autoload :Report, 'bio/appl/mafft/report' #end autoload :Tcoffee, 'bio/appl/tcoffee' autoload :Muscle, 'bio/appl/muscle' autoload :Probcons, 'bio/appl/probcons' autoload :Sim4, 'bio/appl/sim4' ## below are described in bio/appl/sim4.rb #class Sim4 # autoload :Report, 'bio/appl/sim4/report' #end autoload :Spidey, 'bio/appl/spidey/report' autoload :Blat, 'bio/appl/blat/report' module GCG autoload :Msf, 'bio/appl/gcg/msf' autoload :Seq, 'bio/appl/gcg/seq' end module Phylip autoload :PhylipFormat, 'bio/appl/phylip/alignment' autoload :DistanceMatrix, 'bio/appl/phylip/distance_matrix' end autoload :Iprscan, 'bio/appl/iprscan/report' autoload :PAML, 'bio/appl/paml/common' ## below are described in bio/appl/paml/common.rb # module PAML # autoload :Codeml, 'bio/appl/paml/codeml' # autoload :Baseml, 'bio/appl/paml/baseml' # autoload :Yn00, 'bio/appl/paml/yn00' # end ### Utilities autoload :SiRNA, 'bio/util/sirna' autoload :ColorScheme, 'bio/util/color_scheme' autoload :ContingencyTable, 'bio/util/contingency_table' autoload :RestrictionEnzyme, 'bio/util/restriction_enzyme' ### Service libraries autoload :Command, 'bio/command' end bio-2.0.3/lib/bio/0000755000175000017500000000000014141516614013135 5ustar nileshnileshbio-2.0.3/lib/bio/alignment.rb0000644000175000017500000022253114141516614015445 0ustar nileshnilesh# # = bio/alignment.rb - multiple alignment of sequences # # Copyright:: Copyright (C) 2003, 2005, 2006 # GOTO Naohisa # # License:: The Ruby License # # $Id: alignment.rb,v 1.24 2007/12/26 14:08:02 ngoto Exp $ # # = About Bio::Alignment # # Please refer document of Bio::Alignment module. # # = References # # * Bio::Align::AlignI class of the BioPerl. # http://doc.bioperl.org/releases/bioperl-1.4/Bio/Align/AlignI.html # # * Bio::SimpleAlign class of the BioPerl. # http://doc.bioperl.org/releases/bioperl-1.4/Bio/SimpleAlign.html # require 'tempfile' require 'bio/command' require 'bio/sequence' #--- # (depends on autoload) #require 'bio/appl/gcg/seq' #+++ module Bio # # = About Bio::Alignment # # Bio::Alignment is a namespace of classes/modules for multiple sequence # alignment. # # = Multiple alignment container classes # # == Bio::Alignment::OriginalAlignment # # == Bio::Alignment::SequenceArray # # == Bio::Alignment::SequenceHash # # = Bio::Alignment::Site # # = Modules # # == Bio::Alignment::EnumerableExtension # # Mix-in for classes included Enumerable. # # == Bio::Alignment::ArrayExtension # # Mix-in for Array or Array-like classes. # # == Bio::Alignment::HashExtension # # Mix-in for Hash or Hash-like classes. # # == Bio::Alignment::SiteMethods # # == Bio::Alignment::PropertyMethods # # = Bio::Alignment::GAP # # = Compatibility from older BioRuby # module Alignment autoload :MultiFastaFormat, 'bio/appl/mafft/report' # Bio::Alignment::PropertyMethods is a set of methods to treat # the gap character and so on. module PropertyMethods # regular expression for detecting gaps. GAP_REGEXP = /[^a-zA-Z]/ # gap character GAP_CHAR = '-'.freeze # missing character MISSING_CHAR = '?'.freeze # If given character is a gap, returns true. # Otherwise, return false. # Note that s must be a String which contain a single character. def is_gap?(s) (gap_regexp =~ s) ? true : false end # Returns regular expression for checking gap. def gap_regexp ((defined? @gap_regexp) ? @gap_regexp : nil) or GAP_REGEXP end # regular expression for checking gap attr_writer :gap_regexp # Gap character. def gap_char ((defined? @gap_char) ? @gap_char : nil) or GAP_CHAR end # gap character attr_writer :gap_char # Character if the site is missing or unknown. def missing_char ((defined? @missing_char) ? @missing_char : nil) or MISSING_CHAR end # Character if the site is missing or unknown. attr_writer :missing_char # Returns class of the sequence. # If instance variable @seqclass (which can be # set by 'seqclass=' method) is set, simply returns the value. # Otherwise, returns the first sequence's class. # If no sequences are found, returns nil. def seqclass ((defined? @seqclass) ? @seqclass : nil) or String end # The class of the sequence. # The value must be String or its derivatives. attr_writer :seqclass # Returns properties defined in the object as an hash. def get_all_property ret = {} if defined? @gap_regexp ret[:gap_regexp] = @gap_regexp end if defined? @gap_char ret[:gap_char] = @gap_char end if defined? @missing_char ret[:missing_char] = @missing_char end if defined? @seqclass ret[:seqclass] = @seqclass end ret end # Sets properties from given hash. # hash would be a return value of get_character method. def set_all_property(hash) @gap_regexp = hash[:gap_regexp] if hash.has_key?(:gap_regexp) @gap_char = hash[:gap_char] if hash.has_key?(:gap_char) @missing_char = hash[:missing_char] if hash.has_key?(:missing_char) @seqclass = hash[:seqclass] if hash.has_key?(:seqclass) self end end #module PropertyMethods # Bio::Alignment::SiteMethods is a set of methods for # Bio::Alignment::Site. # It can also be used for extending an array of single-letter strings. module SiteMethods include PropertyMethods # If there are gaps, returns true. Otherwise, returns false. def has_gap? (find { |x| is_gap?(x) }) ? true : false end # Removes gaps in the site. (destructive method) def remove_gaps! flag = nil self.collect! do |x| if is_gap?(x) then flag = self; nil; else x; end end self.compact! flag end # Returns consensus character of the site. # If consensus is found, eturns a single-letter string. # If not, returns nil. def consensus_string(threshold = 1.0) return nil if self.size <= 0 return self[0] if self.sort.uniq.size == 1 h = Hash.new(0) self.each { |x| h[x] += 1 } total = self.size b = h.to_a.sort do |x,y| z = (y[1] <=> x[1]) z = (self.index(x[0]) <=> self.index(y[0])) if z == 0 z end if total * threshold <= b[0][1] then b[0][0] else nil end end # IUPAC nucleotide groups. Internal use only. IUPAC_NUC = [ %w( t u ), %w( m a c ), %w( r a g ), %w( w a t u ), %w( s c g ), %w( y c t u ), %w( k g t u ), %w( v a c g m r s ), %w( h a c t u m w y ), %w( d a g t u r w k ), %w( b c g t u s y k ), %w( n a c g t u m r w s y k v h d b ) ] # Returns an IUPAC consensus base for the site. # If consensus is found, eturns a single-letter string. # If not, returns nil. def consensus_iupac a = self.collect { |x| x.downcase }.sort.uniq if a.size == 1 then case a[0] when 'a', 'c', 'g', 't' a[0] when 'u' 't' else IUPAC_NUC.find { |x| a[0] == x[0] } ? a[0] : nil end elsif r = IUPAC_NUC.find { |x| (a - x).size <= 0 } then r[0] else nil end end # Table of strongly conserved amino-acid groups. # # The value of the tables are taken from BioPerl # (Bio/SimpleAlign.pm in BioPerl 1.0), # and the BioPerl's document says that # it is taken from Clustalw documentation and # These are all the positively scoring groups that occur in the # Gonnet Pam250 matrix. The strong and weak groups are # defined as strong score >0.5 and weak score =<0.5 respectively. # StrongConservationGroups = %w(STA NEQK NHQK NDEQ QHRK MILV MILF HY FYW).collect { |x| x.split('').sort } # Table of weakly conserved amino-acid groups. # # Please refer StrongConservationGroups document # for the origin of the table. WeakConservationGroups = %w(CSA ATV SAG STNK STPA SGND SNDEQK NDEQHK NEQHRK FVLIM HFY).collect { |x| x.split('').sort } # Returns the match-line character for the site. # This is amino-acid version. def match_line_amino(opt = {}) # opt[:match_line_char] ==> 100% equal default: '*' # opt[:strong_match_char] ==> strong match default: ':' # opt[:weak_match_char] ==> weak match default: '.' # opt[:mismatch_char] ==> mismatch default: ' ' mlc = (opt[:match_line_char] or '*') smc = (opt[:strong_match_char] or ':') wmc = (opt[:weak_match_char] or '.') mmc = (opt[:mismatch_char] or ' ') a = self.collect { |c| c.upcase }.sort.uniq a.extend(SiteMethods) if a.has_gap? then mmc elsif a.size == 1 then mlc elsif StrongConservationGroups.find { |x| (a - x).empty? } then smc elsif WeakConservationGroups.find { |x| (a - x).empty? } then wmc else mmc end end # Returns the match-line character for the site. # This is nucleic-acid version. def match_line_nuc(opt = {}) # opt[:match_line_char] ==> 100% equal default: '*' # opt[:mismatch_char] ==> mismatch default: ' ' mlc = (opt[:match_line_char] or '*') mmc = (opt[:mismatch_char] or ' ') a = self.collect { |c| c.upcase }.sort.uniq a.extend(SiteMethods) if a.has_gap? then mmc elsif a.size == 1 then mlc else mmc end end end #module SiteMethods # Bio::Alignment::Site stores bases or amino-acids in a # site of the alignment. # It would store multiple String objects of length 1. # Please refer to the document of Array and SiteMethods for methods. class Site < Array include SiteMethods end #module Site # The module Bio::Alignment::EnumerableExtension is a set of useful # methods for multiple sequence alignment. # It can be included by any classes or can be extended to any objects. # The classes or objects must have methods defined in Enumerable, # and must have the each method # which iterates over each sequence (or string) and yields # a sequence (or string) object. # # Optionally, if each_seq method is defined, # which iterates over each sequence (or string) and yields # each sequence (or string) object, it is used instead of each. # # Note that the each or each_seq method would be # called multiple times. # This means that the module is not suitable for IO objects. # In addition, break would be used in the given block and # destructive methods would be used to the sequences. # # For Array or Hash objects, you'd better using # ArrayExtension or HashExtension modules, respectively. # They would have built-in each_seq method and/or # some methods would be redefined. # module EnumerableExtension include PropertyMethods # Iterates over each sequences. # Yields a sequence. # It acts the same as Enumerable#each. # # You would redefine the method suitable for the class/object. def each_seq(&block) #:yields: seq each(&block) end # Returns class of the sequence. # If instance variable @seqclass (which can be # set by 'seqclass=' method) is set, simply returns the value. # Otherwise, returns the first sequence's class. # If no sequences are found, returns nil. def seqclass if (defined? @seqclass) and @seqclass then @seqclass else klass = nil each_seq do |s| if s then klass = s.class break if klass end end (klass or String) end end # Returns the alignment length. # Returns the longest length of the sequence in the alignment. def alignment_length maxlen = 0 each_seq do |s| x = s.length maxlen = x if x > maxlen end maxlen end alias seq_length alignment_length # Gets a site of the position. # Returns a Bio::Alignment::Site object. # # If the position is out of range, it returns the site # of which all are gaps. # # It is a private method. # Only difference from public alignment_site method is # it does not do set_all_property(get_all_property). def _alignment_site(position) site = Site.new each_seq do |s| c = s[position, 1] if c.to_s.empty? c = seqclass.new(gap_char) end site << c end site end private :_alignment_site # Gets a site of the position. # Returns a Bio::Alignment::Site object. # # If the position is out of range, it returns the site # of which all are gaps. def alignment_site(position) site = _alignment_site(position) site.set_all_property(get_all_property) site end # Iterates over each site of the alignment. # It yields a Bio::Alignment::Site object (which inherits Array). # It returns self. def each_site cp = get_all_property (0...alignment_length).each do |i| site = _alignment_site(i) site.set_all_property(cp) yield(site) end self end # Iterates over each site of the alignment, with specifying # start, stop positions and step. # It yields Bio::Alignment::Site object (which inherits Array). # It returns self. # It is same as # start.step(stop, step) { |i| yield alignment_site(i) }. def each_site_step(start, stop, step = 1) cp = get_all_property start.step(stop, step) do |i| site = _alignment_site(i) site.set_all_property(cp) yield(site) end self end # Iterates over each sequence and results running blocks # are collected and returns a new alignment as a # Bio::Alignment::SequenceArray object. # # Note that it would be redefined if you want to change # return value's class. # def alignment_collect a = SequenceArray.new a.set_all_property(get_all_property) each_seq do |str| a << yield(str) end a end # Returns specified range of the alignment. # For each sequence, the '[]' method (it may be String#[]) # is executed, and returns a new alignment # as a Bio::Alignment::SequenceArray object. # # Unlike alignment_slice method, the result alignment are # guaranteed to contain String object if the range specified # is out of range. # # If you want to change return value's class, you should redefine # alignment_collect method. # def alignment_window(*arg) alignment_collect do |s| s[*arg] or seqclass.new('') end end alias window alignment_window # Iterates over each sliding window of the alignment. # window_size is the size of sliding window. # step is the step of each sliding. # It yields a Bio::Alignment::SequenceArray object which contains # each sliding window. # It returns a Bio::Alignment::SequenceArray object which contains # remainder alignment at the terminal end. # If window_size is smaller than 0, it returns nil. def each_window(window_size, step_size = 1) return nil if window_size < 0 if step_size >= 0 then last_step = nil 0.step(alignment_length - window_size, step_size) do |i| yield alignment_window(i, window_size) last_step = i end alignment_window((last_step + window_size)..-1) else i = alignment_length - window_size while i >= 0 yield alignment_window(i, window_size) i += step_size end alignment_window(0...(i-step_size)) end end # Iterates over each site of the alignment and results running the # block are collected and returns an array. # It yields a Bio::Alignment::Site object. def collect_each_site ary = [] each_site do |site| ary << yield(site) end ary end # Helper method for calculating consensus sequence. # It iterates over each site of the alignment. # In each site, gaps will be removed if specified with opt. # It yields a Bio::Alignment::Site object. # Results running the block (String objects are expected) # are joined to a string and it returns the string. # # opt[:gap_mode] ==> 0 -- gaps are regarded as normal characters # 1 -- a site within gaps is regarded as a gap # -1 -- gaps are eliminated from consensus calculation # default: 0 # def consensus_each_site(opt = {}) mchar = (opt[:missing_char] or self.missing_char) gap_mode = opt[:gap_mode] case gap_mode when 0, nil collect_each_site do |a| yield(a) or mchar end.join('') when 1 collect_each_site do |a| a.has_gap? ? gap_char : (yield(a) or mchar) end.join('') when -1 collect_each_site do |a| a.remove_gaps! a.empty? ? gap_char : (yield(a) or mchar) end.join('') else raise ':gap_mode must be 0, 1 or -1' end end # Returns the consensus string of the alignment. # 0.0 <= threshold <= 1.0 is expected. # # It resembles the BioPerl's AlignI::consensus_string method. # # Please refer to the consensus_each_site method for opt. # def consensus_string(threshold = 1.0, opt = {}) consensus_each_site(opt) do |a| a.consensus_string(threshold) end end # Returns the IUPAC consensus string of the alignment # of nucleic-acid sequences. # # It resembles the BioPerl's AlignI::consensus_iupac method. # # Please refer to the consensus_each_site method for opt. # def consensus_iupac(opt = {}) consensus_each_site(opt) do |a| a.consensus_iupac end end # Returns the match line stirng of the alignment # of amino-acid sequences. # # It resembles the BioPerl's AlignI::match_line method. # # opt[:match_line_char] ==> 100% equal default: '*' # opt[:strong_match_char] ==> strong match default: ':' # opt[:weak_match_char] ==> weak match default: '.' # opt[:mismatch_char] ==> mismatch default: ' ' # # More opt can be accepted. # Please refer to the consensus_each_site method for opt. # def match_line_amino(opt = {}) collect_each_site do |a| a.match_line_amino(opt) end.join('') end # Returns the match line stirng of the alignment # of nucleic-acid sequences. # # It resembles the BioPerl's AlignI::match_line method. # # opt[:match_line_char] ==> 100% equal default: '*' # opt[:mismatch_char] ==> mismatch default: ' ' # # More opt can be accepted. # Please refer to the consensus_each_site method for opt. # def match_line_nuc(opt = {}) collect_each_site do |a| a.match_line_nuc(opt) end.join('') end # Returns the match line stirng of the alignment # of nucleic- or amino-acid sequences. # The type of the sequence is automatically determined # or you can specify with opt[:type]. # # It resembles the BioPerl's AlignI::match_line method. # # opt[:type] ==> :na or :aa (or determined by sequence class) # opt[:match_line_char] ==> 100% equal default: '*' # opt[:strong_match_char] ==> strong match default: ':' # opt[:weak_match_char] ==> weak match default: '.' # opt[:mismatch_char] ==> mismatch default: ' ' # :strong_ and :weak_match_char are used only in amino mode (:aa) # # More opt can be accepted. # Please refer to the consensus_each_site method for opt. # def match_line(opt = {}) case opt[:type] when :aa amino = true when :na, :dna, :rna amino = false else if seqclass == Bio::Sequence::AA then amino = true elsif seqclass == Bio::Sequence::NA then amino = false else amino = nil self.each_seq do |x| if /[EFILPQ]/i =~ x amino = true break end end end end if amino then match_line_amino(opt) else match_line_nuc(opt) end end # This is the BioPerl's AlignI::match like method. # # Changes second to last sequences' sites to match_char(default: '.') # when a site is equeal to the first sequence's corresponding site. # # Note that it is a destructive method. # # For Hash, please use it carefully because # the order of the sequences is inconstant. # def convert_match(match_char = '.') #(BioPerl) AlignI::match like method len = alignment_length firstseq = nil each_seq do |s| unless firstseq then firstseq = s else (0...len).each do |i| if s[i] and firstseq[i] == s[i] and !is_gap?(firstseq[i..i]) s[i..i] = match_char end end end end self end # This is the BioPerl's AlignI::unmatch like method. # # Changes second to last sequences' sites match_char(default: '.') # to original sites' characters. # # Note that it is a destructive method. # # For Hash, please use it carefully because # the order of the sequences is inconstant. # def convert_unmatch(match_char = '.') #(BioPerl) AlignI::unmatch like method len = alignment_length firstseq = nil each_seq do |s| unless firstseq then firstseq = s else (0...len).each do |i| if s[i..i] == match_char then s[i..i] = (firstseq[i..i] or match_char) end end end end self end # Fills gaps to the tail of each sequence if the length of # the sequence is shorter than the alignment length. # # Note that it is a destructive method. def alignment_normalize! #(original) len = alignment_length each_seq do |s| s << (gap_char * (len - s.length)) if s.length < len end self end alias normalize! alignment_normalize! # Removes excess gaps in the tail of the sequences. # If removes nothing, returns nil. # Otherwise, returns self. # # Note that it is a destructive method. def alignment_rstrip! #(String-like) len = alignment_length newlen = len each_site_step(len - 1, 0, -1) do |a| a.remove_gaps! if a.empty? then newlen -= 1 else break end end return nil if newlen >= len each_seq do |s| s[newlen..-1] = '' if s.length > newlen end self end alias rstrip! alignment_rstrip! # Removes excess gaps in the head of the sequences. # If removes nothing, returns nil. # Otherwise, returns self. # # Note that it is a destructive method. def alignment_lstrip! #(String-like) pos = 0 each_site do |a| a.remove_gaps! if a.empty? pos += 1 else break end end return nil if pos <= 0 each_seq { |s| s[0, pos] = '' } self end alias lstrip! alignment_lstrip! # Removes excess gaps in the sequences. # If removes nothing, returns nil. # Otherwise, returns self. # # Note that it is a destructive method. def alignment_strip! #(String-like) r = alignment_rstrip! l = alignment_lstrip! (r or l) end alias strip! alignment_strip! # Completely removes ALL gaps in the sequences. # If removes nothing, returns nil. # Otherwise, returns self. # # Note that it is a destructive method. def remove_all_gaps! ret = nil each_seq do |s| x = s.gsub!(gap_regexp, '') ret ||= x end ret ? self : nil end # Returns the specified range of the alignment. # For each sequence, the 'slice' method (it may be String#slice, # which is the same as String#[]) is executed, and # returns a new alignment as a Bio::Alignment::SequenceArray object. # # Unlike alignment_window method, the result alignment # might contain nil. # # If you want to change return value's class, you should redefine # alignment_collect method. # def alignment_slice(*arg) #(String-like) #(BioPerl) AlignI::slice like method alignment_collect do |s| s.slice(*arg) end end alias slice alignment_slice # For each sequence, the 'subseq' method (Bio::Seqeunce::Common#subseq is # expected) is executed, and returns a new alignment as # a Bio::Alignment::SequenceArray object. # # All sequences in the alignment are expected to be kind of # Bio::Sequence::NA or Bio::Sequence::AA objects. # # Unlike alignment_window method, the result alignment # might contain nil. # # If you want to change return value's class, you should redefine # alignment_collect method. # def alignment_subseq(*arg) #(original) alignment_collect do |s| s.subseq(*arg) end end alias subseq alignment_subseq # Concatenates the given alignment. # align must have each_seq # or each method. # # Returns self. # # Note that it is a destructive method. # # For Hash, please use it carefully because # the order of the sequences is inconstant and # key information is completely ignored. # def alignment_concat(align) flag = nil a = [] each_seq { |s| a << s } i = 0 begin align.each_seq do |seq| flag = true a[i].concat(seq) if a[i] and seq i += 1 end return self rescue NoMethodError, ArgumentError => evar raise evar if flag end align.each do |seq| a[i].concat(seq) if a[i] and seq i += 1 end self end end #module EnumerableExtension module Output def output(format, *arg) case format when :clustal output_clustal(*arg) when :fasta output_fasta(*arg) when :phylip output_phylip(*arg) when :phylipnon output_phylipnon(*arg) when :msf output_msf(*arg) when :molphy output_molphy(*arg) else raise "Unknown format: #{format.inspect}" end end # Check whether there are same names for ClustalW format. # # array:: names of the sequences (array of string) # len:: length to check (default:30) def __clustal_have_same_name?(array, len = 30) na30 = array.collect do |k| k.to_s.split(/[\x00\s]/)[0].to_s[0, len].gsub(/\:\;\,\(\)/, '_').to_s end #p na30 na30idx = (0...(na30.size)).to_a na30idx.sort! do |x,y| na30[x] <=> na30[y] end #p na30idx y = nil dupidx = [] na30idx.each do |x| if y and na30[y] == na30[x] then dupidx << y dupidx << x end y = x end if dupidx.size > 0 then dupidx.sort! dupidx.uniq! dupidx else false end end private :__clustal_have_same_name? # Changes sequence names if there are conflicted names # for ClustalW format. # # array:: names of the sequences (array of string) # len:: length to check (default:30) def __clustal_avoid_same_name(array, len = 30) na = array.collect { |k| k.to_s.gsub(/[\r\n\x00]/, ' ') } if dupidx = __clustal_have_same_name?(na, len) procs = [ Proc.new { |s, i| s[0, len].to_s.gsub(/\s/, '_') + s[len..-1].to_s }, # Proc.new { |s, i| # "#{i}_#{s}" # }, ] procs.each do |pr| dupidx.each do |i| s = array[i] na[i] = pr.call(s.to_s, i) end dupidx = __clustal_have_same_name?(na, len) break unless dupidx end if dupidx then na.each_with_index do |s, i| na[i] = "#{i}_#{s}" end end end na end private :__clustal_avoid_same_name # Generates ClustalW-formatted text # seqs:: sequences (must be an alignment object) # names:: names of the sequences # options:: options def __clustal_formatter(seqs, names, options = {}) #(original) aln = [ "CLUSTAL (0.00) multiple sequence alignment\n\n" ] len = seqs.seq_length sn = names.collect { |x| x.to_s.gsub(/[\r\n\x00]/, ' ') } if options[:replace_space] sn.collect! { |x| x.gsub(/\s/, '_') } end if !options.has_key?(:escape) or options[:escape] sn.collect! { |x| x.gsub(/[\:\;\,\(\)]/, '_') } end if !options.has_key?(:split) or options[:split] sn.collect! { |x| x.split(/\s/)[0].to_s } end if !options.has_key?(:avoid_same_name) or options[:avoid_same_name] sn = __clustal_avoid_same_name(sn) end if sn.find { |x| x.length > 10 } then seqwidth = 50 namewidth = 30 sep = ' ' * 6 else seqwidth = 60 namewidth = 10 sep = ' ' * 6 end seqregexp = Regexp.new("(.{1,#{seqwidth}})") gchar = (options[:gap_char] or '-') case options[:type].to_s when /protein/i, /aa/i mopt = { :type => :aa } when /na/i mopt = { :type => :na } else mopt = {} end mline = (options[:match_line] or seqs.match_line(mopt)) aseqs = Array.new(seqs.number_of_sequences).clear seqs.each_seq do |s| aseqs << s.to_s.gsub(seqs.gap_regexp, gchar) end case options[:case].to_s when /lower/i aseqs.each { |s| s.downcase! } when /upper/i aseqs.each { |s| s.upcase! } end aseqs << mline aseqs.collect! do |s| snx = sn.shift head = sprintf("%*s", -namewidth, snx.to_s)[0, namewidth] + sep s << (gchar * (len - s.length)) s.gsub!(seqregexp, "\\1\n") a = s.split(/^/) if options[:seqnos] and snx then i = 0 a.each do |x| x.chomp! l = x.tr(gchar, '').length i += l x.concat(l > 0 ? " #{i}\n" : "\n") end end a.collect { |x| head + x } end lines = (len + seqwidth - 1).div(seqwidth) lines.times do aln << "\n" aseqs.each { |a| aln << a.shift } end aln.join('') end private :__clustal_formatter # Generates ClustalW-formatted text # seqs:: sequences (must be an alignment object) # names:: names of the sequences # options:: options def output_clustal(options = {}) __clustal_formatter(self, self.sequence_names, options) end # to_clustal is deprecated. Instead, please use output_clustal. #--- #alias to_clustal output_clustal #+++ def to_clustal(*arg) warn "to_clustal is deprecated. Please use output_clustal." output_clustal(*arg) end # Generates fasta format text and returns a string. def output_fasta(options={}) #(original) width = (options[:width] or 70) if options[:avoid_same_name] then na = __clustal_avoid_same_name(self.sequence_names, 30) else na = self.sequence_names.collect do |k| k.to_s.gsub(/[\r\n\x00]/, ' ') end end if width and width > 0 then w_reg = Regexp.new(".{1,#{width}}") self.collect do |s| ">#{na.shift}\n" + s.to_s.gsub(w_reg, "\\0\n") end.join('') else self.collect do |s| ">#{na.shift}\n" + s.to_s + "\n" end.join('') end end # generates phylip interleaved alignment format as a string def output_phylip(options = {}) aln, aseqs, lines = __output_phylip_common(options) lines.times do aseqs.each { |a| aln << a.shift } aln << "\n" end aln.pop if aln[-1] == "\n" aln.join('') end # generates Phylip3.2 (old) non-interleaved format as a string def output_phylipnon(options = {}) aln, aseqs, _ = __output_phylip_common(options) aln.first + aseqs.join('') end # common routine for interleaved/non-interleaved phylip format def __output_phylip_common(options = {}) len = self.alignment_length aln = [ " #{self.number_of_sequences} #{len}\n" ] sn = self.sequence_names.collect { |x| x.to_s.gsub(/[\r\n\x00]/, ' ') } if options[:replace_space] sn.collect! { |x| x.gsub(/\s/, '_') } end if !options.has_key?(:escape) or options[:escape] sn.collect! { |x| x.gsub(/[\:\;\,\(\)]/, '_') } end if !options.has_key?(:split) or options[:split] sn.collect! { |x| x.split(/\s/)[0].to_s } end if !options.has_key?(:avoid_same_name) or options[:avoid_same_name] sn = __clustal_avoid_same_name(sn, 10) end namewidth = 10 seqwidth = (options[:width] or 60) seqwidth = seqwidth.div(10) * 10 seqregexp = Regexp.new("(.{1,#{seqwidth.div(10) * 11}})") gchar = (options[:gap_char] or '-') aseqs = Array.new(self.number_of_sequences).clear self.each_seq do |s| aseqs << s.to_s.gsub(self.gap_regexp, gchar) end case options[:case].to_s when /lower/i aseqs.each { |s| s.downcase! } when /upper/i aseqs.each { |s| s.upcase! } end aseqs.collect! do |s| snx = sn.shift head = sprintf("%*s", -namewidth, snx.to_s)[0, namewidth] head2 = ' ' * namewidth s << (gchar * (len - s.length)) s.gsub!(/(.{1,10})/n, " \\1") s.gsub!(seqregexp, "\\1\n") a = s.split(/^/) head += a.shift ret = a.collect { |x| head2 + x } ret.unshift(head) ret end lines = (len + seqwidth - 1).div(seqwidth) [ aln, aseqs, lines ] end # Generates Molphy alignment format text as a string def output_molphy(options = {}) len = self.alignment_length header = "#{self.number_of_sequences} #{len}\n" sn = self.sequence_names.collect { |x| x.to_s.gsub(/[\r\n\x00]/, ' ') } if options[:replace_space] sn.collect! { |x| x.gsub(/\s/, '_') } end if !options.has_key?(:escape) or options[:escape] sn.collect! { |x| x.gsub(/[\:\;\,\(\)]/, '_') } end if !options.has_key?(:split) or options[:split] sn.collect! { |x| x.split(/\s/)[0].to_s } end if !options.has_key?(:avoid_same_name) or options[:avoid_same_name] sn = __clustal_avoid_same_name(sn, 30) end seqwidth = (options[:width] or 60) seqregexp = Regexp.new("(.{1,#{seqwidth}})") gchar = (options[:gap_char] or '-') aseqs = Array.new(len).clear self.each_seq do |s| aseqs << s.to_s.gsub(self.gap_regexp, gchar) end case options[:case].to_s when /lower/i aseqs.each { |s| s.downcase! } when /upper/i aseqs.each { |s| s.upcase! } end aseqs.collect! do |s| s << (gchar * (len - s.length)) s.gsub!(seqregexp, "\\1\n") sn.shift + "\n" + s end aseqs.unshift(header) aseqs.join('') end # Generates msf formatted text as a string def output_msf(options = {}) len = self.seq_length if !options.has_key?(:avoid_same_name) or options[:avoid_same_name] sn = __clustal_avoid_same_name(self.sequence_names) else sn = self.sequence_names.collect do |x| x.to_s.gsub(/[\r\n\x00]/, ' ') end end if !options.has_key?(:replace_space) or options[:replace_space] sn.collect! { |x| x.gsub(/\s/, '_') } end if !options.has_key?(:escape) or options[:escape] sn.collect! { |x| x.gsub(/[\:\;\,\(\)]/, '_') } end if !options.has_key?(:split) or options[:split] sn.collect! { |x| x.split(/\s/)[0].to_s } end seqwidth = 50 namewidth = [31, sn.collect { |x| x.length }.max ].min sep = ' ' * 2 seqregexp = Regexp.new("(.{1,#{seqwidth}})") gchar = (options[:gap_char] or '.') pchar = (options[:padding_char] or '~') aseqs = Array.new(self.number_of_sequences).clear self.each_seq do |s| aseqs << s.to_s.gsub(self.gap_regexp, gchar) end aseqs.each do |s| s.sub!(/\A#{Regexp.escape(gchar)}+/) { |x| pchar * x.length } s.sub!(/#{Regexp.escape(gchar)}+\z/, '') s << (pchar * (len - s.length)) end case options[:case].to_s when /lower/i aseqs.each { |s| s.downcase! } when /upper/i aseqs.each { |s| s.upcase! } else #default upcase aseqs.each { |s| s.upcase! } end case options[:type].to_s when /protein/i, /aa/i amino = true when /na/i amino = false else if seqclass == Bio::Sequence::AA then amino = true elsif seqclass == Bio::Sequence::NA then amino = false else # if we can't determine, we asuume as protein. amino = aseqs.size aseqs.each { |x| amino -= 1 if /\A[acgt]\z/i =~ x } amino = false if amino <= 0 end end seq_type = (amino ? 'P' : 'N') fn = (options[:entry_id] or self.__id__.abs.to_s + '.msf') dt = (options[:time] or Time.now).strftime('%B %d, %Y %H:%M') sums = aseqs.collect { |s| GCG::Seq.calc_checksum(s) } #sums = aseqs.collect { |s| 0 } sum = 0; sums.each { |x| sum += x }; sum %= 10000 msf = [ "#{seq_type == 'N' ? 'N' : 'A' }A_MULTIPLE_ALIGNMENT 1.0\n", "\n", "\n", " #{fn} MSF: #{len} Type: #{seq_type} #{dt} Check: #{sum} ..\n", "\n" ] sn.each do |snx| msf << ' Name: ' + sprintf('%*s', -namewidth, snx.to_s)[0, namewidth] + " Len: #{len} Check: #{sums.shift} Weight: 1.00\n" end msf << "\n//\n" aseqs.collect! do |s| snx = sn.shift head = sprintf("%*s", namewidth, snx.to_s)[0, namewidth] + sep s.gsub!(seqregexp, "\\1\n") a = s.split(/^/) a.collect { |x| head + x } end lines = (len + seqwidth - 1).div(seqwidth) i = 1 lines.times do msf << "\n" n_l = i n_r = [ i + seqwidth - 1, len ].min if n_l != n_r then w = [ n_r - n_l + 1 - n_l.to_s.length - n_r.to_s.length, 1 ].max msf << (' ' * namewidth + sep + n_l.to_s + ' ' * w + n_r.to_s + "\n") else msf << (' ' * namewidth + sep + n_l.to_s + "\n") end aseqs.each { |a| msf << a.shift } i += seqwidth end msf << "\n" msf.join('') end end #module Output module EnumerableExtension include Output # Returns number of sequences in this alignment. def number_of_sequences i = 0 self.each_seq { |s| i += 1 } i end # Returns an array of sequence names. # The order of the names must be the same as # the order of each_seq. def sequence_names (0...(self.number_of_sequences)).to_a end end #module EnumerableExtension # Bio::Alignment::ArrayExtension is a set of useful methods for # multiple sequence alignment. # It is designed to be extended to array objects or # included in your own classes which inherit Array. # (It can also be included in Array, though not recommended.) # # It possesses all methods defined in EnumerableExtension. # For usage of methods, please refer to EnumerableExtension. module ArrayExtension include EnumerableExtension # Iterates over each sequences. # Yields a sequence. # # It works the same as Array#each. def each_seq(&block) #:yields: seq each(&block) end # Returns number of sequences in this alignment. def number_of_sequences self.size end end #module ArrayExtension # Bio::Alignment::HashExtension is a set of useful methods for # multiple sequence alignment. # It is designed to be extended to hash objects or # included in your own classes which inherit Hash. # (It can also be included in Hash, though not recommended.) # # It possesses all methods defined in EnumerableExtension. # For usage of methods, please refer to EnumerableExtension. # # Because SequenceHash#alignment_collect is redefined, # some methods' return value's class are changed to # SequenceHash instead of SequenceArray. # # Because the order of the objects in a hash is inconstant, # some methods strictly affected with the order of objects # might not work correctly, # e.g. EnumerableExtension#convert_match and #convert_unmatch. module HashExtension include EnumerableExtension # Iterates over each sequences. # Yields a sequence. # # It works the same as Hash#each_value. def each_seq #:yields: seq #each_value(&block) each_key { |k| yield self[k] } end # Iterates over each sequence and each results running block # are collected and returns a new alignment as a # Bio::Alignment::SequenceHash object. # # Note that it would be redefined if you want to change # return value's class. # def alignment_collect a = SequenceHash.new a.set_all_property(get_all_property) each_pair do |key, str| a.store(key, yield(str)) end a end # Concatenates the given alignment. # If align is a Hash (or SequenceHash), # sequences of same keys are concatenated. # Otherwise, align must have each_seq # or each method and # works same as EnumerableExtension#alignment_concat. # # Returns self. # # Note that it is a destructive method. # def alignment_concat(align) flag = nil begin align.each_pair do |key, seq| flag = true if origseq = self[key] origseq.concat(seq) end end return self rescue NoMethodError, ArgumentError =>evar raise evar if flag end a = values i = 0 begin align.each_seq do |seq| flag = true a[i].concat(seq) if a[i] and seq i += 1 end return self rescue NoMethodError, ArgumentError => evar raise evar if flag end align.each do |seq| a[i].concat(seq) if a[i] and seq i += 1 end self end # Returns number of sequences in this alignment. def number_of_sequences self.size end # Returns an array of sequence names. # The order of the names must be the same as # the order of each_seq. def sequence_names self.keys end end #module HashExtension # Bio::Alignment::SequenceArray is a container class of # multiple sequence alignment. # Since it inherits Array, it acts completely same as Array. # In addition, methods defined in ArrayExtension and EnumerableExtension # can be used. class SequenceArray < Array include ArrayExtension end #class SequenceArray # Bio::Alignment::SequenceHash is a container class of # multiple sequence alignment. # Since it inherits Hash, it acts completely same as Hash. # In addition, methods defined in HashExtension and EnumerableExtension # can be used. class SequenceHash < Hash include HashExtension end #class SequenceHash # Bio::Alignment::OriginalPrivate is a set of private methods # for Bio::Alignment::OriginalAlignment. module OriginalPrivate # Gets the sequence from given object. def extract_seq(obj) if obj.is_a?(Bio::Sequence::NA) or obj.is_a?(Bio::Sequence::AA) then obj else meth = [ :seq, :naseq, :aaseq ].find {|m| obj.respond_to? m } meth ? obj.__send__(meth) : obj end end module_function :extract_seq # Gets the name or the definition of the sequence from given object. def extract_key(obj) sn = nil for m in [ :definition, :entry_id ] begin sn = obj.send(m) rescue NameError, ArgumentError sn = nil end break if sn end sn end module_function :extract_key end #module OriginalPrivate # Bio::Alignment::OriginalAlignment is # the BioRuby original multiple sequence alignment container class. # It includes HashExtension. # # It is recommended only to use methods defined in EnumerableExtension # (and the each_seq method). # The method only defined in this class might be obsoleted in the future. # class OriginalAlignment include Enumerable include HashExtension include OriginalPrivate # Read files and creates a new alignment object. # # It will be obsoleted. def self.readfiles(*files) require 'bio/io/flatfile' aln = self.new files.each do |fn| Bio::FlatFile.open(nil, fn) do |ff| aln.add_sequences(ff) end end aln end # Creates a new alignment object from given arguments. # # It will be obsoleted. def self.new2(*arg) self.new(arg) end # Creates a new alignment object. # seqs may be one of follows: # an array of sequences (or strings), # an array of sequence database objects, # an alignment object. def initialize(seqs = []) @seqs = {} @keys = [] self.add_sequences(seqs) end # If x is the same value, returns true. # Otherwise, returns false. def ==(x) #(original) if x.is_a?(self.class) self.to_hash == x.to_hash else false end end # convert to hash def to_hash #(Hash-like) @seqs end # Adds sequences to the alignment. # seqs may be one of follows: # an array of sequences (or strings), # an array of sequence database objects, # an alignment object. def add_sequences(seqs) if block_given? then seqs.each do |x| s, key = yield x self.store(key, s) end else if seqs.is_a?(self.class) then seqs.each_pair do |k, s| self.store(k, s) end elsif seqs.respond_to?(:each_pair) seqs.each_pair do |k, x| s = extract_seq(x) self.store(k, s) end else seqs.each do |x| s = extract_seq(x) k = extract_key(x) self.store(k, s) end end end self end # identifiers (or definitions or names) of the sequences attr_reader :keys # stores a sequences with the name # key:: name of the sequence # seq:: sequence def __store__(key, seq) #(Hash-like) h = { key => seq } @keys << h.keys[0] @seqs.update(h) seq end # stores a sequence with key # (name or definition of the sequence). # Unlike __store__ method, the method doesn't allow # same keys. # If the key is already used, returns nil. # When succeeded, returns key. def store(key, seq) #(Hash-like) returns key instead of seq if @seqs.has_key?(key) then # don't allow same key # New key is discarded, while existing key is preserved. key = nil end unless key then unless defined?(@serial) @serial = 0 end @serial = @seqs.size if @seqs.size > @serial while @seqs.has_key?(@serial) @serial += 1 end key = @serial end self.__store__(key, seq) key end # Reconstructs internal data structure. # (Like Hash#rehash) def rehash @seqs.rehash tmpkeys = @seqs.keys @keys.collect! do |k| tmpkeys.delete(k) end @keys.compact! @keys.concat(tmpkeys) self end # Prepends seq (with key) to the front of the alignment. # (Like Array#unshift) def unshift(key, seq) #(Array-like) self.store(key, seq) k = @keys.pop @keys.unshift(k) k end # Removes the first sequence in the alignment and # returns [ key, seq ]. def shift k = @keys.shift if k then s = @seqs.delete(k) [ k, s ] else nil end end # Gets the n-th sequence. # If not found, returns nil. def order(n) #(original) @seqs[@keys[n]] end # Removes the sequence whose key is key. # Returns the removed sequence. # If not found, returns nil. def delete(key) #(Hash-like) @keys.delete(key) @seqs.delete(key) end # Returns sequences. (Like Hash#values) def values #(Hash-like) @keys.collect { |k| @seqs[k] } end # Adds a sequence without key. # The key is automatically determined. def <<(seq) #(Array-like) self.store(nil, seq) self end # Gets a sequence. (Like Hash#[]) def [](*arg) #(Hash-like) @seqs[*arg] end # Number of sequences in the alignment. def size #(Hash&Array-like) @seqs.size end alias number_of_sequences size # If the key exists, returns true. Otherwise, returns false. # (Like Hash#has_key?) def has_key?(key) #(Hash-like) @seqs.has_key?(key) end # Iterates over each sequence. # (Like Array#each) def each #(Array-like) @keys.each do |k| yield @seqs[k] end end alias each_seq each # Iterates over each key and sequence. # (Like Hash#each_pair) def each_pair #(Hash-like) @keys.each do |k| yield k, @seqs[k] end end # Iterates over each sequence, replacing the sequence with the # value returned by the block. def collect! #(Array-like) @keys.each do |k| @seqs[k] = yield @seqs[k] end end ###-- ### note that 'collect' and 'to_a' is defined in Enumerable ### ### instance-variable-related methods ###++ # Creates new alignment. Internal use only. def new(*arg) na = self.class.new(*arg) na.set_all_property(get_all_property) na end protected :new # Duplicates the alignment def dup #(Hash-like) self.new(self) end #-- # methods below should not access instance variables #++ # Merges given alignment and returns a new alignment. def merge(*other) #(Hash-like) na = self.new(self) na.merge!(*other) na end # Merge given alignment. # Note that it is destructive method. def merge!(*other) #(Hash-like) if block_given? then other.each do |aln| aln.each_pair do |k, s| if self.has_key?(k) then s = yield k, self[k], s self.to_hash.store(k, s) else self.store(k, s) end end end else other.each do |aln| aln.each_pair do |k, s| self.delete(k) if self.has_key?(k) self.store(k, s) end end end self end # Returns the key for a given sequence. If not found, returns nil. def index(seq) #(Hash-like) last_key = nil self.each_pair do |k, s| last_key = k if s.class == seq.class then r = (s == seq) else r = (s.to_s == seq.to_s) end break if r end last_key end # Sequences in the alignment are duplicated. # If keys are given to the argument, sequences of given keys are # duplicated. # # It will be obsoleted. def isolate(*arg) #(original) if arg.size == 0 then self.collect! do |s| seqclass.new(s) end else arg.each do |k| if self.has_key?(k) then s = self.delete(key) self.store(k, seqclass.new(s)) end end end self end # Iterates over each sequence and each results running block # are collected and returns a new alignment. # # The method name 'collect_align' will be obsoleted. # Please use 'alignment_collect' instead. def alignment_collect #(original) na = self.class.new na.set_all_property(get_all_property) self.each_pair do |k, s| na.store(k, yield(s)) end na end alias collect_align alignment_collect # Removes empty sequences or nil in the alignment. # (Like Array#compact!) def compact! #(Array-like) d = [] self.each_pair do |k, s| if !s or s.empty? d << k end end d.each do |k| self.delete(k) end d.empty? ? nil : d end # Removes empty sequences or nil and returns new alignment. # (Like Array#compact) def compact #(Array-like) na = self.dup na.compact! na end # Adds a sequence to the alignment. # Returns key if succeeded. # Returns nil (and not added to the alignment) if key is already used. # # It resembles BioPerl's AlignI::add_seq method. def add_seq(seq, key = nil) #(BioPerl) AlignI::add_seq like method unless seq.is_a?(Bio::Sequence::NA) or seq.is_a?(Bio::Sequence::AA) s = extract_seq(seq) key = extract_key(seq) unless key seq = s end self.store(key, seq) end # Removes given sequence from the alignment. # Returns removed sequence. If nothing removed, returns nil. # # It resembles BioPerl's AlignI::remove_seq. def remove_seq(seq) #(BioPerl) AlignI::remove_seq like method if k = self.index(seq) then self.delete(k) else nil end end # Removes sequences from the alignment by given keys. # Returns an alignment object consists of removed sequences. # # It resembles BioPerl's AlignI::purge method. def purge(*arg) #(BioPerl) AlignI::purge like method purged = self.new arg.each do |k| if self[k] then purged.store(k, self.delete(k)) end end purged end # If block is given, it acts like Array#select (Enumerable#select). # Returns a new alignment containing all sequences of the alignment # for which return value of given block is not false nor nil. # # If no block is given, it acts like the BioPerl's AlignI::select. # Returns a new alignment containing sequences of given keys. # # The BioPerl's AlignI::select-like action will be obsoleted. def select(*arg) #(original) na = self.new if block_given? then # 'arg' is ignored # nearly same action as Array#select (Enumerable#select) self.each_pair.each do |k, s| na.store(k, s) if yield(s) end else # BioPerl's AlignI::select like function arg.each do |k| if s = self[k] then na.store(k, s) end end end na end # The method name slice will be obsoleted. # Please use alignment_slice instead. alias slice alignment_slice # The method name subseq will be obsoleted. # Please use alignment_subseq instead. alias subseq alignment_subseq # Not-destructive version of alignment_normalize!. # Returns a new alignment. def normalize #(original) na = self.dup na.alignment_normalize! na end # Not-destructive version of alignment_rstrip!. # Returns a new alignment. def rstrip #(String-like) na = self.dup na.isolate na.alignment_rstrip! na end # Not-destructive version of alignment_lstrip!. # Returns a new alignment. def lstrip #(String-like) na = self.dup na.isolate na.alignment_lstrip! na end # Not-destructive version of alignment_strip!. # Returns a new alignment. def strip #(String-like) na = self.dup na.isolate na.alignment_strip! na end # Not-destructive version of remove_gaps!. # Returns a new alignment. # # The method name 'remove_gap' will be obsoleted. # Please use 'remove_all_gaps' instead. def remove_all_gaps #(original) na = self.dup na.isolate na.remove_all_gaps! na end # Concatenates a string or an alignment. # Returns self. # # Note that the method will be obsoleted. # Please use each_seq { |s| s << str } for concatenating # a string and # alignment_concat(aln) for concatenating an alignment. def concat(aln) #(String-like) if aln.respond_to?(:to_str) then #aln.is_a?(String) self.each do |s| s << aln end self else alignment_concat(aln) end end # Replace the specified region of the alignment to aln. # aln:: String or Bio::Alignment object # arg:: same format as String#slice # # It will be obsoleted. def replace_slice(aln, *arg) #(original) if aln.respond_to?(:to_str) then #aln.is_a?(String) self.each do |s| s[*arg] = aln end elsif aln.is_a?(self.class) then aln.each_pair do |k, s| self[k][*arg] = s end else i = 0 aln.each do |s| self.order(i)[*arg] = s i += 1 end end self end # Performs multiple alignment by using external program. def do_align(factory) a0 = self.class.new (0...self.size).each { |i| a0.store(i, self.order(i)) } r = factory.query(a0) a1 = r.alignment a0.keys.each do |k| unless a1[k.to_s] then raise 'alignment result is inconsistent with input data' end end a2 = self.new a0.keys.each do |k| a2.store(self.keys[k], a1[k.to_s]) end a2 end # Convert to fasta format and returns an array of strings. # # It will be obsoleted. def to_fasta_array(*arg) #(original) width = nil if arg[0].is_a?(Integer) then width = arg.shift end options = (arg.shift or {}) width = options[:width] unless width if options[:avoid_same_name] then na = __clustal_avoid_same_name(self.keys, 30) else na = self.keys.collect { |k| k.to_s.gsub(/[\r\n\x00]/, ' ') } end a = self.collect do |s| ">#{na.shift}\n" + if width then s.to_s.gsub(Regexp.new(".{1,#{width}}"), "\\0\n") else s.to_s + "\n" end end a end # Convets to fasta format and returns an array of FastaFormat objects. # # It will be obsoleted. def to_fastaformat_array(*arg) #(original) require 'bio/db/fasta' a = self.to_fasta_array(*arg) a.collect! do |x| Bio::FastaFormat.new(x) end a end # Converts to fasta format and returns a string. # # The specification of the argument will be changed. # # Note: to_fasta is deprecated. # Please use output_fasta instead. def to_fasta(*arg) #(original) warn "to_fasta is deprecated. Please use output_fasta." self.to_fasta_array(*arg).join('') end # The method name consensus will be obsoleted. # Please use consensus_string instead. alias consensus consensus_string end #class OriginalAlignment # Bio::Alignment::GAP is a set of class methods for # gap-related position translation. module GAP # position with gaps are translated into the position without gaps. #seq:: sequence #pos:: position with gaps #gap_regexp:: regular expression to specify gaps def ungapped_pos(seq, pos, gap_regexp) p = seq[0..pos].gsub(gap_regexp, '').length p -= 1 if p > 0 p end module_function :ungapped_pos # position without gaps are translated into the position with gaps. #seq:: sequence #pos:: position with gaps #gap_regexp:: regular expression to specify gaps def gapped_pos(seq, pos, gap_regexp) olen = seq.gsub(gap_regexp, '').length pos = olen if pos >= olen pos = olen + pos if pos < 0 i = 0 l = pos + 1 while l > 0 and i < seq.length x = seq[i, l].gsub(gap_regexp, '').length i += l l -= x end i -= 1 if i > 0 i end module_function :gapped_pos end # module GAP # creates a new Bio::Alignment::OriginalAlignment object. # Please refer document of OriginalAlignment.new. def self.new(*arg) OriginalAlignment.new(*arg) end # creates a new Bio::Alignment::OriginalAlignment object. # Please refer document of OriginalAlignment.new2. def self.new2(*arg) OriginalAlignment.new2(*arg) end # creates a new Bio::Alignment::OriginalAlignment object. # Please refer document of OriginalAlignment.readfiles. def self.readfiles(*files) OriginalAlignment.readfiles(*files) end #--- # Service classes for multiple alignment applications #+++ #--- # Templates of alignment application factory #+++ # Namespace for templates for alignment application factory module FactoryTemplate # Template class for alignment application factory. # The program acts: # input: stdin or file, format = fasta format # output: stdout (parser should be specified by DEFAULT_PARSER) class Simple # Creates a new alignment factory def initialize(program = self.class::DEFAULT_PROGRAM, options = []) @program = program @options = options @command = nil @output = nil @report = nil @exit_status = nil @data_stdout = nil end # program name attr_accessor :program # options attr_accessor :options # Last command-line string. Returns nil or an array of String. # Note that filenames described in the command-line may already # be removed because these files may be temporary files. attr_reader :command # Last raw result of the program. # Return a string (or nil). attr_reader :output # Last result object performed by the factory. attr_reader :report # Last exit status attr_reader :exit_status # Last output to the stdout. attr_accessor :data_stdout # Clear the internal data and status, except program and options. def reset @command = nil @output = nil @report = nil @exit_status = nil @data_stdout = nil end # Executes the program. # If +seqs+ is not nil, perform alignment for seqs. # If +seqs+ is nil, simply executes the program. # # Compatibility note: When seqs is nil, # returns true if the program exits normally, and # returns false if the program exits abnormally. def query(seqs) if seqs then query_alignment(seqs) else exec_local(@options) @exit_status.exitstatus == 0 ? true : false end end # Performs alignment for seqs. # +seqs+ should be Bio::Alignment or Array of sequences or nil. def query_alignment(seqs) unless seqs.respond_to?(:output_fasta) then seqs = Bio::Alignment.new(seqs) end query_string(seqs.output_fasta(:width => 70)) end # alias of query_alignment. # # Compatibility Note: query_align will renamed to query_alignment. def query_align(seqs) #warn 'query_align is renamed to query_alignment.' query_alignment(seqs) end # Performs alignment for +str+. # The +str+ should be a string that can be recognized by the program. def query_string(str) _query_string(str, @options) @report end # Performs alignment of sequences in the file named +fn+. def query_by_filename(filename_in) _query_local(filename_in, @options) @report end private # Executes a program in the local machine. def exec_local(opt, data_stdin = nil) @exit_status = nil @command = [ @program, *opt ] #STDERR.print "DEBUG: ", @command.join(" "), "\n" @data_stdout = Bio::Command.query_command(@command, data_stdin) @exit_status = $? end # prepare temporary file def _prepare_tempfile(str = nil) tf_in = Tempfile.open(str ? 'alignment_i' : 'alignment_o') tf_in.print str if str tf_in.close(false) tf_in end # generates options specifying input/output filename. # nil for filename means stdin or stdout. # +options+ must not contain specify filenames. # returns an array of string. def _generate_options(infile, outfile, options) options + (infile ? _option_input_file(infile) : _option_input_stdin) + (outfile ? _option_output_file(outfile) : _option_output_stdout) end # generates options specifying input filename. # returns an array of string def _option_input_file(fn) [ fn ] end # generates options specifying output filename. # returns an array of string def _option_output_file(fn) raise 'can not specify output file: always stdout' end # generates options specifying that input is taken from stdin. # returns an array of string def _option_input_stdin [] end # generates options specifying output to stdout. # returns an array of string def _option_output_stdout [] end end #class Simple # mix-in module module WrapInputStdin private # Performs alignment for +str+. # The +str+ should be a string that can be recognized by the program. def _query_string(str, opt) _query_local(nil, opt, str) end end #module WrapInputStdin # mix-in module module WrapInputTempfile private # Performs alignment for +str+. # The +str+ should be a string that can be recognized by the program. def _query_string(str, opt) begin tf_in = _prepare_tempfile(str) ret = _query_local(tf_in.path, opt, nil) ensure tf_in.close(true) if tf_in end ret end end #module WrapInputTempfile # mix-in module module WrapOutputStdout private # Performs alignment by specified filenames def _query_local(fn_in, opt, data_stdin = nil) opt = _generate_options(fn_in, nil, opt) exec_local(opt, data_stdin) @output = @data_stdout @report = self.class::DEFAULT_PARSER.new(@output) @report end end #module WrapOutputStdout # mix-in module module WrapOutputTempfile private # Performs alignment def _query_local(fn_in, opt, data_stdin = nil) begin tf_out = _prepare_tempfile() opt = _generate_options(fn_in, tf_out.path, opt) exec_local(opt, data_stdin) tf_out.open @output = tf_out.read ensure tf_out.close(true) if tf_out end @report = self.class::DEFAULT_PARSER.new(@output) @report end end #module WrapOutputTempfile # Template class for alignment application factory. # The program needs: # input: file (cannot accept stdin), format = fasta format # output: stdout (parser should be specified by DEFAULT_PARSER) class FileInStdoutOut < Simple include Bio::Alignment::FactoryTemplate::WrapInputTempfile include Bio::Alignment::FactoryTemplate::WrapOutputStdout private # generates options specifying that input is taken from stdin. # returns an array of string def _option_input_stdin raise 'input is always a file' end end #class FileInStdoutOut # Template class for alignment application factory. # The program needs: # input: stdin or file, format = fasta format # output: file (parser should be specified by DEFAULT_PARSER) class StdinInFileOut < Simple include Bio::Alignment::FactoryTemplate::WrapInputStdin include Bio::Alignment::FactoryTemplate::WrapOutputTempfile private # generates options specifying output to stdout. # returns an array of string def _option_output_stdout raise 'output is always a file' end end #class StdinInFileOut # Template class for alignment application factory. # The program needs: # input: file (cannot accept stdin), format = fasta format # output: file (parser should be specified by DEFAULT_PARSER) class FileInFileOut < Simple include Bio::Alignment::FactoryTemplate::WrapInputTempfile include Bio::Alignment::FactoryTemplate::WrapOutputTempfile private # generates options specifying that input is taken from stdin. # returns an array of string def _option_input_stdin raise 'input is always a file' end # generates options specifying output to stdout. # returns an array of string def _option_output_stdout raise 'output is always a file' end end #class FileInFileOut # Template class for alignment application factory. # The program needs: # input: file (cannot accept stdin), format = fasta format # output: file (parser should be specified by DEFAULT_PARSER) # Tree (*.dnd) output is also supported. class FileInFileOutWithTree < FileInFileOut # alignment guide tree generated by the program (*.dnd file) attr_reader :output_dnd def reset @output_dnd = nil super end private # Performs alignment def _query_local(fn_in, opt, data_stdin = nil) begin tf_dnd = _prepare_tempfile() opt = opt + _option_output_dndfile(tf_dnd.path) ret = super(fn_in, opt, data_stdin) tf_dnd.open @output_dnd = tf_dnd.read ensure tf_dnd.close(true) if tf_dnd end ret end # generates options specifying output tree file (*.dnd). # returns an array of string def _option_output_dndfile raise NotImplementedError end end #class FileInFileOutWithTree end #module FactoryTemplate end #module Alignment end #module Bio bio-2.0.3/lib/bio/sequence/0000755000175000017500000000000014141516614014745 5ustar nileshnileshbio-2.0.3/lib/bio/sequence/sequence_masker.rb0000644000175000017500000000600114141516614020441 0ustar nileshnilesh# # = bio/sequence/sequence_masker.rb - Sequence masking helper methods # # Copyright:: Copyright (C) 2010 # Naohisa Goto # License:: The Ruby License # # == Description # # Bio::Sequence::SequenceMasker is a mix-in module to provide helpful # methods for masking a sequence. # # For details, see documentation of Bio::Sequence::SequenceMasker. # module Bio require 'bio/sequence' unless const_defined?(:Sequence) class Sequence # Bio::Sequence::SequenceMasker is a mix-in module to provide helpful # methods for masking a sequence. # # It is only expected to be included in Bio::Sequence. # In the future, methods in this module might be moved to # Bio::Sequence or other module and this module might be removed. # Please do not depend on this module. # module SequenceMasker # Masks the sequence with each value in the enum. # The enum should be an array or enumerator. # A block must be given. # When the block returns true, the sequence is masked with # mask_char. # --- # *Arguments*: # * (required) enum : Enumerator # * (required) mask_char : (String) character used for masking # *Returns*:: Bio::Sequence object def mask_with_enumerator(enum, mask_char) offset = 0 unit = mask_char.length - 1 s = self.seq.class.new(self.seq) j = 0 enum.each_with_index do |item, index| if yield item then j = index + offset if j < s.length then s[j, 1] = mask_char offset += unit end end end newseq = self.dup newseq.seq = s newseq end # Masks low quality sequence regions. # For each sequence position, if the quality score is smaller than # the threshold, the sequence in the position is replaced with # mask_char. # # Note: This method does not care quality_score_type. # --- # *Arguments*: # * (required) threshold : (Numeric) threshold # * (required) mask_char : (String) character used for masking # *Returns*:: Bio::Sequence object def mask_with_quality_score(threshold, mask_char) scores = self.quality_scores || [] mask_with_enumerator(scores, mask_char) do |item| item < threshold end end # Masks high error-probability sequence regions. # For each sequence position, if the error probability is larger than # the threshold, the sequence in the position is replaced with # mask_char. # # --- # *Arguments*: # * (required) threshold : (Numeric) threshold # * (required) mask_char : (String) character used for masking # *Returns*:: Bio::Sequence object def mask_with_error_probability(threshold, mask_char) values = self.error_probabilities || [] mask_with_enumerator(values, mask_char) do |item| item > threshold end end end #module SequenceMasker end #class Sequence end #module Bio bio-2.0.3/lib/bio/sequence/common.rb0000644000175000017500000002654714141516614016600 0ustar nileshnilesh# # = bio/sequence/common.rb - common methods for biological sequence # # Copyright:: Copyright (C) 2006 # Toshiaki Katayama , # Ryan Raaum # License:: The Ruby License # module Bio autoload :Locations, 'bio/location' unless const_defined?(:Locations) require 'bio/sequence' unless const_defined?(:Sequence) class Sequence # = DESCRIPTION # Bio::Sequence::Common is a # Mixin[http://www.rubycentral.com/book/tut_modules.html] # implementing methods common to # Bio::Sequence::AA and Bio::Sequence::NA. All of these methods # are available to either Amino Acid or Nucleic Acid sequences, and # by encapsulation are also available to Bio::Sequence objects. # # = USAGE # # # Create a sequence # dna = Bio::Sequence.auto('atgcatgcatgc') # # # Splice out a subsequence using a Genbank-style location string # puts dna.splice('complement(1..4)') # # # What is the base composition? # puts dna.composition # # # Create a random sequence with the composition of a current sequence # puts dna.randomize module Common # Return sequence as # String[http://corelib.rubyonrails.org/classes/String.html]. # The original sequence is unchanged. # # seq = Bio::Sequence::NA.new('atgc') # puts s.to_s #=> 'atgc' # puts s.to_s.class #=> String # puts s #=> 'atgc' # puts s.class #=> Bio::Sequence::NA # --- # *Returns*:: String object def to_s String.new(self) end alias to_str to_s # Create a new sequence based on the current sequence. # The original sequence is unchanged. # # s = Bio::Sequence::NA.new('atgc') # s2 = s.seq # puts s2 #=> 'atgc' # --- # *Returns*:: new Bio::Sequence::NA/AA object def seq self.class.new(self) end # Normalize the current sequence, removing all whitespace and # transforming all positions to uppercase if the sequence is AA or # transforming all positions to lowercase if the sequence is NA. # The original sequence is modified. # # s = Bio::Sequence::NA.new('atgc') # s.normalize! # --- # *Returns*:: current Bio::Sequence::NA/AA object (modified) def normalize! initialize(self) self end alias seq! normalize! # Add new data to the end of the current sequence. # The original sequence is modified. # # s = Bio::Sequence::NA.new('atgc') # s << 'atgc' # puts s #=> "atgcatgc" # s << s # puts s #=> "atgcatgcatgcatgc" # --- # *Returns*:: current Bio::Sequence::NA/AA object (modified) def concat(*arg) super(self.class.new(*arg)) end def <<(*arg) concat(*arg) end # Create a new sequence by adding to an existing sequence. # The existing sequence is not modified. # # s = Bio::Sequence::NA.new('atgc') # s2 = s + 'atgc' # puts s2 #=> "atgcatgc" # puts s #=> "atgc" # # The new sequence is of the same class as the existing sequence if # the new data was added to an existing sequence, # # puts s2.class == s.class #=> true # # but if an existing sequence is added to a String, the result is a String # # s3 = 'atgc' + s # puts s3.class #=> String # --- # *Returns*:: new Bio::Sequence::NA/AA *or* String object def +(*arg) self.class.new(super(*arg)) end # Returns a new sequence containing the subsequence identified by the # start and end numbers given as parameters. *Important:* Biological # sequence numbering conventions (one-based) rather than ruby's # (zero-based) numbering conventions are used. # # s = Bio::Sequence::NA.new('atggaatga') # puts s.subseq(1,3) #=> "atg" # # Start defaults to 1 and end defaults to the entire existing string, so # subseq called without any parameters simply returns a new sequence # identical to the existing sequence. # # puts s.subseq #=> "atggaatga" # --- # *Arguments*: # * (optional) _s_(start): Integer (default 1) # * (optional) _e_(end): Integer (default current sequence length) # *Returns*:: new Bio::Sequence::NA/AA object def subseq(s = 1, e = self.length) raise "Error: start/end position must be a positive integer" unless s > 0 and e > 0 s -= 1 e -= 1 self[s..e] end # This method steps through a sequences in steps of 'step_size' by # subsequences of 'window_size'. Typically used with a block. # Any remaining sequence at the terminal end will be returned. # # Prints average GC% on each 100bp # # s.window_search(100) do |subseq| # puts subseq.gc # end # # Prints every translated peptide (length 5aa) in the same frame # # s.window_search(15, 3) do |subseq| # puts subseq.translate # end # # Split genome sequence by 10000bp with 1000bp overlap in fasta format # # i = 1 # remainder = s.window_search(10000, 9000) do |subseq| # puts subseq.to_fasta("segment #{i}", 60) # i += 1 # end # puts remainder.to_fasta("segment #{i}", 60) # --- # *Arguments*: # * (required) _window_size_: Fixnum # * (optional) _step_size_: Fixnum (default 1) # *Returns*:: new Bio::Sequence::NA/AA object def window_search(window_size, step_size = 1) last_step = 0 0.step(self.length - window_size, step_size) do |i| yield self[i, window_size] last_step = i end return self[last_step + window_size .. -1] end # Returns a float total value for the sequence given a hash of # base or residue values, # # values = {'a' => 0.1, 't' => 0.2, 'g' => 0.3, 'c' => 0.4} # s = Bio::Sequence::NA.new('atgc') # puts s.total(values) #=> 1.0 # --- # *Arguments*: # * (required) _hash_: Hash object # *Returns*:: Float object def total(hash) hash.default = 0.0 unless hash.default sum = 0.0 self.each_byte do |x| begin sum += hash[x.chr] end end return sum end # Returns a hash of the occurrence counts for each residue or base. # # s = Bio::Sequence::NA.new('atgc') # puts s.composition #=> {"a"=>1, "c"=>1, "g"=>1, "t"=>1} # --- # *Returns*:: Hash object def composition count = Hash.new(0) self.scan(/./) do |x| count[x] += 1 end return count end # Returns a randomized sequence. The default is to retain the same # base/residue composition as the original. If a hash of base/residue # counts is given, the new sequence will be based on that hash # composition. If a block is given, each new randomly selected # position will be passed into the block. In all cases, the # original sequence is not modified. # # s = Bio::Sequence::NA.new('atgc') # puts s.randomize #=> "tcag" (for example) # # new_composition = {'a' => 2, 't' => 2} # puts s.randomize(new_composition) #=> "ttaa" (for example) # # count = 0 # s.randomize { |x| count += 1 } # puts count #=> 4 # --- # *Arguments*: # * (optional) _hash_: Hash object # *Returns*:: new Bio::Sequence::NA/AA object def randomize(hash = nil) if hash tmp = '' hash.each {|k, v| tmp += k * v.to_i } else tmp = self end seq = self.class.new(tmp) # Reference: http://en.wikipedia.org/wiki/Fisher-Yates_shuffle seq.length.downto(2) do |n| k = rand(n) c = seq[n - 1] seq[n - 1] = seq[k] seq[k] = c end if block_given? then (0...seq.length).each do |i| yield seq[i, 1] end return self.class.new('') else return seq end end # Return a new sequence extracted from the original using a GenBank style # position string. See also documentation for the Bio::Location class. # # s = Bio::Sequence::NA.new('atgcatgcatgcatgc') # puts s.splice('1..3') #=> "atg" # puts s.splice('join(1..3,8..10)') #=> "atgcat" # puts s.splice('complement(1..3)') #=> "cat" # puts s.splice('complement(join(1..3,8..10))') #=> "atgcat" # # Note that 'complement'ed Genbank position strings will have no # effect on Bio::Sequence::AA objects. # --- # *Arguments*: # * (required) _position_: String *or* Bio::Location object # *Returns*:: Bio::Sequence::NA/AA object def splice(position) unless position.is_a?(Locations) then position = Locations.new(position) end s = '' position.each do |location| if location.sequence s << location.sequence else exon = self.subseq(location.from, location.to) begin exon.complement! if location.strand < 0 rescue NameError end s << exon end end return self.class.new(s) end alias splicing splice #-- # Workaround for Ruby 3.0.0 incompatible changes if ::RUBY_VERSION > "3" # Acts almost the same as String#split. def split(*arg) if block_given? super else ret = super(*arg) ret.collect! { |x| self.class.new('').replace(x) } ret end end %w( * ljust rjust center ).each do |w| module_eval %Q{ def #{w}(*arg) self.class.new('').replace(super) end } end %w( chomp chop delete delete_prefix delete_suffix lstrip rstrip strip reverse squeeze succ next tr tr_s capitalize upcase downcase swapcase ).each do |w| module_eval %Q{ def #{w}(*arg) s = self.dup s.#{w}!(*arg) s end } end %w( sub gsub ).each do |w| module_eval %Q{ def #{w}(*arg, &block) s = self.dup s.#{w}!(*arg, &block) s end } end #Reference: https://nacl-ltd.github.io/2018/11/08/gsub-wrapper.html #(Title: Is it possible to implement gsub wrapper?) %w( sub! gsub! ).each do |w| module_eval %Q{ def #{w}(*arg, &block) if block_given? then super(*arg) do |m| b = Thread.current[:_backref] Thread.current[:_backref] = ::Regexp.last_match block.binding.eval("$~ = Thread.current[:_backref]") Thread.current[:_backref] = b block.call(self.class.new('').replace(m)) end else super end end } end %w( each_char each_grapheme_cluster each_line ).each do |w| module_eval %Q{ def #{w} if block_given? super { |c| yield(self.class.new('').replace(c)) } else enum_for(:#{w}) end end } end %w( slice [] slice! ).each do |w| module_eval %Q{ def #{w}(*arg) r = super r ? self.class.new('').replace(r) : r end } end %w( partition rpartition ).each do |w| module_eval %Q{ def #{w}(sep) r = super if r.kind_of?(Array) r[1] == sep ? [ self.class.new('').replace(r[0]), r[1], self.class.new('').replace(r[2]) ] : r.collect { |x| self.class.new('').replace(x) } else r end end } end #++ end # if ::RUBY_VERSION > "3" end # Common end # Sequence end # Bio bio-2.0.3/lib/bio/sequence/dblink.rb0000644000175000017500000000242014141516614016533 0ustar nileshnilesh# # = bio/sequence/dblink.rb - sequence ID with database name # # Copyright:: Copyright (C) 2008 # Naohisa Goto # License:: The Ruby License # module Bio require 'bio/sequence' unless const_defined?(:Sequence) # Bio::Sequence::DBLink stores IDs with the database name. # Its main purpose is to store database cross-reference information # for a sequence entry. class Sequence::DBLink # creates a new DBLink object def initialize(database, primary_id, *secondary_ids) @database = database @id = primary_id @secondary_ids = secondary_ids end # Database name, or namespace identifier (String). attr_reader :database # Primary identifier (String) attr_reader :id # Secondary identifiers (Array of String) attr_reader :secondary_ids #-- # class methods #++ # Parses DR line in EMBL entry, and returns a DBLink object. def self.parse_embl_DR_line(str) str = str.sub(/\.\s*\z/, '') str.sub!(/\ADR /, '') self.new(*(str.split(/\s*\;\s*/, 3))) end # Parses DR line in UniProt entry, and returns a DBLink object. def self.parse_uniprot_DR_line(str) str = str.sub(/\.\s*\z/, '') str.sub!(/\ADR /, '') self.new(*(str.split(/\s*\;\s*/))) end end #class Sequence::DBLink end #module Bio bio-2.0.3/lib/bio/sequence/format.rb0000644000175000017500000002465414141516614016575 0ustar nileshnilesh# # = bio/sequence/format.rb - various output format of the biological sequence # # Copyright:: Copyright (C) 2006-2008 # Toshiaki Katayama , # Naohisa Goto , # Ryan Raaum , # Jan Aerts # License:: The Ruby License # require 'erb' require 'date' module Bio class Sequence # = DESCRIPTION # A Mixin[http://www.rubycentral.com/book/tut_modules.html] # of methods used by Bio::Sequence#output to output sequences in # common bioinformatic formats. These are not called in isolation. # # = USAGE # # Given a Bio::Sequence object, # puts s.output(:fasta) # puts s.output(:genbank) # puts s.output(:embl) module Format # Repository of generic (or both nucleotide and protein) sequence # formatter classes module Formatter # Raw format generatar autoload :Raw, 'bio/sequence/format_raw' # Fasta format generater autoload :Fasta, 'bio/db/fasta/format_fasta' # NCBI-style Fasta format generatar # (resemble to EMBOSS "ncbi" format) autoload :Fasta_ncbi, 'bio/db/fasta/format_fasta' # FASTQ "fastq-sanger" format generator autoload :Fastq, 'bio/db/fastq/format_fastq' # FASTQ "fastq-sanger" format generator autoload :Fastq_sanger, 'bio/db/fastq/format_fastq' # FASTQ "fastq-solexa" format generator autoload :Fastq_solexa, 'bio/db/fastq/format_fastq' # FASTQ "fastq-illumina" format generator autoload :Fastq_illumina, 'bio/db/fastq/format_fastq' # FastaNumericFormat format generator autoload :Fasta_numeric, 'bio/db/fasta/format_qual' # Qual format generator. # Its format is the same as Fasta_numeric, but it would perform # to convert quality score or generates scores from error probability. autoload :Qual, 'bio/db/fasta/format_qual' end #module Formatter # Repository of nucleotide sequence formatter classes module NucFormatter # GenBank format generater # Note that the name is 'Genbank' and NOT 'GenBank' autoload :Genbank, 'bio/db/genbank/format_genbank' # EMBL format generater # Note that the name is 'Embl' and NOT 'EMBL' autoload :Embl, 'bio/db/embl/format_embl' end #module NucFormatter # Repository of protein sequence formatter classes module AminoFormatter # currently no formats available end #module AminoFormatter # Formatter base class. # Any formatter class should inherit this class. class FormatterBase # Returns a formatterd string of the given sequence # --- # *Arguments*: # * (required) _sequence_: Bio::Sequence object # * (optional) _options_: a Hash object # *Returns*:: String object def self.output(sequence, options = {}) self.new(sequence, options).output end # register new Erb template def self.erb_template(str) erb = ERB.new(str) erb.def_method(self, 'output') true end private_class_method :erb_template # generates output data # --- # *Returns*:: String object def output raise NotImplementedError, 'should be implemented in subclass' end # creates a new formatter object for output def initialize(sequence, options = {}) @sequence = sequence @options = options end private # any unknown methods are delegated to the sequence object def method_missing(sym, *args, &block) #:nodoc: begin @sequence.__send__(sym, *args, &block) rescue NoMethodError => evar lineno = __LINE__ - 2 file = __FILE__ bt_here = [ "#{file}:#{lineno}:in \`__send__\'", "#{file}:#{lineno}:in \`method_missing\'" ] if bt_here == evar.backtrace[0, 2] then bt = evar.backtrace[2..-1] evar = evar.class.new("undefined method \`#{sym.to_s}\' for #{self.inspect}") evar.set_backtrace(bt) end raise(evar) end end end #class FormatterBase # Using Bio::Sequence::Format, return a String with the Bio::Sequence # object formatted in the given style. # # Formats currently implemented are: 'fasta', 'genbank', and 'embl' # # s = Bio::Sequence.new('atgc') # puts s.output(:fasta) #=> "> \natgc\n" # # The style argument is given as a Ruby # Symbol(http://www.ruby-doc.org/core/classes/Symbol.html) # --- # *Arguments*: # * (required) _format_: :fasta, :genbank, *or* :embl # *Returns*:: String object def output(format = :fasta, options = {}) formatter_const = format.to_s.capitalize.intern formatter_class = nil get_formatter_repositories.each do |mod| begin formatter_class = mod.const_get(formatter_const) rescue NameError end break if formatter_class end unless formatter_class then raise "unknown format name #{format.inspect}" end formatter_class.output(self, options) end # Returns a list of available output formats for the sequence # --- # *Arguments*: # *Returns*:: Array of Symbols def list_output_formats a = get_formatter_repositories.collect { |mod| mod.constants } a.flatten! a.collect! { |x| x.to_s.downcase.intern } a end # The same as output(:fasta, :header=>definition, :width=>width) # This method is intended to replace Bio::Sequence#to_fasta. # # s = Bio::Sequence.new('atgc') # puts s.output_fasta #=> "> \natgc\n" # --- # *Arguments*: # * (optional) _definition_: (String) definition line # * (optional) _width_: (Integer) width (default 70) # *Returns*:: String object def output_fasta(definition = nil, width = 70) output(:fasta, :header=> definition, :width => width) end private # returns formatter repository modules def get_formatter_repositories if self.moltype == Bio::Sequence::NA then [ NucFormatter, Formatter ] elsif self.moltype == Bio::Sequence::AA then [ AminoFormatter, Formatter ] else [ NucFormatter, AminoFormatter, Formatter ] end end #--- # Not yet implemented :) # Remove the nodoc command after implementation! # --- # *Returns*:: String object #def format_gff #:nodoc: # raise NotImplementedError #end #+++ # Formatting helper methods for INSD (NCBI, EMBL, DDBJ) feature table module INSDFeatureHelper private # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. (And in any # case, it would be difficult to successfully call this method outside # its expected context). # # Output the Genbank feature format string of the sequence. # Used in Bio::Sequence#output. # --- # *Returns*:: String object def format_features_genbank(features) prefix = ' ' * 5 indent = prefix + ' ' * 16 fwidth = 79 - indent.length format_features(features, prefix, indent, fwidth) end # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. (And in any # case, it would be difficult to successfully call this method outside # its expected context). # # Output the EMBL feature format string of the sequence. # Used in Bio::Sequence#output. # --- # *Returns*:: String object def format_features_embl(features) prefix = 'FT ' indent = prefix + ' ' * 16 fwidth = 80 - indent.length format_features(features, prefix, indent, fwidth) end # format INSD featurs def format_features(features, prefix, indent, width) result = [] features.each do |feature| result.push format_feature(feature, prefix, indent, width) end return result.join('') end # format an INSD feature def format_feature(feature, prefix, indent, width) result = prefix + sprintf("%-16s", feature.feature) position = feature.position #position = feature.locations.to_s result << wrap_and_split_lines(position, width).join("\n" + indent) result << "\n" result << format_qualifiers(feature.qualifiers, indent, width) return result end # format qualifiers def format_qualifiers(qualifiers, indent, width) qualifiers.collect do |qualifier| q = qualifier.qualifier v = qualifier.value.to_s if v == true lines = wrap_with_newline('/' + q, width) elsif q == 'translation' lines = fold("/#{q}=\"#{v}\"", width) else if v[/\D/] or q == 'chromosome' #v.delete!("\x00-\x1f\x7f-\xff") v.gsub!(/"/, '""') v = '"' + v + '"' end lines = wrap_with_newline('/' + q + '=' + v, width) end lines.gsub!(/^/, indent) lines end.join end def fold(str, width) str.gsub(Regexp.new("(.{1,#{width}})"), "\\1\n") end def fold_and_split_lines(str, width) str.scan(Regexp.new(".{1,#{width}}")) end def wrap_and_split_lines(str, width) result = [] lefts = str.chomp.split(/(?:\r\n|\r|\n)/) lefts.each do |left| left.rstrip! while left and left.length > width line = nil width.downto(1) do |i| if left[i..i] == ' ' or /[\,\;]/ =~ left[(i-1)..(i-1)] then line = left[0..(i-1)].sub(/ +\z/, '') left = left[i..-1].sub(/\A +/, '') break end end if line.nil? then line = left[0..(width-1)] left = left[width..-1] end result << line left = nil if left.to_s.empty? end result << left if left end return result end def wrap_with_newline(str, width) result = wrap_and_split_lines(str, width) result_string = result.join("\n") result_string << "\n" unless result_string.empty? return result_string end def wrap(str, width = 80, prefix = '') actual_width = width - prefix.length result = wrap_and_split_lines(str, actual_width) result_string = result.join("\n#{prefix}") result_string = prefix + result_string unless result_string.empty? return result_string end #-- # internal use only MonthStr = [ nil, 'JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC' ].collect { |x| x.freeze }.freeze #++ # formats a date from Date, DateTime, or Time object, or String. def format_date(d) begin yy = d.year mm = d.month dd = d.day rescue NoMethodError, NameError, ArgumentError, TypeError return sprintf("%-11s", d) end sprintf("%02d-%-3s-%04d", dd, MonthStr[mm], yy) end # null date def null_date Date.new(0, 1, 1) end end #module INSDFeatureHelper end #module Format end #class Sequence end #module Bio bio-2.0.3/lib/bio/sequence/adapter.rb0000644000175000017500000000731414141516614016717 0ustar nileshnilesh# # = bio/sequence/adapter.rb - Bio::Sequence adapter helper module # # Copyright:: Copyright (C) 2008 # Naohisa Goto , # License:: The Ruby License # module Bio require 'bio/sequence' unless const_defined?(:Sequence) # Internal use only. Normal users should not use this module. # # Helper methods for defining adapters used when converting data classes to # Bio::Sequence class, with pseudo lazy evaluation and pseudo memoization. # # This module is used by using "extend", not "include". # module Sequence::Adapter autoload :GenBank, 'bio/db/genbank/genbank_to_biosequence' autoload :EMBL, 'bio/db/embl/embl_to_biosequence' autoload :FastaFormat, 'bio/db/fasta/fasta_to_biosequence' autoload :FastaNumericFormat, 'bio/db/fasta/qual_to_biosequence' autoload :BioSQL, 'bio/db/biosql/biosql_to_biosequence' autoload :SangerChromatogram, 'bio/db/sanger_chromatogram/chromatogram_to_biosequence' autoload :Fastq, 'bio/db/fastq/fastq_to_biosequence' private # Defines a reader attribute method with psudo lazy evaluation/memoization. # # It defines a method name like attr_reader, but at the first time # when the method name is called, it acts as follows: # When instance variable @name is not defined, # calls __get__name(@source_data) and stores the returned # value to @name, and changes its behavior to the same as # attr_reader :name. # When instance variable @name is already defined, # its behavior is changed to the same as # attr_reader :name. # When the object is frozen, storing to the instance variable and # changing methods behavior do not occur, and the value of # __get__name(@source_data) is returned. # # Note that it assumes that the source data object is stored in # @source_data instance variable. def attr_reader_lazy(name) #$stderr.puts "attr_reader_lazy :#{name}" varname = "@#{name}".intern methodname = "__get__#{name}".intern # module to reset method's behavior to normal attr_reader reset = "Attr_#{name}".intern const_set(reset, Module.new { attr_reader name }) reset_module_name = "#{self}::#{reset}" # define attr method module_eval <<__END_OF_DEF__ def #{name} unless defined? #{varname} then #$stderr.puts "LAZY #{name}: calling #{methodname}" val = #{methodname}(@source_data) #{varname} = val unless frozen? else val = #{varname} end unless frozen? then #$stderr.puts "LAZY #{name}: finalize: attr_reader :#{name}" self.extend(#{reset_module_name}) end val end __END_OF_DEF__ end # Defines a Bio::Sequence to Bio::* adapter method with # psudo lazy evaluation and psudo memoization. # # Without block, defines a private method __get__name(orig) # which calls source_method for @source_data. # # def__get__(name, source_method) is the same as: # def __get__name(orig); orig.source_method; end # attr_reader_lazy name # # If block is given, __get__name(orig) is defined # with the block. The @source_data is given as an argument of the block, # i.e. the block must get an argument. # def def_biosequence_adapter(name, source_method = name, &block) methodname = "__get__#{name}".intern if block then define_method(methodname, block) else module_eval <<__END_OF_DEF__ def #{methodname}(orig) orig.#{source_method} end __END_OF_DEF__ end private methodname attr_reader_lazy name true end end #module Sequence::Adapter end #module Bio bio-2.0.3/lib/bio/sequence/format_raw.rb0000644000175000017500000000066214141516614017437 0ustar nileshnilesh# # = bio/sequence/format_raw.rb - Raw sequence formatter # # Copyright:: Copyright (C) 2008 Naohisa Goto # License:: The Ruby License # module Bio::Sequence::Format::Formatter # Raw sequence output formatter class class Raw < Bio::Sequence::Format::FormatterBase # output raw sequence data def output "#{@sequence.seq}" end end #class Raw end #module Bio::Sequence::Format::Formatter bio-2.0.3/lib/bio/sequence/compat.rb0000644000175000017500000000713014141516614016556 0ustar nileshnilesh# # = bio/sequence/compat.rb - methods for backward compatibility # # Copyright:: Copyright (C) 2006 # Toshiaki Katayama , # Ryan Raaum # License:: The Ruby License # module Bio require 'bio/sequence' unless const_defined?(:Sequence) class Sequence # Return sequence as # String[http://corelib.rubyonrails.org/classes/String.html]. # The original sequence is unchanged. # # seq = Bio::Sequence.new('atgc') # puts s.to_s #=> 'atgc' # puts s.to_s.class #=> String # puts s #=> 'atgc' # puts s.class #=> Bio::Sequence # --- # *Returns*:: String object def to_s String.new(self.seq) end alias to_str to_s module Common # Bio::Sequence#to_fasta is DEPRECATED # Do not use Bio::Sequence#to_fasta ! Use Bio::Sequence#output instead. # Note that Bio::Sequence::NA#to_fasta, Bio::Sequence::AA#to_fasata, # and Bio::Sequence::Generic#to_fasta can still be used, # because there are no alternative methods. # # Output the FASTA format string of the sequence. The 1st argument is # used as the comment string. If the 2nd option is given, the output # sequence will be folded. # --- # *Arguments*: # * (optional) _header_: String object # * (optional) _width_: Fixnum object (default nil) # *Returns*:: String def to_fasta(header = '', width = nil) warn "Bio::Sequence#to_fasta is obsolete. Use Bio::Sequence#output(:fasta) instead" if $DEBUG ">#{header}\n" + if width self.to_s.gsub(Regexp.new(".{1,#{width}}"), "\\0\n") else self.to_s + "\n" end end end # Common class NA # Generate a new random sequence with the given frequency of bases. # The sequence length is determined by their cumulative sum. # (See also Bio::Sequence::Common#randomize which creates a new # randomized sequence object using the base composition of an existing # sequence instance). # # counts = {'a'=>1,'c'=>2,'g'=>3,'t'=>4} # puts Bio::Sequence::NA.randomize(counts) #=> "ggcttgttac" (for example) # # You may also feed the output of randomize into a block # # actual_counts = {'a'=>0, 'c'=>0, 'g'=>0, 't'=>0} # Bio::Sequence::NA.randomize(counts) {|x| actual_counts[x] += 1} # actual_counts #=> {"a"=>1, "c"=>2, "g"=>3, "t"=>4} # --- # *Arguments*: # * (optional) _hash_: Hash object # *Returns*:: Bio::Sequence::NA object def self.randomize(*arg, &block) self.new('').randomize(*arg, &block) end def pikachu #:nodoc: self.dna.tr("atgc", "pika") # joke, of course :-) end end # NA class AA # Generate a new random sequence with the given frequency of bases. # The sequence length is determined by their cumulative sum. # (See also Bio::Sequence::Common#randomize which creates a new # randomized sequence object using the base composition of an existing # sequence instance). # # counts = {'R'=>1,'L'=>2,'E'=>3,'A'=>4} # puts Bio::Sequence::AA.randomize(counts) #=> "AAEAELALRE" (for example) # # You may also feed the output of randomize into a block # # actual_counts = {'R'=>0,'L'=>0,'E'=>0,'A'=>0} # Bio::Sequence::AA.randomize(counts) {|x| actual_counts[x] += 1} # actual_counts #=> {"A"=>4, "L"=>2, "E"=>3, "R"=>1} # --- # *Arguments*: # * (optional) _hash_: Hash object # *Returns*:: Bio::Sequence::AA object def self.randomize(*arg, &block) self.new('').randomize(*arg, &block) end end # AA end # Sequence end # Bio bio-2.0.3/lib/bio/sequence/quality_score.rb0000644000175000017500000001353114141516614020160 0ustar nileshnilesh# # = bio/sequence/quality_score.rb - Sequence quality score manipulation modules # # Copyright:: Copyright (C) 2009 # Naohisa Goto # License:: The Ruby License # # == Description # # Sequence quality score manipulation modules, mainly used by Bio::Fastq # and related classes. # # == References # # * FASTQ format specification # http://maq.sourceforge.net/fastq.shtml # module Bio require 'bio/sequence' unless const_defined?(:Sequence) class Sequence # Bio::Sequence::QualityScore is a name space for quality score modules. # BioRuby internal use only (mainly from Bio::Fastq). module QualityScore # Converter methods between PHRED and Solexa quality scores. module Converter # Converts PHRED scores to Solexa scores. # # The values may be truncated or incorrect if overflows/underflows # occurred during the calculation. # --- # *Arguments*: # * (required) _scores_: (Array containing Integer) quality scores # *Returns*:: (Array containing Integer) quality scores def convert_scores_from_phred_to_solexa(scores) sc = scores.collect do |q| t = 10 ** (q / 10.0) - 1 t = Float::MIN if t < Float::MIN r = 10 * Math.log10(t) r.finite? ? r.round : r end sc end # Converts Solexa scores to PHRED scores. # # The values may be truncated if overflows/underflows occurred # during the calculation. # --- # *Arguments*: # * (required) _scores_: (Array containing Integer) quality scores # *Returns*:: (Array containing Integer) quality scores def convert_scores_from_solexa_to_phred(scores) sc = scores.collect do |q| r = 10 * Math.log10(10 ** (q / 10.0) + 1) r.finite? ? r.round : r end sc end # Does nothing and simply returns the given argument. # # --- # *Arguments*: # * (required) _scores_: (Array containing Integer) quality scores # *Returns*:: (Array containing Integer) quality scores def convert_nothing(scores) scores end end #module Converter # Bio::Sequence::QualityScore::Phred is a module having quality calculation # methods for the PHRED quality score. # # BioRuby internal use only (mainly from Bio::Fastq). module Phred include Converter # Type of quality scores. # --- # *Returns*:: (Symbol) the type of quality score. def quality_score_type :phred end # PHRED score to probability conversion. # --- # *Arguments*: # * (required) _scores_: (Array containing Integer) scores # *Returns*:: (Array containing Float) probabilities (0<=p<=1) def phred_q2p(scores) scores.collect do |q| r = 10 ** (- q / 10.0) if r > 1.0 then r = 1.0 #elsif r < 0.0 then # r = 0.0 end r end end alias q2p phred_q2p module_function :q2p public :q2p # Probability to PHRED score conversion. # # The values may be truncated or incorrect if overflows/underflows # occurred during the calculation. # --- # *Arguments*: # * (required) _probabilities_: (Array containing Float) probabilities # *Returns*:: (Array containing Float) scores def phred_p2q(probabilities) probabilities.collect do |p| p = Float::MIN if p < Float::MIN q = -10 * Math.log10(p) q.finite? ? q.round : q end end alias p2q phred_p2q module_function :p2q public :p2q alias convert_scores_from_phred convert_nothing alias convert_scores_to_phred convert_nothing alias convert_scores_from_solexa convert_scores_from_solexa_to_phred alias convert_scores_to_solexa convert_scores_from_phred_to_solexa module_function :convert_scores_to_solexa public :convert_scores_to_solexa end #module Phred # Bio::Sequence::QualityScore::Solexa is a module having quality # calculation methods for the Solexa quality score. # # BioRuby internal use only (mainly from Bio::Fastq). module Solexa include Converter # Type of quality scores. # --- # *Returns*:: (Symbol) the type of quality score. def quality_score_type :solexa end # Solexa score to probability conversion. # --- # *Arguments*: # * (required) _scores_: (Array containing Integer) scores # *Returns*:: (Array containing Float) probabilities def solexa_q2p(scores) scores.collect do |q| t = 10 ** (- q / 10.0) t /= (1.0 + t) if t > 1.0 then t = 1.0 #elsif t < 0.0 then # t = 0.0 end t end end alias q2p solexa_q2p module_function :q2p public :q2p # Probability to Solexa score conversion. # --- # *Arguments*: # * (required) _probabilities_: (Array containing Float) probabilities # *Returns*:: (Array containing Float) scores def solexa_p2q(probabilities) probabilities.collect do |p| t = p / (1.0 - p) t = Float::MIN if t < Float::MIN q = -10 * Math.log10(t) q.finite? ? q.round : q end end alias p2q solexa_p2q module_function :p2q public :p2q alias convert_scores_from_solexa convert_nothing alias convert_scores_to_solexa convert_nothing alias convert_scores_from_phred convert_scores_from_phred_to_solexa alias convert_scores_to_phred convert_scores_from_solexa_to_phred module_function :convert_scores_to_phred public :convert_scores_to_phred end #module Solexa end #module QualityScore end #class Sequence end #module Bio bio-2.0.3/lib/bio/sequence/na.rb0000644000175000017500000003733514141516614015703 0ustar nileshnilesh# # = bio/sequence/na.rb - nucleic acid sequence class # # Copyright:: Copyright (C) 2006 # Toshiaki Katayama , # Ryan Raaum # License:: The Ruby License # module Bio autoload :NucleicAcid, 'bio/data/na' unless const_defined?(:NucleicAcid) autoload :CodonTable, 'bio/data/codontable' unless const_defined?(:CodonTable) require 'bio/sequence' unless const_defined?(:Sequence) class Sequence # = DESCRIPTION # Bio::Sequence::NA represents a bare Nucleic Acid sequence in bioruby. # # = USAGE # # Create a Nucleic Acid sequence. # dna = Bio::Sequence.auto('atgcatgcATGCATGCAAAA') # rna = Bio::Sequence.auto('augcaugcaugcaugcaaaa') # # # What are the names of all the bases? # puts dna.names # puts rna.names # # # What is the GC percentage? # puts dna.gc_percent # puts rna.gc_percent # # # What is the molecular weight? # puts dna.molecular_weight # puts rna.molecular_weight # # # What is the reverse complement? # puts dna.reverse_complement # puts dna.complement # # # Is this sequence DNA or RNA? # puts dna.rna? # # # Translate my sequence (see method docs for many options) # puts dna.translate # puts rna.translate class NA < String include Bio::Sequence::Common # Generate an nucleic acid sequence object from a string. # # s = Bio::Sequence::NA.new("aagcttggaccgttgaagt") # # or maybe (if you have an nucleic acid sequence in a file) # # s = Bio::Sequence:NA.new(File.open('dna.txt').read) # # Nucleic Acid sequences are *always* all lowercase in bioruby # # s = Bio::Sequence::NA.new("AAGcTtGG") # puts s #=> "aagcttgg" # # Whitespace is stripped from the sequence # # seq = Bio::Sequence::NA.new("atg\nggg\ttt\r gc") # puts s #=> "atggggttgc" # --- # *Arguments*: # * (required) _str_: String # *Returns*:: Bio::Sequence::NA object def initialize(str) super self.downcase! self.tr!(" \t\n\r",'') end # Alias of Bio::Sequence::Common splice method, documented there. def splicing(position) #:nodoc: mRNA = super if mRNA.rna? mRNA.tr!('t', 'u') else mRNA.tr!('u', 't') end mRNA end # Returns a new complementary sequence object (without reversing). # The original sequence object is not modified. # # s = Bio::Sequence::NA.new('atgc') # puts s.forward_complement #=> 'tacg' # puts s #=> 'atgc' # --- # *Returns*:: new Bio::Sequence::NA object def forward_complement s = self.class.new(self) s.forward_complement! s end # Converts the current sequence into its complement (without reversing). # The original sequence object is modified. # # seq = Bio::Sequence::NA.new('atgc') # puts s.forward_complement! #=> 'tacg' # puts s #=> 'tacg' # --- # *Returns*:: current Bio::Sequence::NA object (modified) def forward_complement! if self.rna? self.tr!('augcrymkdhvbswn', 'uacgyrkmhdbvswn') else self.tr!('atgcrymkdhvbswn', 'tacgyrkmhdbvswn') end self end # Returns a new sequence object with the reverse complement # sequence to the original. The original sequence is not modified. # # s = Bio::Sequence::NA.new('atgc') # puts s.reverse_complement #=> 'gcat' # puts s #=> 'atgc' # --- # *Returns*:: new Bio::Sequence::NA object def reverse_complement s = self.class.new(self) s.reverse_complement! s end # Converts the original sequence into its reverse complement. # The original sequence is modified. # # s = Bio::Sequence::NA.new('atgc') # puts s.reverse_complement #=> 'gcat' # puts s #=> 'gcat' # --- # *Returns*:: current Bio::Sequence::NA object (modified) def reverse_complement! self.reverse! self.forward_complement! end # Alias for Bio::Sequence::NA#reverse_complement alias complement reverse_complement # Alias for Bio::Sequence::NA#reverse_complement! alias complement! reverse_complement! # Translate into an amino acid sequence. # # s = Bio::Sequence::NA.new('atggcgtga') # puts s.translate #=> "MA*" # # By default, translate starts in reading frame position 1, but you # can start in either 2 or 3 as well, # # puts s.translate(2) #=> "WR" # puts s.translate(3) #=> "GV" # # You may also translate the reverse complement in one step by using frame # values of -1, -2, and -3 (or 4, 5, and 6) # # puts s.translate(-1) #=> "SRH" # puts s.translate(4) #=> "SRH" # puts s.reverse_complement.translate(1) #=> "SRH" # # The default codon table in the translate function is the Standard # Eukaryotic codon table. The translate function takes either a # number or a Bio::CodonTable object for its table argument. # The available tables are # (NCBI[http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=t]): # # 1. "Standard (Eukaryote)" # 2. "Vertebrate Mitochondrial" # 3. "Yeast Mitochondorial" # 4. "Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma" # 5. "Invertebrate Mitochondrial" # 6. "Ciliate Macronuclear and Dasycladacean" # 9. "Echinoderm Mitochondrial" # 10. "Euplotid Nuclear" # 11. "Bacteria" # 12. "Alternative Yeast Nuclear" # 13. "Ascidian Mitochondrial" # 14. "Flatworm Mitochondrial" # 15. "Blepharisma Macronuclear" # 16. "Chlorophycean Mitochondrial" # 21. "Trematode Mitochondrial" # 22. "Scenedesmus obliquus mitochondrial" # 23. "Thraustochytrium Mitochondrial" # # If you are using anything other than the default table, you must specify # frame in the translate method call, # # puts s.translate #=> "MA*" (using defaults) # puts s.translate(1,1) #=> "MA*" (same as above, but explicit) # puts s.translate(1,2) #=> "MAW" (different codon table) # # and using a Bio::CodonTable instance in the translate method call, # # mt_table = Bio::CodonTable[2] # puts s.translate(1, mt_table) #=> "MAW" # # By default, any invalid or unknown codons (as could happen if the # sequence contains ambiguities) will be represented by 'X' in the # translated sequence. # You may change this to any character of your choice. # # s = Bio::Sequence::NA.new('atgcNNtga') # puts s.translate #=> "MX*" # puts s.translate(1,1,'9') #=> "M9*" # # The translate method considers gaps to be unknown characters and treats # them as such (i.e. does not collapse sequences prior to translation), so # # s = Bio::Sequence::NA.new('atgc--tga') # puts s.translate #=> "MX*" # --- # *Arguments*: # * (optional) _frame_: one of 1,2,3,4,5,6,-1,-2,-3 (default 1) # * (optional) _table_: Fixnum in range 1,23 or Bio::CodonTable object # (default 1) # * (optional) _unknown_: Character (default 'X') # *Returns*:: Bio::Sequence::AA object def translate(frame = 1, table = 1, unknown = 'X') if table.is_a?(Bio::CodonTable) ct = table else ct = Bio::CodonTable[table] end naseq = self.dna case frame when 1, 2, 3 from = frame - 1 when 4, 5, 6 from = frame - 4 naseq.complement! when -1, -2, -3 from = -1 - frame naseq.complement! else from = 0 end nalen = naseq.length - from nalen -= nalen % 3 aaseq = naseq[from, nalen].gsub(/.{3}/) {|codon| ct[codon] or unknown} return Bio::Sequence::AA.new(aaseq) end # Returns counts of each codon in the sequence in a hash. # # s = Bio::Sequence::NA.new('atggcgtga') # puts s.codon_usage #=> {"gcg"=>1, "tga"=>1, "atg"=>1} # # This method does not validate codons! Any three letter group is a 'codon'. So, # # s = Bio::Sequence::NA.new('atggNNtga') # puts s.codon_usage #=> {"tga"=>1, "gnn"=>1, "atg"=>1} # # seq = Bio::Sequence::NA.new('atgg--tga') # puts s.codon_usage #=> {"tga"=>1, "g--"=>1, "atg"=>1} # # Also, there is no option to work in any frame other than the first. # --- # *Returns*:: Hash object def codon_usage hash = Hash.new(0) self.window_search(3, 3) do |codon| hash[codon] += 1 end return hash end # Calculate the ratio of GC / ATGC bases as a percentage rounded to # the nearest whole number. U is regarded as T. # # s = Bio::Sequence::NA.new('atggcgtga') # puts s.gc_percent #=> 55 # # Note that this method only returns an integer value. # When more digits after decimal points are needed, # use gc_content and sprintf like below: # # s = Bio::Sequence::NA.new('atggcgtga') # puts sprintf("%3.2f", s.gc_content * 100) #=> "55.56" # # --- # *Returns*:: Fixnum def gc_percent count = self.composition at = count['a'] + count['t'] + count['u'] gc = count['g'] + count['c'] return 0 if at + gc == 0 gc = 100 * gc / (at + gc) return gc end # Calculate the ratio of GC / ATGC bases. U is regarded as T. # # s = Bio::Sequence::NA.new('atggcgtga') # puts s.gc_content #=> (5/9) # puts s.gc_content.to_f #=> 0.5555555555555556 # # In older Ruby versions, Float is always returned. # # s = Bio::Sequence::NA.new('atggcgtga') # puts s.gc_content #=> 0.555555555555556 # # Note that "u" is regarded as "t". # If there are no ATGC bases in the sequence, 0.0 is returned. # # --- # *Returns*:: Rational or Float def gc_content count = self.composition at = count['a'] + count['t'] + count['u'] gc = count['g'] + count['c'] total = at + gc return 0.0 if total == 0 return gc.quo(total) end # Calculate the ratio of AT / ATGC bases. U is regarded as T. # # s = Bio::Sequence::NA.new('atggcgtga') # puts s.at_content #=> 4/9 # puts s.at_content.to_f #=> 0.444444444444444 # # In older Ruby versions, Float is always returned. # # s = Bio::Sequence::NA.new('atggcgtga') # puts s.at_content #=> 0.444444444444444 # # Note that "u" is regarded as "t". # If there are no ATGC bases in the sequence, 0.0 is returned. # # --- # *Returns*:: Rational or Float def at_content count = self.composition at = count['a'] + count['t'] + count['u'] gc = count['g'] + count['c'] total = at + gc return 0.0 if total == 0 return at.quo(total) end # Calculate the ratio of (G - C) / (G + C) bases. # # s = Bio::Sequence::NA.new('atggcgtga') # puts s.gc_skew #=> 3/5 # puts s.gc_skew.to_f #=> 0.6 # # In older Ruby versions, Float is always returned. # # s = Bio::Sequence::NA.new('atggcgtga') # puts s.gc_skew #=> 0.6 # # If there are no GC bases in the sequence, 0.0 is returned. # # --- # *Returns*:: Rational or Float def gc_skew count = self.composition g = count['g'] c = count['c'] gc = g + c return 0.0 if gc == 0 return (g - c).quo(gc) end # Calculate the ratio of (A - T) / (A + T) bases. U is regarded as T. # # s = Bio::Sequence::NA.new('atgttgttgttc') # puts s.at_skew #=> (-3/4) # puts s.at_skew.to_f #=> -0.75 # # In older Ruby versions, Float is always returned. # # s = Bio::Sequence::NA.new('atgttgttgttc') # puts s.at_skew #=> -0.75 # # Note that "u" is regarded as "t". # If there are no AT bases in the sequence, 0.0 is returned. # # --- # *Returns*:: Rational or Float def at_skew count = self.composition a = count['a'] t = count['t'] + count['u'] at = a + t return 0.0 if at == 0 return (a - t).quo(at) end # Returns an alphabetically sorted array of any non-standard bases # (other than 'atgcu'). # # s = Bio::Sequence::NA.new('atgStgQccR') # puts s.illegal_bases #=> ["q", "r", "s"] # --- # *Returns*:: Array object def illegal_bases self.scan(/[^atgcu]/).sort.uniq end # Estimate molecular weight (using the values from BioPerl's # SeqStats.pm[http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Tools/SeqStats.html] module). # # s = Bio::Sequence::NA.new('atggcgtga') # puts s.molecular_weight #=> 2841.00708 # # RNA and DNA do not have the same molecular weights, # # s = Bio::Sequence::NA.new('auggcguga') # puts s.molecular_weight #=> 2956.94708 # --- # *Returns*:: Float object def molecular_weight if self.rna? Bio::NucleicAcid.weight(self, true) else Bio::NucleicAcid.weight(self) end end # Create a ruby regular expression instance # (Regexp)[http://corelib.rubyonrails.org/classes/Regexp.html] # # s = Bio::Sequence::NA.new('atggcgtga') # puts s.to_re #=> /atggcgtga/ # --- # *Returns*:: Regexp object def to_re if self.rna? Bio::NucleicAcid.to_re(self.dna, true) else Bio::NucleicAcid.to_re(self) end end # Generate the list of the names of each nucleotide along with the # sequence (full name). Names used in bioruby are found in the # Bio::AminoAcid::NAMES hash. # # s = Bio::Sequence::NA.new('atg') # puts s.names #=> ["Adenine", "Thymine", "Guanine"] # --- # *Returns*:: Array object def names array = [] self.each_byte do |x| array.push(Bio::NucleicAcid.names[x.chr.upcase]) end return array end # Returns a new sequence object with any 'u' bases changed to 't'. # The original sequence is not modified. # # s = Bio::Sequence::NA.new('augc') # puts s.dna #=> 'atgc' # puts s #=> 'augc' # --- # *Returns*:: new Bio::Sequence::NA object def dna self.tr('u', 't') end # Changes any 'u' bases in the original sequence to 't'. # The original sequence is modified. # # s = Bio::Sequence::NA.new('augc') # puts s.dna! #=> 'atgc' # puts s #=> 'atgc' # --- # *Returns*:: current Bio::Sequence::NA object (modified) def dna! self.tr!('u', 't') end # Returns a new sequence object with any 't' bases changed to 'u'. # The original sequence is not modified. # # s = Bio::Sequence::NA.new('atgc') # puts s.dna #=> 'augc' # puts s #=> 'atgc' # --- # *Returns*:: new Bio::Sequence::NA object def rna self.tr('t', 'u') end # Changes any 't' bases in the original sequence to 'u'. # The original sequence is modified. # # s = Bio::Sequence::NA.new('atgc') # puts s.dna! #=> 'augc' # puts s #=> 'augc' # --- # *Returns*:: current Bio::Sequence::NA object (modified) def rna! self.tr!('t', 'u') end def rna? self.index('u') end protected :rna? # Example: # # seq = Bio::Sequence::NA.new('gaattc') # cuts = seq.cut_with_enzyme('EcoRI') # # _or_ # # seq = Bio::Sequence::NA.new('gaattc') # cuts = seq.cut_with_enzyme('g^aattc') # --- # See Bio::RestrictionEnzyme::Analysis.cut def cut_with_enzyme(*args) Bio::RestrictionEnzyme::Analysis.cut(self, *args) end alias cut_with_enzymes cut_with_enzyme end # NA end # Sequence end # Bio bio-2.0.3/lib/bio/sequence/aa.rb0000644000175000017500000000612214141516614015654 0ustar nileshnilesh# # = bio/sequence/aa.rb - amino acid sequence class # # Copyright:: Copyright (C) 2006 # Toshiaki Katayama , # Ryan Raaum # License:: The Ruby License # module Bio autoload :AminoAcid, 'bio/data/aa' unless const_defined?(:AminoAcid) require 'bio/sequence' unless const_defined?(:Sequence) class Sequence # = DESCRIPTION # Bio::Sequence::AA represents a bare Amino Acid sequence in bioruby. # # = USAGE # # Create an Amino Acid sequence. # aa = Bio::Sequence::AA.new('ACDEFGHIKLMNPQRSTVWYU') # # # What are the three-letter codes for all the residues? # puts aa.codes # # # What are the names of all the residues? # puts aa.names # # # What is the molecular weight of this peptide? # puts aa.molecular_weight class AA < String include Bio::Sequence::Common # Generate an amino acid sequence object from a string. # # s = Bio::Sequence::AA.new("RRLEHTFVFLRNFSLMLLRY") # # or maybe (if you have an amino acid sequence in a file) # # s = Bio::Sequence:AA.new(File.open('aa.txt').read) # # Amino Acid sequences are *always* all uppercase in bioruby # # s = Bio::Sequence::AA.new("rrLeHtfV") # puts s #=> "RRLEHTFVF" # # Whitespace is stripped from the sequence # # s = Bio::Sequence::AA.new("RRL\nELA\tRG\r RL") # puts s #=> "RRLELARGRL" # --- # *Arguments*: # * (required) _str_: String # *Returns*:: Bio::Sequence::AA object def initialize(str) super self.upcase! self.tr!(" \t\n\r",'') end # Estimate molecular weight based on # Fasman1976[http://www.genome.ad.jp/dbget-bin/www_bget?aaindex+FASG760101] # # s = Bio::Sequence::AA.new("RRLE") # puts s.molecular_weight #=> 572.655 # --- # *Returns*:: Float object def molecular_weight Bio::AminoAcid.weight(self) end # Create a ruby regular expression instance # (Regexp)[http://corelib.rubyonrails.org/classes/Regexp.html] # # s = Bio::Sequence::AA.new("RRLE") # puts s.to_re #=> /RRLE/ # --- # *Returns*:: Regexp object def to_re Bio::AminoAcid.to_re(self) end # Generate the list of the names of each residue along with the # sequence (3 letters code). Codes used in bioruby are found in the # Bio::AminoAcid::NAMES hash. # # s = Bio::Sequence::AA.new("RRLE") # puts s.codes #=> ["Arg", "Arg", "Leu", "Glu"] # --- # *Returns*:: Array object def codes array = [] self.each_byte do |x| array.push(Bio::AminoAcid.names[x.chr]) end return array end # Generate the list of the names of each residue along with the # sequence (full name). Names used in bioruby are found in the # Bio::AminoAcid::NAMES hash. # # s = Bio::Sequence::AA.new("RRLE") # puts s.names # #=> ["arginine", "arginine", "leucine", "glutamic acid"] # --- # *Returns*:: Array object def names self.codes.map do |x| Bio::AminoAcid.names[x] end end end # AA end # Sequence end # Bio bio-2.0.3/lib/bio/sequence/generic.rb0000644000175000017500000000061414141516614016707 0ustar nileshnilesh# # = bio/sequence/generic.rb - generic sequence class to store an intact string # # Copyright:: Copyright (C) 2006 # Toshiaki Katayama # License:: The Ruby License # module Bio require 'bio/sequence' unless const_defined?(:Sequence) class Sequence class Generic < String #:nodoc: include Bio::Sequence::Common end # Generic end # Sequence end # Bio bio-2.0.3/lib/bio/pathway.rb0000644000175000017500000005407514141516614015152 0ustar nileshnilesh# # = bio/pathway.rb - Binary relations and Graph algorithms # # Copyright: Copyright (C) 2001 # Toshiaki Katayama , # Shuichi Kawashima # License:: The Ruby License # # $Id:$ # require 'matrix' module Bio # Bio::Pathway is a general graph object initially constructed by the # list of the (()) objects. The basic concept of the # Bio::Pathway object is to store a graph as an adjacency list (in the # instance variable @graph), and converting the list into an adjacency # matrix by calling to_matrix method on demand. However, in some # cases, it is convenient to have the original list of the # (())s, Bio::Pathway object also stores the list (as # the instance variable @relations) redundantly. # # Note: you can clear the @relations list by calling clear_relations! # method to reduce the memory usage, and the content of the @relations # can be re-generated from the @graph by to_relations method. class Pathway # Initial graph (adjacency list) generation from the list of Relation. # # Generate Bio::Pathway object from the list of Bio::Relation objects. # If the second argument is true, undirected graph is generated. # # r1 = Bio::Relation.new('a', 'b', 1) # r2 = Bio::Relation.new('a', 'c', 5) # r3 = Bio::Relation.new('b', 'c', 3) # list = [ r1, r2, r3 ] # g = Bio::Pathway.new(list, 'undirected') # def initialize(relations, undirected = false) @undirected = undirected @relations = relations @graph = {} # adjacency list expression of the graph @index = {} # numbering each node in matrix @label = {} # additional information on each node self.to_list # generate adjacency list end # Read-only accessor for the internal list of the Bio::Relation objects attr_reader :relations # Read-only accessor for the adjacency list of the graph. attr_reader :graph # Read-only accessor for the row/column index (@index) of the # adjacency matrix. Contents of the hash @index is created by # calling to_matrix method. attr_reader :index # Accessor for the hash of the label assigned to the each node. You can # label some of the nodes in the graph by passing a hash to the label # and select subgraphs which contain labeled nodes only by subgraph method. # # hash = { 1 => 'red', 2 => 'green', 5 => 'black' } # g.label = hash # g.label # g.subgraph # => new graph consists of the node 1, 2, 5 only # attr_accessor :label # Returns true or false respond to the internal state of the graph. def directed? @undirected ? false : true end # Returns true or false respond to the internal state of the graph. def undirected? @undirected ? true : false end # Changes the internal state of the graph from 'undirected' to # 'directed' and re-generate adjacency list. The undirected graph # can be converted to directed graph, however, the edge between two # nodes will be simply doubled to both ends. # # Note: this method can not be used without the list of the # Bio::Relation objects (internally stored in @relations variable). # Thus if you already called clear_relations! method, call # to_relations first. def directed if undirected? @undirected = false self.to_list end end # Changes the internal state of the graph from 'directed' to # 'undirected' and re-generate adjacency list. # # Note: this method can not be used without the list of the # Bio::Relation objects (internally stored in @relations variable). # Thus if you already called clear_relations! method, call # to_relations first. def undirected if directed? @undirected = true self.to_list end end # Clear @relations array to reduce the memory usage. def clear_relations! @relations.clear end # Reconstruct @relations from the adjacency list @graph. def to_relations @relations.clear @graph.each_key do |from| @graph[from].each do |to, w| @relations << Relation.new(from, to, w) end end return @relations end # Graph (adjacency list) generation from the Relations # # Generate the adjcancecy list @graph from @relations (called by # initialize and in some other cases when @relations has been changed). def to_list @graph.clear @relations.each do |rel| append(rel, false) # append to @graph without push to @relations end end # Add an Bio::Relation object 'rel' to the @graph and @relations. # If the second argument is false, @relations is not modified (only # useful when genarating @graph from @relations internally). def append(rel, add_rel = true) @relations.push(rel) if add_rel if @graph[rel.from].nil? @graph[rel.from] = {} end if @graph[rel.to].nil? @graph[rel.to] = {} end @graph[rel.from][rel.to] = rel.relation @graph[rel.to][rel.from] = rel.relation if @undirected end # Remove an edge indicated by the Bio::Relation object 'rel' from the # @graph and the @relations. def delete(rel) @relations.delete_if do |x| x === rel end @graph[rel.from].delete(rel.to) @graph[rel.to].delete(rel.from) if @undirected end # Returns the number of the nodes in the graph. def nodes @graph.keys.length end # Returns the number of the edges in the graph. def edges edges = 0 @graph.each_value do |v| edges += v.size end edges end # Convert adjacency list to adjacency matrix # # Returns the adjacency matrix expression of the graph as a Matrix # object. If the first argument was assigned, the matrix will be # filled with the given value. The second argument indicates the # value of the diagonal constituents of the matrix besides the above. # # The result of this method depends on the order of Hash#each # (and each_key, etc.), which may be variable with Ruby version # and Ruby interpreter variations (JRuby, etc.). # For a workaround to remove such dependency, you can use @index # to set order of Hash keys. Note that this behavior might be # changed in the future. Be careful that @index is overwritten by # this method. # def to_matrix(default_value = nil, diagonal_value = nil) #-- # Note: following code only fills the outer Array with the reference # to the same inner Array object. # # matrix = Array.new(nodes, Array.new(nodes)) # # so create a new Array object for each row as follows: #++ matrix = Array.new nodes.times do matrix.push(Array.new(nodes, default_value)) end if diagonal_value nodes.times do |i| matrix[i][i] = diagonal_value end end # assign index number if @index.empty? then # assign index number for each node @graph.keys.each_with_index do |k, i| @index[k] = i end else # begin workaround removing depencency to order of Hash#each # assign index number from the preset @index indices = @index.to_a indices.sort! { |i0, i1| i0[1] <=> i1[1] } indices.collect! { |i0| i0[0] } @index.clear v = 0 indices.each do |k, i| if @graph[k] and !@index[k] then @index[k] = v; v += 1 end end @graph.each_key do |k| unless @index[k] then @index[k] = v; v += 1 end end # end workaround removing depencency to order of Hash#each end if @relations.empty? # only used after clear_relations! @graph.each do |from, hash| hash.each do |to, relation| x = @index[from] y = @index[to] matrix[x][y] = relation end end else @relations.each do |rel| x = @index[rel.from] y = @index[rel.to] matrix[x][y] = rel.relation matrix[y][x] = rel.relation if @undirected end end Matrix[*matrix] end # Pretty printer of the adjacency matrix. # # The dump_matrix method accepts the same arguments as to_matrix. # Useful when you want to check the internal state of the matrix # (for debug purpose etc.) easily. # # This method internally calls to_matrix method. # Read documents of to_matrix for important informations. # def dump_matrix(*arg) matrix = self.to_matrix(*arg) sorted = @index.sort {|a,b| a[1] <=> b[1]} "[# " + sorted.collect{|x| x[0]}.join(", ") + "\n" + matrix.to_a.collect{|row| ' ' + row.inspect}.join(",\n") + "\n]" end # Pretty printer of the adjacency list. # # Useful when you want to check the internal state of the adjacency # list (for debug purpose etc.) easily. # # The result of this method depends on the order of Hash#each # (and each_key, etc.), which may be variable with Ruby version # and Ruby interpreter variations (JRuby, etc.). # For a workaround to remove such dependency, you can use @index # to set order of Hash keys. Note that this behavior might be # changed in the future. # def dump_list # begin workaround removing depencency to order of Hash#each if @index.empty? then pref = nil enum = @graph else pref = {}.merge(@index) i = pref.values.max @graph.each_key do |node| pref[node] ||= (i += 1) end graph_to_a = @graph.to_a graph_to_a.sort! { |x, y| pref[x[0]] <=> pref[y[0]] } enum = graph_to_a end # end workaround removing depencency to order of Hash#each list = "" enum.each do |from, hash| list << "#{from} => " # begin workaround removing depencency to order of Hash#each if pref then ary = hash.to_a ary.sort! { |x,y| pref[x[0]] <=> pref[y[0]] } hash = ary end # end workaround removing depencency to order of Hash#each a = [] hash.each do |to, relation| a.push("#{to} (#{relation})") end list << a.join(", ") + "\n" end list end # Select labeled nodes and generate subgraph # # This method select some nodes and returns new Bio::Pathway object # consists of selected nodes only. If the list of the nodes (as # Array) is assigned as the argument, use the list to select the # nodes from the graph. If no argument is assigned, internal # property of the graph @label is used to select the nodes. # # hash = { 'a' => 'secret', 'b' => 'important', 'c' => 'important' } # g.label = hash # g.subgraph # list = [ 'a', 'b', 'c' ] # g.subgraph(list) # def subgraph(list = nil) if list @label.clear list.each do |node| @label[node] = true end end sub_graph = Pathway.new([], @undirected) @graph.each do |from, hash| next unless @label[from] sub_graph.graph[from] ||= {} hash.each do |to, relation| next unless @label[to] sub_graph.append(Relation.new(from, to, relation)) end end return sub_graph end # Not implemented yet. def common_subgraph(graph) raise NotImplementedError end # Not implemented yet. def clique raise NotImplementedError end # Returns completeness of the edge density among the surrounded nodes. # # Calculates the value of cliquishness around the 'node'. This value # indicates completeness of the edge density among the surrounded nodes. # # Note: cliquishness (clustering coefficient) for a directed graph # is also calculated. # Reference: http://en.wikipedia.org/wiki/Clustering_coefficient # # Note: Cliquishness (clustering coefficient) for a node that has # only one neighbor node is undefined. Currently, it returns NaN, # but the behavior may be changed in the future. # def cliquishness(node) neighbors = @graph[node].keys sg = subgraph(neighbors) if sg.graph.size != 0 edges = sg.edges nodes = neighbors.size complete = (nodes * (nodes - 1)) return edges.quo(complete) else return 0.0 end end # Returns frequency of the nodes having same number of edges as hash # # Calculates the frequency of the nodes having the same number of edges # and returns the value as Hash. def small_world freq = Hash.new(0) @graph.each_value do |v| freq[v.size] += 1 end return freq end # Breadth first search solves steps and path to the each node and # forms a tree contains all reachable vertices from the root node. # This method returns the result in 2 hashes - 1st one shows the # steps from root node and 2nd hash shows the structure of the tree. # # The weight of the edges are not considered in this method. def breadth_first_search(root) visited = {} distance = {} predecessor = {} visited[root] = true distance[root] = 0 predecessor[root] = nil queue = [ root ] while from = queue.shift next unless @graph[from] @graph[from].each_key do |to| unless visited[to] visited[to] = true distance[to] = distance[from] + 1 predecessor[to] = from queue.push(to) end end end return distance, predecessor end # Alias for the breadth_first_search method. alias bfs breadth_first_search # Calculates the shortest path between two nodes by using # breadth_first_search method and returns steps and the path as Array. def bfs_shortest_path(node1, node2) distance, route = breadth_first_search(node1) step = distance[node2] node = node2 path = [ node2 ] while node != node1 and route[node] node = route[node] path.unshift(node) end return step, path end # Depth first search yields much information about the structure of # the graph especially on the classification of the edges. This # method returns 5 hashes - 1st one shows the timestamps of each # node containing the first discoverd time and the search finished # time in an array. The 2nd, 3rd, 4th, and 5th hashes contain 'tree # edges', 'back edges', 'cross edges', 'forward edges' respectively. # # If $DEBUG is true (e.g. ruby -d), this method prints the progression # of the search. # # The weight of the edges are not considered in this method. # # Note: The result of this method depends on the order of Hash#each # (and each_key, etc.), which may be variable with Ruby version # and Ruby interpreter variations (JRuby, etc.). # For a workaround to remove such dependency, you can use @index # to set order of Hash keys. Note that this bahavior might be # changed in the future. def depth_first_search visited = {} timestamp = {} tree_edges = {} back_edges = {} cross_edges = {} forward_edges = {} count = 0 # begin workaround removing depencency to order of Hash#each if @index.empty? then preference_of_nodes = nil else preference_of_nodes = {}.merge(@index) i = preference_of_nodes.values.max @graph.each_key do |node0| preference_of_nodes[node0] ||= (i += 1) end end # end workaround removing depencency to order of Hash#each dfs_visit = Proc.new { |from| visited[from] = true timestamp[from] = [count += 1] ary = @graph[from].keys # begin workaround removing depencency to order of Hash#each if preference_of_nodes then ary = ary.sort_by { |node0| preference_of_nodes[node0] } end # end workaround removing depencency to order of Hash#each ary.each do |to| if visited[to] if timestamp[to].size > 1 if timestamp[from].first < timestamp[to].first # forward edge (black) p "#{from} -> #{to} : forward edge" if $DEBUG forward_edges[from] = to else # cross edge (black) p "#{from} -> #{to} : cross edge" if $DEBUG cross_edges[from] = to end else # back edge (gray) p "#{from} -> #{to} : back edge" if $DEBUG back_edges[from] = to end else # tree edge (white) p "#{from} -> #{to} : tree edge" if $DEBUG tree_edges[to] = from dfs_visit.call(to) end end timestamp[from].push(count += 1) } ary = @graph.keys # begin workaround removing depencency to order of Hash#each if preference_of_nodes then ary = ary.sort_by { |node0| preference_of_nodes[node0] } end # end workaround removing depencency to order of Hash#each ary.each do |node| unless visited[node] dfs_visit.call(node) end end return timestamp, tree_edges, back_edges, cross_edges, forward_edges end # Alias for the depth_first_search method. alias dfs depth_first_search # Topological sort of the directed acyclic graphs ("dags") by using # depth_first_search. def dfs_topological_sort # sorted by finished time reversely and collect node names only timestamp, = self.depth_first_search timestamp.sort {|a,b| b[1][1] <=> a[1][1]}.collect {|x| x.first } end # Dijkstra method to solve the shortest path problem in the weighted graph. def dijkstra(root) distance, predecessor = initialize_single_source(root) @graph[root].each do |k, v| distance[k] = v predecessor[k] = root end queue = distance.dup queue.delete(root) while queue.size != 0 min = queue.min {|a, b| a[1] <=> b[1]} u = min[0] # extranct a node having minimal distance @graph[u].each do |k, v| # relaxing procedure of root -> 'u' -> 'k' if distance[k] > distance[u] + v distance[k] = distance[u] + v predecessor[k] = u end end queue.delete(u) end return distance, predecessor end # Bellman-Ford method for solving the single-source shortest-paths # problem in the graph in which edge weights can be negative. def bellman_ford(root) distance, predecessor = initialize_single_source(root) (self.nodes - 1).times do @graph.each_key do |u| @graph[u].each do |v, w| # relaxing procedure of root -> 'u' -> 'v' if distance[v] > distance[u] + w distance[v] = distance[u] + w predecessor[v] = u end end end end # negative cyclic loop check @graph.each_key do |u| @graph[u].each do |v, w| if distance[v] > distance[u] + w return false end end end return distance, predecessor end # Floyd-Wardshall alogrithm for solving the all-pairs shortest-paths # problem on a directed graph G = (V, E). def floyd_warshall inf = 1 / 0.0 m = self.to_matrix(inf, 0) d = m.dup n = self.nodes for k in 0 .. n - 1 do for i in 0 .. n - 1 do for j in 0 .. n - 1 do if d[i, j] > d[i, k] + d[k, j] d[i, j] = d[i, k] + d[k, j] end end end end return d end # Alias for the floyd_warshall method. alias floyd floyd_warshall # Kruskal method for finding minimam spaninng trees def kruskal # initialize rel = self.to_relations.sort{|a, b| a <=> b} index = [] for i in 0 .. (rel.size - 1) do for j in (i + 1) .. (rel.size - 1) do if rel[i] == rel[j] index << j end end end index.sort{|x, y| y<=>x}.each do |idx| rel[idx, 1] = [] end mst = [] seen = Hash.new() @graph.each_key do |x| seen[x] = nil end i = 1 # initialize end rel.each do |r| if seen[r.node[0]] == nil seen[r.node[0]] = 0 end if seen[r.node[1]] == nil seen[r.node[1]] = 0 end if seen[r.node[0]] == seen[r.node[1]] && seen[r.node[0]] == 0 mst << r seen[r.node[0]] = i seen[r.node[1]] = i elsif seen[r.node[0]] != seen[r.node[1]] mst << r v1 = seen[r.node[0]].dup v2 = seen[r.node[1]].dup seen.each do |k, v| if v == v1 || v == v2 seen[k] = i end end end i += 1 end return Pathway.new(mst) end private def initialize_single_source(root) inf = 1 / 0.0 # inf.infinite? -> true distance = {} predecessor = {} @graph.each_key do |k| distance[k] = inf predecessor[k] = nil end distance[root] = 0 return distance, predecessor end end # Pathway # Bio::Relation is a simple object storing two nodes and the relation of them. # The nodes and the edge (relation) can be any Ruby object. You can also # compare Bio::Relation objects if the edges have Comparable property. class Relation # Create new binary relation object consists of the two object 'node1' # and 'node2' with the 'edge' object as the relation of them. def initialize(node1, node2, edge) @node = [node1, node2] @edge = edge end attr_accessor :node, :edge # Returns one node. def from @node[0] end # Returns another node. def to @node[1] end def relation @edge end # Used by eql? method def hash @node.sort.push(@edge).hash end # Compare with another Bio::Relation object whether havind same edges # and same nodes. The == method compares Bio::Relation object's id, # however this case equality === method compares the internal property # of the Bio::Relation object. def ===(rel) if self.edge == rel.edge if self.node[0] == rel.node[0] and self.node[1] == rel.node[1] return true elsif self.node[0] == rel.node[1] and self.node[1] == rel.node[0] return true else return false end else return false end end # Method eql? is an alias of the === method and is used with hash method # to make uniq arry of the Bio::Relation objects. # # a1 = Bio::Relation.new('a', 'b', 1) # a2 = Bio::Relation.new('b', 'a', 1) # a3 = Bio::Relation.new('b', 'c', 1) # p [ a1, a2, a3 ].uniq alias eql? === # Used by the each method to compare with another Bio::Relation object. # This method is only usable when the edge objects have the property of # the module Comparable. def <=>(rel) unless self.edge.kind_of? Comparable raise "[Error] edges are not comparable" end if self.edge > rel.edge return 1 elsif self.edge < rel.edge return -1 elsif self.edge == rel.edge return 0 end end end # Relation end # Bio bio-2.0.3/lib/bio/reference.rb0000644000175000017500000004527114141516614015431 0ustar nileshnilesh# # = bio/reference.rb - Journal reference classes # # Copyright:: Copyright (C) 2001, 2006, 2008 # Toshiaki Katayama , # Ryan Raaum , # Jan Aerts # License:: The Ruby License # # $Id:$ # require 'enumerator' module Bio # = DESCRIPTION # # A class for journal reference information. # # = USAGE # # hash = {'authors' => [ "Hoge, J.P.", "Fuga, F.B." ], # 'title' => "Title of the study.", # 'journal' => "Theor. J. Hoge", # 'volume' => 12, # 'issue' => 3, # 'pages' => "123-145", # 'year' => 2001, # 'pubmed' => 12345678, # 'medline' => 98765432, # 'abstract' => "Hoge fuga. ...", # 'url' => "http://example.com", # 'mesh' => [], # 'affiliations' => []} # ref = Bio::Reference.new(hash) # # # Formats in the BiBTeX style. # ref.format("bibtex") # # # Short-cut for Bio::Reference#format("bibtex") # ref.bibtex # class Reference # Author names in an Array, [ "Hoge, J.P.", "Fuga, F.B." ]. attr_reader :authors # String with title of the study attr_reader :title # String with journal name attr_reader :journal # volume number (typically Fixnum) attr_reader :volume # issue number (typically Fixnum) attr_reader :issue # page range (typically String, e.g. "123-145") attr_reader :pages # year of publication (typically Fixnum) attr_reader :year # pubmed identifier (typically Fixnum) attr_reader :pubmed # medline identifier (typically Fixnum) attr_reader :medline # DOI identifier (typically String, e.g. "10.1126/science.1110418") attr_reader :doi # Abstract text in String. attr_reader :abstract # An URL String. attr_reader :url # MeSH terms in an Array. attr_reader :mesh # Affiliations in an Array. attr_reader :affiliations # Sequence number in EMBL/GenBank records attr_reader :embl_gb_record_number # Position in a sequence that this reference refers to attr_reader :sequence_position # Comments for the reference (typically Array of String, or nil) attr_reader :comments # Create a new Bio::Reference object from a Hash of values. # Data is extracted from the values for keys: # # * authors - expected value: Array of Strings # * title - expected value: String # * journal - expected value: String # * volume - expected value: Fixnum or String # * issue - expected value: Fixnum or String # * pages - expected value: String # * year - expected value: Fixnum or String # * pubmed - expected value: Fixnum or String # * medline - expected value: Fixnum or String # * abstract - expected value: String # * url - expected value: String # * mesh - expected value: Array of Strings # * affiliations - expected value: Array of Strings # # # hash = {'authors' => [ "Hoge, J.P.", "Fuga, F.B." ], # 'title' => "Title of the study.", # 'journal' => "Theor. J. Hoge", # 'volume' => 12, # 'issue' => 3, # 'pages' => "123-145", # 'year' => 2001, # 'pubmed' => 12345678, # 'medline' => 98765432, # 'abstract' => "Hoge fuga. ...", # 'url' => "http://example.com", # 'mesh' => [], # 'affiliations' => []} # ref = Bio::Reference.new(hash) # --- # *Arguments*: # * (required) _hash_: Hash # *Returns*:: Bio::Reference object def initialize(hash) @authors = hash['authors'] || [] # [ "Hoge, J.P.", "Fuga, F.B." ] @title = hash['title'] || '' # "Title of the study." @journal = hash['journal'] || '' # "Theor. J. Hoge" @volume = hash['volume'] || '' # 12 @issue = hash['issue'] || '' # 3 @pages = hash['pages'] || '' # 123-145 @year = hash['year'] || '' # 2001 @pubmed = hash['pubmed'] || '' # 12345678 @medline = hash['medline'] || '' # 98765432 @doi = hash['doi'] @abstract = hash['abstract'] || '' @url = hash['url'] @mesh = hash['mesh'] || [] @embl_gb_record_number = hash['embl_gb_record_number'] || nil @sequence_position = hash['sequence_position'] || nil @comments = hash['comments'] @affiliations = hash['affiliations'] || [] end # If _other_ is equal with the self, returns true. # Otherwise, returns false. # --- # *Arguments*: # * (required) _other_: any object # *Returns*:: true or false def ==(other) return true if super(other) return false unless other.instance_of?(self.class) flag = false [ :authors, :title, :journal, :volume, :issue, :pages, :year, :pubmed, :medline, :doi, :abstract, :url, :mesh, :embl_gb_record_number, :sequence_position, :comments, :affiliations ].each do |m| begin flag = (self.__send__(m) == other.__send__(m)) rescue NoMethodError, ArgumentError, NameError flag = false end break unless flag end flag end # Formats the reference in a given style. # # Styles: # 0. nil - general # 1. endnote - Endnote # 2. bibitem - Bibitem (option available) # 3. bibtex - BiBTeX (option available) # 4. rd - rd (option available) # 5. nature - Nature (option available) # 6. science - Science # 7. genome_biol - Genome Biology # 8. genome_res - Genome Research # 9. nar - Nucleic Acids Research # 10. current - Current Biology # 11. trends - Trends in * # 12. cell - Cell Press # # See individual methods for details. Basic usage is: # # # ref is Bio::Reference object # # using simplest possible call (for general style) # puts ref.format # # # output in Nature style # puts ref.format("nature") # alternatively, puts ref.nature # # # output in Nature short style (see Bio::Reference#nature) # puts ref.format("nature",true) # alternatively, puts ref.nature(true) # --- # *Arguments*: # * (optional) _style_: String with style identifier # * (optional) _options_: Options for styles accepting one # *Returns*:: String def format(style = nil, *options) case style when 'embl' return embl when 'endnote' return endnote when 'bibitem' return bibitem(*options) when 'bibtex' return bibtex(*options) when 'rd' return rd(*options) when /^nature$/i return nature(*options) when /^science$/i return science when /^genome\s*_*biol/i return genome_biol when /^genome\s*_*res/i return genome_res when /^nar$/i return nar when /^current/i return current when /^trends/i return trends when /^cell$/i return cell else return general end end # Returns reference formatted in the Endnote style. # # # ref is a Bio::Reference object # puts ref.endnote # # %0 Journal Article # %A Hoge, J.P. # %A Fuga, F.B. # %D 2001 # %T Title of the study. # %J Theor. J. Hoge # %V 12 # %N 3 # %P 123-145 # %M 12345678 # %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12345678 # %X Hoge fuga. ... # --- # *Returns*:: String def endnote lines = [] lines << "%0 Journal Article" @authors.each do |author| lines << "%A #{author}" end lines << "%D #{@year}" unless @year.to_s.empty? lines << "%T #{@title}" unless @title.empty? lines << "%J #{@journal}" unless @journal.empty? lines << "%V #{@volume}" unless @volume.to_s.empty? lines << "%N #{@issue}" unless @issue.to_s.empty? lines << "%P #{@pages}" unless @pages.empty? lines << "%M #{@pubmed}" unless @pubmed.to_s.empty? u = @url.to_s.empty? ? pubmed_url : @url lines << "%U #{u}" unless u.empty? lines << "%X #{@abstract}" unless @abstract.empty? @mesh.each do |term| lines << "%K #{term}" end lines << "%+ #{@affiliations.join(' ')}" unless @affiliations.empty? return lines.join("\n") end # Returns reference formatted in the EMBL style. # # # ref is a Bio::Reference object # puts ref.embl # # RP 1-1859 # RX PUBMED; 1907511. # RA Oxtoby E., Dunn M.A., Pancoro A., Hughes M.A.; # RT "Nucleotide and derived amino acid sequence of the cyanogenic # RT beta-glucosidase (linamarase) from white clover (Trifolium repens L.)"; # RL Plant Mol. Biol. 17(2):209-219(1991). def embl r = self Bio::Sequence::Format::NucFormatter::Embl.new('').instance_eval { reference_format_embl(r) } end # Returns reference formatted in the bibitem style # # # ref is a Bio::Reference object # puts ref.bibitem # # \bibitem{PMID:12345678} # Hoge, J.P., Fuga, F.B. # Title of the study., # {\em Theor. J. Hoge}, 12(3):123--145, 2001. # --- # *Arguments*: # * (optional) _item_: label string (default: "PMID:#{pubmed}"). # *Returns*:: String def bibitem(item = nil) item = "PMID:#{@pubmed}" unless item pages = @pages.sub('-', '--') return <<-"END".enum_for(:each_line).collect {|line| line.strip}.join("\n") \\bibitem{#{item}} #{@authors.join(', ')} #{@title}, {\\em #{@journal}}, #{@volume}(#{@issue}):#{pages}, #{@year}. END end # Returns reference formatted in the BiBTeX style. # # # ref is a Bio::Reference object # puts ref.bibtex # # @article{PMID:12345678, # author = {Hoge, J.P. and Fuga, F.B.}, # title = {Title of the study.}, # journal = {Theor. J. Hoge}, # year = {2001}, # volume = {12}, # number = {3}, # pages = {123--145}, # } # # # using a different section (e.g. "book") # # (but not really configured for anything other than articles) # puts ref.bibtex("book") # # @book{PMID:12345678, # author = {Hoge, J.P. and Fuga, F.B.}, # title = {Title of the study.}, # journal = {Theor. J. Hoge}, # year = {2001}, # volume = {12}, # number = {3}, # pages = {123--145}, # } # --- # *Arguments*: # * (optional) _section_: BiBTeX section as String # * (optional) _label_: Label string cited by LaTeX documents. # Default is "PMID:#{pubmed}". # * (optional) _keywords_: Hash of additional keywords, # e.g. { 'abstract' => 'This is abstract.' }. # You can also override default keywords. # To disable default keywords, specify false as # value, e.g. { 'url' => false, 'year' => false }. # *Returns*:: String def bibtex(section = nil, label = nil, keywords = {}) section = "article" unless section authors = authors_join(' and ', ' and ') thepages = pages.to_s.empty? ? nil : pages.sub(/\-/, '--') unless label then label = "PMID:#{pubmed}" end theurl = if !(url.to_s.empty?) then url elsif pmurl = pubmed_url and !(pmurl.to_s.empty?) then pmurl else nil end hash = { 'author' => authors.empty? ? nil : authors, 'title' => title.to_s.empty? ? nil : title, 'number' => issue.to_s.empty? ? nil : issue, 'pages' => thepages, 'url' => theurl } keys = %w( author title journal year volume number pages url ) keys.each do |k| hash[k] = self.__send__(k.intern) unless hash.has_key?(k) end hash.merge!(keywords) { |k, v1, v2| v2.nil? ? v1 : v2 } bib = [ "@#{section}{#{label}," ] keys.concat((hash.keys - keys).sort) keys.each do |kw| ref = hash[kw] bib.push " #{kw.ljust(12)} = {#{ref}}," if ref end bib.push "}\n" return bib.join("\n") end # Returns reference formatted in a general/generic style. # # # ref is a Bio::Reference object # puts ref.general # # Hoge, J.P., Fuga, F.B. (2001). "Title of the study." Theor. J. Hoge 12:123-145. # --- # *Returns*:: String def general authors = @authors.join(', ') "#{authors} (#{@year}). \"#{@title}\" #{@journal} #{@volume}:#{@pages}." end # Return reference formatted in the RD style. # # # ref is a Bio::Reference object # puts ref.rd # # == Title of the study. # # * Hoge, J.P. and Fuga, F.B. # # * Theor. J. Hoge 2001 12:123-145 [PMID:12345678] # # Hoge fuga. ... # # An optional string argument can be supplied, but does nothing. # --- # *Arguments*: # * (optional) str: String (default nil) # *Returns*:: String def rd(str = nil) @abstract ||= str lines = [] lines << "== " + @title lines << "* " + authors_join(' and ') lines << "* #{@journal} #{@year} #{@volume}:#{@pages} [PMID:#{@pubmed}]" lines << @abstract return lines.join("\n\n") end # Formats in the Nature Publishing Group # (http://www.nature.com) style. # # # ref is a Bio::Reference object # puts ref.nature # # Hoge, J.P. & Fuga, F.B. Title of the study. Theor. J. Hoge 12, 123-145 (2001). # # # optionally, output short version # puts ref.nature(true) # or puts ref.nature(short=true) # # Hoge, J.P. & Fuga, F.B. Theor. J. Hoge 12, 123-145 (2001). # --- # *Arguments*: # * (optional) _short_: Boolean (default false) # *Returns*:: String def nature(short = false) if short if @authors.size > 4 authors = "#{@authors[0]} et al." elsif @authors.size == 1 authors = "#{@authors[0]}" else authors = authors_join(' & ') end "#{authors} #{@journal} #{@volume}, #{@pages} (#{@year})." else authors = authors_join(' & ') "#{authors} #{@title} #{@journal} #{@volume}, #{@pages} (#{@year})." end end # Returns reference formatted in the # Science[http://www.sciencemag.org] style. # # # ref is a Bio::Reference object # puts ref.science # # J.P. Hoge, F.B. Fuga, Theor. J. Hoge 12 123 (2001). # --- # *Returns*:: String def science if @authors.size > 4 authors = rev_name(@authors[0]) + " et al." else authors = @authors.collect {|name| rev_name(name)}.join(', ') end page_from, = @pages.split('-') "#{authors}, #{@journal} #{@volume} #{page_from} (#{@year})." end # Returns reference formatted in the Genome Biology # (http://genomebiology.com) style. # # # ref is a Bio::Reference object # puts ref.genome_biol # # Hoge JP, Fuga FB: Title of the study. Theor J Hoge 2001, 12:123-145. # --- # *Returns*:: String def genome_biol authors = @authors.collect {|name| strip_dots(name)}.join(', ') journal = strip_dots(@journal) "#{authors}: #{@title} #{journal} #{@year}, #{@volume}:#{@pages}." end # Returns reference formatted in the Current Biology # (http://current-biology.com) style. (Same as the Genome Biology style) # # # ref is a Bio::Reference object # puts ref.current # # Hoge JP, Fuga FB: Title of the study. Theor J Hoge 2001, 12:123-145. # --- # *Returns*:: String def current self.genome_biol end # Returns reference formatted in the Genome Research # (http://genome.org) style. # # # ref is a Bio::Reference object # puts ref.genome_res # # Hoge, J.P. and Fuga, F.B. 2001. # Title of the study. Theor. J. Hoge 12: 123-145. # --- # *Returns*:: String def genome_res authors = authors_join(' and ') "#{authors} #{@year}.\n #{@title} #{@journal} #{@volume}: #{@pages}." end # Returns reference formatted in the Nucleic Acids Reseach # (http://nar.oxfordjournals.org) style. # # # ref is a Bio::Reference object # puts ref.nar # # Hoge, J.P. and Fuga, F.B. (2001) Title of the study. Theor. J. Hoge, 12, 123-145. # --- # *Returns*:: String def nar authors = authors_join(' and ') "#{authors} (#{@year}) #{@title} #{@journal}, #{@volume}, #{@pages}." end # Returns reference formatted in the # CELL[http://www.cell.com] Press style. # # # ref is a Bio::Reference object # puts ref.cell # # Hoge, J.P. and Fuga, F.B. (2001). Title of the study. Theor. J. Hoge 12, 123-145. # --- # *Returns*:: String def cell authors = authors_join(' and ') "#{authors} (#{@year}). #{@title} #{@journal} #{@volume}, #{pages}." end # Returns reference formatted in the # TRENDS[http://www.trends.com] style. # # # ref is a Bio::Reference object # puts ref.trends # # Hoge, J.P. and Fuga, F.B. (2001) Title of the study. Theor. J. Hoge 12, 123-145 # --- # *Returns*:: String def trends if @authors.size > 2 authors = "#{@authors[0]} et al." elsif @authors.size == 1 authors = "#{@authors[0]}" else authors = authors_join(' and ') end "#{authors} (#{@year}) #{@title} #{@journal} #{@volume}, #{@pages}" end # Returns a valid URL for pubmed records # # *Returns*:: String def pubmed_url unless @pubmed.to_s.empty? head = "http://www.ncbi.nlm.nih.gov/pubmed" return "#{head}/#{@pubmed}" end '' end private def strip_dots(data) data.tr(',.', '') if data end def authors_join(amp, sep = ', ') authors = @authors.clone if authors.length > 1 last = authors.pop authors = authors.join(sep) + "#{amp}" + last elsif authors.length == 1 authors = authors.pop else authors = "" end end def rev_name(name) if name =~ /,/ name, initial = name.split(/,\s+/) name = "#{initial} #{name}" end return name end end end bio-2.0.3/lib/bio/version.rb0000644000175000017500000000152614141516614015153 0ustar nileshnilesh# # = bio/version.rb - BioRuby version information # # Copyright:: Copyright (C) 2001-2012 # Toshiaki Katayama , # Naohisa Goto # License:: The Ruby License # module Bio # BioRuby version (Array containing Integer) BIORUBY_VERSION = [2, 0, 3].extend(Comparable).freeze # Extra version specifier (String or nil). # Existance of the value indicates development version. # # nil :: Release version. # ".pre :: Pre-release version. # # References: https://guides.rubygems.org/patterns/#prerelease-gems BIORUBY_EXTRA_VERSION = nil #".pre" # Version identifier, including extra version string (String) # Unlike BIORUBY_VERSION, it is not comparable. BIORUBY_VERSION_ID = (BIORUBY_VERSION.join('.') + BIORUBY_EXTRA_VERSION.to_s).freeze end #module Bio bio-2.0.3/lib/bio/db.rb0000644000175000017500000002021614141516614014050 0ustar nileshnilesh# # = bio/db.rb - common API for database parsers # # Copyright:: Copyright (C) 2001, 2002, 2005 # Toshiaki Katayama # License:: The Ruby License # # $Id: db.rb,v 0.38 2007/05/08 17:02:13 nakao Exp $ # # == On-demand parsing and cache # # The flatfile parsers (sub classes of the Bio::DB) split the original entry # into a Hash and store the hash in the @orig instance variable. To parse # in detail is delayed until the method is called which requires a further # parsing of a content of the @orig hash. Fully parsed data is cached in the # another hash, @data, separately. # # == Guide lines for the developers to create an new database class # # --- Bio::DB.new(entry) # # The 'new' method should accept the entire entry in one String and # return the parsed database object. # # --- Bio::DB#entry_id # # Database classes should implement the following methods if appropriate: # # * entry_id # * definition # # Every sub class should define the following constants if appropriate: # # * DELIMITER (RS) # * entry separator of the flatfile of the database. # * RS (= record separator) is an alias for the DELIMITER in short. # # * TAGSIZE # * length of the tag field in the FORTRAN-like format. # # |<- tag ->||<- data ---->| # ENTRY_ID A12345 # DEFINITION Hoge gene of the Pokemonia pikachuae # # === Template of the sub class # # module Bio # class Hoge < DB # # DELIMITER = RS = "\n//\n" # TAGSIZE = 12 # You can omit this line if not needed # # def initialize(entry) # end # # def entry_id # end # # end # class Hoge # end # module Bio # # === Recommended method names for sub classes # # In general, the method name should be in the singular form when returns # a Object (including the case when the Object is a String), and should be # the plural form when returns same Objects in Array. It depends on the # database classes that which form of the method name can be use. # # For example, GenBank has several REFERENCE fields in one entry, so define # Bio::GenBank#references and this method should return an Array of the # Reference objects. On the other hand, MEDLINE has one REFERENCE information # per one entry, so define Bio::MEDLINE#reference method and this should # return a Reference object. # # The method names used in the sub classes should be taken from the following # list if appropriate: # # --- entry_id #=> String # # The entry identifier. # # --- definition #=> String # # The description of the entry. # # --- reference #=> Bio::Reference # --- references #=> Array of Bio::Reference # # The reference field(s) of the entry. # # --- dblink #=> String # --- dblinks #=> Array of String # # The link(s) to the other database entry. # # --- naseq #=> Bio::Sequence::NA # # The DNA/RNA sequence of the entry. # # --- nalen #=> Integer # # The length of the DNA/RNA sequence of the entry. # # --- aaseq #=> Bio::Sequence::AA # # The amino acid sequence of the entry. # # --- aalen #=> Integer # # The length of the amino acid sequence of the entry. # # --- seq #=> Bio::Sequence::NA or Bio::Sequence::AA # # Returns an appropriate sequence object. # # --- position #=> String # # The position of the sequence in the entry or in the genome (depends on # the database). # # --- locations #=> Bio::Locations # # Returns Bio::Locations.new(position). # # --- division #=> String # # The sub division name of the database. # # * Example: # * EST, VRL etc. for GenBank # * PATTERN, RULE etc. for PROSITE # # --- date #=> String # # The date of the entry. # Should we use Date (by ParseDate) instead of String? # # --- gene #=> String # --- genes #=> Array of String # # The name(s) of the gene. # # --- organism #=> String # # The name of the organism. # require 'bio/sequence' require 'bio/reference' require 'bio/feature' module Bio class DB def self.open(filename, *mode, &block) Bio::FlatFile.open(self, filename, *mode, &block) end # Returns an entry identifier as a String. This method must be # implemented in every database classes by overriding this method. def entry_id raise NotImplementedError end # Returns a list of the top level tags of the entry as an Array of String. def tags @orig.keys end # Returns true or false - wether the entry contains the field of the # given tag name. def exists?(tag) @orig.include?(tag) end # Returns an intact field of the tag as a String. def get(tag) @orig[tag] end # Similar to the get method, however, fetch returns the content of the # field without its tag and any extra white spaces stripped. def fetch(tag, skip = 0) field = @orig[tag].split(/\n/, skip + 1).last.to_s truncate(field.gsub(/^.{0,#{@tagsize}}/,'')) end private # Returns a String with successive white spaces are replaced by one # space and stripeed. def truncate(str) str ||= "" return str.gsub(/\s+/, ' ').strip end # Returns a tag name of the field as a String. def tag_get(str) str ||= "" return str[0,@tagsize].strip end # Returns a String of the field without a tag name. def tag_cut(str) str ||= "" str[0,@tagsize] = '' return str end # Returns the content of the field as a String like the fetch method. # Furthermore, field_fetch stores the result in the @data hash. def field_fetch(tag, skip = 0) unless @data[tag] @data[tag] = fetch(tag, skip) end return @data[tag] end # Returns an Array containing each line of the field without a tag. # lines_fetch also stores the result in the @data hash. def lines_fetch(tag) unless @data[tag] list = [] lines = get(tag).split(/\n/) lines.each do |line| data = tag_cut(line) if data[/^\S/] # next sub field list << data else # continued sub field data.strip! if list.last[/\-$/] # folded list[-1] += data else list[-1] += " #{data}" # rest of list end end end @data[tag] = list end @data[tag] end end # class DB # Stores a NCBI style (GenBank, KEGG etc.) entry. class NCBIDB < DB autoload :Common, 'bio/db/genbank/common' # The entire entry is passed as a String. The length of the tag field is # passed as an Integer. Parses the entry roughly by the entry2hash method # and returns a database object. def initialize(entry, tagsize) @tagsize = tagsize @orig = entry2hash(entry.strip) # Hash of the original entry @data = {} # Hash of the parsed entry end private # Splits an entry into an Array of Strings at the level of top tags. def toptag2array(str) sep = "\001" str.gsub(/\n([A-Za-z\/\*])/, "\n#{sep}\\1").split(sep) end # Splits a field into an Array of Strings at the level of sub tags. def subtag2array(str) sep = "\001" str.gsub(/\n(\s{1,#{@tagsize-1}}\S)/, "\n#{sep}\\1").split(sep) end # Returns the contents of the entry as a Hash with the top level tags as # its keys. def entry2hash(entry) hash = Hash.new('') fields = toptag2array(entry) fields.each do |field| tag = tag_get(field) hash[tag] += field end return hash end end # class NCBIDB # Class for KEGG databases. Inherits a NCBIDB class. class KEGGDB < NCBIDB end # Stores an EMBL style (EMBL, TrEMBL, Swiss-Prot etc.) entry. class EMBLDB < DB autoload :Common, 'bio/db/embl/common' # The entire entry is passed as a String. The length of the tag field is # passed as an Integer. Parses the entry roughly by the entry2hash method # and returns a database object. def initialize(entry, tagsize) @tagsize = tagsize @orig = entry2hash(entry.strip) # Hash of the original entry @data = {} # Hash of the parsed entry end private # Returns the contents of the entry as a Hash. def entry2hash(entry) hash = Hash.new { |h,k| h[k] = '' } entry.each_line do |line| tag = tag_get(line) next if tag == 'XX' tag = 'R' if tag =~ /^R./ # Reference lines hash[tag].concat line end return hash end end # class EMBLDB end # module Bio bio-2.0.3/lib/bio/sequence.rb0000644000175000017500000003760114141516614015301 0ustar nileshnilesh# # = bio/sequence.rb - biological sequence class # # Copyright:: Copyright (C) 2000-2006 # Toshiaki Katayama , # Yoshinori K. Okuji , # Naohisa Goto , # Ryan Raaum , # Jan Aerts # License:: The Ruby License # module Bio # = DESCRIPTION # Bio::Sequence objects represent annotated sequences in bioruby. # A Bio::Sequence object is a wrapper around the actual sequence, # represented as either a Bio::Sequence::NA or a Bio::Sequence::AA object. # For most users, this encapsulation will be completely transparent. # Bio::Sequence responds to all methods defined for Bio::Sequence::NA/AA # objects using the same arguments and returning the same values (even though # these methods are not documented specifically for Bio::Sequence). # # = USAGE # # Create a nucleic or amino acid sequence # dna = Bio::Sequence.auto('atgcatgcATGCATGCAAAA') # rna = Bio::Sequence.auto('augcaugcaugcaugcaaaa') # aa = Bio::Sequence.auto('ACDEFGHIKLMNPQRSTVWYU') # # # Print it out # puts dna.to_s # puts aa.to_s # # # Get a subsequence, bioinformatics style (first nucleotide is '1') # puts dna.subseq(2,6) # # # Get a subsequence, informatics style (first nucleotide is '0') # puts dna[2,6] # # # Print in FASTA format # puts dna.output(:fasta) # # # Print all codons # dna.window_search(3,3) do |codon| # puts codon # end # # # Splice or otherwise mangle your sequence # puts dna.splicing("complement(join(1..5,16..20))") # puts rna.splicing("complement(join(1..5,16..20))") # # # Convert a sequence containing ambiguity codes into a # # regular expression you can use for subsequent searching # puts aa.to_re # # # These should speak for themselves # puts dna.complement # puts dna.composition # puts dna.molecular_weight # puts dna.translate # puts dna.gc_percent class Sequence autoload :Common, 'bio/sequence/common' autoload :NA, 'bio/sequence/na' autoload :AA, 'bio/sequence/aa' autoload :Generic, 'bio/sequence/generic' autoload :Format, 'bio/sequence/format' autoload :Adapter, 'bio/sequence/adapter' autoload :QualityScore, 'bio/sequence/quality_score' autoload :SequenceMasker, 'bio/sequence/sequence_masker' #-- # require "bio/sequence/compat.rb" here to avoid circular require and # possible superclass mismatch of AA class #++ require 'bio/sequence/compat' include Format include SequenceMasker # Create a new Bio::Sequence object # # s = Bio::Sequence.new('atgc') # puts s #=> 'atgc' # # Note that this method does not intialize the contained sequence # as any kind of bioruby object, only as a simple string # # puts s.seq.class #=> String # # See Bio::Sequence#na, Bio::Sequence#aa, and Bio::Sequence#auto # for methods to transform the basic String of a just created # Bio::Sequence object to a proper bioruby object # --- # *Arguments*: # * (required) _str_: String or Bio::Sequence::NA/AA object # *Returns*:: Bio::Sequence object def initialize(str) @seq = str end # Pass any unknown method calls to the wrapped sequence object. see # http://www.rubycentral.com/book/ref_c_object.html#Object.method_missing def method_missing(sym, *args, &block) #:nodoc: begin seq.__send__(sym, *args, &block) rescue NoMethodError => evar lineno = __LINE__ - 2 file = __FILE__ bt_here = [ "#{file}:#{lineno}:in \`__send__\'", "#{file}:#{lineno}:in \`method_missing\'" ] if bt_here == evar.backtrace[0, 2] then bt = evar.backtrace[2..-1] evar = evar.class.new("undefined method \`#{sym.to_s}\' for #{self.inspect}") evar.set_backtrace(bt) end #p lineno #p file #p bt_here #p evar.backtrace raise(evar) end end # The sequence identifier (String). For example, for a sequence # of Genbank origin, this is the locus name. # For a sequence of EMBL origin, this is the primary accession number. attr_accessor :entry_id # A String with a description of the sequence (String) attr_accessor :definition # Features (An Array of Bio::Feature objects) attr_accessor :features # References (An Array of Bio::Reference objects) attr_accessor :references # Comments (String or an Array of String) attr_accessor :comments # Keywords (An Array of String) attr_accessor :keywords # Links to other database entries. # (An Array of Bio::Sequence::DBLink objects) attr_accessor :dblinks # Bio::Sequence::NA/AA attr_accessor :moltype # The sequence object, usually Bio::Sequence::NA/AA, # but could be a simple String attr_accessor :seq # Quality scores of the bases/residues in the sequence. # (Array containing Integer, or nil) attr_accessor :quality_scores # The meaning (calculation method) of the quality scores stored in # the quality_scores attribute. # Maybe one of :phred, :solexa, or nil. # # Note that if it is nil, and error_probabilities is empty, # some methods implicitly assumes that it is :phred (PHRED score). attr_accessor :quality_score_type # Error probabilities of the bases/residues in the sequence. # (Array containing Float, or nil) attr_accessor :error_probabilities #--- # Attributes below have been added during BioHackathon2008 #+++ # Version number of the sequence (String or Integer). # Unlike entry_version, sequence_version will be changed # when the submitter of the sequence updates the entry. # Normally, the same entry taken from different databases (EMBL, GenBank, # and DDBJ) may have the same sequence_version. attr_accessor :sequence_version # Topology (String). "circular", "linear", or nil. attr_accessor :topology # Strandedness (String). "single" (single-stranded), # "double" (double-stranded), "mixed" (mixed-stranded), or nil. attr_accessor :strandedness # molecular type (String). "DNA" or "RNA" for nucleotide sequence. attr_accessor :molecule_type # Data Class defined by EMBL (String) # See http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_1 attr_accessor :data_class # Taxonomic Division defined by EMBL/GenBank/DDBJ (String) # See http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_2 attr_accessor :division # Primary accession number (String) attr_accessor :primary_accession # Secondary accession numbers (Array of String) attr_accessor :secondary_accessions # Created date of the sequence entry (Date, DateTime, Time, or String) attr_accessor :date_created # Last modified date of the sequence entry (Date, DateTime, Time, or String) attr_accessor :date_modified # Release information when created (String) attr_accessor :release_created # Release information when last-modified (String) attr_accessor :release_modified # Version of the entry (String or Integer). # Unlike sequence_version, entry_version is a database # maintainer's internal version number. # The version number will be changed when the database maintainer # modifies the entry. # The same enrty in EMBL, GenBank, and DDBJ may have different # entry_version. attr_accessor :entry_version # Organism species (String). For example, "Escherichia coli". attr_accessor :species # Organism classification, taxonomic classification of the source organism. # (Array of String) attr_accessor :classification alias taxonomy classification # (not well supported) Organelle information (String). attr_accessor :organelle # Namespace of the sequence IDs described in entry_id, primary_accession, # and secondary_accessions methods (String). # For example, 'EMBL', 'GenBank', 'DDBJ', 'RefSeq'. attr_accessor :id_namespace # Sequence identifiers which are not described in entry_id, # primary_accession,and secondary_accessions methods # (Array of Bio::Sequence::DBLink objects). # For example, NCBI GI number can be stored. # Note that only identifiers of the entry itself should be stored. # For database cross references, dblinks should be used. attr_accessor :other_seqids # Guess the type of sequence, Amino Acid or Nucleic Acid, and create a # new sequence object (Bio::Sequence::AA or Bio::Sequence::NA) on the basis # of this guess. This method will change the current Bio::Sequence object. # # s = Bio::Sequence.new('atgc') # puts s.seq.class #=> String # s.auto # puts s.seq.class #=> Bio::Sequence::NA # --- # *Returns*:: Bio::Sequence::NA/AA object def auto @moltype = guess if @moltype == NA @seq = NA.new(seq) else @seq = AA.new(seq) end end # Given a sequence String, guess its type, Amino Acid or Nucleic Acid, and # return a new Bio::Sequence object wrapping a sequence of the guessed type # (either Bio::Sequence::AA or Bio::Sequence::NA) # # s = Bio::Sequence.auto('atgc') # puts s.seq.class #=> Bio::Sequence::NA # --- # *Arguments*: # * (required) _str_: String *or* Bio::Sequence::NA/AA object # *Returns*:: Bio::Sequence object def self.auto(str) seq = self.new(str) seq.auto return seq end # Guess the class of the current sequence. Returns the class # (Bio::Sequence::AA or Bio::Sequence::NA) guessed. In general, used by # developers only, but if you know what you are doing, feel free. # # s = Bio::Sequence.new('atgc') # puts s.guess #=> Bio::Sequence::NA # # There are three parameters: `threshold`, `length`, and `index`. # # The `threshold` value (defaults to 0.9) is the frequency of # nucleic acid bases [AGCTUagctu] required in the sequence for this method # to produce a Bio::Sequence::NA "guess". In the default case, if less # than 90% of the bases (after excluding [Nn]) are in the set [AGCTUagctu], # then the guess is Bio::Sequence::AA. # # s = Bio::Sequence.new('atgcatgcqq') # puts s.guess #=> Bio::Sequence::AA # puts s.guess(0.8) #=> Bio::Sequence::AA # puts s.guess(0.7) #=> Bio::Sequence::NA # # The `length` value is how much of the total sequence to use in the # guess (default 10000). If your sequence is very long, you may # want to use a smaller amount to reduce the computational burden. # # s = Bio::Sequence.new(A VERY LONG SEQUENCE) # puts s.guess(0.9, 1000) # limit the guess to the first 1000 positions # # The `index` value is where to start the guess. Perhaps you know there # are a lot of gaps at the start... # # s = Bio::Sequence.new('-----atgcc') # puts s.guess #=> Bio::Sequence::AA # puts s.guess(0.9,10000,5) #=> Bio::Sequence::NA # --- # *Arguments*: # * (optional) _threshold_: Float in range 0,1 (default 0.9) # * (optional) _length_: Fixnum (default 10000) # * (optional) _index_: Fixnum (default 1) # *Returns*:: Bio::Sequence::NA/AA def guess(threshold = 0.9, length = 10000, index = 0) str = seq.to_s[index,length].to_s.extend Bio::Sequence::Common cmp = str.composition bases = cmp['A'] + cmp['T'] + cmp['G'] + cmp['C'] + cmp['U'] + cmp['a'] + cmp['t'] + cmp['g'] + cmp['c'] + cmp['u'] total = str.length - cmp['N'] - cmp['n'] if bases.to_f / total > threshold return NA else return AA end end # Guess the class of a given sequence. Returns the class # (Bio::Sequence::AA or Bio::Sequence::NA) guessed. In general, used by # developers only, but if you know what you are doing, feel free. # # puts .guess('atgc') #=> Bio::Sequence::NA # # There are three optional parameters: `threshold`, `length`, and `index`. # # The `threshold` value (defaults to 0.9) is the frequency of # nucleic acid bases [AGCTUagctu] required in the sequence for this method # to produce a Bio::Sequence::NA "guess". In the default case, if less # than 90% of the bases (after excluding [Nn]) are in the set [AGCTUagctu], # then the guess is Bio::Sequence::AA. # # puts Bio::Sequence.guess('atgcatgcqq') #=> Bio::Sequence::AA # puts Bio::Sequence.guess('atgcatgcqq', 0.8) #=> Bio::Sequence::AA # puts Bio::Sequence.guess('atgcatgcqq', 0.7) #=> Bio::Sequence::NA # # The `length` value is how much of the total sequence to use in the # guess (default 10000). If your sequence is very long, you may # want to use a smaller amount to reduce the computational burden. # # # limit the guess to the first 1000 positions # puts Bio::Sequence.guess('A VERY LONG SEQUENCE', 0.9, 1000) # # The `index` value is where to start the guess. Perhaps you know there # are a lot of gaps at the start... # # puts Bio::Sequence.guess('-----atgcc') #=> Bio::Sequence::AA # puts Bio::Sequence.guess('-----atgcc',0.9,10000,5) #=> Bio::Sequence::NA # --- # *Arguments*: # * (required) _str_: String *or* Bio::Sequence::NA/AA object # * (optional) _threshold_: Float in range 0,1 (default 0.9) # * (optional) _length_: Fixnum (default 10000) # * (optional) _index_: Fixnum (default 1) # *Returns*:: Bio::Sequence::NA/AA def self.guess(str, *args) self.new(str).guess(*args) end # Transform the sequence wrapped in the current Bio::Sequence object # into a Bio::Sequence::NA object. This method will change the current # object. This method does not validate your choice, so be careful! # # s = Bio::Sequence.new('RRLE') # puts s.seq.class #=> String # s.na # puts s.seq.class #=> Bio::Sequence::NA !!! # # However, if you know your sequence type, this method may be # constructively used after initialization, # # s = Bio::Sequence.new('atgc') # s.na # --- # *Returns*:: Bio::Sequence::NA def na @seq = NA.new(seq) @moltype = NA end # Transform the sequence wrapped in the current Bio::Sequence object # into a Bio::Sequence::NA object. This method will change the current # object. This method does not validate your choice, so be careful! # # s = Bio::Sequence.new('atgc') # puts s.seq.class #=> String # s.aa # puts s.seq.class #=> Bio::Sequence::AA !!! # # However, if you know your sequence type, this method may be # constructively used after initialization, # # s = Bio::Sequence.new('RRLE') # s.aa # --- # *Returns*:: Bio::Sequence::AA def aa @seq = AA.new(seq) @moltype = AA end # Create a new Bio::Sequence object from a formatted string # (GenBank, EMBL, fasta format, etc.) # # s = Bio::Sequence.input(str) # --- # *Arguments*: # * (required) _str_: string # * (optional) _format_: format specification (class or nil) # *Returns*:: Bio::Sequence object def self.input(str, format = nil) if format then klass = format else klass = Bio::FlatFile::AutoDetect.default.autodetect(str) end obj = klass.new(str) obj.to_biosequence end # alias of Bio::Sequence.input def self.read(str, format = nil) input(str, format) end # accession numbers of the sequence # # *Returns*:: Array of String def accessions [ primary_accession, secondary_accessions ].flatten.compact end # Normally, users should not call this method directly. # Use Bio::*#to_biosequence (e.g. Bio::GenBank#to_biosequence). # # Creates a new Bio::Sequence object from database data with an # adapter module. def self.adapter(source_data, adapter_module) biosequence = self.new(nil) biosequence.instance_eval { remove_instance_variable(:@seq) @source_data = source_data } biosequence.extend(adapter_module) biosequence end end # Sequence end # Bio bio-2.0.3/lib/bio/tree.rb0000644000175000017500000006366314141516614014437 0ustar nileshnilesh# # = bio/tree.rb - phylogenetic tree data structure class # # Copyright:: Copyright (C) 2006 # Naohisa Goto # License:: The Ruby License # # require 'matrix' require 'bio/pathway' module Bio # This is the class for phylogenetic tree. # It stores a phylogenetic tree. # # Internally, it is based on Bio::Pathway class. # However, users cannot handle Bio::Pathway object directly. # # This is alpha version. Incompatible changes may be made frequently. class Tree # Error when there are no path between specified nodes class NoPathError < RuntimeError; end # Edge object of each node. # By default, the object doesn't contain any node information. class Edge # creates a new edge. def initialize(distance = nil) if distance.kind_of?(Numeric) self.distance = distance elsif distance self.distance_string = distance end end # evolutionary distance attr_reader :distance # evolutionary distance represented as a string attr_reader :distance_string # set evolutionary distance value def distance=(num) @distance = num @distance_string = (num ? num.to_s : num) end # set evolutionary distance value from a string def distance_string=(str) if str.to_s.strip.empty? @distance = nil @distance_string = str else @distance = str.to_f @distance_string = str end end # visualization of this object def inspect "" end # string representation of this object def to_s @distance_string.to_s end #--- # methods for NHX (New Hampshire eXtended) and/or PhyloXML #+++ # log likelihood value (:L in NHX) attr_accessor :log_likelihood # width of the edge # ( of PhyloXML, or :W="w" in NHX) attr_accessor :width # Other NHX parameters. Returns a Hash. # Note that :L and :W # are not stored here but stored in the proper attributes in this class. # However, if you force to set these parameters in this hash, # the parameters in this hash are preferred when generating NHX. # In addition, If the same parameters are defined at Node object, # the parameters in the node are preferred. def nhx_parameters @nhx_parameters ||= {} @nhx_parameters end end #class Edge # Gets distance value from the given edge. # Returns float or any other numeric value or nil. def get_edge_distance(edge) begin dist = edge.distance rescue NoMethodError dist = edge end dist end # Gets distance string from the given edge. # Returns a string or nil. def get_edge_distance_string(edge) begin dist = edge.distance_string rescue NoMethodError dist = (edge ? edge.to_s : nil) end dist end # Returns edge1 + edge2 def get_edge_merged(edge1, edge2) dist1 = get_edge_distance(edge1) dist2 = get_edge_distance(edge2) if dist1 and dist2 then Edge.new(dist1 + dist2) elsif dist1 then Edge.new(dist1) elsif dist2 then Edge.new(dist2) else Edge.new end end # Node object. class Node # Creates a new node. def initialize(name = nil) @name = name if name end # name of the node attr_accessor :name # bootstrap value attr_reader :bootstrap # bootstrap value as a string attr_reader :bootstrap_string # sets a bootstrap value def bootstrap=(num) @bootstrap_string = (num ? num.to_s : num) @bootstrap = num end # sets a bootstrap value from a string def bootstrap_string=(str) if str.to_s.strip.empty? @bootstrap = nil @bootstrap_string = str else i = str.to_i f = str.to_f @bootstrap = (i == f ? i : f) @bootstrap_string = str end end # visualization of this object def inspect if @name and !@name.empty? then str = "(Node:#{@name.inspect}" else str = sprintf('(Node:%x', (self.__id__ << 1) & 0xffffffff) end if defined?(@bootstrap) and @bootstrap then str += " bootstrap=#{@bootstrap.inspect}" end str += ")" str end # string representation of this object def to_s @name.to_s end # the order of the node # (lower value, high priority) attr_accessor :order_number #--- # methods for NHX (New Hampshire eXtended) and/or PhyloXML #+++ # Phylogenetic events. # Returns an Array of one (or more?) of the following symbols # :gene_duplication # :speciation def events @events ||= [] @events end # EC number (EC_number in PhyloXML, or :E in NHX) attr_accessor :ec_number # scientific name (scientific_name in PhyloXML, or :S in NHX) attr_accessor :scientific_name # taxonomy identifier (taxonomy_identifier in PhyloXML, or :T in NHX) attr_accessor :taxonomy_id # Other NHX parameters. Returns a Hash. # Note that :D, :E, :S, and :T # are not stored here but stored in the proper attributes in this class. # However, if you force to set these parameters in this hash, # the parameters in this hash are preferred when generating NHX. def nhx_parameters @nhx_parameters ||= {} @nhx_parameters end end #class Node # Gets node name def get_node_name(node) begin node.name rescue NoMethodError node.to_s end end def get_node_bootstrap(node) begin node.bootstrap rescue NoMethodError nil end end def get_node_bootstrap_string(node) begin node.bootstrap_string rescue NoMethodError nil end end # Creates a new phylogenetic tree. # When no arguments are given, it creates a new empty tree. # When a Tree object is given, it copies the tree. # Note that the new tree shares Node and Edge objects # with the given tree. def initialize(tree = nil) # creates an undirected adjacency list graph @pathway = Bio::Pathway.new([], true) @root = nil @options = {} _init_cache self.concat(tree) if tree end # (private) clear internal cache def _init_cache @cache_parent = {} end private :_init_cache # (private) clear internal cache def _clear_cache @cache_parent.clear end private :_clear_cache # root node of this tree # (even if unrooted tree, it is used by some methods) attr_accessor :root # tree options; mainly used for tree output attr_accessor :options # Clears all nodes and edges. # Returns self. # Note that options and root are also cleared. def clear initialize self end # Returns all nodes as an array. def nodes @pathway.graph.keys end # Number of nodes. def number_of_nodes @pathway.nodes end # Iterates over each node of this tree. def each_node(&x) #:yields: node @pathway.graph.each_key(&x) self end # Iterates over each edges of this tree. def each_edge #:yields: source, target, edge @pathway.relations.each do |rel| yield rel.node[0], rel.node[1], rel.relation end self end # Returns all edges an array of [ node0, node1, edge ] def edges @pathway.relations.collect do |rel| [ rel.node[0], rel.node[1], rel.relation ] end end # Returns number of edges in the tree. def number_of_edges @pathway.relations.size end # Returns an array of adjacent nodes of the given node. def adjacent_nodes(node) h = @pathway.graph[node] h ? h.keys : [] end # Returns all connected edges with adjacent nodes. # Returns an array of the array [ source, target, edge ]. # # The reason why the method name is "out_edges" is that # it comes from the Boost Graph Library. def out_edges(source) h = @pathway.graph[source] if h h.collect { |key, val| [ source, key, val ] } else [] end end # Iterates over each connected edges of the given node. # Returns self. # # The reason why the method name is "each_out_edge" is that # it comes from the Boost Graph Library. def each_out_edge(source) #:yields: source, target, edge h = @pathway.graph[source] h.each { |key, val| yield source, key, val } if h self end # Returns number of edges in the given node. # # The reason why the method name is "out_degree" is that # it comes from the Boost Graph Library. def out_degree(source) h = @pathway.graph[source] h ? h.size : 0 end # Returns an edge from source to target. # If source and target are not adjacent nodes, returns nil. def get_edge(source, target) h = @pathway.graph[source] h ? h[target] : nil end # Adds a new edge to the tree. # Returns the newly added edge. # If the edge already exists, it is overwritten with new one. def add_edge(source, target, edge = Edge.new) _clear_cache @pathway.append(Bio::Relation.new(source, target, edge)) edge end # Finds a node in the tree by given name and returns the node. # If the node does not found, returns nil. # If multiple nodes with the same name exist, # the result would be one of those (unspecified). def get_node_by_name(str) self.each_node do |node| if get_node_name(node) == str return node end end nil end # Adds a node to the tree. # Returns self. # If the node already exists, it does nothing. def add_node(node) _clear_cache @pathway.graph[node] ||= {} self end # If the node exists, returns true. # Otherwise, returns false. def include?(node) @pathway.graph[node] ? true : false end # Removes all edges connected with the node. # Returns self. # If the node does not exist, raises IndexError. def clear_node(node) unless self.include?(node) raise IndexError, 'the node does not exist' end _clear_cache @pathway.relations.delete_if do |rel| rel.node.include?(node) end @pathway.graph[node].each_key do |k| @pathway.graph[k].delete(node) end @pathway.graph[node].clear self end # Removes the given node from the tree. # All edges connected with the node are also removed. # Returns self. # If the node does not exist, raises IndexError. def remove_node(node) #_clear_cache #done in clear_node(node) self.clear_node(node) @pathway.graph.delete(node) self end # Removes each node if the block returns not nil. # All edges connected with the removed nodes are also removed. # Returns self. def remove_node_if #_clear_cache #done in clear_node(node) all = self.nodes all.each do |node| if yield node then self.clear_node(node) @pathway.graph.delete(node) end end self end # Removes an edge between source and target. # Returns self. # If the edge does not exist, raises IndexError. #--- # If two or more edges exists between source and target, # all of them are removed. #+++ def remove_edge(source, target) unless self.get_edge(source, target) then raise IndexError, 'edge not found' end _clear_cache fwd = [ source, target ] rev = [ target, source ] @pathway.relations.delete_if do |rel| rel.node == fwd or rel.node == rev end h = @pathway.graph[source] h.delete(target) if h h = @pathway.graph[target] h.delete(source) if h self end # Removes each edge if the block returns not nil. # Returns self. def remove_edge_if #:yields: source, target, edge _clear_cache removed_rel = [] @pathway.relations.delete_if do |rel| if yield rel.node[0], rel.node[1], rel.edge then removed_rel << rel true end end removed_rel.each do |rel| source = rel.node[0] target = rel.node[1] h = @pathway.graph[source] h.delete(target) if h h = @pathway.graph[target] h.delete(source) if h end self end # Replaces each node by each block's return value. # Returns self. def collect_node! #:yields: node _clear_cache tr = {} self.each_node do |node| tr[node] = yield node end # replaces nodes in @pathway.relations @pathway.relations.each do |rel| rel.node.collect! { |node| tr[node] } end # re-generates @pathway from relations @pathway.to_list # adds orphan nodes tr.each_value do |newnode| @pathway.graph[newnode] ||= {} end self end # Replaces each edge by each block's return value. # Returns self. def collect_edge! #:yields: source, target, edge _clear_cache @pathway.relations.each do |rel| newedge = yield rel.node[0], rel.node[1], rel.relation rel.edge = newedge @pathway.append(rel, false) end self end # Gets the sub-tree consisted of given nodes. # _nodes_ must be an array of nodes. # Nodes that do not exist in the original tree are ignored. # Returns a Tree object. # Note that the sub-tree shares Node and Edge objects # with the original tree. def subtree(nodes) nodes = nodes.find_all do |x| @pathway.graph[x] end return self.class.new if nodes.empty? # creates subtree new_tree = self.class.new nodes.each do |x| new_tree.add_node(x) end self.each_edge do |node1, node2, edge| if new_tree.include?(node1) and new_tree.include?(node2) then new_tree.add_edge(node1, node2, edge) end end return new_tree end # Gets the sub-tree consisted of given nodes and # all internal nodes connected between given nodes. # _nodes_ must be an array of nodes. # Nodes that do not exist in the original tree are ignored. # Returns a Tree object. # The result is unspecified for cyclic trees. # Note that the sub-tree shares Node and Edge objects # with the original tree. def subtree_with_all_paths(nodes) hash = {} nodes.each { |x| hash[x] = true } nodes.each_index do |i| node1 = nodes[i] (0...i).each do |j| node2 = nodes[j] unless node1 == node2 then begin path = self.path(node1, node2) rescue IndexError, NoPathError path = [] end path.each { |x| hash[x] = true } end end end self.subtree(hash.keys) end # Concatenates the other tree. # If the same edge exists, the edge in _other_ is used. # Returns self. # The result is unspecified if _other_ isn't a Tree object. # Note that the Node and Edge objects in the _other_ tree are # shared in the concatinated tree. def concat(other) #raise TypeError unless other.kind_of?(self.class) _clear_cache other.each_node do |node| self.add_node(node) end other.each_edge do |node1, node2, edge| self.add_edge(node1, node2, edge) end self end # Gets path from node1 to node2. # Returns an array of nodes, including node1 and node2. # If node1 and/or node2 do not exist, IndexError is raised. # If node1 and node2 are not connected, NoPathError is raised. # The result is unspecified for cyclic trees. def path(node1, node2) raise IndexError, 'node1 not found' unless @pathway.graph[node1] raise IndexError, 'node2 not found' unless @pathway.graph[node2] return [ node1 ] if node1 == node2 return [ node1, node2 ] if @pathway.graph[node1][node2] _, path = @pathway.bfs_shortest_path(node1, node2) unless path[0] == node1 and path[-1] == node2 then raise NoPathError, 'node1 and node2 are not connected' end path end # Iterates over each edge from node1 to node2. # The result is unspecified for cyclic trees. def each_edge_in_path(node1, node2) path = self.path(node1, node2) source = path.shift path.each do |target| edge = self.get_edge(source, target) yield source, target, edge source = target end self end # Returns distance between node1 and node2. # It would raise error if the edges didn't contain distance values. # The result is unspecified for cyclic trees. def distance(node1, node2) distance = 0 self.each_edge_in_path(node1, node2) do |source, target, edge| distance += get_edge_distance(edge) end distance end # (private) get parent only by using cache def _get_cached_parent(node, root) @cache_parent[root] ||= Hash.new cache = @cache_parent[root] if node == root then unless cache.has_key?(root) then self.adjacent_nodes(root).each do |n| cache[n] ||= root if n != root end cache[root] = nil end parent = nil else unless parent = cache[node] then parent = self.adjacent_nodes(node).find { |n| (m = cache[n]) && (m != node) } _cache_parent(node, parent, root) if parent end parent end end private :_get_cached_parent # (private) set parent cache def _cache_parent(node, parent, root) return unless parent cache = @cache_parent[root] cache[node] = parent self.adjacent_nodes(node).each do |n| cache[n] ||= node if n != parent end end private :_cache_parent # Gets the parent node of the _node_. # If _root_ isn't specified or _root_ is nil, @root is used. # Returns an Node object or nil. # The result is unspecified for cyclic trees. def parent(node, root = nil) root ||= @root raise IndexError, 'can not get parent for unrooted tree' unless root unless ret = _get_cached_parent(node, root) then ret = self.path(root, node)[-2] _cache_parent(node, ret, root) end ret end # Gets the adjacent children nodes of the _node_. # If _root_ isn't specified or _root_ is nil, @root is used. # Returns an array of Nodes. # The result is unspecified for cyclic trees. def children(node, root = nil) root ||= @root c = self.adjacent_nodes(node) c.delete(self.parent(node, root)) c end # Gets all descendent nodes of the _node_. # If _root_ isn't specified or _root_ is nil, @root is used. # Returns an array of Nodes. # The result is unspecified for cyclic trees. def descendents(node, root = nil) root ||= @root distance, route = @pathway.breadth_first_search(root) d = distance[node] result = [] distance.each do |key, val| if val > d then x = key while x = route[x] if x == node then result << key break end break if distance[x] <= d end end end result end # If _node_ is nil, returns an array of # all leaves (nodes connected with one edge). # Otherwise, gets all descendent leaf nodes of the _node_. # If _root_ isn't specified or _root_ is nil, @root is used. # Returns an array of Nodes. # The result is unspecified for cyclic trees. def leaves(node = nil, root = nil) unless node then nodes = [] self.each_node do |x| nodes << x if self.out_degree(x) == 1 end return nodes else root ||= @root self.descendents(node, root).find_all do |x| self.adjacent_nodes(x).size == 1 end end end # Gets all ancestral nodes of the _node_. # If _root_ isn't specified or _root_ is nil, @root is used. # Returns an array of Nodes. # The result is unspecified for cyclic trees. def ancestors(node, root = nil) root ||= @root (self.path(root, node) - [ node ]).reverse end # Gets the lowest common ancestor of the two nodes. # If _root_ isn't specified or _root_ is nil, @root is used. # Returns a Node object or nil. # The result is unspecified for cyclic trees. def lowest_common_ancestor(node1, node2, root = nil) root ||= @root _, route = @pathway.breadth_first_search(root) x = node1; r1 = [] begin; r1 << x; end while x = route[x] x = node2; r2 = [] begin; r2 << x; end while x = route[x] return (r1 & r2).first end # Returns total distance of all edges. # It would raise error if some edges didn't contain distance values. def total_distance distance = 0 self.each_edge do |source, target, edge| distance += get_edge_distance(edge) end distance end # Calculates distance matrix of given nodes. # If _nodes_ is nil, or is ommited, it acts the same as # tree.distance_matrix(tree.leaves). # Returns a matrix object. # The result is unspecified for cyclic trees. # Note 1: The diagonal values of the matrix are 0. # Note 2: If the distance cannot be calculated, nil will be set. def distance_matrix(nodes = nil) nodes ||= self.leaves matrix = [] nodes.each_index do |i| row = [] nodes.each_index do |j| if i == j then distance = 0 elsif r = matrix[j] and val = r[i] then distance = val else distance = (self.distance(nodes[i], nodes[j]) rescue nil) end row << distance end matrix << row end Matrix.rows(matrix, false) end # Shows the adjacency matrix representation of the tree. # It shows matrix only for given nodes. # If _nodes_ is nil or is ommitted, # it acts the same as tree.adjacency_matrix(tree.nodes). # If a block is given, for each edge, # it yields _source_, _target_, and _edge_, and # uses the returned value of the block. # Without blocks, it uses edge. # Returns a matrix object. def adjacency_matrix(nodes = nil, default_value = nil, diagonal_value = nil) #:yields: source, target, edge nodes ||= self.nodes size = nodes.size hash = {} nodes.each_with_index { |x, i| hash[x] = i } # prepares an matrix matrix = Array.new(size, nil) matrix.collect! { |x| Array.new(size, default_value) } (0...size).each { |i| matrix[i][i] = diagonal_value } # fills the matrix from each edge self.each_edge do |source, target, edge| i_source = hash[source] i_target = hash[target] if i_source and i_target then val = block_given? ? (yield source, target, edge) : edge matrix[i_source][i_target] = val matrix[i_target][i_source] = val end end Matrix.rows(matrix, false) end # Removes all nodes that are not branches nor leaves. # That is, removes nodes connected with exactly two edges. # For each removed node, two adjacent edges are merged and # a new edge are created. # Returns removed nodes. # Note that orphan nodes are still kept unchanged. def remove_nonsense_nodes _clear_cache hash = {} self.each_node do |node| hash[node] = true if @pathway.graph[node].size == 2 end hash.each_key do |node| adjs = @pathway.graph[node].keys edges = @pathway.graph[node].values new_edge = get_edge_merged(edges[0], edges[1]) @pathway.graph[adjs[0]].delete(node) @pathway.graph[adjs[1]].delete(node) @pathway.graph.delete(node) @pathway.append(Bio::Relation.new(adjs[0], adjs[1], new_edge)) end #@pathway.to_relations @pathway.relations.reject! do |rel| hash[rel.node[0]] or hash[rel.node[1]] end return hash.keys end # Insert a new node between adjacent nodes node1 and node2. # The old edge between node1 and node2 are changed to the edge # between new_node and node2. # The edge between node1 and new_node is newly created. # # If new_distance is specified, the distance between # node1 and new_node is set to new_distance, and # distance between new_node and node2 is set to # tree.get_edge(node1, node2).distance - new_distance. # # Returns self. # If node1 and node2 are not adjacent, raises IndexError. # # If new_node already exists in the tree, the tree would become # circular. In addition, if the edge between new_node and # node1 (or node2) already exists, it will be erased. def insert_node(node1, node2, new_node, new_distance = nil) unless edge = self.get_edge(node1, node2) then raise IndexError, 'nodes not found or two nodes are not adjacent' end _clear_cache new_edge = Edge.new(new_distance) self.remove_edge(node1, node2) self.add_edge(node1, new_node, new_edge) if new_distance and old_distance = get_edge_distance(edge) then old_distance -= new_distance begin edge.distance = old_distance rescue NoMethodError edge = old_distance end end self.add_edge(new_node, node2, edge) self end end #class Tree end #module Bio #--- # temporary added #+++ require 'bio/tree/output' bio-2.0.3/lib/bio/map.rb0000644000175000017500000003145514141516614014247 0ustar nileshnilesh# # = bio/map.rb - biological mapping class # # Copyright:: Copyright (C) 2006 Jan Aerts # License:: The Ruby License # # $Id: map.rb,v 1.11 2007/04/12 12:19:16 aerts Exp $ require 'bio/location' module Bio # == Description # # The Bio::Map contains classes that describe mapping information # and can be used to contain linkage maps, radiation-hybrid maps, # etc. As the same marker can be mapped to more than one map, and a # single map typically contains more than one marker, the link # between the markers and maps is handled by Bio::Map::Mapping # objects. Therefore, to link a map to a marker, a Bio::Map::Mapping # object is added to that Bio::Map. See usage below. # # Not only maps in the strict sense have map-like features (and # similarly not only markers in the strict sense have marker-like # features). For example, a microsatellite is something that can be # mapped on a linkage map (and hence becomes a 'marker'), but a # clone can also be mapped to a cytogenetic map. In that case, the # clone acts as a marker and has marker-like properties. That same # clone can also be considered a 'map' when BAC-end sequences are # mapped to it. To reflect this flexibility, the modules # Bio::Map::ActsLikeMap and Bio::Map::ActsLikeMarker define methods # that are typical for maps and markers. # #-- # In a certain sense, a biological sequence also has map- and # marker-like properties: things can be mapped to it at certain # locations, and the sequence itself can be mapped to something else # (e.g. the BAC-end sequence example above, or a BLAST-result). #++ # # == Usage # # my_marker1 = Bio::Map::Marker.new('marker1') # my_marker2 = Bio::Map::Marker.new('marker2') # my_marker3 = Bio::Map::Marker.new('marker3') # # my_map1 = Bio::Map::SimpleMap.new('RH_map_ABC (2006)', 'RH', 'cR') # my_map2 = Bio::Map::SimpleMap.new('consensus', 'linkage', 'cM') # # my_map1.add_mapping_as_map(my_marker1, '17') # my_map1.add_mapping_as_map(Bio::Map::Marker.new('marker2'), '5') # my_marker3.add_mapping_as_marker(my_map1, '9') # # print "Does my_map1 contain marker3? => " # puts my_map1.contains_marker?(my_marker3).to_s # print "Does my_map2 contain marker3? => " # puts my_map2.contains_marker?(my_marker3).to_s # # my_map1.mappings_as_map.sort.each do |mapping| # puts [ mapping.map.name, # mapping.marker.name, # mapping.location.from.to_s, # mapping.location.to.to_s ].join("\t") # end # puts my_map1.mappings_as_map.min.marker.name # # my_map2.mappings_as_map.each do |mapping| # puts [ mapping.map.name, # mapping.marker.name, # mapping.location.from.to_s, # mapping.location.to.to_s ].join("\t") # end # module Map # == Description # # The Bio::Map::ActsLikeMap module contains methods that are typical for # map-like things: # # * add markers with their locations (through Bio::Map::Mappings) # * check if a given marker is mapped to it, # and can be mixed into other classes (e.g. Bio::Map::SimpleMap) # # Classes that include this mixin should provide an array property # called mappings_as_map. # # For example: # # class MyMapThing # include Bio::Map::ActsLikeMap # # def initialize(name) # @name = name # @mappings_as_maps = Array.new # end # attr_accessor :name, :mappings_as_map # end # module ActsLikeMap # == Description # # Adds a Bio::Map::Mappings object to its array of mappings. # # == Usage # # # suppose we have a Bio::Map::SimpleMap object called my_map # my_map.add_mapping_as_map(Bio::Map::Marker.new('marker_a'), '5') # # --- # *Arguments*: # * _marker_ (required): Bio::Map::Marker object # * _location_: location of mapping. Should be a _string_, not a _number_. # *Returns*:: itself def add_mapping_as_map(marker, location = nil) unless marker.class.include?(Bio::Map::ActsLikeMarker) raise "[Error] marker is not object that implements Bio::Map::ActsLikeMarker" end my_mapping = ( location.nil? ) ? Bio::Map::Mapping.new(self, marker, nil) : Bio::Map::Mapping.new(self, marker, Bio::Locations.new(location)) if ! marker.mapped_to?(self) self.mappings_as_map.push(my_mapping) marker.mappings_as_marker.push(my_mapping) else already_mapped = false marker.positions_on(self).each do |loc| if loc.equals?(Bio::Locations.new(location)) already_mapped = true end end if ! already_mapped self.mappings_as_map.push(my_mapping) marker.mappings_as_marker.push(my_mapping) end end return self end # Checks whether a Bio::Map::Marker is mapped to this # Bio::Map::SimpleMap. # # --- # *Arguments*: # * _marker_: a Bio::Map::Marker object # *Returns*:: true or false def contains_marker?(marker) unless marker.class.include?(Bio::Map::ActsLikeMarker) raise "[Error] marker is not object that implements Bio::Map::ActsLikeMarker" end contains = false self.mappings_as_map.each do |mapping| if mapping.marker == marker contains = true return contains end end return contains end end # ActsLikeMap # == Description # # The Bio::Map::ActsLikeMarker module contains methods that are # typical for marker-like things: # # * map it to one or more maps # * check if it's mapped to a given map # and can be mixed into other classes (e.g. Bio::Map::Marker) # # Classes that include this mixin should provide an array property # called mappings_as_marker. # # For example: # # class MyMarkerThing # include Bio::Map::ActsLikeMarker # # def initialize(name) # @name = name # @mappings_as_marker = Array.new # end # attr_accessor :name, :mappings_as_marker # end # module ActsLikeMarker # == Description # # Adds a Bio::Map::Mappings object to its array of mappings. # # == Usage # # # suppose we have a Bio::Map::Marker object called marker_a # marker_a.add_mapping_as_marker(Bio::Map::SimpleMap.new('my_map'), '5') # # --- # *Arguments*: # * _map_ (required): Bio::Map::SimpleMap object # * _location_: location of mapping. Should be a _string_, not a _number_. # *Returns*:: itself def add_mapping_as_marker(map, location = nil) unless map.class.include?(Bio::Map::ActsLikeMap) raise "[Error] map is not object that implements Bio::Map::ActsLikeMap" end my_mapping = (location.nil?) ? Bio::Map::Mappings.new(map, self, nil) : Bio::Map::Mapping.new(map, self, Bio::Locations.new(location)) if ! self.mapped_to?(map) self.mappings_as_marker.push(my_mapping) map.mappings_as_map.push(my_mapping) else already_mapped = false self.positions_on(map).each do |loc| if loc.equals?(Bio::Locations.new(location)) already_mapped = true end end if ! already_mapped self.mappings_as_marker.push(my_mapping) map.mappings_as_map.push(my_mapping) end end end # Check whether this marker is mapped to a given Bio::Map::SimpleMap. # --- # *Arguments*: # * _map_: a Bio::Map::SimpleMap object # *Returns*:: true or false def mapped_to?(map) unless map.class.include?(Bio::Map::ActsLikeMap) raise "[Error] map is not object that implements Bio::Map::ActsLikeMap" end mapped = false self.mappings_as_marker.each do |mapping| if mapping.map == map mapped = true return mapped end end return mapped end # Return all positions of this marker on a given map. # --- # *Arguments*: # * _map_: an object that mixes in Bio::Map::ActsLikeMap # *Returns*:: array of Bio::Location objects def positions_on(map) unless map.class.include?(Bio::Map::ActsLikeMap) raise "[Error] map is not object that implements Bio::Map::ActsLikeMap" end positions = Array.new self.mappings_as_marker.each do |mapping| if mapping.map == map positions.push(mapping.location) end end return positions end # Return all mappings of this marker on a given map. # --- # *Arguments*: # * _map_: an object that mixes in Bio::Map::ActsLikeMap # *Returns*:: array of Bio::Map::Mapping objects def mappings_on(map) unless map.class.include?(Bio::Map::ActsLikeMap) raise "[Error] map is not object that implements Bio::Map::ActsLikeMap" end m = Array.new self.mappings_as_marker.each do |mapping| if mapping.map == map m.push(mapping) end end return m end end # ActsLikeMarker # == Description # # Creates a new Bio::Map::Mapping object, which links Bio::Map::ActsAsMap- # and Bio::Map::ActsAsMarker-like objects. This class is typically not # accessed directly, but through map- or marker-like objects. class Mapping include Comparable # Creates a new Bio::Map::Mapping object # --- # *Arguments*: # * _map_: a Bio::Map::SimpleMap object # * _marker_: a Bio::Map::Marker object # * _location_: a Bio::Locations object def initialize(map, marker, location = nil) @map, @marker, @location = map, marker, location end attr_accessor :map, :marker, :location # Compares the location of this mapping to another mapping. # --- # *Arguments*: # * other_mapping: Bio::Map::Mapping object # *Returns*:: # * 1 if self < other location # * -1 if self > other location # * 0 if both location are the same # * nil if the argument is not a Bio::Location object def <=>(other) unless other.kind_of?(Bio::Map::Mapping) raise "[Error] markers are not comparable" end unless @map.equal?(other.map) raise "[Error] maps have to be the same" end return self.location[0].<=>(other.location[0]) end end # Mapping # == Description # # This class handles the essential storage of name, type and units # of a map. It includes Bio::Map::ActsLikeMap, and therefore # supports the methods of that module. # # == Usage # # my_map1 = Bio::Map::SimpleMap.new('RH_map_ABC (2006)', 'RH', 'cR') # my_map1.add_marker(Bio::Map::Marker.new('marker_a', '17') # my_map1.add_marker(Bio::Map::Marker.new('marker_b', '5') # class SimpleMap include Bio::Map::ActsLikeMap # Builds a new Bio::Map::SimpleMap object # --- # *Arguments*: # * name: name of the map # * type: type of the map (e.g. linkage, radiation_hybrid, cytogenetic, ...) # * units: unit of the map (e.g. cM, cR, ...) # *Returns*:: new Bio::Map::SimpleMap object def initialize(name = nil, type = nil, length = nil, units = nil) @name, @type, @length, @units = name, type, length, units @mappings_as_map = Array.new end # Name of the map attr_accessor :name # Type of the map attr_accessor :type # Length of the map attr_accessor :length # Units of the map attr_accessor :units # Mappings attr_accessor :mappings_as_map end # SimpleMap # == Description # # This class handles markers that are anchored to a Bio::Map::SimpleMap. # It includes Bio::Map::ActsLikeMarker, and therefore supports the # methods of that module. # # == Usage # # marker_a = Bio::Map::Marker.new('marker_a') # marker_b = Bio::Map::Marker.new('marker_b') # class Marker include Bio::Map::ActsLikeMarker # Builds a new Bio::Map::Marker object # --- # *Arguments*: # * name: name of the marker # *Returns*:: new Bio::Map::Marker object def initialize(name) @name = name @mappings_as_marker = Array.new end # Name of the marker attr_accessor :name # Mappings attr_accessor :mappings_as_marker end # Marker end # Map end # Bio bio-2.0.3/lib/bio/db/0000755000175000017500000000000014141516614013522 5ustar nileshnileshbio-2.0.3/lib/bio/db/litdb.rb0000644000175000017500000000270614141516614015152 0ustar nileshnilesh# # = bio/db/litdb.rb - LITDB database class # # Copyright:: Copyright (C) 2001 Toshiaki Katayama # License:: The Ruby License # # $Id:$ # require 'bio/db' module Bio # = LITDB class class LITDB < NCBIDB # Delimiter DELIMITER = "\nEND\n" # Delimiter RS = DELIMITER # TAGSIZE = 12 # def initialize(entry) super(entry, TAGSIZE) end # Returns def reference hash = Hash.new('') hash['authors'] = author.split(/;/).map {|x| x.sub(/,/, ', ')} hash['title'] = title hash['journal'] = journal.gsub(/\./, '. ').strip vol = volume.split(/,\s+/) if vol.size > 1 hash['volume'] = vol.shift.sub(/Vol\./, '') hash['pages'], hash['year'] = vol.pop.split(' ') hash['issue'] = vol.shift.sub(/No\./, '') unless vol.empty? end return Reference.new(hash) end # CODE def entry_id field_fetch('CODE') end # TITLE def title field_fetch('TITLE') end # FIELD def field field_fetch('FIELD') end # JOURNAL def journal field_fetch('JOURNAL') end # VOLUME def volume field_fetch('VOLUME') end # KEYWORD ';;' def keyword unless @data['KEYWORD'] @data['KEYWORD'] = fetch('KEYWORD').split(/;;\s*/) end @data['KEYWORD'] end # AUTHOR def author field_fetch('AUTHOR') end end end bio-2.0.3/lib/bio/db/fasta.rb0000644000175000017500000002225014141516614015146 0ustar nileshnilesh# # = bio/db/fasta.rb - FASTA format class # # Copyright:: Copyright (C) 2001, 2002 # Naohisa Goto , # Toshiaki Katayama # License:: The Ruby License # # $Id:$ # # == Description # # FASTA format class. # # == Examples # # See documents of Bio::FastaFormat class. # # == References # # * FASTA format (WikiPedia) # http://en.wikipedia.org/wiki/FASTA_format # # * Fasta format description (NCBI) # http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml # require 'bio/db' require 'bio/sequence' require 'bio/sequence/dblink' require 'bio/db/fasta/defline' module Bio # Treats a FASTA formatted entry, such as: # # >id and/or some comments <== definition line # ATGCATGCATGCATGCATGCATGCATGCATGCATGC <== sequence lines # ATGCATGCATGCATGCATGCATGCATGCATGCATGC # ATGCATGCATGC # # The precedent '>' can be omitted and the trailing '>' will be removed # automatically. # # === Examples # # fasta_string = <gi|398365175|ref|NP_009718.3| Cdc28p [Saccharomyces cerevisiae S288c] # MSGELANYKRLEKVGEGTYGVVYKALDLRPGQGQRVVALKKIRLESEDEGVPSTAIREISLLKELKDDNI # VRLYDIVHSDAHKLYLVFEFLDLDLKRYMEGIPKDQPLGADIVKKFMMQLCKGIAYCHSHRILHRDLKPQ # NLLINKDGNLKLGDFGLARAFGVPLRAYTHEIVTLWYRAPEVLLGGKQYSTGVDTWSIGCIFAEMCNRKP # IFSGDSEIDQIFKIFRVLGTPNEAIWPDIVYLPDFKPSFPQWRRKDLSQVVPSLDPRGIDLLDKLLAYDP # INRISARRAAIHPYFQES # END_OF_STRING # # f = Bio::FastaFormat.new(fasta_string) # # f.entry #=> ">gi|398365175|ref|NP_009718.3| Cdc28p [Saccharomyces cerevisiae S288c]\n"+ # # MSGELANYKRLEKVGEGTYGVVYKALDLRPGQGQRVVALKKIRLESEDEGVPSTAIREISLLKELKDDNI\n"+ # # VRLYDIVHSDAHKLYLVFEFLDLDLKRYMEGIPKDQPLGADIVKKFMMQLCKGIAYCHSHRILHRDLKPQ\n"+ # # NLLINKDGNLKLGDFGLARAFGVPLRAYTHEIVTLWYRAPEVLLGGKQYSTGVDTWSIGCIFAEMCNRKP\n"+ # # IFSGDSEIDQIFKIFRVLGTPNEAIWPDIVYLPDFKPSFPQWRRKDLSQVVPSLDPRGIDLLDKLLAYDP\n"+ # # INRISARRAAIHPYFQES" # # ==== Methods related to the name of the sequence # # A larger range of methods for dealing with Fasta definition lines can be found in FastaDefline, accessed through the FastaFormat#identifiers method. # # f.entry_id #=> "gi|398365175" # f.first_name #=> "gi|398365175|ref|NP_009718.3|" # f.definition #=> "gi|398365175|ref|NP_009718.3| Cdc28p [Saccharomyces cerevisiae S288c]" # f.identifiers #=> Bio::FastaDefline instance # f.accession #=> "NP_009718" # f.accessions #=> ["NP_009718"] # f.acc_version #=> "NP_009718.3" # f.comment #=> nil # # ==== Methods related to the actual sequence # # f.seq #=> "MSGELANYKRLEKVGEGTYGVVYKALDLRPGQGQRVVALKKIRLESEDEGVPSTAIREISLLKELKDDNIVRLYDIVHSDAHKLYLVFEFLDLDLKRYMEGIPKDQPLGADIVKKFMMQLCKGIAYCHSHRILHRDLKPQNLLINKDGNLKLGDFGLARAFGVPLRAYTHEIVTLWYRAPEVLLGGKQYSTGVDTWSIGCIFAEMCNRKPIFSGDSEIDQIFKIFRVLGTPNEAIWPDIVYLPDFKPSFPQWRRKDLSQVVPSLDPRGIDLLDKLLAYDPINRISARRAAIHPYFQES" # f.data #=> "\nMSGELANYKRLEKVGEGTYGVVYKALDLRPGQGQRVVALKKIRLESEDEGVPSTAIREISLLKELKDDNI\nVRLYDIVHSDAHKLYLVFEFLDLDLKRYMEGIPKDQPLGADIVKKFMMQLCKGIAYCHSHRILHRDLKPQ\nNLLINKDGNLKLGDFGLARAFGVPLRAYTHEIVTLWYRAPEVLLGGKQYSTGVDTWSIGCIFAEMCNRKP\nIFSGDSEIDQIFKIFRVLGTPNEAIWPDIVYLPDFKPSFPQWRRKDLSQVVPSLDPRGIDLLDKLLAYDP\nINRISARRAAIHPYFQES\n" # f.length #=> 298 # f.aaseq #=> "MSGELANYKRLEKVGEGTYGVVYKALDLRPGQGQRVVALKKIRLESEDEGVPSTAIREISLLKELKDDNIVRLYDIVHSDAHKLYLVFEFLDLDLKRYMEGIPKDQPLGADIVKKFMMQLCKGIAYCHSHRILHRDLKPQNLLINKDGNLKLGDFGLARAFGVPLRAYTHEIVTLWYRAPEVLLGGKQYSTGVDTWSIGCIFAEMCNRKPIFSGDSEIDQIFKIFRVLGTPNEAIWPDIVYLPDFKPSFPQWRRKDLSQVVPSLDPRGIDLLDKLLAYDPINRISARRAAIHPYFQES" # f.aaseq.composition #=> {"M"=>5, "S"=>15, "G"=>21, "E"=>16, "L"=>36, "A"=>17, "N"=>8, "Y"=>13, "K"=>22, "R"=>20, "V"=>18, "T"=>7, "D"=>23, "P"=>17, "Q"=>10, "I"=>23, "H"=>7, "F"=>12, "C"=>4, "W"=>4} # f.aalen #=> 298 # # # === A less structured fasta entry # # f.entry #=> ">abc 123 456\nASDF" # # f.entry_id #=> "abc" # f.first_name #=> "abc" # f.definition #=> "abc 123 456" # f.comment #=> nil # f.accession #=> nil # f.accessions #=> [] # f.acc_version #=> nil # # f.seq #=> "ASDF" # f.data #=> "\nASDF\n" # f.length #=> 4 # f.aaseq #=> "ASDF" # f.aaseq.composition #=> {"A"=>1, "S"=>1, "D"=>1, "F"=>1} # f.aalen #=> 4 # # # === References # # * FASTA format (WikiPedia) # http://en.wikipedia.org/wiki/FASTA_format # class FastaFormat < DB # Entry delimiter in flatfile text. DELIMITER = RS = "\n>" # (Integer) excess read size included in DELIMITER. DELIMITER_OVERRUN = 1 # '>' # The comment line of the FASTA formatted data. attr_accessor :definition # The seuqnce lines in text. attr_accessor :data attr_reader :entry_overrun # Stores the comment and sequence information from one entry of the # FASTA format string. If the argument contains more than one # entry, only the first entry is used. def initialize(str) @definition = str[/.*/].sub(/^>/, '').strip # 1st line @data = str.sub(/.*/, '') # rests @data.sub!(/^>.*/m, '') # remove trailing entries for sure @entry_overrun = $& end # Returns the stored one entry as a FASTA format. (same as to_s) def entry @entry = ">#{@definition}\n#{@data.strip}\n" end alias to_s entry # Executes FASTA/BLAST search by using a Bio::Fasta or a Bio::Blast # factory object. # # #!/usr/bin/env ruby # require 'bio' # # factory = Bio::Fasta.local('fasta34', 'db/swissprot.f') # flatfile = Bio::FlatFile.open(Bio::FastaFormat, 'queries.f') # flatfile.each do |entry| # p entry.definition # result = entry.fasta(factory) # result.each do |hit| # print "#{hit.query_id} : #{hit.evalue}\t#{hit.target_id} at " # p hit.lap_at # end # end # def query(factory) factory.query(entry) end alias fasta query alias blast query # Returns a joined sequence line as a String. def seq unless defined?(@seq) unless /\A\s*^\#/ =~ @data then @seq = Sequence::Generic.new(@data.tr(" \t\r\n0-9", '')) # lazy clean up else a = @data.split(/(^\#.*$)/) i = 0 cmnt = {} s = [] a.each do |x| if /^# ?(.*)$/ =~ x then cmnt[i] ? cmnt[i] << "\n" << $1 : cmnt[i] = $1 else x.tr!(" \t\r\n0-9", '') # lazy clean up i += x.length s << x end end @comment = cmnt @seq = Bio::Sequence::Generic.new(s.join('')) end end @seq end # Returns comments. def comment seq @comment end # Returns sequence length. def length seq.length end # Returens the Bio::Sequence::NA. def naseq Sequence::NA.new(seq) end # Returens the length of Bio::Sequence::NA. def nalen self.naseq.length end # Returens the Bio::Sequence::AA. def aaseq Sequence::AA.new(seq) end # Returens the length of Bio::Sequence::AA. def aalen self.aaseq.length end # Returns sequence as a Bio::Sequence object. # # Note: If you modify the returned Bio::Sequence object, # the sequence or definition in this FastaFormat object # might also be changed (but not always be changed) # because of efficiency. # def to_biosequence Bio::Sequence.adapter(self, Bio::Sequence::Adapter::FastaFormat) end alias to_seq to_biosequence # Parsing FASTA Defline, and extract IDs. # IDs are NSIDs (NCBI standard FASTA sequence identifiers) # or ":"-separated IDs. # It returns a Bio::FastaDefline instance. def identifiers unless defined?(@ids) then @ids = FastaDefline.new(@definition) end @ids end # Parsing FASTA Defline (using #identifiers method), and # shows a possibly unique identifier. # It returns a string. def entry_id identifiers.entry_id end # Parsing FASTA Defline (using #identifiers method), and # shows GI/locus/accession/accession with version number. # If a entry has more than two of such IDs, # only the first ID are shown. # It returns a string or nil. def gi identifiers.gi end # Returns an accession number. def accession identifiers.accession end # Parsing FASTA Defline (using #identifiers method), and # shows accession numbers. # It returns an array of strings. def accessions identifiers.accessions end # Returns accession number with version. def acc_version identifiers.acc_version end # Returns locus. def locus identifiers.locus end # Returns the first name (word) of the definition line - everything # before the first whitespace. # # >abc def #=> 'abc' # >gi|398365175|ref|NP_009718.3| Cdc28p [Saccharomyces cerevisiae S288c] #=> 'gi|398365175|ref|NP_009718.3|' # >abc #=> 'abc' def first_name index = definition.index(/\s/) if index.nil? return @definition else return @definition[0...index] end end end #class FastaFormat end #module Bio bio-2.0.3/lib/bio/db/sanger_chromatogram/0000755000175000017500000000000014141516614017544 5ustar nileshnileshbio-2.0.3/lib/bio/db/sanger_chromatogram/abif.rb0000644000175000017500000001225014141516614020772 0ustar nileshnilesh# # = bio/db/sanger_chromatogram/abif.rb - Abif class # # Copyright:: Copyright (C) 2009 Anthony Underwood , # License:: The Ruby License # require 'bio/db/sanger_chromatogram/chromatogram' module Bio # == Description # # This class inherits from the SangerChromatogram superclass. It captures the information contained # within an ABIF format chromatogram file generated by DNA sequencing. See the SangerChromatogram class # for usage. class Abif < SangerChromatogram DATA_TYPES = { 1 => 'byte', 2 => 'char', 3 => 'word', 4 => 'short', 5 => 'long', 7 => 'float', 8 => 'double', 10 => 'date', 11 => 'time', 18 => 'pString', 19 => 'cString', 12 => 'thumb', 13 => 'bool', 6 => 'rational', 9 => 'BCD', 14 => 'point', 15 => 'rect', 16 => 'vPoint', 17 => 'vRect', 20 => 'tag', 128 => 'deltaComp', 256 => 'LZWComp', 384 => 'deltaLZW', 1024 => 'user'} # User defined data types have tags numbers >= 1024 PACK_TYPES = { 'byte' => 'C', 'char' => 'c', 'word' => 'n', 'short' => 'n', 'long' => 'N', 'date' => 'nCC', 'time' => 'CCCC', 'pString' => 'CA*', 'cString' => 'Z*', 'float' => 'g', 'double' => 'G', 'bool' => 'C', 'thumb' => 'NNCC', 'rational' => 'NN', 'point' => 'nn', 'rect' => 'nnnn', 'vPoint' => 'NN', 'vRect' => 'NNNN', 'tag' => 'NN'} # Specifies how to pack each data type #sequence attributes # The sample title as entered when sequencing the sample (String) attr_accessor :sample_title # The chemistry used when sequencing e.g Dye terminators => 'term.' (String) attr_accessor :chemistry # see SangerChromatogram class for how to create an Abif object and its usage def initialize(string) header = string.slice(0,128) # read in header info @chromatogram_type, @version, @directory_tag_name, @directory_tag_number, @directory_element_type, @directory_element_size, @directory_number_of_elements, @directory_data_size, @directory_data_offset, @directory_data_handle= header.unpack("a4 n a4 N n n N N N N") @version = @version/100.to_f get_directory_entries(string) # get sequence @sequence = @directory_entries["PBAS"][1].data.map{|char| char.chr.downcase}.join("") #get peak indices @peak_indices = @directory_entries["PLOC"][1].data #get qualities @qualities = @directory_entries["PCON"][1].data # get sample title @sample_title = @directory_entries["SMPL"][1].data @directory_entries["PDMF"].size > 2 ? @dye_mobility = @directory_entries["PDMF"][2].data : @dye_mobility = @directory_entries["PDMF"][1].data #get trace data @chemistry = @directory_entries["phCH"][1].data base_order = @directory_entries["FWO_"][1].data.map{|char| char.chr.downcase} (9..12).each do |data_index| self.instance_variable_set("@#{base_order[data_index-9]}trace", @directory_entries["DATA"][data_index].data) end end # Returns the data for the name. # If not found, returns nil. # --- # *Arguments*: # * (required) _name_: (String) name of the data # * (required) tag_number: (Integer) tag number (default 1) # *Returns*:: any data type or nil def data(name, tag_number = 1) d = @directory_entries[name] d ? d[tag_number].data : nil end private def get_directory_entries(string) @directory_entries = Hash.new offset = @directory_data_offset @directory_number_of_elements.times do entry = DirectoryEntry.new entry_fields = string.slice(offset, @directory_element_size) entry.name, entry.tag_number, entry.element_type, entry.element_size, entry.number_of_elements, entry.data_size, entry.data_offset = entry_fields.unpack("a4 N n n N N N") # populate the entry with the data it refers to if entry.data_size > 4 get_entry_data(entry, string) else get_entry_data(entry, entry_fields) end if @directory_entries.has_key?(entry.name) @directory_entries[entry.name][entry.tag_number] = entry else @directory_entries[entry.name] = Array.new @directory_entries[entry.name][entry.tag_number] = entry end offset += @directory_element_size end end def get_entry_data(entry, string) if entry.data_size > 4 raw_data = string.slice(entry.data_offset, entry.data_size) else raw_data = string.slice(20,4) end if entry.element_type > 1023 # user defined data: not processed as yet by this bioruby module entry.data = raw_data else pack_type = PACK_TYPES[DATA_TYPES[entry.element_type]] pack_type.match(/\*/) ? unpack_string = pack_type : unpack_string = "#{pack_type}#{entry.number_of_elements}" entry.data = raw_data.unpack(unpack_string) if pack_type == "CA*" # pascal string where the first byte is a charcter count and should therefore be removed entry.data.shift end end end class DirectoryEntry attr_accessor :name, :tag_number, :element_type, :element_size, :number_of_elements, :data_size, :data_offset attr_accessor :data end end end bio-2.0.3/lib/bio/db/sanger_chromatogram/scf.rb0000644000175000017500000001661614141516614020656 0ustar nileshnilesh# # = bio/db/sanger_chromatogram/scf.rb - Scf class # # Copyright:: Copyright (C) 2009 Anthony Underwood , # License:: The Ruby License # require 'bio/db/sanger_chromatogram/chromatogram' module Bio # == Description # # This class inherits from the SangerChromatogram superclass. It captures the information contained # within an scf format chromatogram file generated by DNA sequencing. See the SangerChromatogram class # for usage class Scf < SangerChromatogram # sequence attributes # The quality of each base at each position along the length of the sequence is captured # by the nqual attributes where n is one of a, c, g or t. Generally the quality will be # high for the base that is called at a particular position and low for all the other bases. # However at positions of poor sequence quality, more than one base may have similar top scores. # By analysing the nqual attributes it may be possible to determine if the base calling was # correct or not. # The quality of the A base at each sequence position attr_accessor :aqual # The quality of the C base at each sequence position attr_accessor :cqual # The quality of the G base at each sequence position attr_accessor :gqual # The quality of the T base at each sequence position attr_accessor :tqual # A hash of extra information extracted from the chromatogram file attr_accessor :comments # see SangerChromatogram class for how to create an Scf object and its usage def initialize(string) header = string.slice(0,128) # read in header info @chromatogram_type, @samples, @sample_offset, @bases, @bases_left_clip, @bases_right_clip, @bases_offset, @comment_size, @comments_offset, @version, @sample_size, @code_set, @header_spare = header.unpack("a4 NNNNNNNN a4 NN N20") get_traces(string) get_bases_peakIndices_and_qualities(string) get_comments(string) if @comments["DYEP"] @dye_mobility = @comments["DYEP"] else @dye_mobility = "Unnown" end end private def get_traces(string) if @version == "3.00" # read in trace info offset = @sample_offset length = @samples * @sample_size # determine whether the data is stored in 1 byte as an unsigned byte or 2 bytes as an unsigned short @sample_size == 2 ? byte = "n" : byte = "c" for base in ["a" , "c" , "g" , "t"] trace_read = string.slice(offset,length).unpack("#{byte}#{@samples}") # convert offsets for sample_num in (0..trace_read.size-1) if trace_read[sample_num] > 30000 trace_read[sample_num] = trace_read[sample_num] - 65536 end end # For 8-bit data we need to emulate a signed/unsigned # cast that is implicit in the C implementations..... if @sample_size == 1 for sample_num in (0..trace_read.size-1) trace_read[sample_num] += 256 if trace_read[sample_num] < 0 end end trace_read = convert_deltas_to_values(trace_read) self.instance_variable_set("@#{base}trace", trace_read) offset += length end elsif @version == "2.00" @atrace = [] @ctrace = [] @gtrace = [] @ttrace = [] # read in trace info offset = @sample_offset length = @samples * @sample_size * 4 # determine whether the data is stored in 1 byte as an unsigned byte or 2 bytes as an unsigned short @sample_size == 2 ? byte = "n" : byte = "c" trace_read = string.slice(offset,length).unpack("#{byte}#{@samples*4}") (0..(@samples-1)*4).step(4) do |offset2| @atrace << trace_read[offset2] @ctrace << trace_read[offset2+1] @gtrace << trace_read[offset2+2] @ttrace << trace_read[offset2+3] end end end def get_bases_peakIndices_and_qualities(string) if @version == "3.00" # now go and get the peak index information offset = @bases_offset length = @bases * 4 get_v3_peak_indices(string,offset,length) # now go and get the accuracy information offset += length; get_v3_accuracies(string,offset,length) # OK, now go and get the base information. offset += length; length = @bases; get_v3_sequence(string,offset,length) #combine accuracies to get quality scores @qualities= convert_accuracies_to_qualities elsif @version == "2.00" @peak_indices = [] @aqual = [] @cqual = [] @gqual = [] @tqual = [] @qualities = [] @sequence = "" # now go and get the base information offset = @bases_offset length = @bases * 12 all_bases_info = string.slice(offset,length) (0..length-1).step(12) do |offset2| base_info = all_bases_info.slice(offset2,12).unpack("N C C C C a C3") @peak_indices << base_info[0] @aqual << base_info[1] @cqual << base_info[2] @gqual << base_info[3] @tqual << base_info[4] @sequence += base_info[5].downcase case base_info[5].downcase when "a" @qualities << base_info[1] when "c" @qualities << base_info[2] when "g" @qualities << base_info[3] when "t" @qualities << base_info[4] else @qualities << 0 end end end end def get_v3_peak_indices(string,offset,length) @peak_indices = string.slice(offset,length).unpack("N#{length/4}") end def get_v3_accuracies(string,offset,length) qualities = string.slice(offset,length) qual_length = length/4; qual_offset = 0; for base in ["a" , "c" , "g" , "t"] self.instance_variable_set("@#{base}qual",qualities.slice(qual_offset,qual_length).unpack("C#{qual_length}")) qual_offset += qual_length end end def get_v3_sequence(string,offset,length) @sequence = string.slice(offset,length).unpack("a#{length}").join('').downcase end def convert_deltas_to_values(trace_read) p_sample = 0; for sample_num in (0..trace_read.size-1) trace_read[sample_num] = trace_read[sample_num] + p_sample p_sample = trace_read[sample_num]; end p_sample = 0; for sample_num in (0..trace_read.size-1) trace_read[sample_num] = trace_read[sample_num] + p_sample p_sample = trace_read[sample_num]; end return trace_read end def convert_accuracies_to_qualities qualities = Array.new for base_pos in (0..@sequence.length-1) case sequence.slice(base_pos,1) when "a" qualities << @aqual[base_pos] when "c" qualities << @cqual[base_pos] when "g" qualities << @gqual[base_pos] when "t" qualities << @tqual[base_pos] else qualities << 0 end end return qualities end def get_comments(string) @comments = Hash.new comment_string = string.slice(@comments_offset,@comment_size) comment_string.gsub!(/\0/, "") comment_array = comment_string.split("\n") comment_array.each do |comment| comment =~ /(\w+)=(.*)/ @comments[$1] = $2 end end end end bio-2.0.3/lib/bio/db/sanger_chromatogram/chromatogram_to_biosequence.rb0000644000175000017500000000144314141516614025642 0ustar nileshnilesh# # = bio/db/sanger_chromatogram/chromatogram_to_biosequence.rb - Bio::SangerChromatogram to Bio::Sequence adapter module # # Copyright:: Copyright (C) 2009 Anthony Underwood , # License:: The Ruby License # # $Id:$ # require 'bio/sequence' require 'bio/sequence/adapter' # Internal use only. Normal users should not use this module. # # Bio::SangerChromatogram to Bio::Sequence adapter module. # It is internally used in Bio::SangerChromatogram#to_biosequence. # module Bio::Sequence::Adapter::SangerChromatogram extend Bio::Sequence::Adapter private def_biosequence_adapter :seq # primary accession def_biosequence_adapter :primary_accession do |orig| orig.version end end #module Bio::Sequence::Adapter::SangerChromatogram bio-2.0.3/lib/bio/db/sanger_chromatogram/chromatogram.rb0000644000175000017500000001274714141516614022567 0ustar nileshnilesh# # = bio/db/sanger_chromatogram/chromatogram.rb - Sanger Chromatogram class # # Copyright:: Copyright (C) 2009 Anthony Underwood , # License:: The Ruby License # # require 'bio/sequence/adapter' module Bio # == Description # # This is the Superclass for the Abif and Scf classes that allow importing of the common scf # and abi sequence chromatogram formats # The following attributes are Common to both the Abif and Scf subclasses # # * *chromatogram_type* (String): This is extracted from the chromatogram file itself and will # probably be either .scf or ABIF for Scf and Abif files respectively. # * *version* (String): The version of the Scf or Abif file # * *sequence* (String): the sequence contained within the chromatogram as a string. # * *qualities* (Array): the quality scores of each base as an array of integers. These will # probably be phred scores. # * *peak_indices* (Array): if the sequence traces contained within the chromatogram are imagined # as being plotted on an x,y graph, the peak indices are the x positions of the peaks that # represent the nucleotides bases found in the sequence from the chromatogram. For example if # the peak_indices are [16,24,37,49 ....] and the sequence is AGGT...., at position 16 the # traces in the chromatogram were base-called as an A, position 24 a G, position 37 a G, # position 49 a T etc # * *atrace*, *ctrace*, *gtrace*, *ttrace* (Array): If the sequence traces contained within # the chromatogram are imagined as being plotted on an x,y graph, these attributes are arrays of # y positions for each of the 4 nucleotide bases along the length of the x axis. If these were # plotted joined by lines of different colours then the resulting graph should look like the # original chromatogram file when viewed in a chromtogram viewer such as Chromas, 4Peaks or # FinchTV. # * *dye_mobility* (String): The mobility of the dye used when sequencing. This can influence the # base calling # # == Usage # filename = "path/to/sequence_chromatogram_file" # # for Abif files # chromatogram_ff = Bio::Abif.open(filename) # for Scf files # chromatogram_ff = Bio::Scf.open(filename) # # chromatogram = chromatogram_ff.next_entry # chromatogram.to_seq # => returns a Bio::Sequence object # chromatogram.sequence # => returns the sequence contained within the chromatogram as a string # chromatogram.qualities # => returns an array of quality values for each base # chromatogram.atrace # => returns an array of the a trace y positions # class SangerChromatogram # The type of chromatogram file .scf for Scf files and ABIF doe Abif files attr_accessor :chromatogram_type # The Version of the Scf or Abif file (String) attr_accessor :version # The sequence contained within the chromatogram (String) attr_accessor :sequence # An array of quality scores for each base in the sequence (Array) attr_accessor :qualities # An array 'x' positions (see description) on the trace where the bases occur/have been called (Array) attr_accessor :peak_indices # An array of 'y' positions (see description) for the 'A' trace from the chromatogram (Array attr_accessor :atrace # An array of 'y' positions (see description) for the 'C' trace from the chromatogram (Array attr_accessor :ctrace # An array of 'y' positions (see description) for the 'G' trace from the chromatogram (Array attr_accessor :gtrace # An array of 'y' positions (see description) for the 'T' trace from the chromatogram (Array attr_accessor :ttrace #The mobility of the dye used when sequencing (String) attr_accessor :dye_mobility def self.open(filename) Bio::FlatFile.open(self, filename) end # Returns a Bio::Sequence::NA object based on the sequence from the chromatogram def seq Bio::Sequence::NA.new(@sequence) end # Returns a Bio::Sequence object based on the sequence from the chromatogram def to_biosequence Bio::Sequence.adapter(self, Bio::Sequence::Adapter::SangerChromatogram) end alias :to_seq :to_biosequence # Returns the sequence from the chromatogram as a string def sequence_string @sequence end # Reverses and complements the current chromatogram object including its sequence, traces # and qualities def complement! # reverse traces tmp_trace = @atrace @atrace = @ttrace.reverse @ttrace = tmp_trace.reverse tmp_trace = @ctrace @ctrace = @gtrace.reverse @gtrace = tmp_trace.reverse # reverse base qualities if (defined? @aqual) && @aqual # if qualities exist tmp_qual = @aqual @aqual = @tqual.reverse @tqual = tmp_qual.reverse tmp_qual = @cqual @cqual = @gqual.reverse @gqual = tmp_qual.reverse end #reverse qualities @qualities = @qualities.reverse #reverse peak indices @peak_indices = @peak_indices.map{|index| @atrace.size - index} @peak_indices.reverse! # reverse sequence @sequence = @sequence.reverse.tr('atgcnrykmswbvdh','tacgnyrmkswvbhd') end # Returns a new chromatogram object of the appropriate subclass (scf or abi) where the # sequence, traces and qualities have all been revesed and complemented def complement chromatogram = self.dup chromatogram.complement! return chromatogram end end end bio-2.0.3/lib/bio/db/fastq/0000755000175000017500000000000014141516614014640 5ustar nileshnileshbio-2.0.3/lib/bio/db/fastq/fastq_to_biosequence.rb0000644000175000017500000000160414141516614021370 0ustar nileshnilesh# # = bio/db/fastq/fastq_to_biosequence.rb - Bio::Fastq to Bio::Sequence adapter module # # Copyright:: Copyright (C) 2009 # Naohisa Goto # License:: The Ruby License # require 'bio/sequence' require 'bio/sequence/adapter' # Internal use only. Normal users should not use this module. # # Bio::Fastq to Bio::Sequence adapter module. # It is internally used in Bio::Fastq#to_biosequence. # module Bio::Sequence::Adapter::Fastq extend Bio::Sequence::Adapter private def_biosequence_adapter :seq def_biosequence_adapter :entry_id # primary accession def_biosequence_adapter :primary_accession do |orig| orig.entry_id end def_biosequence_adapter :definition def_biosequence_adapter :quality_scores def_biosequence_adapter :quality_score_type def_biosequence_adapter :error_probabilities end #module Bio::Sequence::Adapter::Fastq bio-2.0.3/lib/bio/db/fastq/format_fastq.rb0000644000175000017500000001254114141516614017656 0ustar nileshnilesh# # = bio/db/fasta/format_fastq.rb - FASTQ format generater # # Copyright:: Copyright (C) 2009 # Naohisa Goto # License:: The Ruby License # require 'bio/db/fastq' module Bio::Sequence::Format::Formatter # INTERNAL USE ONLY, YOU SHOULD NOT USE THIS CLASS. # # FASTQ format output class for Bio::Sequence. # # The default FASTQ format is fastq-sanger. class Fastq < Bio::Sequence::Format::FormatterBase # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. # # Creates a new Fasta format generater object from the sequence. # # --- # *Arguments*: # * _sequence_: Bio::Sequence object # * (optional) :repeat_title => (true or false) if true, repeating title in the "+" line; if not true, "+" only (default false) # * (optional) :width => _width_: (Fixnum) width to wrap sequence and quality lines; nil to prevent wrapping (default nil) # * (optional) :title => _title_: (String) completely replaces title line with the _title_ (default nil) # * (optional) :default_score => _score_: (Integer) default score for bases that have no valid quality scores or error probabilities; false or nil means the lowest score, true means the highest score (default nil) def initialize; end if false # dummy for RDoc # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. # # Output the FASTQ format string of the sequence. # # Currently, this method is used in Bio::Sequence#output like so, # # s = Bio::Sequence.new('atgc') # puts s.output(:fastq_sanger) # --- # *Returns*:: String object def output title = @options[:title] width = @options.has_key?(:width) ? @options[:width] : nil seq = @sequence.seq.to_s entry_id = @sequence.entry_id || "#{@sequence.primary_accession}.#{@sequence.sequence_version}" definition = @sequence.definition unless title then title = definition.to_s unless title[0, entry_id.length] == entry_id and /\s/ =~ title[entry_id.length, 1].to_s then title = "#{entry_id} #{title}" end end title2 = @options[:repeat_title] ? title : '' qstr = fastq_quality_string(seq, @options[:default_score]) "@#{title}\n" + if width then seq.gsub(Regexp.new(".{1,#{width}}"), "\\0\n") else seq + "\n" end + "+#{title2}\n" + if width then qstr.gsub(Regexp.new(".{1,#{width}}"), "\\0\n") else qstr + "\n" end end private def fastq_format_data Bio::Fastq::FormatData::FASTQ_SANGER.instance end def fastq_quality_string(seq, default_score) sc = fastq_quality_scores(seq) if sc.size < seq.length then if default_score == true then # when true, the highest score default_score = fastq_format_data.score_range.end else # when false or nil, the lowest score default_score ||= fastq_format_data.score_range.begin end sc = sc + ([ default_score ] * (seq.length - sc.size)) end fastq_format_data.scores2str(sc) end def fastq_quality_scores(seq) return [] if seq.length <= 0 fmt = fastq_format_data # checks quality_scores qsc = @sequence.quality_scores qsc_type = @sequence.quality_score_type if qsc and qsc_type and qsc_type == fmt.quality_score_type and qsc.size >= seq.length then return qsc end # checks error_probabilities ep = @sequence.error_probabilities if ep and ep.size >= seq.length then return fmt.p2q(ep[0, seq.length]) end # If quality score type of the sequence is nil, regarded as :phred. qsc_type ||= :phred # checks if scores can be converted if qsc and qsc.size >= seq.length then case [ qsc_type, fmt.quality_score_type ] when [ :phred, :solexa ] return fmt.convert_scores_from_phred_to_solexa(qsc[0, seq.length]) when [ :solexa, :phred ] return fmt.convert_scores_from_solexa_to_phred(qsc[0, seq.length]) end end # checks quality scores type case qsc_type when :phred, :solexa #does nothing else qsc_type = nil qsc = nil end # collects piece of information qsc_cov = qsc ? qsc.size.quo(seq.length) : 0 ep_cov = ep ? ep.size.quo(seq.length) : 0 if qsc_cov > ep_cov then case [ qsc_type, fmt.quality_score_type ] when [ :phred, :phred ], [ :solexa, :solexa ] return qsc when [ :phred, :solexa ] return fmt.convert_scores_from_phred_to_solexa(qsc) when [ :solexa, :phred ] return fmt.convert_scores_from_solexa_to_phred(qsc) end elsif ep_cov > qsc_cov then return fmt.p2q(ep) end # if no information, returns empty array return [] end end #class Fastq # class Fastq_sanger is the same as the Fastq class. Fastq_sanger = Fastq class Fastq_solexa < Fastq private def fastq_format_data Bio::Fastq::FormatData::FASTQ_SOLEXA.instance end end #class Fastq_solexa class Fastq_illumina < Fastq private def fastq_format_data Bio::Fastq::FormatData::FASTQ_ILLUMINA.instance end end #class Fastq_illumina end #module Bio::Sequence::Format::Formatter bio-2.0.3/lib/bio/db/lasergene.rb0000644000175000017500000002147514141516614016025 0ustar nileshnilesh# # bio/db/lasergene.rb - Interface for DNAStar Lasergene sequence file format # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2007 Center for Biomedical Research Informatics, University of Minnesota (http://cbri.umn.edu) # License:: The Ruby License # # $Id:$ # module Bio # # bio/db/lasergene.rb - Interface for DNAStar Lasergene sequence file format # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2007 Center for Biomedical Research Informatics, University of Minnesota (http://cbri.umn.edu) # License:: The Ruby License # # = Description # # Bio::Lasergene reads DNAStar Lasergene formatted sequence files, or +.seq+ # files. It only expects to find one sequence per file. # # = Usage # # require 'bio' # filename = 'MyFile.seq' # lseq = Bio::Lasergene.new( IO.readlines(filename) ) # lseq.entry_id # => "Contig 1" # lseq.seq # => ATGACGTATCCAAAGAGGCGTTACC # # = Comments # # I'm only aware of the following three kinds of Lasergene file formats. Feel # free to send me other examples that may not currently be accounted for. # # File format 1: # # ## begin ## # "Contig 1" (1,934) # Contig Length: 934 bases # Average Length/Sequence: 467 bases # Total Sequence Length: 1869 bases # Top Strand: 2 sequences # Bottom Strand: 2 sequences # Total: 4 sequences # ^^ # ATGACGTATCCAAAGAGGCGTTACCGGAGAAGAAGACACCGCCCCCGCAGTCCTCTTGGCCAGATCCTCCGCCGCCGCCCCTGGCTCGTCCACCCCCGCCACAGTTACCGCTGGAGAAGGAAAAATGGCATCTTCAWCACCCGCCTATCCCGCAYCTTCGGAWRTACTATCAAGCGAACCACAGTCAGAACGCCCTCCTGGGCGGTGGACATGATGAGATTCAATATTAATGACTTTCTTCCCCCAGGAGGGGGCTCAAACCCCCGCTCTGTGCCCTTTGAATACTACAGAATAAGAAAGGTTAAGGTTGAATTCTGGCCCTGCTCCCCGATCACCCAGGGTGACAGGGGAATGGGCTCCAGTGCTGWTATTCTAGMTGATRRCTTKGTAACAAAGRCCACAGCCCTCACCTATGACCCCTATGTAAACTTCTCCTCCCGCCATACCATAACCCAGCCCTTCTCCTACCRCTCCCGYTACTTTACCCCCAAACCTGTCCTWGATKCCACTATKGATKACTKCCAACCAAACAACAAAAGAAACCAGCTGTGGSTGAGACTACAWACTGCTGGAAATGTAGACCWCGTAGGCCTSGGCACTGCGTKCGAAAACAGTATATACGACCAGGAATACAATATCCGTGTMACCATGTATGTACAATTCAGAGAATTTAATCTTAAAGACCCCCCRCTTMACCCKTAATGAATAATAAMAACCATTACGAAGTGATAAAAWAGWCTCAGTAATTTATTYCATATGGAAATTCWSGGCATGGGGGGGAAAGGGTGACGAACKKGCCCCCTTCCTCCSTSGMYTKTTCYGTAGCATTCYTCCAMAAYACCWAGGCAGYAMTCCTCCSATCAAGAGcYTSYACAGCTGGGACAGCAGTTGAGGAGGACCATTCAAAGGGGGTCGGATTGCTGGTAATCAGA # ## end ## # # # File format 2: # # ## begin ## # ^^: 350,935 # Contig 1 (1,935) # Contig Length: 935 bases # Average Length/Sequence: 580 bases # Total Sequence Length: 2323 bases # Top Strand: 2 sequences # Bottom Strand: 2 sequences # Total: 4 sequences # ^^ # ATGTCGGGGAAATGCTTGACCGCGGGCTACTGCTCATCATTGCTTTCTTTGTGGTATATCGTGCCGTTCTGTTTTGCTGTGCTCGTCAACGCCAGCGGCGACAGCAGCTCTCATTTTCAGTCGATTTATAACTTGACGTTATGTGAGCTGAATGGCACGAACTGGCTGGCAGACAACTTTAACTGGGCTGTGGAGACTTTTGTCATCTTCCCCGTGTTGACTCACATTGTTTCCTATGGTGCACTCACTACCAGTCATTTTCTTGACACAGTTGGTCTAGTTACTGTGTCTACCGCCGGGTTTTATCACGGGCGGTACGTCTTGAGTAGCATCTACGCGGTCTGTGCTCTGGCTGCGTTGATTTGCTTCGCCATCAGGTTTGCGAAGAACTGCATGTCCTGGCGCTACTCTTGCACTAGATACACCAACTTCCTCCTGGACACCAAGGGCAGACTCTATCGTTGGCGGTCGCCTGTCATCATAGAGAAAGGGGGTAAGGTTGAGGTCGAAGGTCATCTGATCGATCTCAAAAGAGTTGTGCTTGATGGCTCTGTGGCGACACCTTTAACCAGAGTTTCAGCGGAACAATGGGGTCGTCCCTAGACGACTTTTGCCATGATAGTACAGCCCCACAGAAGGTGCTCTTGGCGTTTTCCATCACCTACACGCCAGTGATGATATATGCCCTAAAGGTAAGCCGCGGCCGACTTTTGGGGCTTCTGCACCTTTTGATTTTTTTGAACTGTGCCTTTACTTTCGGGTACATGACATTCGTGCACTTTCGGAGCACGAACAAGGTCGCGCTCACTATGGGAGCAGTAGTCGCACTCCTTTGGGGGGTGTACTCAGCCATAGAAACCTGGAAATTCATCACCTCCAGATGCCGTTGTGCTTGCTAGGCCGCAAGTACATTCTGGCCCCTGCCCACCACGTTG # ## end ## # # File format 3 (non-standard Lasergene header): # # ## begin ## # LOCUS PRU87392 15411 bp RNA linear VRL 17-NOV-2000 # DEFINITION Porcine reproductive and respiratory syndrome virus strain VR-2332, # complete genome. # ACCESSION U87392 AF030244 U00153 # VERSION U87392.3 GI:11192298 # [...cut...] # 3'UTR 15261..15411 # polyA_site 15409 # ORIGIN # ^^ # atgacgtataggtgttggctctatgccttggcatttgtattgtcaggagctgtgaccattggcacagcccaaaacttgctgcacagaaacacccttctgtgatagcctccttcaggggagcttagggtttgtccctagcaccttgcttccggagttgcactgctttacggtctctccacccctttaaccatgtctgggatacttgatcggtgcacgtgtacccccaatgccagggtgtttatggcggagggccaagtctactgcacacgatgcctcagtgcacggtctctccttcccctgaacctccaagtttctgagctcggggtgctaggcctattctacaggcccgaagagccactccggtggacgttgccacgtgcattccccactgttgagtgctcccccgccggggcctgctggctttctgcaatctttccaatcgcacgaatgaccagtggaaacctgaacttccaacaaagaatggtacgggtcgcagctgagctttacagagccggccagctcacccctgcagtcttgaaggctctacaagtttatgaacggggttgccgctggtaccccattgttggacctgtccctggagtggccgttttcgccaattccctacatgtgagtgataaacctttcccgggagcaactcacgtgttgaccaacctgccgctcccgcagagacccaagcctgaagacttttgcccctttgagtgtgctatggctactgtctatgacattggtcatgacgccgtcatgtatgtggccgaaaggaaagtctcctgggcccctcgtggcggggatgaagtgaaatttgaagctgtccccggggagttgaagttgattgcgaaccggctccgcacctccttcccgccccaccacacagtggacatgtctaagttcgccttcacagcccctgggtgtggtgtttctatgcgggtcgaacgccaacacggctgccttcccgctgacactgtccctgaaggcaactgctggtggagcttgtttgacttgcttccactggaagttcagaacaaagaaattcgccatgctaaccaatttggctaccagaccaagcatggtgtctctggcaagtacctacagcggaggctgca[...cut...] # ## end ## # class Lasergene # Entire header before the sequence attr_reader :comments # Sequence # # Bio::Sequence::NA or Bio::Sequence::AA object attr_reader :sequence # Name of sequence # * Parsed from standard Lasergene header attr_reader :name # Contig length, length of present sequence # * Parsed from standard Lasergene header attr_reader :contig_length # Average length per sequence # * Parsed from standard Lasergene header attr_reader :average_length # Length of parent sequence # * Parsed from standard Lasergene header attr_reader :total_length # Number of top strand sequences # * Parsed from standard Lasergene header attr_reader :top_strand_sequences # Number of bottom strand sequences # * Parsed from standard Lasergene header attr_reader :bottom_strand_sequences # Number of sequences # * Parsed from standard Lasergene header attr_reader :total_sequences DELIMITER_1 = '^\^\^:' # Match '^^:' at the beginning of a line DELIMITER_2 = '^\^\^' # Match '^^' at the beginning of a line def initialize(lines) process(lines) end # Is the comment header recognized as standard Lasergene format? # # --- # *Arguments* # * _none_ # *Returns*:: +true+ _or_ +false+ def standard_comment? @standard_comment end # Sequence # # Bio::Sequence::NA or Bio::Sequence::AA object def seq @sequence end # Name of sequence # * Parsed from standard Lasergene header def entry_id @name end ######### protected ######### def process(lines) delimiter_1_indices = [] delimiter_2_indices = [] # If the data from the file is passed as one big String instead of # broken into an Array, convert lines to an Array if lines.kind_of? String lines = lines.tr("\r", '').split("\n") end lines.each_with_index do |line, index| if line.match DELIMITER_1 delimiter_1_indices << index elsif line.match DELIMITER_2 delimiter_2_indices << index end end raise InputError, "More than one delimiter of type '#{DELIMITER_1}'" if delimiter_1_indices.size > 1 raise InputError, "More than one delimiter of type '#{DELIMITER_2}'" if delimiter_2_indices.size > 1 raise InputError, "No comment to data separator of type '#{DELIMITER_2}'" if delimiter_2_indices.size < 1 if !delimiter_1_indices.empty? # toss out DELIMETER_1 and anything preceding it @comments = lines[ (delimiter_1_indices[0] + 1) .. (delimiter_2_indices[0] - 1) ] else @comments = lines[ 0 .. (delimiter_2_indices[0] - 1) ] end @standard_comment = false if @comments[0] =~ %r{(.+)\s+\(\d+,\d+\)} # if we have a standard Lasergene comment @standard_comment = true @name = $1 comments.each do |comment| if comment.match('Contig Length:\s+(\d+)') @contig_length = $1.to_i elsif comment.match('Average Length/Sequence:\s+(\d+)') @average_length = $1.to_i elsif comment.match('Total Sequence Length:\s+(\d+)') @total_length = $1.to_i elsif comment.match('Top Strand:\s+(\d+)') @top_strand_sequences = $1.to_i elsif comment.match('Bottom Strand:\s+(\d+)') @bottom_strand_sequences = $1.to_i elsif comment.match('Total:\s+(\d+)') @total_sequences = $1.to_i end end end @comments = @comments.join('') @sequence = Bio::Sequence.auto( lines[ (delimiter_2_indices[0] + 1) .. -1 ].join('') ) end end # Lasergene end # Bio bio-2.0.3/lib/bio/db/embl/0000755000175000017500000000000014141516614014441 5ustar nileshnileshbio-2.0.3/lib/bio/db/embl/embl.rb0000644000175000017500000003222614141516614015712 0ustar nileshnilesh# # = bio/db/embl/embl.rb - EMBL database class # # # Copyright:: Copyright (C) 2001-2007 # Mitsuteru C. Nakao # Jan Aerts # License:: The Ruby License # # $Id: embl.rb,v 1.29.2.7 2008/06/17 16:04:36 ngoto Exp $ # # == Description # # Parser class for EMBL database entry. # # == Examples # # emb = Bio::EMBL.new($<.read) # emb.entry_id # emb.each_cds do |cds| # cds # A CDS in feature table. # end # emb.seq #=> "ACGT..." # # == References # # * The EMBL Nucleotide Sequence Database # http://www.ebi.ac.uk/embl/ # # * The EMBL Nucleotide Sequence Database: Users Manual # http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html # require 'date' require 'bio/db' require 'bio/db/embl/common' require 'bio/compat/features' require 'bio/compat/references' require 'bio/sequence' require 'bio/sequence/dblink' module Bio class EMBL < EMBLDB include Bio::EMBLDB::Common # returns contents in the ID line. # * Bio::EMBL#id_line -> # where is: # {'ENTRY_NAME' => String, 'MOLECULE_TYPE' => String, 'DIVISION' => String, # 'SEQUENCE_LENGTH' => Int, 'SEQUENCE_VERSION' => Int} # # ID Line # "ID ENTRY_NAME DATA_CLASS; MOLECULE_TYPE; DIVISION; SEQUENCE_LENGTH BP." # # DATA_CLASS = ['standard'] # # MOLECULE_TYPE: DNA RNA XXX # # Code ( DIVISION ) # EST (ESTs) # PHG (Bacteriophage) # FUN (Fungi) # GSS (Genome survey) # HTC (High Throughput cDNAs) # HTG (HTGs) # HUM (Human) # INV (Invertebrates) # ORG (Organelles) # MAM (Other Mammals) # VRT (Other Vertebrates) # PLN (Plants) # PRO (Prokaryotes) # ROD (Rodents) # SYN (Synthetic) # STS (STSs) # UNC (Unclassified) # VRL (Viruses) # # Rel 89- # ID CD789012; SV 4; linear; genomic DNA; HTG; MAM; 500 BP. # ID <1>; SV <2>; <3>; <4>; <5>; <6>; <7> BP. # 1. Primary accession number # 2. Sequence version number # 3. Topology: 'circular' or 'linear' # 4. Molecule type (see note 1 below) # 5. Data class (see section 3.1) # 6. Taxonomic division (see section 3.2) # 7. Sequence length (see note 2 below) def id_line(key=nil) unless @data['ID'] tmp = Hash.new idline = fetch('ID').split(/; +/) tmp['ENTRY_NAME'], tmp['DATA_CLASS'] = idline.shift.split(/ +/) if idline.first =~ /^SV/ tmp['SEQUENCE_VERSION'] = idline.shift.split(' ').last tmp['TOPOLOGY'] = idline.shift tmp['MOLECULE_TYPE'] = idline.shift tmp['DATA_CLASS'] = idline.shift else tmp['MOLECULE_TYPE'] = idline.shift end tmp['DIVISION'] = idline.shift tmp['SEQUENCE_LENGTH'] = idline.shift.strip.split(' ').first.to_i @data['ID'] = tmp end if key @data['ID'][key] else @data['ID'] end end # returns ENTRY_NAME in the ID line. # * Bio::EMBL#entry -> String def entry id_line('ENTRY_NAME') end alias entry_name entry alias entry_id entry # returns MOLECULE_TYPE in the ID line. # * Bio::EMBL#molecule -> String def molecule id_line('MOLECULE_TYPE') end alias molecule_type molecule def data_class id_line('DATA_CLASS') end def topology id_line('TOPOLOGY') end # returns DIVISION in the ID line. # * Bio::EMBL#division -> String def division id_line('DIVISION') end # returns SEQUENCE_LENGTH in the ID line. # * Bio::EMBL#sequencelength -> String def sequence_length id_line('SEQUENCE_LENGTH') end alias seqlen sequence_length # AC Line # "AC A12345; B23456;" # returns the version information in the sequence version (SV) line. # * Bio::EMBL#sv -> Accession.Version in String # * Bio::EMBL#version -> accession in Int # # SV Line; sequence version (1/entry) # SV Accession.Version def sv if (v = field_fetch('SV').sub(/;/,'')) == "" [id_line['ENTRY_NAME'], id_line['SEQUENCE_VERSION']].join('.') else v end end def version (sv.split(".")[1] || id_line['SEQUENCE_VERSION']).to_i end # returns contents in the date (DT) line. # * Bio::EMBL#dt ->
# where
is: # {} # * Bio::EMBL#dt(key) -> String # keys: 'created' and 'updated' # # DT Line; date (2/entry) def dt(key=nil) unless @data['DT'] tmp = Hash.new dt_line = self.get('DT').split(/\n/) tmp['created'] = dt_line[0].sub(/\w{2} /,'').strip tmp['updated'] = dt_line[1].sub(/\w{2} /,'').strip @data['DT'] = tmp end if key @data['DT'][key] else @data['DT'] end end #-- ## # DE Line; description (>=1) # #++ #-- ## # KW Line; keyword (>=1) # KW [Keyword;]+ # # Bio::EMBLDB#kw -> Array # #keywords -> Array #++ #-- ## # OS Line; organism species (>=1) # OS Genus species (name) # "OS Trifolium repens (white clover)" # # Bio::EMBLDB#os -> Array #++ # returns contents in the OS line. # * Bio::EMBL#os -> Array of # where is: # [{'name'=>'Human', 'os'=>'Homo sapiens'}, # {'name'=>'Rat', 'os'=>'Rattus norveticus'}] # * Bio::EMBL#os[0]['name'] => "Human" # * Bio::EMBL#os[0] => {'name'=>"Human", 'os'=>'Homo sapiens'} #-- # * Bio::EMBL#os(0) => "Homo sapiens (Human)" #++ # # OS Line; organism species (>=1) # OS Trifolium repens (white clover) # # Typically, OS line shows "Genus species (name)" style: # OS Genus species (name) # # Other examples: # OS uncultured bacterium # OS xxxxxx metagenome # OS Cloning vector xxxxxxxx # Complicated examples: # OS Poeciliopsis gracilis (Poeciliopsis gracilis (Heckel, 1848)) # OS Etmopterus sp. B Last & Stevens, 1994 (bristled lanternshark) # OS Galaxias sp. D (Allibone et al., 1996) (Pool Burn galaxias) # OS Sicydiinae sp. 'Keith et al., 2010' # OS Acanthopagrus sp. 'Jean & Lee, 2008' # OS Gaussia princeps (T. Scott, 1894) # OS Rana sp. 8 Hillis & Wilcox, 2005 # OS Contracaecum rudolphii C D'Amelio et al., 2007 # OS Partula sp. 'Mt. Marau, Tahiti' # OS Leptocephalus sp. 'type II larva' (Smith, 1989) # OS Tayloria grandis (D.G.Long) Goffinet & A.J.Shaw, 2002 # OS Non-A, non-B hepatitis virus # OS Canidae (dog, coyote, wolf, fox) # OS Salmonella enterica subsp. enterica serovar 4,[5],12:i:- # OS Yersinia enterocolitica (type O:5,27) # OS Influenza A virus (A/green-winged teal/OH/72/99(H6N1,4)) # OS Influenza A virus (A/Beijing/352/1989,(highgrowth reassortant NIB26)(H3N2)) # OS Recombinant Hepatitis C virus H77(5'UTR-NS2)/JFH1_V787A,Q1247L # def os(num = nil) unless @data['OS'] os = Array.new tmp = fetch('OS') if /([A-Z][a-z]* *[\w \:\'\+\-]+\w) *\(([\w ]+)\)\s*\z/ =~ tmp org = $1 name = $2 os.push({'name' => name, 'os' => org}) else os.push({'name' => nil, 'os' => tmp}) end @data['OS'] = os end if num # EX. "Trifolium repens (white clover)" "#{@data['OS'][num]['os']} {#data['OS'][num]['name']" end @data['OS'] end #-- ## # OC Line; organism classification (>=1) # # Bio::EMBLDB#oc -> Array #++ #-- ## # OG Line; organella (0 or 1/entry) # ["Mitochondrion", "Chloroplast","Kinetoplast", "Cyanelle", "Plastid"] # or a plasmid name (e.g. "Plasmid pBR322"). # # Bio::EMBLDB#og -> String #++ #-- ## # R Lines # RN RC RP RX RA RT RL # # Bio::EMBLDB#ref #++ #-- ## # DR Line; defabases cross-regerence (>=0) # "DR database_identifier; primary_identifier; secondary_identifier." # # Bio::EMBLDB#dr #++ # returns feature table header (String) in the feature header (FH) line. # # FH Line; feature table header (0 or 2) def fh fetch('FH') end # returns contents in the feature table (FT) lines. # * Bio::EMBL#ft -> Bio::Features # * Bio::EMBL#ft {} -> {|Bio::Feature| } # # same as features method in bio/db/genbank.rb # # FT Line; feature table data (>=0) def ft unless @data['FT'] ary = Array.new in_quote = false @orig['FT'].each_line do |line| next if line =~ /^FEATURES/ #head = line[0,20].strip # feature key (source, CDS, ...) body = line[20,60].chomp # feature value (position, /qualifier=) if line =~ /^FT {3}(\S+)/ ary.push([ $1, body ]) # [ feature, position, /q="data", ... ] elsif body =~ /^ \// and not in_quote ary.last.push(body) # /q="data..., /q=data, /q if body =~ /=" / and body !~ /"$/ in_quote = true end else ary.last.last << body # ...data..., ...data..." if body =~ /"$/ in_quote = false end end end ary.map! do |subary| parse_qualifiers(subary) end @data['FT'] = ary.extend(Bio::Features::BackwardCompatibility) end if block_given? @data['FT'].each do |feature| yield feature end else @data['FT'] end end alias features ft # iterates on CDS features in the FT lines. def each_cds ft.each do |cds_feature| if cds_feature.feature == 'CDS' yield cds_feature end end end # iterates on gene features in the FT lines. def each_gene ft.each do |gene_feature| if gene_feature.feature == 'gene' yield gene_feature end end end # returns comment text in the comments (CC) line. # # CC Line; comments of notes (>=0) def cc get('CC').to_s.gsub(/^CC /, '') end alias comment cc ## # XX Line; spacer line (many) # def nxx # end # returns sequence header information in the sequence header (SQ) line. # * Bio::EMBL#sq -> # where is: # {'ntlen' => Int, 'other' => Int, # 'a' => Int, 'c' => Int, 'g' => Int, 't' => Int} # * Bio::EMBL#sq(base) -> # * Bio::EMBL#sq[base] -> # # SQ Line; sequence header (1/entry) # SQ Sequence 1859 BP; 609 A; 314 C; 355 G; 581 T; 0 other; def sq(base = nil) unless @data['SQ'] fetch('SQ') =~ \ /(\d+) BP\; (\d+) A; (\d+) C; (\d+) G; (\d+) T; (\d+) other;/ @data['SQ'] = {'ntlen' => $1.to_i, 'other' => $6.to_i, 'a' => $2.to_i, 'c' => $3.to_i , 'g' => $4.to_i, 't' => $5.to_i} else @data['SQ'] end if base @data['SQ'][base.downcase] else @data['SQ'] end end # returns the nucleotie sequence in this entry. # * Bio::EMBL#seq -> Bio::Sequence::NA # # @orig[''] as sequence # bb Line; (blanks) sequence data (>=1) def seq Bio::Sequence::NA.new( fetch('').gsub(/ /,'').gsub(/\d+/,'') ) end alias naseq seq alias ntseq seq #-- # // Line; termination line (end; 1/entry) #++ # modified date. Returns Date object, String or nil. def date_modified parse_date(self.dt['updated']) end # created date. Returns Date object, String or nil. def date_created parse_date(self.dt['created']) end # release number when last updated def release_modified parse_release_version(self.dt['updated'])[0] end # release number when created def release_created parse_release_version(self.dt['created'])[0] end # entry version number numbered by EMBL def entry_version parse_release_version(self.dt['updated'])[1] end # parse date string. Returns Date object. def parse_date(str) begin Date.parse(str) rescue ArgumentError, TypeError, NoMethodError, NameError str end end private :parse_date # extracts release and version numbers from DT line def parse_release_version(str) return [ nil, nil ] unless str a = str.split(/[\(\,\)]/) a.shift #date string e.g. "14-OCT-2006" rel = nil ver = nil a.each do |x| case x when /Rel\.\s*(.+)/ rel = $1.strip when /Version\s*(.+)/ ver = $1.strip end end [ rel, ver ] end private :parse_release_version # database references (DR). # Returns an array of Bio::Sequence::DBLink objects. def dblinks get('DR').split(/\n/).collect { |x| Bio::Sequence::DBLink.parse_embl_DR_line(x) } end # species def species self.fetch('OS') end # taxonomy classfication alias classification oc # converts the entry to Bio::Sequence object # --- # *Arguments*:: # *Returns*:: Bio::Sequence object def to_biosequence Bio::Sequence.adapter(self, Bio::Sequence::Adapter::EMBL) end ### private methods private ## # same as Bio::GenBank#parse_qualifiers(feature) def parse_qualifiers(ary) feature = Feature.new feature.feature = ary.shift feature.position = ary.shift.gsub(/\s/, '') ary.each do |f| if f =~ %r{/([^=]+)=?"?([^"]*)"?} qualifier, value = $1, $2 if value.empty? value = true end case qualifier when 'translation' value = Sequence::AA.new(value.gsub(/\s/, '')) when 'codon_start' value = value.to_i end feature.append(Feature::Qualifier.new(qualifier, value)) end end return feature end end # class EMBL end # module Bio bio-2.0.3/lib/bio/db/embl/common.rb0000644000175000017500000002305414141516614016262 0ustar nileshnilesh# # = bio/db/embl.rb - Common methods for EMBL style database classes # # Copyright:: Copyright (C) 2001-2006 # Mitsuteru C. Nakao # License:: The Ruby License # # $Id: common.rb,v 1.12.2.5 2008/05/07 12:22:10 ngoto Exp $ # # == Description # # EMBL style databases class # # This module defines a common framework among EMBL, UniProtKB, SWISS-PROT, # TrEMBL. For more details, see the documentations in each embl/*.rb # libraries. # # EMBL style format: # ID - identification (begins each entry; 1 per entry) # AC - accession number (>=1 per entry) # SV - sequence version (1 per entry) # DT - date (2 per entry) # DE - description (>=1 per entry) # KW - keyword (>=1 per entry) # OS - organism species (>=1 per entry) # OC - organism classification (>=1 per entry) # OG - organelle (0 or 1 per entry) # RN - reference number (>=1 per entry) # RC - reference comment (>=0 per entry) # RP - reference positions (>=1 per entry) # RX - reference cross-reference (>=0 per entry) # RA - reference author(s) (>=1 per entry) # RG - reference group (>=0 per entry) # RT - reference title (>=1 per entry) # RL - reference location (>=1 per entry) # DR - database cross-reference (>=0 per entry) # FH - feature table header (0 or 2 per entry) # FT - feature table data (>=0 per entry) # CC - comments or notes (>=0 per entry) # XX - spacer line (many per entry) # SQ - sequence header (1 per entry) # bb - (blanks) sequence data (>=1 per entry) # // - termination line (ends each entry; 1 per entry) # # == Examples # # # Make a new parser class for EMBL style database entry. # require 'bio/db/embl/common' # module Bio # class NEWDB < EMBLDB # include Bio::EMBLDB::Common # end # end # # == References # # * The EMBL Nucleotide Sequence Database # http://www.ebi.ac.uk/embl/ # # * The EMBL Nucleotide Sequence Database: Users Manual # http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html # # * Swiss-Prot Protein knowledgebase. TrEMBL Computer-annotated supplement # to Swiss-Prot # http://au.expasy.org/sprot/ # # * UniProt # http://uniprot.org/ # # * The UniProtKB/SwissProt/TrEMBL User Manual # http://www.expasy.org/sprot/userman.html # require 'bio/db' require 'bio/reference' require 'bio/compat/references' module Bio class EMBLDB module Common DELIMITER = "\n//\n" RS = DELIMITER TAGSIZE = 5 def initialize(entry) super(entry, TAGSIZE) end # returns a Array of accession numbers in the AC lines. # # AC Line # "AC A12345; B23456;" # AC [AC1;]+ # # Accession numbers format: # 1 2 3 4 5 6 # [O,P,Q] [0-9] [A-Z, 0-9] [A-Z, 0-9] [A-Z, 0-9] [0-9] def ac unless @data['AC'] tmp = Array.new field_fetch('AC').split(/ /).each do |e| tmp.push(e.sub(/;/,'')) end @data['AC'] = tmp end @data['AC'] end alias accessions ac # returns the first accession number in the AC lines def accession ac[0] end # returns a String int the DE line. # # DE Line def de unless @data['DE'] @data['DE'] = fetch('DE') end @data['DE'] end alias description de alias definition de # API # returns contents in the OS line. # * Bio::EMBLDB#os -> Array of # where is: # [{'name'=>'Human', 'os'=>'Homo sapiens'}, # {'name'=>'Rat', 'os'=>'Rattus norveticus'}] # * Bio::SPTR#os[0]['name'] => "Human" # * Bio::SPTR#os[0] => {'name'=>"Human", 'os'=>'Homo sapiens'} # * Bio::STPR#os(0) => "Homo sapiens (Human)" # # OS Line; organism species (>=1) # "OS Trifolium repens (white clover)" # # OS Genus species (name). # OS Genus species (name0) (name1). # OS Genus species (name0) (name1). # OS Genus species (name0), G s0 (name0), and G s (name1). def os(num = nil) unless @data['OS'] os = Array.new fetch('OS').split(/, and|, /).each do |tmp| if tmp =~ /([A-Z][a-z]* *[\w \:\'\+\-]+\w)/ org = $1 tmp =~ /(\(.+\))/ os.push({'name' => $1, 'os' => org}) else raise "Error: OS Line. #{$!}\n#{fetch('OS')}\n" end end @data['OS'] = os end if num # EX. "Trifolium repens (white clover)" "#{@data['OS'][num]['os']} {#data['OS'][num]['name']" end @data['OS'] end # returns contents in the OG line. # * Bio::EMBLDB::Common#og -> [ * ] # # OG Line; organella (0 or 1/entry) # OG Plastid; Chloroplast. # OG Mitochondrion. # OG Plasmid sym pNGR234a. # OG Plastid; Cyanelle. # OG Plasmid pSymA (megaplasmid 1). # OG Plasmid pNRC100, Plasmid pNRC200, and Plasmid pHH1. def og unless @data['OG'] og = Array.new if get('OG').size > 0 ogstr = fetch('OG') ogstr.sub!(/\.$/,'') ogstr.sub!(/ and/,'') ogstr.sub!(/;/, ',') ogstr.split(',').each do |tmp| og.push(tmp.strip) end end @data['OG'] = og end @data['OG'] end # returns contents in the OC line. # * Bio::EMBLDB::Common#oc -> [ * ] # OC Line; organism classification (>=1) # OC Eukaryota; Alveolata; Apicomplexa; Piroplasmida; Theileriidae; # OC Theileria. def oc unless @data['OC'] begin @data['OC'] = fetch('OC').sub(/.$/,'').split(/;/).map {|e| e.strip } rescue NameError nil end end @data['OC'] end # returns keywords in the KW line. # * Bio::EMBLDB::Common#kw -> [ * ] # KW Line; keyword (>=1) # KW [Keyword;]+ def kw unless @data['KW'] if get('KW').size > 0 tmp = fetch('KW').sub(/.$/,'') @data['KW'] = tmp.split(/;/).map {|e| e.strip } else @data['KW'] = [] end end @data['KW'] end alias keywords kw # returns contents in the R lines. # * Bio::EMBLDB::Common#ref -> [ * ] # where is: # {'RN' => '', 'RC' => '', 'RP' => '', 'RX' => '', # 'RA' => '', 'RT' => '', 'RL' => '', 'RG' => ''} # # R Lines # * RN RC RP RX RA RT RL RG def ref unless @data['R'] ary = Array.new get('R').split(/\nRN /).each do |str| raw = {'RN' => '', 'RC' => '', 'RP' => '', 'RX' => '', 'RA' => '', 'RT' => '', 'RL' => '', 'RG' => ''} str = 'RN ' + str unless /^RN / =~ str str.split("\n").each do |line| if /^(R[NPXARLCTG]) (.+)/ =~ line raw[$1] += $2 + ' ' else raise "Invalid format in R lines, \n[#{line}]\n" end end raw.each_value {|v| v.strip! v.sub!(/^"/,'') v.sub!(/;$/,'') v.sub!(/"$/,'') } ary.push(raw) end @data['R'] = ary end @data['R'] end # returns Bio::Reference object from Bio::EMBLDB::Common#ref. # * Bio::EMBLDB::Common#ref -> Bio::References def references unless @data['references'] ary = self.ref.map {|ent| hash = Hash.new ent.each {|key, value| case key when 'RN' if /\[(\d+)\]/ =~ value.to_s hash['embl_gb_record_number'] = $1.to_i end when 'RC' unless value.to_s.strip.empty? hash['comments'] ||= [] hash['comments'].push value end when 'RP' hash['sequence_position'] = value when 'RA' a = value.split(/\, /) a.each do |x| x.sub!(/( [^ ]+)\z/, ",\\1") end hash['authors'] = a when 'RT' hash['title'] = value when 'RL' if /(.*) (\d+) *(\(([^\)]+)\))?(\, |\:)([a-zA-Z\d]+\-[a-zA-Z\d]+) *\((\d+)\)\.?\z/ =~ value.to_s hash['journal'] = $1.rstrip hash['volume'] = $2 hash['issue'] = $4 hash['pages'] = $6 hash['year'] = $7 else hash['journal'] = value end when 'RX' # PUBMED, DOI, (AGRICOLA) value.split(/\. /).each {|item| tag, xref = item.split(/\; /).map {|i| i.strip.sub(/\.\z/, '') } hash[ tag.downcase ] = xref } end } Reference.new(hash) } @data['references'] = ary.extend(Bio::References::BackwardCompatibility) end @data['references'] end # returns contents in the DR line. # * Bio::EMBLDB::Common#dr -> [ * ] # where is: # * Bio::EMBLDB::Common#dr {|k,v| } # # DR Line; defabases cross-reference (>=0) # a cross_ref pre one line # "DR database_identifier; primary_identifier; secondary_identifier." def dr unless @data['DR'] tmp = Hash.new self.get('DR').split(/\n/).each do |db| a = db.sub(/^DR /,'').sub(/.$/,'').strip.split(/;[ ]/) dbname = a.shift tmp[dbname] = Array.new unless tmp[dbname] tmp[dbname].push(a) end @data['DR'] = tmp end if block_given? @data['DR'].each do |k,v| yield(k, v) end else @data['DR'] end end end # module Common end # class EMBLDB end # module Bio bio-2.0.3/lib/bio/db/embl/format_embl.rb0000644000175000017500000001325414141516614017262 0ustar nileshnilesh# # = bio/db/embl/format_embl.rb - EMBL format generater # # Copyright:: Copyright (C) 2008 # Jan Aerts , # Naohisa Goto # License:: The Ruby License # module Bio::Sequence::Format::NucFormatter # INTERNAL USE ONLY, YOU SHOULD NOT USE THIS CLASS. # Embl format output class for Bio::Sequence. class Embl < Bio::Sequence::Format::FormatterBase # helper methods include Bio::Sequence::Format::INSDFeatureHelper private # wrapping with EMBL style def embl_wrap(prefix, str) wrap(str.to_s, 80, prefix) end # Given words (an Array of String) are wrapping with EMBL style. # Each word is never splitted inside the word. def embl_wrap_words(prefix, array) width = 80 result = [] str = nil array.each do |x| if str then if str.length + 1 + x.length > width then str = nil else str.concat ' ' str.concat x end end unless str then str = prefix + x result.push str end end result.join("\n") end # format reference # ref:: Bio::Reference object # hash:: (optional) a hash for RN (reference number) administration def reference_format_embl(ref, hash = nil) lines = Array.new if ref.embl_gb_record_number or hash then refno = ref.embl_gb_record_number.to_i hash ||= {} if refno <= 0 or hash[refno] then refno = hash.keys.sort[-1].to_i + 1 hash[refno] = true end lines << embl_wrap("RN ", "[#{refno}]") end if ref.comments then ref.comments.each do |cmnt| lines << embl_wrap("RC ", cmnt) end end unless ref.sequence_position.to_s.empty? then lines << embl_wrap("RP ", "#{ref.sequence_position}") end unless ref.doi.to_s.empty? then lines << embl_wrap("RX ", "DOI; #{ref.doi}.") end unless ref.pubmed.to_s.empty? then lines << embl_wrap("RX ", "PUBMED; #{ref.pubmed}.") end unless ref.authors.empty? then auth = ref.authors.collect do |x| y = x.to_s.strip.split(/\, *([^\,]+)\z/) y[1].gsub!(/\. +/, '.') if y[1] y.join(' ') end lastauth = auth.pop auth.each { |x| x.concat ',' } auth.push(lastauth.to_s + ';') lines << embl_wrap_words('RA ', auth) end lines << embl_wrap('RT ', (ref.title.to_s.empty? ? '' : "\"#{ref.title}\"") + ';') unless ref.journal.to_s.empty? then volissue = "#{ref.volume.to_s}" volissue = "#{volissue}(#{ref.issue})" unless ref.issue.to_s.empty? rl = "#{ref.journal}" rl += " #{volissue}" unless volissue.empty? rl += ":#{ref.pages}" unless ref.pages.to_s.empty? rl += "(#{ref.year})" unless ref.year.to_s.empty? rl += '.' lines << embl_wrap('RL ', rl) end lines << "XX" return lines.join("\n") end def seq_format_embl(seq) counter = 0 result = seq.gsub(/.{1,60}/) do |x| counter += x.length x = x.gsub(/.{10}/, '\0 ') sprintf(" %-66s%9d\n", x, counter) end result.chomp! result end def seq_composition(seq) { :a => seq.count('aA'), :c => seq.count('cC'), :g => seq.count('gG'), :t => seq.count('tTuU'), :other => seq.count('^aAcCgGtTuU') } end # moleculue type def mol_type_embl if mt = molecule_type then mt elsif fe = (features or []).find { |f| f.feature == 'source' } and qu = fe.qualifiers.find { |q| q.qualifier == 'mol_type' } then qu.value else 'NA' end end # CC line. Comments. def comments_format_embl(cmnts) return '' if !cmnts or cmnts.empty? cmnts = [ cmnts ] unless cmnts.kind_of?(Array) a = [] cmnts.each do |str| a.push embl_wrap('CC ', str) end unless a.empty? then a.push "XX " a.push '' # dummy to put "\n" at the end of the string end a.join("\n") end # Erb template of EMBL format for Bio::Sequence erb_template <<'__END_OF_TEMPLATE__' ID <%= primary_accession || entry_id %>; SV <%= sequence_version %>; <%= topology %>; <%= mol_type_embl %>; <%= data_class %>; <%= division %>; <%= seq.length %> BP. XX <%= embl_wrap('AC ', accessions.reject{|a| a.nil?}.join('; ') + ';') %> XX DT <%= format_date(date_created || null_date) %> (Rel. <%= release_created || 0 %>, Created) DT <%= format_date(date_modified || null_date) %> (Rel. <%= release_modified || 0 %>, Last updated, Version <%= entry_version || 0 %>) XX <%= embl_wrap('DE ', definition) %> XX <%= embl_wrap('KW ', (keywords || []).join('; ') + '.') %> XX OS <%= species %> <%= embl_wrap('OC ', (classification || []).join('; ') + '.') %> XX <% hash = {}; (references || []).each do |ref| %><%= reference_format_embl(ref, hash) %> <% end %><% (dblinks || []).each do |r| %>DR <%= r.database %>; <%= r.id %><% unless r.secondary_ids.empty? %>; <%= r.secondary_ids[0] %><% end %>. <% end %><% if dblinks and !dblinks.empty? then %>XX <% end %><%= comments_format_embl(comments) %>FH Key Location/Qualifiers FH <%= format_features_embl(features || []) %>XX SQ Sequence <%= seq.length %> BP; <% c = seq_composition(seq) %><%= c[:a] %> A; <%= c[:c] %> C; <%= c[:g] %> G; <%= c[:t] %> T; <%= c[:other] %> other; <%= seq_format_embl(seq) %> // __END_OF_TEMPLATE__ end #class Embl end #module Bio::Sequence::Format::NucFormatter bio-2.0.3/lib/bio/db/embl/uniprotkb.rb0000644000175000017500000012122214141516614017003 0ustar nileshnilesh# # = bio/db/embl/uniprotkb.rb - UniProtKB data parser class # # Copyright:: Copyright (C) 2001-2006 Mitsuteru C. Nakao # License:: The Ruby License # # # == Description # # See Bio::UniProtKB documents. # require 'bio/db' require 'bio/db/embl/common' module Bio # == Description # # Parser class for UniProtKB/SwissProt and TrEMBL database entry. # # See the UniProtKB document files and manuals. # # == Examples # # str = File.read("p53_human.swiss") # obj = Bio::UniProtKB.new(str) # obj.entry_id #=> "P53_HUMAN" # # == References # # * The UniProt Knowledgebase (UniProtKB) # http://www.uniprot.org/help/uniprotkb # # * The Universal Protein Resource (UniProt) # http://uniprot.org/ # # * The UniProtKB/SwissProt/TrEMBL User Manual # http://www.uniprot.org/docs/userman.html # class UniProtKB < EMBLDB include Bio::EMBLDB::Common @@entry_regrexp = /[A-Z0-9]{1,4}_[A-Z0-9]{1,5}/ @@data_class = ["STANDARD", "PRELIMINARY"] # returns a Hash of the ID line. # # returns a content (Int or String) of the ID line by a given key. # Hash keys: ['ENTRY_NAME', 'DATA_CLASS', 'MODECULE_TYPE', 'SEQUENCE_LENGTH'] # # === ID Line (since UniProtKB release 9.0 of 31-Oct-2006) # ID P53_HUMAN Reviewed; 393 AA. # #"ID #{ENTRY_NAME} #{DATA_CLASS}; #{SEQUENCE_LENGTH}." # # === Examples # obj.id_line #=> {"ENTRY_NAME"=>"P53_HUMAN", "DATA_CLASS"=>"Reviewed", # "SEQUENCE_LENGTH"=>393, "MOLECULE_TYPE"=>nil} # # obj.id_line('ENTRY_NAME') #=> "P53_HUMAN" # # # === ID Line (older style) # ID P53_HUMAN STANDARD; PRT; 393 AA. # #"ID #{ENTRY_NAME} #{DATA_CLASS}; #{MOLECULE_TYPE}; #{SEQUENCE_LENGTH}." # # === Examples # obj.id_line #=> {"ENTRY_NAME"=>"P53_HUMAN", "DATA_CLASS"=>"STANDARD", # "SEQUENCE_LENGTH"=>393, "MOLECULE_TYPE"=>"PRT"} # # obj.id_line('ENTRY_NAME') #=> "P53_HUMAN" # def id_line(key = nil) return id_line[key] if key return @data['ID'] if @data['ID'] part = @orig['ID'].split(/ +/) if part[4].to_s.chomp == 'AA.' then # after UniProtKB release 9.0 of 31-Oct-2006 # (http://www.uniprot.org/docs/sp_news.htm) molecule_type = nil sequence_length = part[3].to_i else molecule_type = part[3].sub(/;/,'') sequence_length = part[4].to_i end @data['ID'] = { 'ENTRY_NAME' => part[1], 'DATA_CLASS' => part[2].sub(/;/,''), 'MOLECULE_TYPE' => molecule_type, 'SEQUENCE_LENGTH' => sequence_length } end # returns a ENTRY_NAME in the ID line. # def entry_id id_line('ENTRY_NAME') end alias entry_name entry_id alias entry entry_id # returns a MOLECULE_TYPE in the ID line. # # A short-cut for Bio::UniProtKB#id_line('MOLECULE_TYPE'). def molecule id_line('MOLECULE_TYPE') end alias molecule_type molecule # returns a SEQUENCE_LENGTH in the ID line. # # A short-cut for Bio::UniProtKB#id_line('SEQUENCE_LENGHT'). def sequence_length id_line('SEQUENCE_LENGTH') end alias aalen sequence_length # Bio::EMBLDB::Common#ac -> ary # #accessions -> ary # #accession -> String (accessions.first) @@ac_regrexp = /[OPQ][0-9][A-Z0-9]{3}[0-9]/ # returns a Hash of information in the DT lines. # hash keys: # ['created', 'sequence', 'annotation'] #-- # also Symbols acceptable (ASAP): # [:created, :sequence, :annotation] #++ # # Since UniProtKB release 7.0 of 07-Feb-2006, the DT line format is # changed, and the word "annotation" is no longer used in DT lines. # Despite the change, the word "annotation" is still used for keeping # compatibility. # # returns a String of information in the DT lines by a given key. # # === DT Line; date (3/entry) # DT DD-MMM-YYY (integrated into UniProtKB/XXXXX.) # DT DD-MMM-YYY (sequence version NN) # DT DD-MMM-YYY (entry version NN) # # The format have been changed in UniProtKB release 7.0 of 07-Feb-2006. # Below is the older format. # # === Old format of DT Line; date (3/entry) # DT DD-MMM-YYY (rel. NN, Created) # DT DD-MMM-YYY (rel. NN, Last sequence update) # DT DD-MMM-YYY (rel. NN, Last annotation update) def dt(key = nil) return dt[key] if key return @data['DT'] if @data['DT'] part = self.get('DT').split(/\n/) @data['DT'] = { 'created' => part[0].sub(/\w{2} /,'').strip, 'sequence' => part[1].sub(/\w{2} /,'').strip, 'annotation' => part[2].sub(/\w{2} /,'').strip } end # (private) parses DE line (description lines) # since UniProtKB release 14.0 of 22-Jul-2008 # # Return array containing array. # # http://www.uniprot.org/docs/sp_news.htm def parse_DE_line_rel14(str) # Returns if it is not the new format since Rel.14 return nil unless /^DE (RecName|AltName|SubName)\: / =~ str ret = [] cur = nil str.each_line do |line| case line when /^DE (Includes|Contains)\: *$/ cur = [ $1 ] ret.push cur cur = nil #subcat_and_desc = nil next when /^DE *(RecName|AltName|SubName)\: +(.*)/ category = $1 subcat_and_desc = $2 cur = [ category ] ret.push cur when /^DE *(Flags)\: +(.*)/ category = $1 desc = $2 flags = desc.strip.split(/\s*\;\s*/) || [] cur = [ category, flags ] ret.push cur cur = nil #subcat_and_desc = nil next when /^DE *(.*)/ subcat_and_desc = $1 else warn "Warning: skipped DE line in unknown format: #{line.inspect}" #subcat_and_desc = nil next end case subcat_and_desc when nil # does nothing when /\A([^\=]+)\=(.*)/ subcat = $1 desc = $2 desc.sub!(/\;\s*\z/, '') unless cur warn "Warning: unknown category in DE line: #{line.inspect}" cur = [ '' ] ret.push cur end cur.push [ subcat, desc ] else warn "Warning: skipped DE line description in unknown format: #{line.inspect}" end end ret end private :parse_DE_line_rel14 # returns the proposed official name of the protein. # Returns a String. # # Since UniProtKB release 14.0 of 22-Jul-2008, the DE line format have # been changed. The method returns the full name which is taken from # "RecName: Full=" or "SubName: Full=" line normally in the beginning of # the DE lines. # Unlike parser for old format, no special treatments for fragment or # precursor. # # For old format, the method parses the DE lines and returns the protein # name as a String. # # === DE Line; description (>=1) # "DE #{OFFICIAL_NAME} (#{SYNONYM})" # "DE #{OFFICIAL_NAME} (#{SYNONYM}) [CONTEINS: #1; #2]." # OFFICIAL_NAME 1/entry # SYNONYM >=0 # CONTEINS >=0 def protein_name @data['DE'] ||= parse_DE_line_rel14(get('DE')) parsed_de_line = @data['DE'] if parsed_de_line then # since UniProtKB release 14.0 of 22-Jul-2008 name = nil parsed_de_line.each do |a| case a[0] when 'RecName', 'SubName' if name_pair = a[1..-1].find { |b| b[0] == 'Full' } then name = name_pair[1] break end end end name = name.to_s else # old format (before Rel. 13.x) name = "" if de_line = fetch('DE') then str = de_line[/^[^\[]*/] # everything preceding the first [ (the "contains" part) name = str[/^[^(]*/].strip name << ' (Fragment)' if str =~ /fragment/i end end return name end # returns synonyms (unofficial and/or alternative names). # Returns an Array containing String objects. # # Since UniProtKB release 14.0 of 22-Jul-2008, the DE line format have # been changed. The method returns the full or short names which are # taken from "RecName: Short=", "RecName: EC=", and AltName lines, # except after "Contains:" or "Includes:". # For keeping compatibility with old format parser, "RecName: EC=N.N.N.N" # is reported as "EC N.N.N.N". # In addition, to prevent confusion, "Allergen=" and "CD_antigen=" # prefixes are added for the corresponding fields. # # For old format, the method parses the DE lines and returns synonyms. # synonyms are each placed in () following the official name on the DE line. def synonyms ary = Array.new @data['DE'] ||= parse_DE_line_rel14(get('DE')) parsed_de_line = @data['DE'] if parsed_de_line then # since UniProtKB release 14.0 of 22-Jul-2008 parsed_de_line.each do |a| case a[0] when 'Includes', 'Contains' break #the each loop when 'RecName', 'SubName', 'AltName' a[1..-1].each do |b| if name = b[1] and b[1] != self.protein_name then case b[0] when 'EC' name = "EC " + b[1] when 'Allergen', 'CD_antigen' name = b[0] + '=' + b[1] else name = b[1] end ary.push name end end end #case a[0] end #parsed_de_line.each else # old format (before Rel. 13.x) if de_line = fetch('DE') then line = de_line.sub(/\[.*\]/,'') # ignore stuff between [ and ]. That's the "contains" part line.scan(/\([^)]+/) do |synonym| unless synonym =~ /fragment/i then ary << synonym[1..-1].strip # index to remove the leading ( end end end end return ary end # returns gene names in the GN line. # # New UniProt/SwissProt format: # * Bio::UniProtKB#gn -> [ * ] # where is: # { :name => '...', # :synonyms => [ 's1', 's2', ... ], # :loci => [ 'l1', 'l2', ... ], # :orfs => [ 'o1', 'o2', ... ] # } # # Old format: # * Bio::UniProtKB#gn -> Array # AND # * Bio::UniProtKB#gn[0] -> Array # OR # # === GN Line: Gene name(s) (>=0, optional) def gn unless @data['GN'] case fetch('GN') when /Name=/,/ORFNames=/,/OrderedLocusNames=/,/Synonyms=/ @data['GN'] = gn_uniprot_parser else @data['GN'] = gn_old_parser end end @data['GN'] end # returns contents in the old style GN line. # === GN Line: Gene name(s) (>=0, optional) # GN HNS OR DRDX OR OSMZ OR BGLY. # GN CECA1 AND CECA2. # GN CECA1 AND (HOGE OR FUGA). # # GN NAME1 [(AND|OR) NAME]+. # # Bio::UniProtKB#gn -> Array # AND # #gn[0] -> Array # OR # #gene_names -> Array def gn_old_parser names = Array.new if get('GN').size > 0 names = fetch('GN').sub(/\.$/,'').split(/ AND /) names.map! { |synonyms| synonyms = synonyms.gsub(/\(|\)/,'').split(/ OR /).map { |e| e.strip } } end @data['GN'] = names end private :gn_old_parser # returns contents in the structured GN line. # The new format of the GN line is: # GN Name=; Synonyms=[, ...]; OrderedLocusNames=[, ...]; # GN ORFNames=[, ...]; # # * Bio::UniProtKB#gn -> [ * ] # where is: # { :name => '...', # :synonyms => [ 's1', 's2', ... ], # :loci => [ 'l1', 'l2', ... ], # :orfs => [ 'o1', 'o2', ... ] # } def gn_uniprot_parser @data['GN'] = Array.new gn_line = fetch('GN').strip records = gn_line.split(/\s*and\s*/) records.each do |record| gene_hash = {:name => '', :synonyms => [], :loci => [], :orfs => []} record.each_line(';') do |element| case element when /Name=/ then gene_hash[:name] = $'[0..-2] when /Synonyms=/ then gene_hash[:synonyms] = $'[0..-2].split(/\s*,\s*/) when /OrderedLocusNames=/ then gene_hash[:loci] = $'[0..-2].split(/\s*,\s*/) when /ORFNames=/ then gene_hash[:orfs] = $'[0..-2].split(/\s*,\s*/) end end @data['GN'] << gene_hash end return @data['GN'] end private :gn_uniprot_parser # returns a Array of gene names in the GN line. def gene_names gn # set @data['GN'] if it hasn't been already done if @data['GN'].first.class == Hash then @data['GN'].collect { |element| element[:name] } else @data['GN'].first end end # returns a String of the first gene name in the GN line. def gene_name (x = self.gene_names) ? x.first : nil end # returns a Array of Hashs or a String of the OS line when a key given. # * Bio::EMBLDB#os -> Array # [{'name' => '(Human)', 'os' => 'Homo sapiens'}, # {'name' => '(Rat)', 'os' => 'Rattus norveticus'}] # * Bio::EPTR#os[0] -> Hash # {'name' => "(Human)", 'os' => 'Homo sapiens'} # * Bio::UniProtKB#os[0]['name'] -> "(Human)" # * Bio::EPTR#os(0) -> "Homo sapiens (Human)" # # === OS Line; organism species (>=1) # OS Genus species (name). # OS Genus species (name0) (name1). # OS Genus species (name0) (name1). # OS Genus species (name0), G s0 (name0), and G s (name0) (name1). # OS Homo sapiens (Human), and Rarrus norveticus (Rat) # OS Hippotis sp. Clark and Watts 825. # OS unknown cyperaceous sp. def os(num = nil) unless @data['OS'] os = Array.new fetch('OS').split(/, and|, /).each do |tmp| if tmp =~ /(\w+ *[\w \:\'\+\-\.]+[\w\.])/ org = $1 tmp =~ /(\(.+\))/ os.push({'name' => $1, 'os' => org}) else raise "Error: OS Line. #{$!}\n#{fetch('OS')}\n" end end @data['OS'] = os end if num # EX. "Trifolium repens (white clover)" return "#{@data['OS'][num]['os']} #{@data['OS'][num]['name']}" else return @data['OS'] end end # Bio::EMBLDB::Common#og -> Array # OG Line; organella (0 or 1/entry) # ["MITOCHONDRION", "CHLOROPLAST", "Cyanelle", "Plasmid"] # or a plasmid name (e.g. "Plasmid pBR322"). # Bio::EMBLDB::Common#oc -> Array # OC Line; organism classification (>=1) # "OC Eukaryota; Alveolata; Apicomplexa; Piroplasmida; Theileriidae;" # "OC Theileria." # returns a Hash of oraganism taxonomy cross-references. # * Bio::UniProtKB#ox -> Hash # {'NCBI_TaxID' => ['1234','2345','3456','4567'], ...} # # === OX Line; organism taxonomy cross-reference (>=1 per entry) # OX NCBI_TaxID=1234; # OX NCBI_TaxID=1234, 2345, 3456, 4567; def ox unless @data['OX'] tmp = fetch('OX').sub(/\.$/,'').split(/;/).map { |e| e.strip } hsh = Hash.new tmp.each do |e| db,refs = e.split(/=/) hsh[db] = refs.split(/, */) end @data['OX'] = hsh end return @data['OX'] end # === The OH Line; # # OH NCBI_TaxID=TaxID; HostName. # http://br.expasy.org/sprot/userman.html#OH_line def oh unless @data['OH'] @data['OH'] = fetch('OH').split("\. ").map {|x| if x =~ /NCBI_TaxID=(\d+);/ taxid = $1 else raise ArgumentError, ["Error: Invalid OH line format (#{self.entry_id}):", $!, "\n", get('OH'), "\n"].join end if x =~ /NCBI_TaxID=\d+; (.+)/ host_name = $1 host_name.sub!(/\.$/, '') else host_name = nil end {'NCBI_TaxID' => taxid, 'HostName' => host_name} } end @data['OH'] end # Bio::EMBLDB::Common#ref -> Array # R Lines # RN RC RP RX RA RT RL # returns contents in the R lines. # * Bio::EMBLDB::Common#ref -> [ * ] # where is: # {'RN' => '', 'RC' => '', 'RP' => '', 'RX' => '', # 'RA' => '', 'RT' => '', 'RL' => '', 'RG' => ''} # # R Lines # * RN RC RP RX RA RT RL RG def ref unless @data['R'] @data['R'] = [get('R').split(/\nRN /)].flatten.map { |str| hash = {'RN' => '', 'RC' => '', 'RP' => '', 'RX' => '', 'RA' => '', 'RT' => '', 'RL' => '', 'RG' => ''} str = 'RN ' + str unless /^RN / =~ str str.split("\n").each do |line| if /^(R[NPXARLCTG]) (.+)/ =~ line hash[$1] += $2 + ' ' else raise "Invalid format in R lines, \n[#{line}]\n" end end hash['RN'] = set_RN(hash['RN']) hash['RC'] = set_RC(hash['RC']) hash['RP'] = set_RP(hash['RP']) hash['RX'] = set_RX(hash['RX']) hash['RA'] = set_RA(hash['RA']) hash['RT'] = set_RT(hash['RT']) hash['RL'] = set_RL(hash['RL']) hash['RG'] = set_RG(hash['RG']) hash } end @data['R'] end def set_RN(data) data.strip end def set_RC(data) data.scan(/([STP]\w+)=(.+);/).map { |comment| [comment[1].split(/, and |, /)].flatten.map { |text| {'Token' => comment[0], 'Text' => text} } }.flatten end private :set_RC def set_RP(data) data = data.strip data = data.sub(/\.$/, '') data.split(/, AND |, /i).map {|x| x = x.strip x = x.gsub(' ', ' ') } end private :set_RP def set_RX(data) rx = {'MEDLINE' => nil, 'PubMed' => nil, 'DOI' => nil} if data =~ /MEDLINE=(.+?);/ rx['MEDLINE'] = $1 end if data =~ /PubMed=(.+?);/ rx['PubMed'] = $1 end if data =~ /DOI=(.+?);/ rx['DOI'] = $1 end rx end private :set_RX def set_RA(data) data = data.sub(/; *$/, '') end private :set_RA def set_RT(data) data = data.sub(/; *$/, '') data = data.gsub(/(^"|"$)/, '') end private :set_RT def set_RL(data) data = data.strip end private :set_RL def set_RG(data) data = data.split('; ') end private :set_RG # returns Bio::Reference object from Bio::EMBLDB::Common#ref. # * Bio::EMBLDB::Common#ref -> Bio::References def references unless @data['references'] ary = self.ref.map {|ent| hash = Hash.new('') ent.each {|key, value| case key when 'RA' hash['authors'] = value.split(/, /) when 'RT' hash['title'] = value when 'RL' if value =~ /(.*) (\d+) \((\d+)\), (\d+-\d+) \((\d+)\)$/ hash['journal'] = $1 hash['volume'] = $2 hash['issue'] = $3 hash['pages'] = $4 hash['year'] = $5 else hash['journal'] = value end when 'RX' # PUBMED, MEDLINE, DOI value.each do |tag, xref| hash[ tag.downcase ] = xref end end } Reference.new(hash) } @data['references'] = References.new(ary) end @data['references'] end # === The HI line # Bio::UniProtKB#hi #=> hash def hi unless @data['HI'] @data['HI'] = [] fetch('HI').split(/\. /).each do |hlist| hash = {'Category' => '', 'Keywords' => [], 'Keyword' => ''} hash['Category'], hash['Keywords'] = hlist.split(': ') hash['Keywords'] = hash['Keywords'].split('; ') hash['Keyword'] = hash['Keywords'].pop hash['Keyword'].sub!(/\.$/, '') @data['HI'] << hash end end @data['HI'] end @@cc_topics = ['PHARMACEUTICAL', 'BIOTECHNOLOGY', 'TOXIC DOSE', 'ALLERGEN', 'RNA EDITING', 'POLYMORPHISM', 'BIOPHYSICOCHEMICAL PROPERTIES', 'MASS SPECTROMETRY', 'WEB RESOURCE', 'ENZYME REGULATION', 'DISEASE', 'INTERACTION', 'DEVELOPMENTAL STAGE', 'INDUCTION', 'CAUTION', 'ALTERNATIVE PRODUCTS', 'DOMAIN', 'PTM', 'MISCELLANEOUS', 'TISSUE SPECIFICITY', 'COFACTOR', 'PATHWAY', 'SUBUNIT', 'CATALYTIC ACTIVITY', 'SUBCELLULAR LOCATION', 'FUNCTION', 'SIMILARITY'] # returns contents in the CC lines. # * Bio::UniProtKB#cc -> Hash # # returns an object of contents in the TOPIC. # * Bio::UniProtKB#cc(TOPIC) -> Array w/in Hash, Hash # # returns contents of the "ALTERNATIVE PRODUCTS". # * Bio::UniProtKB#cc('ALTERNATIVE PRODUCTS') -> Hash # {'Event' => str, # 'Named isoforms' => int, # 'Comment' => str, # 'Variants'=>[{'Name' => str, 'Synonyms' => str, 'IsoId' => str, 'Sequence' => []}]} # # CC -!- ALTERNATIVE PRODUCTS: # CC Event=Alternative splicing; Named isoforms=15; # ... # CC placentae isoforms. All tissues differentially splice exon 13; # CC Name=A; Synonyms=no del; # CC IsoId=P15529-1; Sequence=Displayed; # # returns contents of the "DATABASE". # * Bio::UniProtKB#cc('DATABASE') -> Array # [{'NAME'=>str,'NOTE'=>str, 'WWW'=>URI,'FTP'=>URI}, ...] # # CC -!- DATABASE: NAME=Text[; NOTE=Text][; WWW="Address"][; FTP="Address"]. # # returns contents of the "MASS SPECTROMETRY". # * Bio::UniProtKB#cc('MASS SPECTROMETRY') -> Array # [{'MW"=>float,'MW_ERR'=>float, 'METHOD'=>str,'RANGE'=>str}, ...] # # CC -!- MASS SPECTROMETRY: MW=XXX[; MW_ERR=XX][; METHOD=XX][;RANGE=XX-XX]. # # === CC lines (>=0, optional) # CC -!- TISSUE SPECIFICITY: HIGHEST LEVELS FOUND IN TESTIS. ALSO PRESENT # CC IN LIVER, KIDNEY, LUNG AND BRAIN. # # CC -!- TOPIC: FIRST LINE OF A COMMENT BLOCK; # CC SECOND AND SUBSEQUENT LINES OF A COMMENT BLOCK. # # See also http://www.expasy.org/sprot/userman.html#CC_line # def cc(topic = nil) unless @data['CC'] cc = Hash.new comment_border= '-' * (77 - 4 + 1) dlm = /-!- / # 12KD_MYCSM has no CC lines. return cc if get('CC').size == 0 cc_raw = fetch('CC') # Removing the copyright statement. cc_raw.sub!(/ *---.+---/m, '') # Not any CC Lines without the copyright statement. return cc if cc_raw == '' begin cc_raw, copyright = cc_raw.split(/#{comment_border}/)[0] _ = copyright #dummy for suppress "assigned but unused variable" cc_raw = cc_raw.sub(dlm,'') cc_raw.split(dlm).each do |tmp| tmp = tmp.strip if /(^[A-Z ]+[A-Z]): (.+)/ =~ tmp key = $1 body = $2 body.gsub!(/- (?!AND)/,'-') body.strip! unless cc[key] cc[key] = [body] else cc[key].push(body) end else raise ["Error: [#{entry_id}]: CC Lines", '"', tmp, '"', '', get('CC'),''].join("\n") end end rescue NameError if fetch('CC') == '' return {} else raise ["Error: Invalid CC Lines: [#{entry_id}]: ", "\n'#{self.get('CC')}'\n", "(#{$!})"].join end rescue NoMethodError end @data['CC'] = cc end case topic when 'ALLERGEN' return @data['CC'][topic] when 'ALTERNATIVE PRODUCTS' return cc_alternative_products(@data['CC'][topic]) when 'BIOPHYSICOCHEMICAL PROPERTIES' return cc_biophysiochemical_properties(@data['CC'][topic]) when 'BIOTECHNOLOGY' return @data['CC'][topic] when 'CATALITIC ACTIVITY' return cc_catalytic_activity(@data['CC'][topic]) when 'CAUTION' return cc_caution(@data['CC'][topic]) when 'COFACTOR' return @data['CC'][topic] when 'DEVELOPMENTAL STAGE' return @data['CC'][topic].join('') when 'DISEASE' return @data['CC'][topic].join('') when 'DOMAIN' return @data['CC'][topic] when 'ENZYME REGULATION' return @data['CC'][topic].join('') when 'FUNCTION' return @data['CC'][topic].join('') when 'INDUCTION' return @data['CC'][topic].join('') when 'INTERACTION' return cc_interaction(@data['CC'][topic]) when 'MASS SPECTROMETRY' return cc_mass_spectrometry(@data['CC'][topic]) when 'MISCELLANEOUS' return @data['CC'][topic] when 'PATHWAY' return cc_pathway(@data['CC'][topic]) when 'PHARMACEUTICAL' return @data['CC'][topic] when 'POLYMORPHISM' return @data['CC'][topic] when 'PTM' return @data['CC'][topic] when 'RNA EDITING' return cc_rna_editing(@data['CC'][topic]) when 'SIMILARITY' return @data['CC'][topic] when 'SUBCELLULAR LOCATION' return cc_subcellular_location(@data['CC'][topic]) when 'SUBUNIT' return @data['CC'][topic] when 'TISSUE SPECIFICITY' return @data['CC'][topic] when 'TOXIC DOSE' return @data['CC'][topic] when 'WEB RESOURCE' return cc_web_resource(@data['CC'][topic]) when 'DATABASE' # DATABASE: NAME=Text[; NOTE=Text][; WWW="Address"][; FTP="Address"]. tmp = Array.new db = @data['CC']['DATABASE'] return db unless db db.each do |e| db = {'NAME' => nil, 'NOTE' => nil, 'WWW' => nil, 'FTP' => nil} e.sub(/.$/,'').split(/;/).each do |line| case line when /NAME=(.+)/ db['NAME'] = $1 when /NOTE=(.+)/ db['NOTE'] = $1 when /WWW="(.+)"/ db['WWW'] = $1 when /FTP="(.+)"/ db['FTP'] = $1 end end tmp.push(db) end return tmp when nil return @data['CC'] else return @data['CC'][topic] end end def cc_alternative_products(data) ap = data.join('') return ap unless ap # Event, Named isoforms, Comment, [Name, Synonyms, IsoId, Sequnce]+ tmp = {'Event' => "", 'Named isoforms' => "", 'Comment' => "", 'Variants' => []} if /Event=(.+?);/ =~ ap tmp['Event'] = $1 tmp['Event'] = tmp['Event'].sub(/;/,'').split(/, /) end if /Named isoforms=(\S+?);/ =~ ap tmp['Named isoforms'] = $1 end if /Comment=(.+?);/m =~ ap tmp['Comment'] = $1 end ap.scan(/Name=.+?Sequence=.+?;/).each do |ent| tmp['Variants'] << cc_alternative_products_variants(ent) end return tmp end private :cc_alternative_products def cc_alternative_products_variants(data) variant = {'Name' => '', 'Synonyms' => [], 'IsoId' => [], 'Sequence' => []} data.split(/; /).map {|x| x.split(/=/) }.each do |e| case e[0] when 'Sequence', 'Synonyms', 'IsoId' e[1] = e[1].sub(/;/,'').split(/, /) end variant[e[0]] = e[1] end variant end private :cc_alternative_products_variants def cc_biophysiochemical_properties(data) data = data[0] hash = {'Absorption' => {}, 'Kinetic parameters' => {}, 'pH dependence' => "", 'Redox potential' => "", 'Temperature dependence' => ""} if data =~ /Absorption: Abs\(max\)=(.+?);/ hash['Absorption']['Abs(max)'] = $1 end if data =~ /Absorption: Abs\(max\)=.+; Note=(.+?);/ hash['Absorption']['Note'] = $1 end if data =~ /Kinetic parameters: KM=(.+?); Vmax=(.+?);/ hash['Kinetic parameters']['KM'] = $1 hash['Kinetic parameters']['Vmax'] = $2 end if data =~ /Kinetic parameters: KM=.+; Vmax=.+; Note=(.+?);/ hash['Kinetic parameters']['Note'] = $1 end if data =~ /pH dependence: (.+?);/ hash['pH dependence'] = $1 end if data =~ /Redox potential: (.+?);/ hash['Redox potential'] = $1 end if data =~ /Temperature dependence: (.+?);/ hash['Temperature dependence'] = $1 end hash end private :cc_biophysiochemical_properties def cc_caution(data) data.join('') end private :cc_caution # returns conteins in a line of the CC INTERACTION section. # # CC P46527:CDKN1B; NbExp=1; IntAct=EBI-359815, EBI-519280; def cc_interaction(data) str = data.join('') it = str.scan(/(.+?); NbExp=(.+?); IntAct=(.+?);/) it.map {|ent| ent.map! {|x| x.strip } if ent[0] =~ /^(.+):(.+)/ spac = $1 spid = $2.split(' ')[0] optid = nil elsif ent[0] =~ /Self/ spac = self.entry_id spid = self.entry_id optid = nil end if ent[0] =~ /^.+:.+ (.+)/ optid = $1 end {'SP_Ac' => spac, 'identifier' => spid, 'NbExp' => ent[1], 'IntAct' => ent[2].split(', '), 'optional_identifier' => optid} } end private :cc_interaction def cc_mass_spectrometry(data) # MASS SPECTROMETRY: MW=XXX[; MW_ERR=XX][; METHOD=XX][;RANGE=XX-XX]. return data unless data data.map { |m| mass = {'MW' => nil, 'MW_ERR' => nil, 'METHOD' => nil, 'RANGE' => nil, 'NOTE' => nil} m.sub(/.$/,'').split(/;/).each do |line| case line when /MW=(.+)/ mass['MW'] = $1 when /MW_ERR=(.+)/ mass['MW_ERR'] = $1 when /METHOD=(.+)/ mass['METHOD'] = $1 when /RANGE=(\d+-\d+)/ mass['RANGE'] = $1 # RANGE class ? when /NOTE=(.+)/ mass['NOTE'] = $1 end end mass } end private :cc_mass_spectrometry def cc_pathway(data) data.map {|x| x.sub(/\.$/, '') }.map {|x| x.split(/; | and |: /) }[0] end private :cc_pathway def cc_rna_editing(data) data = data.join('') entry = {'Modified_positions' => [], 'Note' => ""} if data =~ /Modified_positions=(.+?)(\.|;)/ entry['Modified_positions'] = $1.sub(/\.$/, '').split(', ') else raise ArgumentError, "Invarid CC RNA Editing lines (#{self.entry_id}):#{$!}\n#{get('CC')}" end if data =~ /Note=(.+)/ entry['Note'] = $1 end entry end private :cc_rna_editing def cc_subcellular_location(data) data.map {|x| x.split('. ').map {|y| y.split('; ').map {|z| z.sub(/\.$/, '') } } }[0] end private :cc_subcellular_location #-- # Since UniProtKB release 12.2 of 11-Sep-2007: # CC -!- WEB RESOURCE: Name=ResourceName[; Note=FreeText][; URL=WWWAddress]. # Old format: # CC -!- WEB RESOURCE: NAME=ResourceName[; NOTE=FreeText][; URL=WWWAddress]. #++ def cc_web_resource(data) data.map {|x| entry = {'Name' => nil, 'Note' => nil, 'URL' => nil} x.split(';').each do |y| case y when /(Name|Note)\=(.+)/ key = $1 val = $2.strip entry[key] = val when /(NAME|NOTE)\=(.+)/ key = $1.downcase.capitalize val = $2.strip entry[key] = val when /URL\=\"(.+)\"/ entry['URL'] = $1.strip end end entry } end private :cc_web_resource # returns databases cross-references in the DR lines. # * Bio::UniProtKB#dr -> Hash w/in Array # # === DR Line; defabases cross-reference (>=0) # DR database_identifier; primary_identifier; secondary_identifier. # a cross_ref pre one line @@dr_database_identifier = ['EMBL','CARBBANK','DICTYDB','ECO2DBASE', 'ECOGENE', 'FLYBASE','GCRDB','HIV','HSC-2DPAGE','HSSP','INTERPRO','MAIZEDB', 'MAIZE-2DPAGE','MENDEL','MGD''MIM','PDB','PFAM','PIR','PRINTS', 'PROSITE','REBASE','AARHUS/GHENT-2DPAGE','SGD','STYGENE','SUBTILIST', 'SWISS-2DPAGE','TIGR','TRANSFAC','TUBERCULIST','WORMPEP','YEPD','ZFIN'] # Backup Bio::EMBLDB#dr as embl_dr alias :embl_dr :dr # Bio::UniProtKB#dr def dr(key = nil) unless key embl_dr else (embl_dr[key] or []).map {|x| {'Accession' => x[0], 'Version' => x[1], ' ' => x[2], 'Molecular Type' => x[3]} } end end # Bio::EMBLDB::Common#kw - Array # #keywords -> Array # # KW Line; keyword (>=1) # KW [Keyword;]+ # returns contents in the feature table. # # == Examples # # sp = Bio::UniProtKB.new(entry) # ft = sp.ft # ft.class #=> Hash # ft.keys.each do |feature_key| # ft[feature_key].each do |feature| # feature['From'] #=> '1' # feature['To'] #=> '21' # feature['Description'] #=> '' # feature['FTId'] #=> '' # feature['diff'] #=> [] # feature['original'] #=> [feature_key, '1', '21', '', ''] # end # end # # * Bio::UniProtKB#ft -> Hash # {FEATURE_KEY => [{'From' => int, 'To' => int, # 'Description' => aStr, 'FTId' => aStr, # 'diff' => [original_residues, changed_residues], # 'original' => aAry }],...} # # returns an Array of the information about the feature_name in the feature table. # * Bio::UniProtKB#ft(feature_name) -> Array of Hash # [{'From' => str, 'To' => str, 'Description' => str, 'FTId' => str},...] # # == FT Line; feature table data (>=0, optional) # # Col Data item # ----- ----------------- # 1- 2 FT # 6-13 Feature name # 15-20 `FROM' endpoint # 22-27 `TO' endpoint # 35-75 Description (>=0 per key) # ----- ----------------- # # Note: 'FROM' and 'TO' endopoints are allowed to use non-numerial charactors # including '<', '>' or '?'. (c.f. '<1', '?42') # # See also http://www.expasy.org/sprot/userman.html#FT_line # def ft(feature_key = nil) return ft[feature_key] if feature_key return @data['FT'] if @data['FT'] table = [] begin get('FT').split("\n").each do |line| if line =~ /^FT \w/ feature = line.chomp.ljust(74) table << [feature[ 5..12].strip, # Feature Name feature[14..19].strip, # From feature[21..26].strip, # To feature[34..74].strip ] # Description else table.last << line.chomp.sub!(/^FT +/, '') end end # Joining Description lines table = table.map { |feature| ftid = feature.pop if feature.last =~ /FTId=/ if feature.size > 4 feature = [feature[0], feature[1], feature[2], feature[3, feature.size - 3].join(" ")] end feature << if ftid then ftid else '' end } hash = {} table.each do |feature| hash[feature[0]] = [] unless hash[feature[0]] hash[feature[0]] << { # Removing '<', '>' or '?' in FROM/TO endopoint. 'From' => feature[1].sub(/\D/, '').to_i, 'To' => feature[2].sub(/\D/, '').to_i, 'Description' => feature[3], 'FTId' => feature[4].to_s.sub(/\/FTId=/, '').sub(/\.$/, ''), 'diff' => [], 'original' => feature } case feature[0] when 'VARSPLIC', 'VARIANT', 'VAR_SEQ', 'CONFLICT' case hash[feature[0]].last['Description'] when /(\w[\w ]*\w*) - ?> (\w[\w ]*\w*)/ original_res = $1 changed_res = $2 original_res = original_res.gsub(/ /,'').strip chenged_res = changed_res.gsub(/ /,'').strip when /Missing/i original_res = seq.subseq(hash[feature[0]].last['From'], hash[feature[0]].last['To']) changed_res = '' end hash[feature[0]].last['diff'] = [original_res, chenged_res] end end rescue raise "Invalid FT Lines(#{$!}) in #{entry_id}:, \n'#{self.get('FT')}'\n" end @data['FT'] = hash end # returns a Hash of conteins in the SQ lines. # * Bio::UniProtKBL#sq -> hsh # # returns a value of a key given in the SQ lines. # * Bio::UniProtKBL#sq(key) -> int or str # * Keys: ['MW', 'mw', 'molecular', 'weight', 'aalen', 'len', 'length', # 'CRC64'] # # === SQ Line; sequence header (1/entry) # SQ SEQUENCE 233 AA; 25630 MW; 146A1B48A1475C86 CRC64; # SQ SEQUENCE \d+ AA; \d+ MW; [0-9A-Z]+ CRC64; # # MW, Dalton unit. # CRC64 (64-bit Cyclic Redundancy Check, ISO 3309). def sq(key = nil) unless @data['SQ'] if fetch('SQ') =~ /(\d+) AA\; (\d+) MW; (.+) CRC64;/ @data['SQ'] = { 'aalen' => $1.to_i, 'MW' => $2.to_i, 'CRC64' => $3 } else raise "Invalid SQ Line: \n'#{fetch('SQ')}'" end end if key case key when /mw/, /molecular/, /weight/ @data['SQ']['MW'] when /len/, /length/, /AA/ @data['SQ']['aalen'] else @data['SQ'][key] end else @data['SQ'] end end # returns a Bio::Sequence::AA of the amino acid sequence. # * Bio::UniProtKB#seq -> Bio::Sequence::AA # # blank Line; sequence data (>=1) def seq unless @data[''] @data[''] = Sequence::AA.new( fetch('').gsub(/ |\d+/,'') ) end return @data[''] end alias aaseq seq end # class UniProtKB end # module Bio =begin = Bio::UniProtKB < Bio::DB Class for a entry in the SWISS-PROT/TrEMBL database. * (()) * (()) * (()) --- Bio::UniProtKB.new(a_sp_entry) === ID line (Identification) --- Bio::UniProtKB#id_line -> {'ENTRY_NAME' => str, 'DATA_CLASS' => str, 'MOLECULE_TYPE' => str, 'SEQUENCE_LENGTH' => int } --- Bio::UniProtKB#id_line(key) -> str key = (ENTRY_NAME|MOLECULE_TYPE|DATA_CLASS|SEQUENCE_LENGTH) --- Bio::UniProtKB#entry_id -> str --- Bio::UniProtKB#molecule -> str --- Bio::UniProtKB#sequence_length -> int === AC lines (Accession number) --- Bio::UniProtKB#ac -> ary --- Bio::UniProtKB#accessions -> ary --- Bio::UniProtKB#accession -> accessions.first === GN line (Gene name(s)) --- Bio::UniProtKB#gn -> [ary, ...] or [{:name => str, :synonyms => [], :loci => [], :orfs => []}] --- Bio::UniProtKB#gene_name -> str --- Bio::UniProtKB#gene_names -> [str] or [str] === DT lines (Date) --- Bio::UniProtKB#dt -> {'created' => str, 'sequence' => str, 'annotation' => str} --- Bio::UniProtKB#dt(key) -> str key := (created|annotation|sequence) === DE lines (Description) --- Bio::UniProtKB#de -> str #definition -> str --- Bio::UniProtKB#protein_name Returns the proposed official name of the protein --- Bio::UniProtKB#synonyms Returns an array of synonyms (unofficial names) === KW lines (Keyword) --- Bio::UniProtKB#kw -> ary === OS lines (Organism species) --- Bio::UniProtKB#os -> [{'name' => str, 'os' => str}, ...] === OC lines (organism classification) --- Bio::UniProtKB#oc -> ary === OG line (Organella) --- Bio::UniProtKB#og -> ary === OX line (Organism taxonomy cross-reference) --- Bio::UniProtKB#ox -> {'NCBI_TaxID' => [], ...} === RN RC RP RX RA RT RL RG lines (Reference) --- Bio::UniProtKB#ref -> [{'RN' => int, 'RP' => str, 'RC' => str, 'RX' => str, ''RT' => str, 'RL' => str, 'RA' => str, 'RC' => str, 'RG' => str},...] === DR lines (Database cross-reference) --- Bio::UniProtKB#dr -> {'EMBL' => ary, ...} === FT lines (Feature table data) --- Bio::UniProtKB#ft -> hsh === SQ lines (Sequence header and data) --- Bio::UniProtKB#sq -> {'CRC64' => str, 'MW' => int, 'aalen' => int} --- Bio::UniProtKB#sq(key) -> int or str key := (aalen|MW|CRC64) --- Bio::UniProtKB#seq -> Bio::Sequece::AA #aaseq -> Bio::Sequece::AA =end # Content Occurrence in an entry # ---- --------------------------- -------------------------------- # ID - identification (begins each entry; 1 per entry) # AC - accession number(s) (>=1 per entry) # DT - date (3 per entry) # DE - description (>=1 per entry) # GN - gene name(s) (>=0 per entry; optional) # OS - organism species (>=1 per entry) # OG - organelle (0 or 1 per entry; optional) # OC - organism classification (>=1 per entry) # OX - organism taxonomy x-ref (>=1 per entry) # OH - Organism Host # RN - reference number (>=1 per entry) # RP - reference positions (>=1 per entry) # RC - reference comment(s) (>=0 per entry; optional) # RX - reference cross-reference(s) (>=0 per entry; optional) # RA - reference author(s) (>=1 per entry) # RT - reference title (>=0 per entry; optional) # RL - reference location (>=1 per entry) # RG - reference group(s) # CC - comments or notes (>=0 per entry; optional) # DR - database cross-references (>=0 per entry; optional) # KW - keywords (>=1 per entry) # FT - feature table data (>=0 per entry; optional) # SQ - sequence header (1 per entry) # - (blanks) The sequence data (>=1 per entry) # // - termination line (ends each entry; 1 per entry) # ---- --------------------------- -------------------------------- bio-2.0.3/lib/bio/db/embl/sptr.rb0000644000175000017500000000105014141516614015752 0ustar nileshnilesh# # = bio/db/embl/sptr.rb - Bio::SPTR is an alias of Bio::UniProtKB # # Copyright:: Copyright (C) 2013 BioRuby Project # License:: The Ruby License # warn "Bio::SPTR is changed to an alias of Bio::UniProtKB. Please use Bio::UniProtKB. Bio::SPTR may be deprecated in the future." if $VERBOSE module Bio require "bio/db/embl/uniprotkb" unless const_defined?(:UniProtKB) # Bio::SPTR is changed to an alias of Bio::UniProtKB. # Please use Bio::UniProtKB. # Bio::SPTR may be deprecated in the future. SPTR = UniProtKB end #module Bio bio-2.0.3/lib/bio/db/embl/uniprot.rb0000644000175000017500000000120014141516614016457 0ustar nileshnilesh# # = bio/db/embl/uniprot.rb - UniProt database class # # Copyright:: Copyright (C) 2013 BioRuby Project # License:: The Ruby License # # warn "Bio::UniProt is an alias of Bio::UniProtKB. Please use Bio::UniProtKB. Bio::UniProt may be deprecated in the future." if $VERBOSE module Bio require 'bio/db/embl/uniprotkb' unless const_defined?(:UniProtKB) # Bio::UniProt is changed to an alias of Bio::UniProtKB. # Please use Bio::UniProtKB. # Bio::UniProt may be deprecated in the future. # # Note that Bio::SPTR have been renamed to Bio::UniProtKB and # is also an alias of Bio::UniProtKB. # UniProt = UniProtKB end bio-2.0.3/lib/bio/db/embl/trembl.rb0000644000175000017500000000101314141516614016246 0ustar nileshnilesh# # = bio/db/embl/trembl.rb - (deprecated) TrEMBL database class # # Copyright:: Copyright (C) 2013 BioRuby Project # License:: The Ruby License # warn "Bio::TrEMBL is deprecated. Use Bio::UniProtKB." module Bio require 'bio/db/embl/uniprotkb' unless const_defined?(:UniProtKB) # Bio::TrEMBL is deprecated. Use Bio::UniProtKB. class TrEMBL < UniProtKB # Bio::TrEMBL is deprecated. Use Bio::UniProtKB. def initialize(str) warn "Bio::TrEMBL is deprecated. Use Bio::UniProtKB." super(str) end end end bio-2.0.3/lib/bio/db/embl/swissprot.rb0000644000175000017500000000103614141516614017043 0ustar nileshnilesh# # = bio/db/embl/swissprot.rb - (deprecated) SwissProt database class # # Copyright:: Copyright (C) 2013 BioRuby Project # License:: The Ruby License # warn "Bio::SwissProt is deprecated. Use Bio::UniProtKB." module Bio require 'bio/db/embl/uniprotkb' unless const_defined?(:UniProtKB) # Bio::SwissProt is deprecated. Use Bio::UniProtKB. class SwissProt < SPTR # Bio::SwissProt is deprecated. Use Bio::UniProtKB. def initialize(str) warn "Bio::SwissProt is deprecated. Use Bio::UniProtKB." super(str) end end end bio-2.0.3/lib/bio/db/embl/embl_to_biosequence.rb0000644000175000017500000000331314141516614020771 0ustar nileshnilesh# # = bio/db/embl/embl_to_biosequence.rb - Bio::EMBL to Bio::Sequence adapter module # # Copyright:: Copyright (C) 2008 # Naohisa Goto , # License:: The Ruby License # # $Id:$ # require 'bio/sequence' require 'bio/sequence/adapter' # Internal use only. Normal users should not use this module. # # Bio::EMBL to Bio::Sequence adapter module. # It is internally used in Bio::EMBL#to_biosequence. # module Bio::Sequence::Adapter::EMBL extend Bio::Sequence::Adapter private def_biosequence_adapter :seq def_biosequence_adapter :id_namespace do |orig| 'EMBL' end def_biosequence_adapter :entry_id def_biosequence_adapter :primary_accession do |orig| orig.accessions[0] end def_biosequence_adapter :secondary_accessions do |orig| orig.accessions[1..-1] || [] end def_biosequence_adapter :molecule_type def_biosequence_adapter :data_class def_biosequence_adapter :definition, :description def_biosequence_adapter :topology def_biosequence_adapter :date_created def_biosequence_adapter :date_modified def_biosequence_adapter :release_created def_biosequence_adapter :release_modified def_biosequence_adapter :entry_version def_biosequence_adapter :division def_biosequence_adapter :sequence_version, :version def_biosequence_adapter :keywords def_biosequence_adapter :species def_biosequence_adapter :classification #-- # unsupported yet # def_biosequence_adapter :organelle do |orig| # orig.fetch('OG') # end #++ def_biosequence_adapter :references def_biosequence_adapter :features def_biosequence_adapter :comments, :cc def_biosequence_adapter :dblinks end #module Bio::Sequence::Adapter::EMBL bio-2.0.3/lib/bio/db/prosite.rb0000644000175000017500000002574114141516614015545 0ustar nileshnilesh# # = bio/db/prosite.rb - PROSITE database class # # Copyright:: Copyright (C) 2001 Toshiaki Katayama # License:: The Ruby License # # $Id:$ # require 'bio/db' module Bio class PROSITE < EMBLDB # Delimiter DELIMITER = "\n//\n" # Delimiter RS = DELIMITER # Bio::DB API TAGSIZE = 5 def initialize(entry) super(entry, TAGSIZE) end # ID Identification (Begins each entry; 1 per entry) # # ID ENTRY_NAME; ENTRY_TYPE. (ENTRY_TYPE : PATTERN, MATRIX, RULE) # # Returns def name unless @data['ID'] @data['ID'], @data['TYPE'] = fetch('ID').chomp('.').split('; ') end @data['ID'] end # Returns def division unless @data['TYPE'] name end @data['TYPE'] end # AC Accession number (1 per entry) # # AC PSnnnnn; # # Returns def ac unless @data['AC'] @data['AC'] = fetch('AC').chomp(';') end @data['AC'] end alias entry_id ac # DT Date (1 per entry) # # DT MMM-YYYY (CREATED); MMM-YYYY (DATA UPDATE); MMM-YYYY (INFO UPDATE). # # Returns def dt field_fetch('DT') end alias date dt # DE Short description (1 per entry) # # DE Description. # # Returns def de field_fetch('DE') end alias definition de # PA Pattern (>=0 per entry) # # see - pa2re method # # Returns def pa field_fetch('PA') @data['PA'] = fetch('PA') unless @data['PA'] @data['PA'].gsub!(/\s+/, '') if @data['PA'] @data['PA'] end alias pattern pa # MA Matrix/profile (>=0 per entry) # # see - ma2re method # # Returns def ma field_fetch('MA') end alias profile ma # RU Rule (>=0 per entry) # # RU Rule_Description. # # The rule is described in ordinary English and is free-format. # # Returns def ru field_fetch('RU') end alias rule ru # NR Numerical results (>=0 per entry) # # - SWISS-PROT scan statistics of true and false positives/negatives # # /RELEASE SWISS-PROT release number and total number of sequence # entries in that release. # /TOTAL Total number of hits in SWISS-PROT. # /POSITIVE Number of hits on proteins that are known to belong to the # set in consideration. # /UNKNOWN Number of hits on proteins that could possibly belong to # the set in consideration. # /FALSE_POS Number of false hits (on unrelated proteins). # /FALSE_NEG Number of known missed hits. # /PARTIAL Number of partial sequences which belong to the set in # consideration, but which are not hit by the pattern or # profile because they are partial (fragment) sequences. # # Returns def nr unless @data['NR'] hash = {} # temporal hash fetch('NR').scan(%r{/(\S+)=([^;]+);}).each do |k, v| if v =~ /^(\d+)\((\d+)\)$/ hits = $1.to_i # the number of hits seqs = $2.to_i # the number of sequences v = [hits, seqs] elsif v =~ /([\d\.]+),(\d+)/ sprel = $1 # the number of SWISS-PROT release spseq = $2.to_i # the number of SWISS-PROT sequences v = [sprel, spseq] else v = v.to_i end hash[k] = v end @data['NR'] = hash end @data['NR'] end alias statistics nr # Returns def release statistics['RELEASE'] end # Returns def swissprot_release_number release.first end # Returns def swissprot_release_sequences release.last end # Returns def total statistics['TOTAL'] end # Returns def total_hits total.first end # Returns def total_sequences total.last end # Returns def positive statistics['POSITIVE'] end # Returns def positive_hits positive.first end # Returns def positive_sequences positive.last end # Returns def unknown statistics['UNKNOWN'] end # Returns def unknown_hits unknown.first end # Returns def unknown_sequences unknown.last end # Returns def false_pos statistics['FALSE_POS'] end # Returns def false_positive_hits false_pos.first end # Returns def false_positive_sequences false_pos.last end # Returns def false_neg statistics['FALSE_NEG'] end alias false_negative_hits false_neg # Returns def partial statistics['PARTIAL'] end # CC Comments (>=0 per entry) # # CC /QUALIFIER=data; /QUALIFIER=data; ....... # # /TAXO-RANGE Taxonomic range. # /MAX-REPEAT Maximum known number of repetitions of the pattern in a # single protein. # /SITE Indication of an `interesting' site in the pattern. # /SKIP-FLAG Indication of an entry that can be, in some cases, ignored # by a program (because it is too unspecific). # # Returns def cc unless @data['CC'] hash = {} # temporal hash fetch('CC').scan(%r{/(\S+)=([^;]+);}).each do |k, v| hash[k] = v end @data['CC'] = hash end @data['CC'] end alias comment cc # Returns def taxon_range(expand = nil) range = comment['TAXO-RANGE'] if range and expand expand = [] range.scan(/./) do |x| case x when 'A'; expand.push('archaebacteria') when 'B'; expand.push('bacteriophages') when 'E'; expand.push('eukaryotes') when 'P'; expand.push('prokaryotes') when 'V'; expand.push('eukaryotic viruses') end end range = expand end return range end # Returns def max_repeat comment['MAX-REPEAT'].to_i end # Returns def site if comment['SITE'] num, desc = comment['SITE'].split(',') end return [num.to_i, desc] end # Returns def skip_flag if comment['SKIP-FLAG'] == 'TRUE' return true end end # DR Cross-references to SWISS-PROT (>=0 per entry) # # DR AC_NB, ENTRY_NAME, C; AC_NB, ENTRY_NAME, C; AC_NB, ENTRY_NAME, C; # # - `AC_NB' is the SWISS-PROT primary accession number of the entry to # which reference is being made. # - `ENTRY_NAME' is the SWISS-PROT entry name. # - `C' is a one character flag that can be one of the following: # # T For a true positive. # N For a false negative; a sequence which belongs to the set under # consideration, but which has not been picked up by the pattern or # profile. # P For a `potential' hit; a sequence that belongs to the set under # consideration, but which was not picked up because the region(s) that # are used as a 'fingerprint' (pattern or profile) is not yet available # in the data bank (partial sequence). # ? For an unknown; a sequence which possibly could belong to the set under # consideration. # F For a false positive; a sequence which does not belong to the set in # consideration. # # Returns def dr unless @data['DR'] hash = {} # temporal hash if fetch('DR') fetch('DR').scan(/(\w+)\s*, (\w+)\s*, (.);/).each do |a, e, c| hash[a] = [e, c] # SWISS-PROT : accession, entry, true/false end end @data['DR'] = hash end @data['DR'] end alias sp_xref dr # Returns def list_xref(flag, by_name = nil) ary = [] sp_xref.each do |sp_acc, value| if value[1] == flag if by_name sp_name = value[0] ary.push(sp_name) else ary.push(sp_acc) end end end return ary end # Returns def list_truepositive(by_name = nil) list_xref('T', by_name) end # Returns def list_falsenegative(by_name = nil) list_xref('F', by_name) end # Returns def list_falsepositive(by_name = nil) list_xref('P', by_name) end # Returns def list_potentialhit(by_name = nil) list_xref('P', by_name) end # Returns def list_unknown(by_name = nil) list_xref('?', by_name) end # 3D Cross-references to PDB (>=0 per entry) # # 3D name; [name2;...] # # Returns def pdb_xref unless @data['3D'] @data['3D'] = fetch('3D').split(/; */) end @data['3D'] end # DO Pointer to the documentation file (1 per entry) # # DO PDOCnnnnn; # # Returns def pdoc_xref @data['DO'] = fetch('DO').chomp(';') end ### prosite pattern to regular expression # # prosite/prosuser.txt: # # The PA (PAttern) lines contains the definition of a PROSITE pattern. The # patterns are described using the following conventions: # # 0) The standard IUPAC one-letter codes for the amino acids are used. # 0) Ambiguities are indicated by listing the acceptable amino acids for a # given position, between square parentheses `[ ]'. For example: [ALT] # stands for Ala or Leu or Thr. # 1) A period ends the pattern. # 2) When a pattern is restricted to either the N- or C-terminal of a # sequence, that pattern either starts with a `<' symbol or respectively # ends with a `>' symbol. # 3) Ambiguities are also indicated by listing between a pair of curly # brackets `{ }' the amino acids that are not accepted at a given # position. For example: {AM} stands for any amino acid except Ala and # Met. # 4) Repetition of an element of the pattern can be indicated by following # that element with a numerical value or a numerical range between # parenthesis. Examples: x(3) corresponds to x-x-x, x(2,4) corresponds to # x-x or x-x-x or x-x-x-x. # 5) The symbol `x' is used for a position where any amino acid is accepted. # 6) Each element in a pattern is separated from its neighbor by a `-'. # # Examples: # # PA [AC]-x-V-x(4)-{ED}. # # This pattern is translated as: [Ala or Cys]-any-Val-any-any-any-any-{any # but Glu or Asp} # # PA $/, '$') # (2) restricted to the C-terminal : `>' pattern.gsub!(/\{(\w+)\}/) { |m| '[^' + $1 + ']' # (3) not accepted at a given position : '{}' } pattern.gsub!(/\(([\d,]+)\)/) { |m| '{' + $1 + '}' # (4) repetition of an element : (n), (n,m) } pattern.tr!('x', '.') # (5) any amino acid is accepted : 'x' pattern.tr!('-', '') # (6) each element is separated by a '-' Regexp.new(pattern, Regexp::IGNORECASE) end def pa2re(pattern) self.class.pa2re(pattern) end def re self.class.pa2re(self.pa) end ### prosite profile to regular expression # # prosite/profile.txt: # # Returns def ma2re(matrix) raise NotImplementedError end end # PROSITE end # Bio bio-2.0.3/lib/bio/db/rebase.rb0000644000175000017500000003346414141516614015322 0ustar nileshnilesh# # bio/db/rebase.rb - Interface for EMBOSS formatted REBASE files # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # # $Id:$ # require 'yaml' require 'bio/reference' module Bio # # bio/db/rebase.rb - Interface for EMBOSS formatted REBASE files # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # # # = Description # # Bio::REBASE provides utilties for interacting with REBASE data in EMBOSS # format. REBASE is the Restriction Enzyme Database, more information # can be found here: # # * http://rebase.neb.com # # EMBOSS formatted files located at: # # * http://rebase.neb.com/rebase/rebase.f37.html # # These files are the same as the "emboss_?.???" files located at: # # * ftp://ftp.neb.com/pub/rebase/ # # To easily get started with the data you can simply type this command # at your shell prompt: # # % wget "ftp://ftp.neb.com/pub/rebase/emboss_*" # # # = Usage # # require 'bio' # require 'pp' # # enz = File.read('emboss_e') # ref = File.read('emboss_r') # sup = File.read('emboss_s') # # # When creating a new instance of Bio::REBASE # # the contents of the enzyme file must be passed. # # The references and suppiers file contents # # may also be passed. # rebase = Bio::REBASE.new( enz ) # rebase = Bio::REBASE.new( enz, ref ) # rebase = Bio::REBASE.new( enz, ref, sup ) # # # The 'read' class method allows you to read in files # # that are REBASE EMBOSS formatted # rebase = Bio::REBASE.read( 'emboss_e' ) # rebase = Bio::REBASE.read( 'emboss_e', 'emboss_r' ) # rebase = Bio::REBASE.read( 'emboss_e', 'emboss_r', 'emboss_s' ) # # # The data loaded may be saved in YAML format # rebase.save_yaml( 'enz.yaml' ) # rebase.save_yaml( 'enz.yaml', 'ref.yaml' ) # rebase.save_yaml( 'enz.yaml', 'ref.yaml', 'sup.yaml' ) # # # YAML formatted files can also be read with the # # class method 'load_yaml' # rebase = Bio::REBASE.load_yaml( 'enz.yaml' ) # rebase = Bio::REBASE.load_yaml( 'enz.yaml', 'ref.yaml' ) # rebase = Bio::REBASE.load_yaml( 'enz.yaml', 'ref.yaml', 'sup.yaml' ) # # pp rebase.enzymes[0..4] # ["AarI", "AasI", "AatI", "AatII", "Acc16I"] # pp rebase.enzyme_name?('aasi') # true # pp rebase['AarI'].pattern # "CACCTGC" # pp rebase['AarI'].blunt? # false # pp rebase['AarI'].organism # "Arthrobacter aurescens SS2-322" # pp rebase['AarI'].source # "A. Janulaitis" # pp rebase['AarI'].primary_strand_cut1 # 11 # pp rebase['AarI'].primary_strand_cut2 # 0 # pp rebase['AarI'].complementary_strand_cut1 # 15 # pp rebase['AarI'].complementary_strand_cut2 # 0 # pp rebase['AarI'].suppliers # ["F"] # pp rebase['AarI'].supplier_names # ["Fermentas International Inc."] # # pp rebase['AarI'].isoschizomers # Currently none stored in the references file # pp rebase['AarI'].methylation # "" # # pp rebase['EcoRII'].methylation # "2(5)" # pp rebase['EcoRII'].suppliers # ["F", "J", "M", "O", "S"] # pp rebase['EcoRII'].supplier_names # ["Fermentas International Inc.", "Nippon Gene Co., Ltd.", # # "Roche Applied Science", "Toyobo Biochemicals", # # "Sigma Chemical Corporation"] # # # Number of enzymes in the database # pp rebase.size # 673 # pp rebase.enzymes.size # 673 # # rebase.each do |name, info| # pp "#{name}: #{info.methylation}" unless info.methylation.empty? # end # class REBASE class DynamicMethod_Hash < Hash #:nodoc: # Define a writer or reader # * Allows hash[:kay]= to be accessed like hash.key= # * Allows hash[:key] to be accessed like hash.key def method_missing(method_id, *args) k = self.class if method_id.to_s[-1].chr == '=' k.class_eval do define_method(method_id) { |s| self[ method_id.to_s[0..-2].to_sym ] = s } end k.instance_method(method_id).bind(self).call(args[0]) else k.class_eval do define_method(method_id) { self[method_id] } end k.instance_method(method_id).bind(self).call end end end class EnzymeEntry < DynamicMethod_Hash #:nodoc: @@supplier_data = {} def self.supplier_data=(d); @@supplier_data = d; end def supplier_names ret = [] self.suppliers.each { |s| ret << @@supplier_data[s] } ret end end # Calls _block_ once for each element in @data hash, passing that element as a parameter. # # --- # *Arguments* # * Accepts a block # *Returns*:: results of _block_ operations def each @data.each { |item| yield item } end # Make the instantiated class act like a Hash on @data # Does the equivalent and more of this: # def []( key ); @data[ key ]; end # def size; @data.size; end def method_missing(method_id, *args) #:nodoc: self.class.class_eval do define_method(method_id) { |a| Hash.instance_method(method_id).bind(@data).call(a) } end Hash.instance_method(method_id).bind(@data).call(*args) end # Constructor # # --- # *Arguments* # * +enzyme_lines+: (_required_) contents of EMBOSS formatted enzymes file # * +reference_lines+: (_optional_) contents of EMBOSS formatted references file # * +supplier_lines+: (_optional_) contents of EMBOSS formatted suppliers files # * +yaml+: (_optional_, _default_ +false+) enzyme_lines, reference_lines, and supplier_lines are read as YAML if set to true # *Returns*:: Bio::REBASE def initialize( enzyme_lines, reference_lines = nil, supplier_lines = nil, yaml = false ) # All your REBASE are belong to us. if yaml @enzyme_data = enzyme_lines @reference_data = reference_lines @supplier_data = supplier_lines else @enzyme_data = parse_enzymes(enzyme_lines) @reference_data = parse_references(reference_lines) @supplier_data = parse_suppliers(supplier_lines) end EnzymeEntry.supplier_data = @supplier_data setup_enzyme_data end # List the enzymes available # # --- # *Arguments* # * _none_ # *Returns*:: +Array+ sorted enzyme names def enzymes @enzyme_names end # Check if supplied name is the name of an available enzyme # # --- # *Arguments* # * +name+: Enzyme name # *Returns*:: +true/false+ def enzyme_name?(name) @enzyme_names_downcased.include?(name.downcase) end # Save the current data # rebase.save_yaml( 'enz.yaml' ) # rebase.save_yaml( 'enz.yaml', 'ref.yaml' ) # rebase.save_yaml( 'enz.yaml', 'ref.yaml', 'sup.yaml' ) # # --- # *Arguments* # * +f_enzyme+: (_required_) Filename to save YAML formatted output of enzyme data # * +f_reference+: (_optional_) Filename to save YAML formatted output of reference data # * +f_supplier+: (_optional_) Filename to save YAML formatted output of supplier data # *Returns*:: nothing def save_yaml( f_enzyme, f_reference=nil, f_supplier=nil ) File.open(f_enzyme, 'w') { |f| f.puts YAML.dump(@enzyme_data) } File.open(f_reference, 'w') { |f| f.puts YAML.dump(@reference_data) } if f_reference File.open(f_supplier, 'w') { |f| f.puts YAML.dump(@supplier_data) } if f_supplier return end # Read REBASE EMBOSS-formatted files # rebase = Bio::REBASE.read( 'emboss_e' ) # rebase = Bio::REBASE.read( 'emboss_e', 'emboss_r' ) # rebase = Bio::REBASE.read( 'emboss_e', 'emboss_r', 'emboss_s' ) # # --- # *Arguments* # * +f_enzyme+: (_required_) Filename to read enzyme data # * +f_reference+: (_optional_) Filename to read reference data # * +f_supplier+: (_optional_) Filename to read supplier data # *Returns*:: Bio::REBASE object def self.read( f_enzyme, f_reference=nil, f_supplier=nil ) e = IO.readlines(f_enzyme) r = f_reference ? IO.readlines(f_reference) : nil s = f_supplier ? IO.readlines(f_supplier) : nil self.new(e,r,s) end # Read YAML formatted files # rebase = Bio::REBASE.load_yaml( 'enz.yaml' ) # rebase = Bio::REBASE.load_yaml( 'enz.yaml', 'ref.yaml' ) # rebase = Bio::REBASE.load_yaml( 'enz.yaml', 'ref.yaml', 'sup.yaml' ) # # --- # *Arguments* # * +f_enzyme+: (_required_) Filename to read YAML-formatted enzyme data # * +f_reference+: (_optional_) Filename to read YAML-formatted reference data # * +f_supplier+: (_optional_) Filename to read YAML-formatted supplier data # *Returns*:: Bio::REBASE object def self.load_yaml( f_enzyme, f_reference=nil, f_supplier=nil ) e = YAML.load_file(f_enzyme) r = f_reference ? YAML.load_file(f_reference) : nil s = f_supplier ? YAML.load_file(f_supplier) : nil self.new(e,r,s,true) end ######### protected ######### def setup_enzyme_data @data = {} @enzyme_data.each do |name, hash| @data[name] = EnzymeEntry.new d = @data[name] d.pattern = hash[:pattern] # d.blunt?= is a syntax error d[:blunt?] = (hash[:blunt].to_i == 1 ? true : false) d.primary_strand_cut1 = hash[:c1].to_i d.complementary_strand_cut1 = hash[:c2].to_i d.primary_strand_cut2 = hash[:c3].to_i d.complementary_strand_cut2 = hash[:c4].to_i # Set up keys just in case there's no reference data supplied [:organism, :isoschizomers, :methylation, :source].each { |k| d[k] = '' } d.suppliers = [] d.references = [] end @enzyme_names = @data.keys.sort @enzyme_names_downcased = @enzyme_names.map{|a| a.downcase} setup_enzyme_and_reference_association end def setup_enzyme_and_reference_association return unless @reference_data @reference_data.each do |name, hash| d = @data[name] [:organism, :isoschizomers, :methylation, :source].each { |k| d[k] = hash[k] } d.suppliers = hash[:suppliers].split('') d.references = [] hash[:references].each { |k| d.references << raw_to_reference(k) } end end # data is a hash indexed by the :name of each entry which is also a hash # * data[enzyme_name] has the following keys: # :name, :pattern, :len, :ncuts, :blunt, :c1, :c2, :c3, :c4 # :c1 => First 5' cut # :c2 => First 3' cut # :c3 => Second 5' cut # :c4 => Seocnd 3' cut def parse_enzymes( lines ) data = {} return data if lines == nil lines.each_line do |line| next if line[0].chr == '#' line.chomp! a = line.split("\s") data[ a[0] ] = { :name => a[0], :pattern => a[1], :len => a[2], :ncuts => a[3], :blunt => a[4], :c1 => a[5], :c2 => a[6], :c3 => a[7], :c4 => a[8] } end # lines.each data end # data is a hash indexed by the :name of each entry which is also a hash # * data[enzyme_name] has the following keys: # :organism, :isoschizomers, :references, :source, :methylation, :suppliers, :name, :number_of_references def parse_references( lines ) data = {} return data if lines == nil index = 1 h = {} references_left = 0 lines.each_line do |line| next if line[0].chr == '#' # Comment next if line[0..1] == '//' # End of entry marker line.chomp! if (1..7).include?( index ) h[index] = line references_left = h[index].to_i if index == 7 index += 1 next end if index == 8 h[index] ||= [] h[index] << line references_left -= 1 end if references_left == 0 data[ h[1] ] = { :name => h[1], :organism => h[2], :isoschizomers => h[3], :methylation => h[4], :source => h[5], :suppliers => h[6], :number_of_references => h[7], :references => h[8] } index = 1 h = {} end end # lines.each data end # data is a hash indexed by the supplier code # data[supplier_code] # returns the suppliers name def parse_suppliers( lines ) data = {} return data if lines == nil lines.each_line do |line| next if line[0].chr == '#' data[$1] = $2 if line =~ %r{(.+?)\s(.+)} end data end # Takes a string in one of the three formats listed below and returns a # Bio::Reference object # * Possible input styles: # a = 'Inagaki, K., Hikita, T., Yanagidani, S., Nomura, Y., Kishimoto, N., Tano, T., Tanaka, H., (1993) Biosci. Biotechnol. Biochem., vol. 57, pp. 1716-1721.' # b = 'Nekrasiene, D., Lapcinskaja, S., Kiuduliene, L., Vitkute, J., Janulaitis, A., Unpublished observations.' # c = "Grigaite, R., Maneliene, Z., Janulaitis, A., (2002) Nucleic Acids Res., vol. 30." def raw_to_reference( line ) a = line.split(', ') if a[-1] == 'Unpublished observations.' title = a.pop.chop pages = volume = year = journal = '' else title = '' pages_or_volume = a.pop.chop if pages_or_volume =~ %r{pp\.\s} pages = pages_or_volume pages.gsub!('pp. ', '') volume = a.pop else pages = '' volume = pages_or_volume end volume.gsub!('vol. ', '') year_and_journal = a.pop year_and_journal =~ %r{\((\d+)\)\s(.+)} year = $1 journal = $2 end authors = [] last_name = nil a.each do |e| if last_name authors << "#{last_name}, #{e}" last_name = nil else last_name = e end end ref = { 'title' => title, 'pages' => pages, 'volume' => volume, 'year' => year, 'journal' => journal, 'authors' => authors, } Bio::Reference.new(ref) end end # REBASE end # Bio bio-2.0.3/lib/bio/db/kegg/0000755000175000017500000000000014141516614014437 5ustar nileshnileshbio-2.0.3/lib/bio/db/kegg/keggtab.rb0000644000175000017500000002155514141516614016400 0ustar nileshnilesh# # = bio/db/kegg/keggtab.rb - KEGG keggtab class # # Copyright:: Copyright (C) 2001 Mitsuteru C. Nakao # Copyright (C) 2003, 2006 Toshiaki Katayama # License:: The Ruby License # # $Id: keggtab.rb,v 1.10 2007/04/05 23:35:41 trevor Exp $ # module Bio class KEGG # == Description # # Parse 'keggtab' KEGG database definition file which also includes # Taxonomic category of the KEGG organisms. # # == References # # The 'keggtab' file is included in # # * ftp://ftp.genome.jp/pub/kegg/tarfiles/genes.tar.gz # * ftp://ftp.genome.jp/pub/kegg/tarfiles/genes.weekly.last.tar.Z # # == Format # # File format is something like # # # KEGGTAB # # # # name type directory abbreviation # # # enzyme enzyme $BIOROOT/db/ideas/ligand ec # ec alias enzyme # (snip) # # Human # h.sapiens genes $BIOROOT/db/kegg/genes hsa # H.sapiens alias h.sapiens # hsa alias h.sapiens # (snip) # # # # Taxonomy # # # (snip) # animals alias hsa+mmu+rno+dre+dme+cel # eukaryotes alias animals+plants+protists+fungi # genes alias eubacteria+archaea+eukaryotes # class Keggtab # Path for keggtab file and optionally set bioroot top directory. # Environmental variable BIOROOT overrides bioroot. def initialize(file_path, bioroot = nil) @bioroot = ENV['BIOROOT'] || bioroot @db_names = Hash.new @database = Hash.new @taxonomy = Hash.new File.open(file_path) do |f| parse_keggtab(f.read) end end # Returns a string of the BIOROOT path prefix. attr_reader :bioroot attr_reader :db_names # Bio::KEGG::Keggtab::DB class DB # Create a container object for database definitions. def initialize(db_name, db_type, db_path, db_abbrev) @name = db_name @type = db_type @path = db_path @abbrev = db_abbrev @aliases = Array.new end # Database name. (e.g. 'enzyme', 'h.sapies', 'e.coli', ...) attr_reader :name # Definition type. (e.g. 'enzyme', 'alias', 'genes', ...) attr_reader :type # Database flat file path. (e.g. '$BIOROOT/db/kegg/genes', ...) attr_reader :path # Short name for the database. (e.g. 'ec', 'hsa', 'eco', ...) # korg and keggorg are alias for abbrev method. attr_reader :abbrev # Array containing all alias names for the database. # (e.g. ["H.sapiens", "hsa"], ["E.coli", "eco"], ...) attr_reader :aliases alias korg abbrev alias keggorg abbrev end # DB section # Returns a hash containing DB definition section of the keggtab file. # If database name is given as an argument, returns a Keggtab::DB object. def database(db_abbrev = nil) if db_abbrev @database[db_abbrev] else @database end end # Returns an Array containing all alias names for the database. # (e.g. 'hsa' -> ["H.sapiens", "hsa"], 'hpj' -> ["H.pylori_J99", "hpj"]) def aliases(db_abbrev) if @database[db_abbrev] @database[db_abbrev].aliases end end # Returns a canonical database name for the abbreviation. # (e.g. 'ec' -> 'enzyme', 'hsa' -> 'h.sapies', ...) def name(db_abbrev) if @database[db_abbrev] @database[db_abbrev].name end end # Returns an absolute path for the flat file database. # (e.g. '/bio/db/kegg/genes', ...) def path(db_abbrev) if @database[db_abbrev] file = @database[db_abbrev].name if @bioroot "#{@database[db_abbrev].path.sub(/\$BIOROOT/,@bioroot)}/#{file}" else "#{@database[db_abbrev].path}/#{file}" end end end # deprecated def alias_list(db_name) if @db_names[db_name] @db_names[db_name].aliases end end # deprecated def db_path(db_name) if @bioroot "#{@db_names[db_name].path.sub(/\$BIOROOT/,@bioroot)}/#{db_name}" else "#{@db_names[db_name].path}/#{db_name}" end end # deprecated def db_by_abbrev(db_abbrev) @db_names.each do |k, db| return db if db.abbrev == db_abbrev end return nil end # deprecated def name_by_abbrev(db_abbrev) db_by_abbrev(db_abbrev).name end # deprecated def db_path_by_abbrev(db_abbrev) db_name = name_by_abbrev(db_abbrev) db_path(db_name) end # Taxonomy section # Returns a hash containing Taxonomy section of the keggtab file. # If argument is given, returns a List of all child nodes belongs # to the label node. # (e.g. "eukaryotes" -> ["animals", "plants", "protists", "fungi"], ...) def taxonomy(node = nil) if node @taxonomy[node] else @taxonomy end end # List of all node labels from Taxonomy section. # (e.g. ["actinobacteria", "animals", "archaea", "bacillales", ...) def taxa_list @taxonomy.keys.sort end def child_nodes(node = 'genes') return @taxonomy[node] end # Returns an array of organism names included in the specified taxon # label. (e.g. 'proteobeta' -> ["nme", "nma", "rso"]) # This method has taxo2keggorgs, taxon2korgs, and taxon2keggorgs aliases. def taxo2korgs(node = 'genes') if node.length == 3 return node else if @taxonomy[node] tmp = Array.new @taxonomy[node].each do |x| tmp.push(taxo2korgs(x)) end return tmp else return nil end end end alias taxo2keggorgs taxo2korgs alias taxon2korgs taxo2korgs alias taxon2keggorgs taxo2korgs # Returns an array of taxonomy names the organism belongs. # (e.g. 'eco' -> ['proteogamma','proteobacteria','eubacteria','genes']) # This method has aliases as keggorg2taxo, korg2taxonomy, keggorg2taxonomy. def korg2taxo(keggorg) tmp = Array.new traverse = Proc.new {|keggorg| @taxonomy.each do |k,v| if v.include?(keggorg) tmp.push(k) traverse.call(k) break end end } traverse.call(keggorg) return tmp end alias keggorg2taxo korg2taxo alias korg2taxonomy korg2taxo alias keggorg2taxonomy korg2taxo private def parse_keggtab(keggtab) in_taxonomy = nil keggtab.each do |line| case line when /^# Taxonomy/ # beginning of the taxonomy section in_taxonomy = true when /^#|^$/ next when /(^\w\S+)\s+(\w+)\s+(\$\S+)\s+(\w+)/ # db db_name = $1 db_type = $2 db_path = $3 db_abbrev = $4 @db_names[db_name] = Bio::KEGG::Keggtab::DB.new(db_name, db_type, db_path, db_abbrev) when /(^\w\S+)\s+alias\s+(\w.+\w)/ # alias db_alias = $1 db_name = $2#.downcase if in_taxonomy @taxonomy.update(db_alias => db_name.split('+')) elsif @db_names[db_name] @db_names[db_name].aliases.push(db_alias) end end end # convert keys-by-names hash @db_names to keys-by-abbrev hash @database @db_names.each do |k,v| @database[v.abbrev] = v end end end # Keggtab end # KEGG end # Bio if __FILE__ == $0 begin require 'pp' alias p pp rescue LoadError end if ARGV.empty? prefix = ENV['BIOROOT'] || '/bio' keggtab_file = "#{prefix}/etc/keggtab" else keggtab_file = ARGV.shift end puts "= Initialize: keggtab = Bio::KEGG::Keggtab.new(file)" keggtab = Bio::KEGG::Keggtab.new(keggtab_file) puts "\n--- Bio::KEGG::Keggtab#bioroot # -> String" p keggtab.bioroot puts "\n== Methods for DB section" puts "\n--- Bio::KEGG::Keggtab#database # -> Hash" p keggtab.database puts "\n--- Bio::KEGG::Keggtab#database('eco') # -> Keggtab::DB" p keggtab.database('eco') puts "\n--- Bio::KEGG::Keggtab#name('eco') # -> String" p keggtab.name('eco') puts "\n--- Bio::KEGG::Keggtab#path('eco') # -> String" p keggtab.path('eco') puts "\n--- Bio::KEGG::Keggtab#aliases(abbrev) # -> Array" puts "\n++ keggtab.aliases('eco')" p keggtab.aliases('eco') puts "\n++ keggtab.aliases('vg')" p keggtab.aliases('vg') puts "\n== Methods for Taxonomy section" puts "\n--- Bio::KEGG::Keggtab#taxonomy # -> Hash" p keggtab.taxonomy puts "\n--- Bio::KEGG::Keggtab#taxonomy('archaea') # -> Hash" p keggtab.taxonomy('archaea') puts "\n--- Bio::KEGG::Keggtab#taxa_list # -> Array" p keggtab.taxa_list puts "\n--- Bio::KEGG::Keggtab#taxo2korgs(node) # -> Array" puts "\n++ keggtab.taxo2korgs('proteobeta')" p keggtab.taxo2korgs('proteobeta') puts "\n++ keggtab.taxo2korgs('eubacteria')" p keggtab.taxo2korgs('eubacteria') puts "\n++ keggtab.taxo2korgs('archaea')" p keggtab.taxo2korgs('archaea') puts "\n++ keggtab.taxo2korgs('eukaryotes')" p keggtab.taxo2korgs('eukaryotes') puts "\n--- Bio::KEGG::Keggtab#korg2taxo(keggorg) # -> Array" puts "\n++ keggtab.korg2taxo('eco')" p keggtab.korg2taxo('eco') puts "\n++ keggtab.korg2taxo('plants')" p keggtab.korg2taxo('plants') end bio-2.0.3/lib/bio/db/kegg/common.rb0000644000175000017500000001604614141516614016263 0ustar nileshnilesh# # = bio/db/kegg/common.rb - Common methods for KEGG database classes # # Copyright:: Copyright (C) 2001-2007 Toshiaki Katayama # Copyright:: Copyright (C) 2003 Masumi Itoh # Copyright:: Copyright (C) 2009 Kozo Nishida # License:: The Ruby License # # # # == Description # # Note that the modules in this file are intended to be Bio::KEGG::* # internal use only. # # This file contains modules that implement methods commonly used from # KEGG database parser classes. # module Bio class KEGG # Namespace for methods commonly used in the Bio::KEGG::* classes. module Common # The module provides references method. module References # REFERENCE -- Returns contents of the REFERENCE records as an Array of # Bio::Reference objects. def references unless @data['REFERENCE'] ary = [] toptag2array(get('REFERENCE')).each do |ref| hash = Hash.new subtag2array(ref).each do |field| case tag_get(field) when /REFERENCE/ cmnt = tag_cut(field).chomp if /^\s*PMID\:(\d+)\s*/ =~ cmnt then hash['pubmed'] = $1 cmnt = $' end if cmnt and !cmnt.empty? then hash['comments'] ||= [] hash['comments'].push(cmnt) end when /AUTHORS/ authors = truncate(tag_cut(field)) authors = authors.split(/\, /) authors[-1] = authors[-1].split(/\s+and\s+/) if authors[-1] authors = authors.flatten.map { |a| a.sub(',', ', ') } hash['authors'] = authors when /TITLE/ hash['title'] = truncate(tag_cut(field)) when /JOURNAL/ journal = truncate(tag_cut(field)) case journal # KEGG style when /(.*) (\d*(?:\([^\)]+\))?)\:(\d+\-\d+) \((\d+)\)$/ hash['journal'] = $1 hash['volume'] = $2 hash['pages'] = $3 hash['year'] = $4 # old KEGG style when /(.*) (\d+):(\d+\-\d+) \((\d+)\) \[UI:(\d+)\]$/ hash['journal'] = $1 hash['volume'] = $2 hash['pages'] = $3 hash['year'] = $4 hash['medline'] = $5 # Only journal name and year are available when /(.*) \((\d+)\)$/ hash['journal'] = $1 hash['year'] = $2 else hash['journal'] = journal end end end ary.push(Reference.new(hash)) end @data['REFERENCE'] = ary #.extend(Bio::References::BackwardCompatibility) end @data['REFERENCE'] end end #module References # The module providing dblinks_as_hash methods. # # Bio::KEGG::* internal use only. module DblinksAsHash # Returns a Hash of the DB name and an Array of entry IDs in # DBLINKS field. def dblinks_as_hash unless defined? @dblinks_as_hash hash = {} dblinks_as_strings.each do |line| db, ids = line.split(/\:\s*/, 2) list = ids.split(/\s+/) hash[db] = list end @dblinks_as_hash = hash end @dblinks_as_hash end end #module DblinksAsHash # The module providing pathways_as_hash method. # # Bio::KEGG::* internal use only. module PathwaysAsHash # Returns a Hash of the pathway ID and name in PATHWAY field. def pathways_as_hash unless defined? @pathways_as_hash then hash = {} pathways_as_strings.each do |line| line = line.sub(/\APATH\:\s+/, '') entry_id, name = line.split(/\s+/, 2) hash[entry_id] = name end @pathways_as_hash = hash end @pathways_as_hash end end #module PathwaysAsHash # This module provides orthologs_as_hash method. # # Bio::KEGG::* internal use only. module OrthologsAsHash # Returns a Hash of the orthology ID and definition in ORTHOLOGY field. def orthologs_as_hash unless defined? @orthologs_as_hash kos = {} orthologs_as_strings.each do |line| ko = line.sub(/\AKO\:\s+/, '') entry_id, definition = ko.split(/\s+/, 2) kos[entry_id] = definition end @orthologs_as_hash = kos end @orthologs_as_hash end end #module OrthologsAsHash # This module provides genes_as_hash method. # # Bio::KEGG::* internal use only. module GenesAsHash # Returns a Hash of the organism ID and an Array of entry IDs in # GENES field. def genes_as_hash unless defined? @genes_as_hash hash = {} genes_as_strings.each do |line| name, *list = line.split(/\s+/) org = name.downcase.sub(/:/, '') genes = list.map {|x| x.sub(/\(.*\)/, '')} #names = list.map {|x| x.scan(/.*\((.*)\)/)} hash[org] = genes end @genes_as_hash = hash end @genes_as_hash end end #module GenesAsHash # This module provides modules_as_hash method. # # Bio::KEGG::* internal use only. module ModulesAsHash # Returns MODULE field as a Hash. # Each key of the hash is KEGG MODULE ID, # and each value is the name of the Pathway Module. # --- # *Returns*:: Hash def modules_as_hash unless defined? @modules_s_as_hash then hash = {} modules_as_strings.each do |line| entry_id, name = line.split(/\s+/, 2) hash[entry_id] = name end @modules_as_hash = hash end @modules_as_hash end end #module ModulesAsHash # This module provides strings_as_hash private method. # # Bio::KEGG::* internal use only. module StringsAsHash # (Private) Creates a hash from lines. # Each line is consisted of two components, ID and description, # separated with spaces. IDs must be unique with each other. def strings_as_hash(lines) hash = {} lines.each do |line| entry_id, definition = line.split(/\s+/, 2) hash[entry_id] = definition end return hash end private :strings_as_hash end #module StringsAsHash # This module provides diseases_as_hash method. # # Bio::KEGG::* internal use only. module DiseasesAsHash include StringsAsHash # Returns a Hash of the disease ID and its definition def diseases_as_hash unless (defined? @diseases_as_hash) && @diseases_as_hash @diseases_as_hash = strings_as_hash(diseases_as_strings) end @diseases_as_hash end end #module DiseasesAsHash end #module Common end #class KEGG end #module Bio bio-2.0.3/lib/bio/db/kegg/pathway.rb0000644000175000017500000001406014141516614016442 0ustar nileshnilesh# # = bio/db/kegg/pathway.rb - KEGG PATHWAY database class # # Copyright:: Copyright (C) 2010 Kozo Nishida # Copyright:: Copyright (C) 2010 Toshiaki Katayama # License:: The Ruby License # # require 'bio/db' require 'bio/db/kegg/common' module Bio class KEGG # == Description # # Bio::KEGG::PATHWAY is a parser class for the KEGG PATHWAY database entry. # # == References # # * http://www.genome.jp/kegg/pathway.html # * ftp://ftp.genome.jp/pub/kegg/pathway/pathway # class PATHWAY < KEGGDB DELIMITER = RS = "\n///\n" TAGSIZE = 12 include Common::DblinksAsHash # Returns a Hash of the DB name and an Array of entry IDs in DBLINKS field. def dblinks_as_hash; super; end if false #dummy for RDoc alias dblinks dblinks_as_hash include Common::PathwaysAsHash # Returns a Hash of the pathway ID and name in PATHWAY field. def pathways_as_hash; super; end if false #dummy for RDoc alias pathways pathways_as_hash include Common::OrthologsAsHash # Returns a Hash of the orthology ID and definition in ORTHOLOGY field. def orthologs_as_hash; super; end if false #dummy for RDoc alias orthologs orthologs_as_hash include Common::DiseasesAsHash # Returns a Hash of the disease ID and its definition def diseases_as_hash; super; end if false #dummy for RDoc alias diseases diseases_as_hash include Common::References # REFERENCE -- Returns contents of the REFERENCE records as an Array of # Bio::Reference objects. # --- # *Returns*:: an Array containing Bio::Reference objects def references; super; end if false #dummy for RDoc include Common::ModulesAsHash # Returns MODULE field as a Hash. # Each key of the hash is KEGG MODULE ID, # and each value is the name of the Pathway Module. # --- # *Returns*:: Hash def modules_as_hash; super; end if false #dummy for RDoc alias modules modules_as_hash #-- # for a private method strings_as_hash. #++ include Common::StringsAsHash # Creates a new Bio::KEGG::PATHWAY object. # --- # *Arguments*: # * (required) _entry_: (String) single entry as a string # *Returns*:: Bio::KEGG::PATHWAY object def initialize(entry) super(entry, TAGSIZE) end # Return the ID of the pathway, described in the ENTRY line. # --- # *Returns*:: String def entry_id field_fetch('ENTRY')[/\S+/] end # Name of the pathway, described in the NAME line. # --- # *Returns*:: String def name field_fetch('NAME') end # Description of the pathway, described in the DESCRIPTION line. # --- # *Returns*:: String def description field_fetch('DESCRIPTION') end alias definition description # Return the name of the KEGG class, described in the CLASS line. # --- # *Returns*:: String def keggclass field_fetch('CLASS') end # Pathways described in the PATHWAY_MAP lines. # --- # *Returns*:: Array containing String def pathways_as_strings lines_fetch('PATHWAY_MAP') end # Returns MODULE field of the entry. # --- # *Returns*:: Array containing String objects def modules_as_strings lines_fetch('MODULE') end # Disease described in the DISEASE lines. # --- # *Returns*:: Array containing String def diseases_as_strings lines_fetch('DISEASE') end # Returns an Array of a database name and entry IDs in DBLINKS field. # --- # *Returns*:: Array containing String def dblinks_as_strings lines_fetch('DBLINKS') end # Orthologs described in the ORTHOLOGY lines. # --- # *Returns*:: Array containing String def orthologs_as_strings lines_fetch('ORTHOLOGY') end # Organism described in the ORGANISM line. # --- # *Returns*:: String def organism field_fetch('ORGANISM') end # Genes described in the GENE lines. # --- # *Returns*:: Array containing String def genes_as_strings lines_fetch('GENE') end # Genes described in the GENE lines. # --- # *Returns*:: Hash of gene ID and its definition def genes_as_hash unless (defined? @genes_as_hash) && @genes_as_hash @genes_as_hash = strings_as_hash(genes_as_strings) end @genes_as_hash end alias genes genes_as_hash # Enzymes described in the ENZYME lines. # --- # *Returns*:: Array containing String def enzymes_as_strings lines_fetch('ENZYME') end alias enzymes enzymes_as_strings # Reactions described in the REACTION lines. # --- # *Returns*:: Array containing String def reactions_as_strings lines_fetch('REACTION') end # Reactions described in the REACTION lines. # --- # *Returns*:: Hash of reaction ID and its definition def reactions_as_hash unless (defined? @reactions_as_hash) && @reactions_as_hash @reactions_as_hash = strings_as_hash(reactions_as_strings) end @reactions_as_hash end alias reactions reactions_as_hash # Compounds described in the COMPOUND lines. # --- # *Returns*:: Array containing String def compounds_as_strings lines_fetch('COMPOUND') end # Compounds described in the COMPOUND lines. # --- # *Returns*:: Hash of compound ID and its definition def compounds_as_hash unless (defined? @compounds_as_hash) && @compounds_as_hash @compounds_as_hash = strings_as_hash(compounds_as_strings) end @compounds_as_hash end alias compounds compounds_as_hash # Returns REL_PATHWAY field of the entry. # --- # *Returns*:: Array containing String objects def rel_pathways_as_strings lines_fetch('REL_PATHWAY') end # Returns REL_PATHWAY field as a Hash. Each key of the hash is # Pathway ID, and each value is the name of the pathway. # --- # *Returns*:: Hash def rel_pathways_as_hash unless defined? @rel_pathways_as_hash then hash = {} rel_pathways_as_strings.each do |line| entry_id, name = line.split(/\s+/, 2) hash[entry_id] = name end @rel_pathways_as_hash = hash end @rel_pathways_as_hash end alias rel_pathways rel_pathways_as_hash # KO pathway described in the KO_PATHWAY line. # --- # *Returns*:: String def ko_pathway field_fetch('KO_PATHWAY') end end # PATHWAY end # KEGG end # Bio bio-2.0.3/lib/bio/db/kegg/reaction.rb0000644000175000017500000000620014141516614016566 0ustar nileshnilesh# # = bio/db/kegg/reaction.rb - KEGG REACTION database class # # Copyright:: Copyright (C) 2004 Toshiaki Katayama # Copyright:: Copyright (C) 2009 Kozo Nishida # License:: The Ruby License # # $Id:$ # require 'bio/db' require 'bio/db/kegg/common' require 'enumerator' module Bio class KEGG class REACTION < KEGGDB DELIMITER = RS = "\n///\n" TAGSIZE = 12 include Common::PathwaysAsHash # Returns a Hash of the pathway ID and name in PATHWAY field. def pathways_as_hash; super; end if false #dummy for RDoc alias pathways pathways_as_hash include Common::OrthologsAsHash # Returns a Hash of the orthology ID and definition in ORTHOLOGY field. def orthologs_as_hash; super; end if false #dummy for RDoc alias orthologs orthologs_as_hash # Creates a new Bio::KEGG::REACTION object. # --- # *Arguments*: # * (required) _entry_: (String) single entry as a string # *Returns*:: Bio::KEGG::REACTION object def initialize(entry) super(entry, TAGSIZE) end # ID of the entry, described in the ENTRY line. # --- # *Returns*:: String def entry_id field_fetch('ENTRY')[/\S+/] end # Name of the reaction, described in the NAME line. # --- # *Returns*:: String def name field_fetch('NAME') end # Definition of the reaction, described in the DEFINITION line. # --- # *Returns*:: String def definition field_fetch('DEFINITION') end # Chemical equation, described in the EQUATION line. # --- # *Returns*:: String def equation field_fetch('EQUATION') end # KEGG RPAIR (ReactantPair) information, described in the RPAIR lines. # --- # *Returns*:: Array containing String def rpairs_as_strings lines_fetch('RPAIR') end # KEGG RPAIR (ReactantPair) information, described in the RPAIR lines. # Returns a hash of RPair IDs and [ name, type ] informations, for example, # { "RP12733" => [ "C00022_C00900", "trans" ], # "RP05698" => [ "C00011_C00022", "leave" ], # "RP00440" => [ "C00022_C00900", "main" ] # } # --- # *Returns*:: Hash def rpairs_as_hash unless defined? @rpairs_as_hash rps = {} rpairs_as_strings.each do |line| _, entry_id, name, rptype = line.split(/\s+/) rps[entry_id] = [ name, rptype ] end @rpairs_as_hash = rps end @rpairs_as_hash end alias rpairs rpairs_as_hash # Returns the content of the RPAIR entry as tokens # (RPair signature, RPair ID, , RPair type). # --- # *Returns*:: Array containing String def rpairs_as_tokens fetch('RPAIR').split(/\s+/) end # Pathway information, described in the PATHWAY lines. # --- # *Returns*:: Array containing String def pathways_as_strings lines_fetch('PATHWAY') end # Enzymes described in the ENZYME line. # --- # *Returns*:: Array containing String def enzymes unless @data['ENZYME'] @data['ENZYME'] = fetch('ENZYME').scan(/\S+/) end @data['ENZYME'] end # Orthologs described in the ORTHOLOGY lines. # --- # *Returns*:: Array containing String def orthologs_as_strings lines_fetch('ORTHOLOGY') end end # REACTION end # KEGG end # Bio bio-2.0.3/lib/bio/db/kegg/kgml.rb0000644000175000017500000004144314141516614015724 0ustar nileshnilesh# # = bio/db/kegg/kgml.rb - KEGG KGML parser class # # Copyright:: Copyright (C) 2005 # Toshiaki Katayama # License:: The Ruby License # # require 'rexml/document' module Bio class KEGG # == KGML (KEGG XML) parser # # See http://www.genome.jp/kegg/xml/ for more details on KGML. # # === Note for older version users # * Most of incompatible attribute names with KGML tags are now deprecated. # Use the names of KGML tags instead of old incompatible names that will # be removed in the future. # * Bio::KGML::Entry#id (entry_id is deprecated) # * Bio::KGML::Entry#type (category is deprecated) # * Bio::KGML::Relation#entry1 (node1 is deprecated) # * Bio::KGML::Relation#entry2 (node2 is deprecated) # * Bio::KGML::Relation#type (rel is deprecated) # * Bio::KGML::Reaction#name (entry_id is deprecated) # * Bio::KGML::Reaction#type (direction is deprecated) # * New class Bio::KGML::Graphics and new method Bio::KGML::Entry#graphics. # Because two or more graphics elements may exist, following attribute # methods in Bio::KGML::Entry are now deprecated and will be removed # in the future. See rdoc of these methods for details. # * Bio::KEGG::KGML::Entry#label # * Bio::KEGG::KGML::Entry#shape # * Bio::KEGG::KGML::Entry#x # * Bio::KEGG::KGML::Entry#y # * Bio::KEGG::KGML::Entry#width # * Bio::KEGG::KGML::Entry#height # * Bio::KEGG::KGML::Entry#fgcolor # * Bio::KEGG::KGML::Entry#bgcolor # * Incompatible changes: Bio::KEGG::KGML::Reaction#substrates now returns # an array containing Bio::KEGG::KGML::Substrate objects, and # Bio::KEGG::KGML::Reaction#products now returns an array containing # Bio::KEGG::KGML::Product objects. The changes enable us to get id of # substrates and products. # # === Incompatible attribute names with KGML tags # # # :map -> :pathway # names() # # edge() # # === Examples # # file = File.read("kgml/hsa/hsa00010.xml") # kgml = Bio::KEGG::KGML.new(file) # # # attributes # puts kgml.name # puts kgml.org # puts kgml.number # puts kgml.title # puts kgml.image # puts kgml.link # # kgml.entries.each do |entry| # # attributes # puts entry.id # puts entry.name # puts entry.type # puts entry.link # puts entry.reaction # # attributes # entry.graphics.each do |graphics| # puts graphics.name # puts graphics.type # puts graphics.x # puts graphics.y # puts graphics.width # puts graphics.height # puts graphics.fgcolor # puts graphics.bgcolor # end # # attributes # puts entry.components # # methood # puts entry.names # end # # kgml.relations.each do |relation| # # attributes # puts relation.entry1 # puts relation.entry2 # puts relation.type # # attributes # puts relation.name # puts relation.value # end # # kgml.reactions.each do |reaction| # # attributes # puts reaction.name # puts reaction.type # # attributes # reaction.substrates.each do |substrate| # puts substrate.id # puts substrate.name # # attributes # altnames = reaction.alt[entry_id] # altnames.each do |name| # puts name # end # end # # attributes # reaction.products.each do |product| # puts product.id # puts product.name # # attributes # altnames = reaction.alt[entry_id] # altnames.each do |name| # puts name # end # end # end # # === References # # * http://www.genome.jp/kegg/xml/docs/ # class KGML # Creates a new KGML object. # # --- # *Arguments*: # * (required) _str_: String containing xml data # *Returns*:: Bio::KEGG::KGML object def initialize(xml) dom = REXML::Document.new(xml) parse_root(dom) parse_entry(dom) parse_relation(dom) parse_reaction(dom) end # KEGG-style ID string of this pathway map (String or nil) # ('pathway' element) attr_reader :name # "ko" (KEGG Orthology), "ec" (KEGG ENZYME), # or the KEGG 3-letter organism code (String or nil) # ('pathway' element) attr_reader :org # map number (String or nil) # ('pathway' element) attr_reader :number # title (String or nil) # ('pathway' element) attr_reader :title # image URL of this pathway map (String or nil) # ('pathway' element) attr_reader :image # information URL of this pathway map (String or nil) # ('pathway' element) attr_reader :link # entry elements (Array containing KGML::Entry objects, or nil) attr_accessor :entries # relation elements (Array containing KGML::Relations objects, or nil) attr_accessor :relations # reaction elements (Array containing KGML::Reactions objects, or nil) attr_accessor :reactions # Bio::KEGG:Entry contains an entry element in the KGML. class Entry # ID of this entry in this pathway map (Integer or nil). # ('id' attribute in 'entry' element) attr_accessor :id alias entry_id id alias entry_id= id= # KEGG-style ID string of this entry (String or nil) attr_accessor :name # type of this entry (String or nil). # Normally one of the following: # * "ortholog" # * "enzyme" # * "reaction" # * "gene" # * "group" # * "compound" # * "map" # See http://www.genome.jp/kegg/xml/docs/ for details. # ('type' attribute in 'entry' element) attr_accessor :type alias category type alias category= type= # URL pointing information about this entry (String or nil) attr_accessor :link # KEGG-style ID string of this reaction (String or nil) attr_accessor :reaction # (Deprecated?) ('map' attribute in 'entry' element) attr_accessor :pathway # (private) get an attribute value in the graphics[-1] object def _graphics_attr(attr) if self.graphics then g = self.graphics[-1] g ? g.__send__(attr) : nil else nil end end private :_graphics_attr # (private) get an attribute value in the graphics[-1] object def _graphics_set_attr(attr, val) self.graphics ||= [] unless g = self.graphics[-1] then g = Graphics.new self.graphics.push(g) end g.__send__(attr, val) end private :_graphics_set_attr # Deprecated. # Same as self.graphics[-1].name (additional nil checks may be needed). # # label of the 'graphics' element (String or nil) # ('name' attribute in 'graphics' element) def label _graphics_attr(:name) end # Deprecated. # Same as self.graphics[-1].name= (additional nil checks may be needed). # def label=(val) _graphics_set_attr(:name=, val) end # Deprecated. # Same as self.graphics[-1].type (additional nil checks may be needed). # # shape of the 'graphics' element (String or nil) # Normally one of the following: # * "rectangle" # * "circle" # * "roundrectangle" # * "line" # If not specified, "rectangle" is the default value. # ('type' attribute in 'graphics' element) def shape _graphics_attr(:type) end # Deprecated. # Same as self.graphics[-1].type= (additional nil checks may be needed). # def shape=(val) _graphics_set_attr(:type=, val) end # Deprecated. # Same as self.graphics[-1].x (additional nil checks may be needed). # # X axis position (Integer or nil) ('graphics' element) def x _graphics_attr(:x) end # Deprecated. # Same as self.graphics[-1].x= (additional nil checks may be needed). # def x=(val) _graphics_set_attr(:x=, val) end # Deprecated. # Same as self.graphics[-1].y (additional nil checks may be needed). # # Y axis position (Integer or nil) ('graphics' element) def y _graphics_attr(:y) end # Deprecated. # Same as self.graphics[-1].y= (additional nil checks may be needed). # def y=(val) _graphics_set_attr(:y=, val) end # Deprecated. # Same as self.graphics[-1].width (additional nil checks may be needed). # # width (Integer or nil) ('graphics' element) def width _graphics_attr(:width) end # Deprecated. # Same as self.graphics[-1].width= (additional nil checks may be needed). # def width=(val) _graphics_set_attr(:width=, val) end # Deprecated. # Same as self.graphics[-1].height (additional nil checks may be needed). # # height (Integer or nil) ('graphics' element) def height _graphics_attr(:height) end # Deprecated. # Same as self.graphics[-1].height= (additional nil checks may be needed). # def height=(val) _graphics_set_attr(:height=, val) end # Deprecated. # Same as self.graphics[-1].fgcolor (additional nil checks may be needed). # # foreground color (String or nil) ('graphics' element) def fgcolor _graphics_attr(:fgcolor) end # Deprecated. # Same as self.graphics[-1].fgcolor= (additional nil checks may be needed). # def fgcolor=(val) _graphics_set_attr(:fgcolor=, val) end # Deprecated. # Same as self.graphics[-1].bgcolor (additional nil checks may be needed). # # background color (String or nil) ('graphics' element) def bgcolor _graphics_attr(:bgcolor) end # Deprecated. # Same as self.graphics[-1].bgcolor= (additional nil checks may be needed). # def bgcolor=(val) _graphics_set_attr(:bgcolor=, val) end # graphics elements included in this entry # (Array containing Graphics objects, or nil) attr_accessor :graphics # component elements included in this entry # (Array containing Integer objects, or nil) attr_accessor :components # the "name" attribute may contain multiple names separated # with space characters. This method returns the names # as an array. (Array containing String objects) def names @name.split(/\s+/) end end # Bio::KEGG::KGML::Graphics contains a 'graphics' element in the KGML. class Graphics # label of the 'graphics' element (String or nil) attr_accessor :name # shape of the 'graphics' element (String or nil) # Normally one of the following: # * "rectangle" # * "circle" # * "roundrectangle" # * "line" # If not specified, "rectangle" is the default value. attr_accessor :type # X axis position (Integer or nil) attr_accessor :x # Y axis position (Integer or nil) attr_accessor :y # polyline coordinates # (Array containing Array of [ x, y ] pair of Integer values) attr_accessor :coords # width (Integer or nil) attr_accessor :width # height (Integer or nil) attr_accessor :height # foreground color (String or nil) attr_accessor :fgcolor # background color (String or nil) attr_accessor :bgcolor end #class Graphics # Bio::KEGG::KGML::Relation contains a relation element in the KGML. class Relation # the first entry of the relation (Integer or nil) # ('entry1' attribute in 'relation' element) attr_accessor :entry1 alias node1 entry1 alias node1= entry1= # the second entry of the relation (Integer or nil) # ('entry2' attribute in 'relation' element) attr_accessor :entry2 alias node2 entry2 alias node2= entry2= # type of this relation (String or nil). # Normally one of the following: # * "ECrel" # * "PPrel" # * "GErel" # * "PCrel" # * "maplink" # ('type' attribute in 'relation' element) attr_accessor :type alias rel type alias rel= type= # interaction and/or relation type (String or nil). # See http://www.genome.jp/kegg/xml/docs/ for details. # ('name' attribute in 'subtype' element) attr_accessor :name # interaction and/or relation information (String or nil). # See http://www.genome.jp/kegg/xml/docs/ for details. # ('value' attribute in 'subtype' element) attr_accessor :value # (Deprecated?) def edge @value.to_i end end # Bio::KEGG::KGML::Reaction contains a reaction element in the KGML. class Reaction # ID of this reaction (Integer or nil) attr_accessor :id # KEGG-stype ID string of this reaction (String or nil) # ('name' attribute in 'reaction' element) attr_accessor :name alias entry_id name alias entry_id= name= # type of this reaction (String or nil). # Normally "reversible" or "irreversible". # ('type' attribute in 'reaction' element) attr_accessor :type alias direction type alias direction= type= # Substrates. Each substrate name is the KEGG-style ID string. # (Array containing String objects, or nil) attr_accessor :substrates # Products. Each product name is the KEGG-style ID string. # (Array containing String objects, or nil) attr_accessor :products # alt element (Hash) attr_accessor :alt end # Bio::KEGG::KGML::SubstrateProduct contains a substrate element # or a product element in the KGML. # # Please do not use SubstrateProduct directly. # Instead, please use Substrate or Product class. class SubstrateProduct # ID of this substrate or product (Integer or nil) attr_accessor :id # name of this substrate or product (String or nil) attr_accessor :name # Creates a new object def initialize(id = nil, name = nil) @id ||= id @name ||= name end end #class SubstrateProduct # Bio::KEGG::KGML::Substrate contains a substrate element in the KGML. class Substrate < SubstrateProduct end # Bio::KEGG::KGML::Product contains a product element in the KGML. class Product < SubstrateProduct end private def parse_root(dom) root = dom.root.attributes @name = root["name"] @org = root["org"] @number = root["number"] @title = root["title"] @image = root["image"] @link = root["link"] end def parse_entry(dom) @entries = Array.new dom.elements.each("/pathway/entry") { |node| attr = node.attributes entry = Entry.new entry.id = attr["id"].to_i entry.name = attr["name"] entry.type = attr["type"] # implied entry.link = attr["link"] entry.reaction = attr["reaction"] entry.pathway = attr["map"] node.elements.each("graphics") { |graphics| g = Graphics.new attr = graphics.attributes g.x = attr["x"].to_i g.y = attr["y"].to_i g.type = attr["type"] g.name = attr["name"] g.width = attr["width"].to_i g.height = attr["height"].to_i g.fgcolor = attr["fgcolor"] g.bgcolor = attr["bgcolor"] if str = attr["coords"] then coords = [] tmp = str.split(',') tmp.collect! { |n| n.to_i } while xx = tmp.shift yy = tmp.shift coords.push [ xx, yy ] end g.coords = coords else g.coords = nil end entry.graphics ||= [] entry.graphics.push g } node.elements.each("component") { |component| attr = component.attributes entry.components ||= [] entry.components << attr["id"].to_i } @entries << entry } end def parse_relation(dom) @relations = Array.new dom.elements.each("/pathway/relation") { |node| attr = node.attributes relation = Relation.new relation.entry1 = attr["entry1"].to_i relation.entry2 = attr["entry2"].to_i relation.type = attr["type"] node.elements.each("subtype") { |subtype| attr = subtype.attributes relation.name = attr["name"] relation.value = attr["value"] } @relations << relation } end def parse_reaction(dom) @reactions = Array.new dom.elements.each("/pathway/reaction") { |node| attr = node.attributes reaction = Reaction.new reaction.id = attr["id"].to_i reaction.name = attr["name"] reaction.type = attr["type"] substrates = Array.new products = Array.new hash = Hash.new node.elements.each("substrate") { |substrate| id = substrate.attributes["id"].to_i name = substrate.attributes["name"] substrates << Substrate.new(id, name) substrate.elements.each("alt") { |alt| hash[name] ||= Array.new hash[name] << alt.attributes["name"] } } node.elements.each("product") { |product| id = product.attributes["id"].to_i name = product.attributes["name"] products << Product.new(id, name) product.elements.each("alt") { |alt| hash[name] ||= Array.new hash[name] << alt.attributes["name"] } } reaction.substrates = substrates reaction.products = products reaction.alt = hash @reactions << reaction } end end # KGML end # KEGG end # Bio bio-2.0.3/lib/bio/db/kegg/glycan.rb0000644000175000017500000000602514141516614016244 0ustar nileshnilesh# # = bio/db/kegg/glycan.rb - KEGG GLYCAN database class # # Copyright:: Copyright (C) 2004 Toshiaki Katayama # License:: The Ruby License # # $Id:$ # require 'bio/db' require 'bio/db/kegg/common' module Bio class KEGG class GLYCAN < KEGGDB DELIMITER = RS = "\n///\n" TAGSIZE = 12 include Common::DblinksAsHash # Returns a Hash of the DB name and an Array of entry IDs in DBLINKS field. def dblinks_as_hash; super; end if false #dummy for RDoc alias dblinks dblinks_as_hash include Common::PathwaysAsHash # Returns a Hash of the pathway ID and name in PATHWAY field. def pathways_as_hash; super; end if false #dummy for RDoc alias pathways pathways_as_hash include Common::OrthologsAsHash # Returns a Hash of the orthology ID and definition in ORTHOLOGY field. def orthologs_as_hash; super; end if false #dummy for RDoc alias orthologs orthologs_as_hash def initialize(entry) super(entry, TAGSIZE) end # ENTRY def entry_id field_fetch('ENTRY')[/\S+/] end # NAME def name field_fetch('NAME') end # COMPOSITION def composition unless @data['COMPOSITION'] hash = Hash.new(0) fetch('COMPOSITION').scan(/\((\S+)\)(\d+)/).each do |key, val| hash[key] = val.to_i end @data['COMPOSITION'] = hash end @data['COMPOSITION'] end # MASS def mass unless @data['MASS'] @data['MASS'] = field_fetch('MASS')[/[\d\.]+/].to_f end @data['MASS'] end # CLASS def keggclass field_fetch('CLASS') end # COMPOUND def compounds unless @data['COMPOUND'] @data['COMPOUND'] = fetch('COMPOUND').split(/\s+/) end @data['COMPOUND'] end # REACTION def reactions unless @data['REACTION'] @data['REACTION'] = fetch('REACTION').split(/\s+/) end @data['REACTION'] end # PATHWAY def pathways_as_strings lines_fetch('PATHWAY') end # ENZYME def enzymes unless @data['ENZYME'] field = fetch('ENZYME') if /\(/.match(field) # old version @data['ENZYME'] = field.scan(/\S+ \(\S+\)/) else @data['ENZYME'] = field.scan(/\S+/) end end @data['ENZYME'] end # ORTHOLOGY def orthologs_as_strings unless @data['ORTHOLOGY'] @data['ORTHOLOGY'] = lines_fetch('ORTHOLOGY') end @data['ORTHOLOGY'] end # COMMENT def comment field_fetch('COMMENT') end # REMARK def remark field_fetch('REMARK') end # REFERENCE def references unless @data['REFERENCE'] ary = Array.new lines = lines_fetch('REFERENCE') lines.each do |line| if /^\d+\s+\[PMID/.match(line) ary << line else ary.last << " #{line.strip}" end end @data['REFERENCE'] = ary end @data['REFERENCE'] end # DBLINKS def dblinks_as_strings unless @data['DBLINKS'] @data['DBLINKS'] = lines_fetch('DBLINKS') end @data['DBLINKS'] end # ATOM, BOND def kcf return "#{get('NODE')}#{get('EDGE')}" end end # GLYCAN end # KEGG end # Bio bio-2.0.3/lib/bio/db/kegg/orthology.rb0000644000175000017500000000634214141516614017017 0ustar nileshnilesh# # = bio/db/kegg/orthology.rb - KEGG ORTHOLOGY database class # # Copyright:: Copyright (C) 2003-2007 Toshiaki Katayama # Copyright:: Copyright (C) 2003 Masumi Itoh # License:: The Ruby License # # $Id:$ # require 'bio/db' require 'bio/db/kegg/common' module Bio class KEGG # == Description # # KO (KEGG Orthology) entry parser. # # == References # # * http://www.genome.jp/dbget-bin/get_htext?KO # * ftp://ftp.genome.jp/pub/kegg/genes/ko # class ORTHOLOGY < KEGGDB DELIMITER = RS = "\n///\n" TAGSIZE = 12 include Common::DblinksAsHash # Returns a Hash of the DB name and an Array of entry IDs in DBLINKS field. def dblinks_as_hash; super; end if false #dummy for RDoc alias dblinks dblinks_as_hash include Common::GenesAsHash # Returns a Hash of the organism ID and an Array of entry IDs in GENES field. def genes_as_hash; super; end if false #dummy for RDoc alias genes genes_as_hash include Common::PathwaysAsHash # Returns a Hash of the pathway ID and name in PATHWAY field. def pathways_as_hash; super; end if false #dummy for RDoc alias pathways pathways_as_hash include Common::ModulesAsHash # Returns MODULE field as a Hash. # Each key of the hash is KEGG MODULE ID, # and each value is the name of the Pathway Module. # --- # *Returns*:: Hash def modules_as_hash; super; end if false #dummy for RDoc alias modules modules_as_hash include Common::References # REFERENCE -- Returns contents of the REFERENCE records as an Array of # Bio::Reference objects. # --- # *Returns*:: an Array containing Bio::Reference objects def references; super; end if false #dummy for RDoc # Reads a flat file format entry of the KO database. def initialize(entry) super(entry, TAGSIZE) end # Returns ID of the entry. def entry_id field_fetch('ENTRY')[/\S+/] end # Returns NAME field of the entry. def name field_fetch('NAME') end # Returns an Array of names in NAME field. def names name.split(', ') end # Returns DEFINITION field of the entry. def definition field_fetch('DEFINITION') end # Returns CLASS field of the entry. def keggclass field_fetch('CLASS') end # Returns an Array of biological classes in CLASS field. def keggclasses keggclass.gsub(/ \[[^\]]+/, '').split(/\] ?/) end # Pathways described in the PATHWAY field. # --- # *Returns*:: Array containing String def pathways_as_strings lines_fetch('PATHWAY') end # *OBSOLETE* Do not use this method. # Because KEGG ORTHOLOGY format is changed and PATHWAY field is added, # older "pathways" method is renamed and remain only for compatibility. # # Returns an Array of KEGG/PATHWAY ID in CLASS field. def pathways_in_keggclass keggclass.scan(/\[PATH:(.*?)\]/).flatten end # Returns MODULE field of the entry. # --- # *Returns*:: Array containing String objects def modules_as_strings lines_fetch('MODULE') end # Returns an Array of a database name and entry IDs in DBLINKS field. def dblinks_as_strings lines_fetch('DBLINKS') end # Returns an Array of the organism ID and entry IDs in GENES field. def genes_as_strings lines_fetch('GENES') end end # ORTHOLOGY end # KEGG end # Bio bio-2.0.3/lib/bio/db/kegg/enzyme.rb0000644000175000017500000000560314141516614016277 0ustar nileshnilesh# # = bio/db/kegg/enzyme.rb - KEGG/ENZYME database class # # Copyright:: Copyright (C) 2001, 2002, 2007 Toshiaki Katayama # License:: The Ruby License # # $Id:$ # require 'bio/db' require 'bio/db/kegg/common' module Bio class KEGG class ENZYME < KEGGDB DELIMITER = RS = "\n///\n" TAGSIZE = 12 include Common::DblinksAsHash # Returns a Hash of the DB name and an Array of entry IDs in DBLINKS field. def dblinks_as_hash; super; end if false #dummy for RDoc alias dblinks dblinks_as_hash include Common::PathwaysAsHash # Returns a Hash of the pathway ID and name in PATHWAY field. def pathways_as_hash; super; end if false #dummy for RDoc alias pathways pathways_as_hash include Common::OrthologsAsHash # Returns a Hash of the orthology ID and definition in ORTHOLOGY field. def orthologs_as_hash; super; end if false #dummy for RDoc alias orthologs orthologs_as_hash include Common::GenesAsHash # Returns a Hash of the organism ID and an Array of entry IDs in GENES field. def genes_as_hash; super; end if false #dummy for RDoc alias genes genes_as_hash def initialize(entry) super(entry, TAGSIZE) end # ENTRY def entry field_fetch('ENTRY') end def entry_id entry[/EC (\S+)/, 1] end def obsolete? entry[/Obsolete/] ? true : false end # NAME def names field_fetch('NAME').split(/\s*;\s*/) end def name names.first end # CLASS def classes lines_fetch('CLASS') end # SYSNAME def sysname field_fetch('SYSNAME') end # REACTION def reaction field_fetch('REACTION') end # ALL_REAC ';' def all_reac field_fetch('ALL_REAC') end def iubmb_reactions all_reac.sub(/;\s*\(other\).*/, '').split(/\s*;\s*/) end def kegg_reactions reac = all_reac if reac[/\(other\)/] reac.sub(/.*\(other\)\s*/, '').split(/\s*;\s*/) else [] end end # SUBSTRATE def substrates field_fetch('SUBSTRATE').split(/\s*;\s*/) end # PRODUCT def products field_fetch('PRODUCT').split(/\s*;\s*/) end # INHIBITOR def inhibitors field_fetch('INHIBITOR').split(/\s*;\s*/) end # COFACTOR def cofactors field_fetch('COFACTOR').split(/\s*;\s*/) end # COMMENT def comment field_fetch('COMMENT') end # PATHWAY def pathways_as_strings lines_fetch('PATHWAY') end # ORTHOLOGY def orthologs_as_strings lines_fetch('ORTHOLOGY') end # GENES def genes_as_strings lines_fetch('GENES') end # DISEASE def diseases lines_fetch('DISEASE') end # MOTIF def motifs lines_fetch('MOTIF') end # STRUCTURES def structures unless @data['STRUCTURES'] @data['STRUCTURES'] = fetch('STRUCTURES').sub(/(PDB: )*/,'').split(/\s+/) end @data['STRUCTURES'] end # REFERENCE # DBLINKS def dblinks_as_strings lines_fetch('DBLINKS') end end # ENZYME end # KEGG end # Bio bio-2.0.3/lib/bio/db/kegg/genes.rb0000644000175000017500000002375614141516614016102 0ustar nileshnilesh# # = bio/db/kegg/genes.rb - KEGG/GENES database class # # Copyright:: Copyright (C) 2001, 2002, 2006, 2010 # Toshiaki Katayama # License:: The Ruby License # # $Id:$ # # # == KEGG GENES parser # # See http://www.genome.jp/kegg/genes.html # # # === Examples # # require 'bio/io/fetch' # entry_string = Bio::Fetch.query('genes', 'b0002') # # entry = Bio::KEGG::GENES.new(entry_string) # # # ENTRY # p entry.entry # => Hash # # p entry.entry_id # => String # p entry.division # => String # p entry.organism # => String # # # NAME # p entry.name # => String # p entry.names # => Array # # # DEFINITION # p entry.definition # => String # p entry.eclinks # => Array # # # PATHWAY # p entry.pathway # => String # p entry.pathways # => Hash # # # POSITION # p entry.position # => String # p entry.chromosome # => String # p entry.gbposition # => String # p entry.locations # => Bio::Locations # # # MOTIF # p entry.motifs # => Hash of Array # # # DBLINKS # p entry.dblinks # => Hash of Array # # # STRUCTURE # p entry.structure # => Array # # # CODON_USAGE # p entry.codon_usage # => Hash # p entry.cu_list # => Array # # # AASEQ # p entry.aaseq # => Bio::Sequence::AA # p entry.aalen # => Fixnum # # # NTSEQ # p entry.ntseq # => Bio::Sequence::NA # p entry.naseq # => Bio::Sequence::NA # p entry.ntlen # => Fixnum # p entry.nalen # => Fixnum # module Bio autoload :Locations, 'bio/location' unless const_defined?(:Locations) autoload :Sequence, 'bio/sequence' unless const_defined?(:Sequence) require 'bio/db' require 'bio/db/kegg/common' class KEGG # == Description # # KEGG GENES entry parser. # # == References # # * http://www.genome.jp/kegg/genes.html # class GENES < KEGGDB DELIMITER = RS = "\n///\n" TAGSIZE = 12 include Common::DblinksAsHash # Returns a Hash of the DB name and an Array of entry IDs in DBLINKS field. def dblinks_as_hash; super; end if false #dummy for RDoc alias dblinks dblinks_as_hash include Common::PathwaysAsHash # Returns a Hash of the pathway ID and name in PATHWAY field. def pathways_as_hash; super; end if false #dummy for RDoc alias pathways pathways_as_hash include Common::OrthologsAsHash # Returns a Hash of the orthology ID and definition in ORTHOLOGY field. def orthologs_as_hash; super; end if false #dummy for RDoc alias orthologs orthologs_as_hash include Common::DiseasesAsHash # Returns a Hash of the disease ID and its definition def diseases_as_hash; super; end if false #dummy for RDoc alias diseases diseases_as_hash # Creates a new Bio::KEGG::GENES object. # --- # *Arguments*: # * (required) _entry_: (String) single entry as a string # *Returns*:: Bio::KEGG::GENES object def initialize(entry) super(entry, TAGSIZE) end # Returns the "ENTRY" line content as a Hash. # For example, # {"organism"=>"E.coli", "division"=>"CDS", "id"=>"b0356"} # # --- # *Returns*:: Hash def entry unless @data['ENTRY'] hash = Hash.new('') if get('ENTRY').length > 30 e = get('ENTRY') hash['id'] = e[12..29].strip hash['division'] = e[30..39].strip hash['organism'] = e[40..80].strip end @data['ENTRY'] = hash end @data['ENTRY'] end # ID of the entry, described in the ENTRY line. # --- # *Returns*:: String def entry_id entry['id'] end # Division of the entry, described in the ENTRY line. # --- # *Returns*:: String def division entry['division'] # CDS, tRNA etc. end # Organism name of the entry, described in the ENTRY line. # --- # *Returns*:: String def organism entry['organism'] # H.sapiens etc. end # Returns the NAME line. # --- # *Returns*:: String def name field_fetch('NAME') end # Names of the entry as an Array, described in the NAME line. # # --- # *Returns*:: Array containing String def names_as_array name.split(', ') end alias names names_as_array # The method will be deprecated. Use Bio::KEGG::GENES#names. # # Names of the entry as an Array, described in the NAME line. # # --- # *Returns*:: Array containing String def genes names_as_array end # The method will be deprecated. # Use entry.names.first instead. # # Returns the first gene name described in the NAME line. # --- # *Returns*:: String def gene genes.first end # Definition of the entry, described in the DEFINITION line. # --- # *Returns*:: String def definition field_fetch('DEFINITION') end # Enzyme's EC numbers shown in the DEFINITION line. # --- # *Returns*:: Array containing String def eclinks unless defined? @eclinks ec_list = definition.slice(/\[EC\:([^\]]+)\]/, 1) || definition.slice(/\(EC\:([^\)]+)\)/, 1) ary = ec_list ? ec_list.strip.split(/\s+/) : [] @eclinks = ary end @eclinks end # Orthologs described in the ORTHOLOGY lines. # --- # *Returns*:: Array containing String def orthologs_as_strings lines_fetch('ORTHOLOGY') end # Returns the PATHWAY lines as a String. # --- # *Returns*:: String def pathway unless defined? @pathway @pathway = fetch('PATHWAY') end @pathway end # Pathways described in the PATHWAY lines. # --- # *Returns*:: Array containing String def pathways_as_strings lines_fetch('PATHWAY') end # Networks described in the NETWORK lines. # --- # *Returns*:: Array containing String def networks_as_strings lines_fetch('NETWORK') end # Diseases described in the DISEASE lines. # --- # *Returns*:: Array containing String def diseases_as_strings lines_fetch('DISEASE') end # Drug targets described in the DRUG_TARGET lines. # --- # *Returns*:: Array containing String def drug_targets_as_strings lines_fetch('DRUG_TARGET') end # Returns CLASS field of the entry. def keggclass field_fetch('CLASS') end # Returns an Array of biological classes in CLASS field. def keggclasses keggclass.gsub(/ \[[^\]]+/, '').split(/\] ?/) end # The position in the genome described in the POSITION line. # --- # *Returns*:: String def position unless @data['POSITION'] @data['POSITION'] = fetch('POSITION').gsub(/\s/, '') end @data['POSITION'] end # Chromosome described in the POSITION line. # --- # *Returns*:: String or nil def chromosome if position[/:/] position.sub(/:.*/, '') elsif ! position[/\.\./] position else nil end end # The position in the genome described in the POSITION line # as GenBank feature table location formatted string. # --- # *Returns*:: String def gbposition position.sub(/.*?:/, '') end # The position in the genome described in the POSITION line # as Bio::Locations object. # --- # *Returns*:: Bio::Locations object def locations Bio::Locations.new(gbposition) end # Motif information described in the MOTIF lines. # --- # *Returns*:: Strings def motifs_as_strings lines_fetch('MOTIF') end # Motif information described in the MOTIF lines. # --- # *Returns*:: Hash def motifs_as_hash unless @data['MOTIF'] hash = {} db = nil motifs_as_strings.each do |line| if line[/^\S+:/] db, str = line.split(/:/, 2) else str = line end hash[db] ||= [] hash[db] += str.strip.split(/\s+/) end @data['MOTIF'] = hash end @data['MOTIF'] # Hash of Array of IDs in MOTIF end alias motifs motifs_as_hash # The specification of the method will be changed in the future. # Please use Bio::KEGG::GENES#motifs. # # Motif information described in the MOTIF lines. # --- # *Returns*:: Hash def motif motifs end # Links to other databases described in the DBLINKS lines. # --- # *Returns*:: Array containing String objects def dblinks_as_strings lines_fetch('DBLINKS') end # Returns structure ID information described in the STRUCTURE lines. # --- # *Returns*:: Array containing String def structure unless @data['STRUCTURE'] @data['STRUCTURE'] = fetch('STRUCTURE').sub(/(PDB: )*/,'').split(/\s+/) end @data['STRUCTURE'] # ['PDB:1A9X', ...] end alias structures structure # Codon usage data described in the CODON_USAGE lines. (Deprecated: no more exists) # --- # *Returns*:: Hash def codon_usage(codon = nil) unless @data['CODON_USAGE'] hash = Hash.new list = cu_list base = %w(t c a g) base.each_with_index do |x, i| base.each_with_index do |y, j| base.each_with_index do |z, k| hash["#{x}#{y}#{z}"] = list[i*16 + j*4 + k] end end end @data['CODON_USAGE'] = hash end @data['CODON_USAGE'] end # Codon usage data described in the CODON_USAGE lines as an array. # --- # *Returns*:: Array def cu_list ary = [] get('CODON_USAGE').sub(/.*/,'').each_line do |line| # cut 1st line line.chomp.sub(/^.{11}/, '').scan(/..../) do |cu| ary.push(cu.to_i) end end return ary end # Returns amino acid sequence described in the AASEQ lines. # --- # *Returns*:: Bio::Sequence::AA object def aaseq unless @data['AASEQ'] @data['AASEQ'] = Bio::Sequence::AA.new(fetch('AASEQ').gsub(/\d+/, '')) end @data['AASEQ'] end # Returns length of the amino acid sequence described in the AASEQ lines. # --- # *Returns*:: Integer def aalen fetch('AASEQ')[/\d+/].to_i end # Returns nucleic acid sequence described in the NTSEQ lines. # --- # *Returns*:: Bio::Sequence::NA object def ntseq unless @data['NTSEQ'] @data['NTSEQ'] = Bio::Sequence::NA.new(fetch('NTSEQ').gsub(/\d+/, '')) end @data['NTSEQ'] end alias naseq ntseq # Returns nucleic acid sequence length. # --- # *Returns*:: Integer def ntlen fetch('NTSEQ')[/\d+/].to_i end alias nalen ntlen end end # KEGG end # Bio bio-2.0.3/lib/bio/db/kegg/compound.rb0000644000175000017500000000501514141516614016611 0ustar nileshnilesh# # = bio/db/kegg/compound.rb - KEGG COMPOUND database class # # Copyright:: Copyright (C) 2001, 2002, 2004, 2007 Toshiaki Katayama # License:: The Ruby License # # $Id:$ # require 'bio/db' require 'bio/db/kegg/common' module Bio class KEGG # == Description # # Bio::KEGG::COMPOUND is a parser class for the KEGG COMPOUND database entry. # KEGG COMPOUND is a chemical structure database. # # == References # # * http://www.genome.jp/kegg/compound/ # class COMPOUND < KEGGDB DELIMITER = RS = "\n///\n" TAGSIZE = 12 include Common::DblinksAsHash # Returns a Hash of the DB name and an Array of entry IDs in DBLINKS field. def dblinks_as_hash; super; end if false #dummy for RDoc alias dblinks dblinks_as_hash include Common::PathwaysAsHash # Returns a Hash of the pathway ID and name in PATHWAY field. def pathways_as_hash; super; end if false #dummy for RDoc alias pathways pathways_as_hash # Creates a new Bio::KEGG::COMPOUND object. # --- # *Arguments*: # * (required) _entry_: (String) single entry as a string # *Returns*:: Bio::KEGG::COMPOUND object def initialize(entry) super(entry, TAGSIZE) end # ENTRY def entry_id field_fetch('ENTRY')[/\S+/] end # NAME def names field_fetch('NAME').split(/\s*;\s*/) end # The first name recorded in the NAME field. def name names.first end # FORMULA def formula field_fetch('FORMULA') end # MASS def mass field_fetch('MASS').to_f end # REMARK def remark field_fetch('REMARK') end # GLYCAN def glycans unless @data['GLYCAN'] @data['GLYCAN'] = fetch('GLYCAN').split(/\s+/) end @data['GLYCAN'] end # REACTION def reactions unless @data['REACTION'] @data['REACTION'] = fetch('REACTION').split(/\s+/) end @data['REACTION'] end # RPAIR def rpairs unless @data['RPAIR'] @data['RPAIR'] = fetch('RPAIR').split(/\s+/) end @data['RPAIR'] end # PATHWAY def pathways_as_strings lines_fetch('PATHWAY') end # ENZYME def enzymes unless @data['ENZYME'] field = fetch('ENZYME') if /\(/.match(field) # old version @data['ENZYME'] = field.scan(/\S+ \(\S+\)/) else @data['ENZYME'] = field.scan(/\S+/) end end @data['ENZYME'] end # DBLINKS def dblinks_as_strings lines_fetch('DBLINKS') end # ATOM, BOND def kcf return "#{get('ATOM')}#{get('BOND')}" end # COMMENT def comment field_fetch('COMMENT') end end # COMPOUND end # KEGG end # Bio bio-2.0.3/lib/bio/db/kegg/expression.rb0000644000175000017500000000575314141516614017175 0ustar nileshnilesh# # = bio/db/kegg/expression.rb - KEGG EXPRESSION database class # # Copyright:: Copyright (C) 2001-2003, 2005 # Shuichi Kawashima , # Toshiaki Katayama # License:: The Ruby License # # $Id: expression.rb,v 1.11 2007/04/05 23:35:41 trevor Exp $ # require "bio/db" module Bio class KEGG class EXPRESSION def initialize(entry) @orf2val = Hash.new('') @orf2rgb = Hash.new('') @orf2ratio = Hash.new('') @max_intensity = 10000 entry.split("\n").each do |line| unless /^#/ =~ line ary = line.split("\t") orf = ary.shift val = ary[2, 4].collect {|x| x.to_f} @orf2val[orf] = val end end end attr_reader :orf2val attr_reader :orf2rgb attr_reader :orf2ratio attr_reader :max_intensity def control_avg sum = 0.0 @orf2val.values.each do |v| sum += v[0] - v[1] end sum/orf2val.size end def target_avg sum = 0.0 @orf2val.values.each do |v| sum += v[2] - v[3] end sum/orf2val.size end def control_var sum = 0.0 avg = self.control_avg @orf2val.values.each do |v| tmp = v[0] - v[1] sum += (tmp - avg)*(tmp - avg) end sum/orf2val.size end def target_var sum = 0.0 avg = self.target_avg @orf2val.values.each do |v| tmp = v[2] - v[3] sum += (tmp - avg)*(tmp - avg) end sum/orf2val.size end def control_sd var = self.control_var Math.sqrt(var) end def target_sd var = self.target_var Math.sqrt(var) end def up_regulated(num=20, threshold=nil) logy_minus_logx ary = @orf2ratio.to_a.sort{|a, b| b[1] <=> a[1]} if threshold != nil i = 0 while ary[i][1] > threshold i += 1 end return ary[0..i] else return ary[0..num-1] end end def down_regulated(num=20, threshold=nil) logy_minus_logx ary = @orf2ratio.to_a.sort{|a, b| a[1] <=> b[1]} if threshold != nil i = 0 while ary[i][1] < threshold i += 1 end return ary[0..i] else return ary[0..num-1] end end def regulated(num=20, threshold=nil) logy_minus_logx ary = @orf2ratio.to_a.sort{|a, b| b[1].abs <=> a[1].abs} if threshold != nil i = 0 while ary[i][1].abs > threshold i += 1 end return ary[0..i] else return ary[0..num-1] end end def logy_minus_logx @orf2val.each do |k, v| @orf2ratio[k] = (1.0/Math.log10(2))*(Math.log10(v[2]-v[3]) - Math.log10(v[0]-v[1])) end end def val2rgb col_unit = @max_intensity/255 @orf2val.each do |k, v| tmp_val = ((v[0] - v[1])/col_unit).to_i if tmp_val > 255 g = "ff" else g = format("%02x", tmp_val) end tmp_val = ((v[2] - v[3])/col_unit).to_i if tmp_val > 255 r = "ff" else r = format("%02x", tmp_val) end @orf2rgb[k] = r + g + "00" end end end # class EXPRESSION end # class KEGG end # module Bio bio-2.0.3/lib/bio/db/kegg/genome.rb0000644000175000017500000001153714141516614016245 0ustar nileshnilesh# # = bio/db/kegg/genome.rb - KEGG/GENOME database class # # Copyright:: Copyright (C) 2001, 2002, 2007 Toshiaki Katayama # License:: The Ruby License # # $Id:$ # require 'bio/db' require 'bio/reference' require 'bio/db/kegg/common' module Bio class KEGG # == Description # # Parser for the KEGG GENOME database # # == References # # * ftp://ftp.genome.jp/pub/kegg/genomes/genome # * http://www.genome.jp/dbget-bin/www_bfind?genome # * http://www.genome.jp/kegg/catalog/org_list.html # class GENOME < KEGGDB DELIMITER = RS = "\n///\n" TAGSIZE = 12 include Common::References # REFERENCE -- Returns contents of the REFERENCE records as an Array of # Bio::Reference objects. def references; super; end if false #dummy for RDoc def initialize(entry) super(entry, TAGSIZE) end # (private) Returns a tag name of the field as a String. # Needed to redefine because of the PLASMID field. def tag_get(str) if /\APLASMID\s+/ =~ str.to_s then 'PLASMID' else super(str) end end private :tag_get # (private) Returns a String of the field without a tag name. # Needed to redefine because of the PLASMID field. def tag_cut(str) if /\APLASMID\s+/ =~ str.to_s then $' else super(str) end end private :tag_cut # ENTRY -- Returns contents of the ENTRY record as a String. def entry_id field_fetch('ENTRY')[/\S+/] end # NAME -- Returns contents of the NAME record as a String. def name field_fetch('NAME') end # DEFINITION -- Returns contents of the DEFINITION record as a String. def definition field_fetch('DEFINITION') end alias organism definition # TAXONOMY -- Returns contents of the TAXONOMY record as a Hash. def taxonomy unless @data['TAXONOMY'] taxid, lineage = subtag2array(get('TAXONOMY')) taxid = taxid ? truncate(tag_cut(taxid)) : '' lineage = lineage ? truncate(tag_cut(lineage)) : '' @data['TAXONOMY'] = { 'taxid' => taxid, 'lineage' => lineage, } @data['TAXONOMY'].default = '' end @data['TAXONOMY'] end # Returns NCBI taxonomy ID from the TAXONOMY record as a String. def taxid taxonomy['taxid'] end # Returns contents of the TAXONOMY/LINEAGE record as a String. def lineage taxonomy['lineage'] end # DATA_SOURCE -- Returns contents of the DATA_SOURCE record as a String. def data_source field_fetch('DATA_SOURCE') end # ORIGINAL_DB -- Returns contents of the ORIGINAL_DB record as a String. def original_db #field_fetch('ORIGINAL_DB') unless defined?(@original_db) @original_db = fetch('ORIGINAL_DB') end @original_db end # Returns ORIGINAL_DB record as an Array containing String objects. # # --- # *Arguments*: # *Returns*:: Array containing String objects def original_databases lines_fetch('ORIGINAL_DB') end # DISEASE -- Returns contents of the COMMENT record as a String. def disease field_fetch('DISEASE') end # COMMENT -- Returns contents of the COMMENT record as a String. def comment field_fetch('COMMENT') end # CHROMOSOME -- Returns contents of the CHROMOSOME records as an Array # of Hash. def chromosomes unless @data['CHROMOSOME'] @data['CHROMOSOME'] = [] toptag2array(get('CHROMOSOME')).each do |chr| hash = Hash.new('') subtag2array(chr).each do |field| hash[tag_get(field)] = truncate(tag_cut(field)) end @data['CHROMOSOME'].push(hash) end end @data['CHROMOSOME'] end # PLASMID -- Returns contents of the PLASMID records as an Array of Hash. def plasmids unless @data['PLASMID'] @data['PLASMID'] = [] toptag2array(get('PLASMID')).each do |chr| hash = Hash.new('') subtag2array(chr).each do |field| hash[tag_get(field)] = truncate(tag_cut(field)) end @data['PLASMID'].push(hash) end end @data['PLASMID'] end # STATISTICS -- Returns contents of the STATISTICS record as a Hash. def statistics unless @data['STATISTICS'] hash = Hash.new(0.0) get('STATISTICS').each_line do |line| case line when /nucleotides:\s+(\d+)/ hash['num_nuc'] = $1.to_i when /protein genes:\s+(\d+)/ hash['num_gene'] = $1.to_i when /RNA genes:\s+(\d+)/ hash['num_rna'] = $1.to_i end end @data['STATISTICS'] = hash end @data['STATISTICS'] end # Returns number of nucleotides from the STATISTICS record as a Fixnum. def nalen statistics['num_nuc'] end alias length nalen # Returns number of protein genes from the STATISTICS record as a Fixnum. def num_gene statistics['num_gene'] end # Returns number of rna from the STATISTICS record as a Fixnum. def num_rna statistics['num_rna'] end end # GENOME end # KEGG end # Bio bio-2.0.3/lib/bio/db/kegg/drug.rb0000644000175000017500000000536614141516614015737 0ustar nileshnilesh# # = bio/db/kegg/drug.rb - KEGG DRUG database class # # Copyright:: Copyright (C) 2007 Toshiaki Katayama # License:: The Ruby License # # $Id:$ # require 'bio/db' require 'bio/db/kegg/common' module Bio class KEGG # == Description # # Bio::KEGG::DRUG is a parser class for the KEGG DRUG database entry. # KEGG DRUG is a drug information database. # # == References # # * http://www.genome.jp/kegg/drug/ # class DRUG < KEGGDB DELIMITER = RS = "\n///\n" TAGSIZE = 12 include Common::DblinksAsHash # Returns a Hash of the DB name and an Array of entry IDs in DBLINKS field. def dblinks_as_hash; super; end if false #dummy for RDoc alias dblinks dblinks_as_hash include Common::PathwaysAsHash # Returns a Hash of the pathway ID and name in PATHWAY field. def pathways_as_hash; super; end if false #dummy for RDoc alias pathways pathways_as_hash # Creates a new Bio::KEGG::DRUG object. # --- # *Arguments*: # * (required) _entry_: (String) single entry as a string # *Returns*:: Bio::KEGG::DRUG object def initialize(entry) super(entry, TAGSIZE) end # ID of the entry, described in the ENTRY line. # --- # *Returns*:: String def entry_id field_fetch('ENTRY')[/\S+/] end # Names described in the NAME line. # --- # *Returns*:: Array containing String objects def names field_fetch('NAME').split(/\s*;\s*/) end # The first name recorded in the NAME field. # --- # *Returns*:: String def name names.first end # Chemical formula described in the FORMULA line. # --- # *Returns*:: String def formula field_fetch('FORMULA') end # Molecular weight described in the MASS line. # --- # *Returns*:: Float def mass field_fetch('MASS').to_f end # Biological or chemical activity described in the ACTIVITY line. # --- # *Returns*:: String def activity field_fetch('ACTIVITY') end # REMARK lines. # --- # *Returns*:: String def remark field_fetch('REMARK') end # List of KEGG Pathway IDs with short descriptions, # described in the PATHWAY lines. # --- # *Returns*:: Array containing String objects def pathways_as_strings lines_fetch('PATHWAY') end # List of database names and IDs, described in the DBLINKS lines. # --- # *Returns*:: Array containing String objects def dblinks_as_strings lines_fetch('DBLINKS') end # ATOM, BOND lines. # --- # *Returns*:: String def kcf return "#{get('ATOM')}#{get('BOND')}" end # COMMENT lines. # --- # *Returns*:: String def comment field_fetch('COMMENT') end # Product names described in the PRODUCTS lines. # --- # *Returns*:: Array containing String objects def products lines_fetch('PRODUCTS') end end # DRUG end # KEGG end # Bio bio-2.0.3/lib/bio/db/kegg/module.rb0000644000175000017500000000706314141516614016257 0ustar nileshnilesh# # = bio/db/kegg/module.rb - KEGG MODULE database class # # Copyright:: Copyright (C) 2010 Kozo Nishida # Copyright:: Copyright (C) 2010 Toshiaki Katayama # License:: The Ruby License # # require 'bio/db' require 'bio/db/kegg/common' module Bio class KEGG # == Description # # Bio::KEGG::MODULE is a parser class for the KEGG MODULE database entry. # # == References # # * http://www.kegg.jp/kegg-bin/get_htext?ko00002.keg # * ftp://ftp.genome.jp/pub/kegg/pathway/module # class MODULE < KEGGDB DELIMITER = RS = "\n///\n" TAGSIZE = 12 #-- # for a private method strings_as_hash. #++ include Common::StringsAsHash # Creates a new Bio::KEGG::MODULE object. # --- # *Arguments*: # * (required) _entry_: (String) single entry as a string # *Returns*:: Bio::KEGG::MODULE object def initialize(entry) super(entry, TAGSIZE) end # Return the ID, described in the ENTRY line. # --- # *Returns*:: String def entry_id field_fetch('ENTRY')[/\S+/] end # Name of the module, described in the NAME line. # --- # *Returns*:: String def name field_fetch('NAME') end # Definition of the module, described in the DEFINITION line. # --- # *Returns*:: String def definition field_fetch('DEFINITION') end # Name of the KEGG class, described in the CLASS line. # --- # *Returns*:: String def keggclass field_fetch('CLASS') end # Pathways described in the PATHWAY lines. # --- # *Returns*:: Array containing String def pathways_as_strings lines_fetch('PATHWAY') end # Pathways described in the PATHWAY lines. # --- # *Returns*:: Hash of pathway ID and its definition def pathways_as_hash unless (defined? @pathways_as_hash) && @pathways_as_hash @pathways_as_hash = strings_as_hash(pathways_as_strings) end @pathways_as_hash end alias pathways pathways_as_hash # Orthologs described in the ORTHOLOGY lines. # --- # *Returns*:: Array containing String def orthologs_as_strings lines_fetch('ORTHOLOGY') end # Orthologs described in the ORTHOLOGY lines. # --- # *Returns*:: Hash of orthology ID and its definition def orthologs_as_hash unless (defined? @orthologs_as_hash) && @orthologs_as_hash @orthologs_as_hash = strings_as_hash(orthologs_as_strings) end @orthologs_as_hash end alias orthologs orthologs_as_hash # All KO IDs in the ORTHOLOGY lines. # --- # *Returns*:: Array of orthology IDs def orthologs_as_array orthologs_as_hash.keys.map{|x| x.split(/\+|\-|,/)}.flatten.sort.uniq end # Reactions described in the REACTION lines. # --- # *Returns*:: Array containing String def reactions_as_strings lines_fetch('REACTION') end # Reactions described in the REACTION lines. # --- # *Returns*:: Hash of reaction ID and its definition def reactions_as_hash unless (defined? @reactions_as_hash) && @reactions_as_hash @reactions_as_hash = strings_as_hash(reactions_as_strings) end @reactions_as_hash end alias reactions reactions_as_hash # Compounds described in the COMPOUND lines. # --- # *Returns*:: Array containing String def compounds_as_strings lines_fetch('COMPOUND') end # Compounds described in the COMPOUND lines. # --- # *Returns*:: Hash of compound ID and its definition def compounds_as_hash unless (defined? @compounds_as_hash) && @compounds_as_hash @compounds_as_hash = strings_as_hash(compounds_as_strings) end @compounds_as_hash end alias compounds compounds_as_hash end # MODULE end # KEGG end # Bio bio-2.0.3/lib/bio/db/kegg/brite.rb0000644000175000017500000000117714141516614016077 0ustar nileshnilesh# # = bio/db/kegg/brite.rb - KEGG/BRITE database class # # Copyright:: Copyright (C) 2001 Toshiaki Katayama # License:: The Ruby License # # $Id: brite.rb,v 0.8 2007/04/05 23:35:41 trevor Exp $ # require 'bio/db' module Bio class KEGG # == Note # # This class is not completely implemented, but obsolete as the original # database BRITE has changed it's meaning. # class BRITE < KEGGDB DELIMITER = RS = "\n///\n" TAGSIZE = 12 def initialize(entry) super(entry, TAGSIZE) end # ENTRY # DEFINITION # RELATION # FACTORS # INTERACTION # SOURCE # REFERENCE end # BRITE end # KEGG end # Bio bio-2.0.3/lib/bio/db/go.rb0000644000175000017500000002346414141516614014465 0ustar nileshnilesh# # = bio/db/go.rb - Classes for Gene Ontology # # Copyright:: Copyright (C) 2003 # Mitsuteru C. Nakao # License:: The Ruby License # # # == Gene Ontology # # == Example # # == References # require 'bio/pathway' module Bio # = Bio::GO # Classes for Gene Ontology http://www.geneontology.org class GO # = Bio::GO::Ontology # # Container class for ontologies in the DAG Edit format. # # == Example # # c_data = File.open('component.oontology').read # go_c = Bio::GO::Ontology.new(c_data) # p go_c.bfs_shortest_path('0003673','0005632') class Ontology < Bio::Pathway # Bio::GO::Ontology.parse_ogids(line) # # Parsing GOID line in the DAGEdit format # GO:ID[ ; GO:ID...] def self.parse_goids(line) goids = [] loop { if /^ *[$%<]\S.+?;/ =~ line endpoint = line.index(';') + 1 line = line[endpoint..line.size] elsif /^,* GO:(\d{7}),*/ =~ line goids << $1.clone endpoint = line.index(goids.last) + goids.last.size line = line[endpoint..line.size] else break end } return goids end # Returns a Hash instance of the header lines in ontology flatfile. attr_reader :header_lines # attr_reader :id2term # attr_reader :id2id # Bio::GO::Ontology.new(str) # The DAG Edit format ontology data parser. def initialize(str) @id2term = {} @header_lines = {} @id2id = {} adj_list = dag_edit_format_parser(str) super(adj_list) end # Returns a GO_Term correspondig with the given GO_ID. def goid2term(goid) term = id2term[goid] term = id2term[id2id[goid]] if term == nil return term end private # constructing adjaency list for the given ontology def dag_edit_format_parser(str) stack = [] adj_list = [] str.each_line {|line| if /^!(.+?):\s+(\S.+)$/ =~ line # Parsing head lines tag = $1 value = $2 tag.gsub!(/-/,'_') next if tag == 'type' instance_eval("@header_lines['#{tag}'] = '#{value}'") next end case line when /^( *)([$<%])(.+?) ; GO:(\d{7})(\n*)/ # GO Term ; GO:ID depth = $1.length.to_i rel = $2 term = $3 goid1 = $4 en = $5 goids = parse_goids(line) # GO:ID[ ; GO:ID...] parse_synonyms(line) # synonym:Term[ ; synonym:Term...] stack[depth] = goids.first @id2term[goid1] = term next if depth == 0 goids.each {|goid| @id2term[goid] = term @id2id[goid] = goids.first adj_list << Bio::Relation.new(stack[depth - 1], goid, rel) } if en == "" loop { case line when /^\n$/ break when /^ *([<%]) (.+?) ; GO:(\d{7})/ # <%GO Term ; GO:ID rel1 = $1 term1 = $2 goid1 = $3 parse_goids(line) parse_synonyms(line) @id2term[goid1] = term1 goids.each {|goid| adj_list << Bio::Relation.new(goid1, goid, rel1) } else break end } end end } return adj_list end # Returns an ary of GO IDs by parsing an entry line in the DAG Edit # format. def parse_goids(line) Ontology.parse_goids(line) end # Bio::GO::Ontology#parse_synonyms(line) def parse_synonyms(line) synonyms = [] loop { if / ; synonym:(\S.+?) *[;<%\n]/ =~ line synonyms << $1.clone endpoint = line.index(synonyms.last) + synonyms.last.size line = line[endpoint..line.size] else break end } return synonyms end end # class Ontology # = Bio::GO::GeneAssociation # $CVSROOT/go/gene-associations/gene_association.* # # Data parser for the gene_association go annotation. # See also the file format http://www.geneontology.org/doc/GO.annotation.html#file # # == Example # # mgi_data = File.open('gene_association.mgi').read # mgi = Bio::GO::GeneAssociation.parser(mgi_data) # # Bio::GO::GeneAssociation.parser(mgi_data) do |entry| # p [entry.entry_id, entry.evidence, entry.goid] # end # class GeneAssociation # < Bio::DB # Delimiter DELIMITER = "\n" # Delimiter RS = DELIMITER # Returns an Array of parsed gene_association flatfile. # Block is acceptable. def self.parser(str) if block_given? str.each_line(DELIMITER) {|line| next if /^!/ =~ line yield GeneAssociation.new(line) } else galist = [] str.each_line(DELIMITER) {|line| next if /^!/ =~ line galist << GeneAssociation.new(line) } return galist end end # Returns DB variable. attr_reader :db # -> aStr # Returns Db_Object_Id variable. Alias to entry_id. attr_reader :db_object_id # -> aStr # Returns Db_Object_Symbol variable. attr_reader :db_object_symbol # Returns Db_Object_Name variable. attr_reader :qualifier # Returns Db_Reference variable. attr_reader :db_reference # -> [] # Returns Evidence code variable. attr_reader :evidence # Returns the entry is associated with this value. attr_reader :with # -> [] # Returns Aspect valiable. attr_reader :aspect # attr_reader :db_object_name # attr_reader :db_object_synonym # -> [] # Returns Db_Object_Type variable. attr_reader :db_object_type # Returns Taxon variable. attr_reader :taxon # Returns Date variable. attr_reader :date # attr_reader :assigned_by alias entry_id db_object_id # Parsing an entry (in a line) in the gene_association flatfile. def initialize(entry) tmp = entry.chomp.split(/\t/) @db = tmp[0] @db_object_id = tmp[1] @db_object_symbol = tmp[2] @qualifier = tmp[3] # @goid = tmp[4] @db_reference = tmp[5].split(/\|/) # @evidence = tmp[6] @with = tmp[7].split(/\|/) # @aspect = tmp[8] @db_object_name = tmp[9] # @db_object_synonym = tmp[10].split(/\|/) # @db_object_type = tmp[11] @taxon = tmp[12] # taxon:4932 @date = tmp[13] # 20010118 @assigned_by = tmp[14] end # Returns GO_ID in /\d{7}/ format. Giving not nil arg, returns # /GO:\d{7}/ style. # # * Bio::GO::GeneAssociation#goid -> "001234" # * Bio::GO::GeneAssociation#goid(true) -> "GO:001234" def goid(org = nil) if org @goid else @goid.sub('GO:','') end end # Bio::GO::GeneAssociation#to_str -> a line of gene_association file. def to_str return [@db, @db_object_id, @db_object_symbol, @qualifier, @goid, @db_reference.join("|"), @evidence, @with.join("|"), @aspect, @db_object_name, @db_object_synonym.join("|"), @db_object_type, @taxon, @date, @assigned_by].join("\t") end end # class GeneAssociation # = Container class for files in geneontology.org/go/external2go/*2go. # # The line syntax is: # # database: > GO: ; GO: # # == Example # # spkw2go = Bio::GO::External2go.new(File.read("spkw2go")) # spkw2go.size # spkw2go.each do |relation| # relation # -> {:db => "", :db_id => "", :go_term => "", :go_id => ""} # end # spkw2go.dbs # # == SAMPLE # !date: 2005/02/08 18:02:54 # !Mapping of SWISS-PROT KEYWORDS to GO terms. # !Evelyn Camon, SWISS-PROT. # ! # SP_KW:ATP synthesis > GO:ATP biosynthesis ; GO:0006754 # ... # class External2go < Array # Returns aHash of the external2go header information attr_reader :header # Constructor from parsing external2go file. def self.parser(str) e2g = self.new str.each_line do |line| line.chomp! if line =~ /^\!date: (.+)/ e2g.header[:date] = $1 elsif line =~ /^\!(.*)/ e2g.header[:desc] << $1 elsif ary = line.scan(/^(.+?):(.+) > GO:(.+) ; (GO:\d{7})/).first e2g << {:db_id => ary[1], :db => ary[0], :go_term => ary[2], :go_id => ary[3]} else raise("Invalid Format Line: \n #{line.inspect}\n") end end return e2g end # Constructor. # relation := {:db => aStr, :db_id => aStr, :go_term => aStr, :go_id => aStr} def initialize @header = {:date => '', :desc => []} super end # Bio::GO::External2go#set_date(value) def set_date(value) @header[:date] = value end # Bio::GO::External2go#set_desc(ary) def set_desc(ary) @header[:desc] = ary end # Bio::GO::External2go#to_str # Returns the contents in the external2go format. def to_str ["!date: #{@header[:date]}", @header[:desc].map {|e| "!#{e}" }, self.map { |e| [e[:db], ':', e[:db_id], ' > GO:', e[:go_term], ' ; ', e[:go_id]].join } ].join("\n") end # Returns ary of databases. def dbs self.map {|rel| rel[:db] }.uniq end # Returns ary of database IDs. def db_ids self.map {|rel| rel[:db_id] }.uniq end # Returns ary of GO Terms. def go_terms self.map {|rel| rel[:go_term] }.uniq end # Returns ary of GO IDs. def go_ids self.map {|rel| rel[:go_id] }.uniq end end # class External2go end # class GO end # module Bio bio-2.0.3/lib/bio/db/newick.rb0000644000175000017500000002762214141516614015340 0ustar nileshnilesh# # = bio/db/newick.rb - Newick Standard phylogenetic tree parser / formatter # # Copyright:: Copyright (C) 2004-2006 # Naohisa Goto # Daniel Amelang # License:: The Ruby License # # # == Description # # This file contains parser and formatter of Newick and NHX. # # == References # # * http://evolution.genetics.washington.edu/phylip/newick_doc.html # * http://www.phylosoft.org/forester/NHX.html # require 'strscan' require 'bio/tree' module Bio #--- # newick parser #+++ # Newick standard phylogenetic tree parser class. # # This is alpha version. Incompatible changes may be made frequently. class Newick # delemiter of the entry DELIMITER = RS = ";" # parse error class class ParseError < RuntimeError; end # same as Bio::Tree::Edge Edge = Bio::Tree::Edge # same as Bio::Tree::Node Node = Bio::Tree::Node # Creates a new Newick object. # _options_ for parsing can be set. # # Available options: # :bootstrap_style:: # :traditional for traditional bootstrap style, # :molphy for molphy style, # :disabled to ignore bootstrap strings. # For details of default actions, please read the notes below. # :parser:: # :naive for using naive parser, compatible with # BioRuby 1.1.0, which ignores quoted strings and # do not convert underscores to spaces. # # Notes for bootstrap style: # Molphy-style bootstrap values may always be parsed, even if # the options[:bootstrap_style] is set to # :traditional or :disabled. # # Note for default or traditional bootstrap style: # By default, if all of the internal node's names are numeric # and there are no NHX and no molphy-style boostrap values, # the names of internal nodes are regarded as bootstrap values. # options[:bootstrap_style] = :disabled or :molphy # to disable the feature (or at least one NHX tag exists). def initialize(str, options = nil) str = str.sub(/\;(.*)/m, ';') @original_string = str @entry_overrun = $1 @options = (options or {}) end # parser options # (in some cases, options can be automatically set by the parser) attr_reader :options # original string before parsing attr_reader :original_string # string after this entry attr_reader :entry_overrun # Gets the tree. # Returns a Bio::Tree object. def tree if !defined?(@tree) @tree = __parse_newick(@original_string, @options) else @tree end end # Re-parses the tree from the original string. # Returns self. # This method is useful after changing parser options. def reparse if defined?(@tree) remove_instance_variable(:@tree) end self.tree self end private # gets a option def __get_option(key, options) options[key] or (@options ? @options[key] : nil) end # Parses newick formatted leaf (or internal node) name. def __parse_newick_leaf(leaf_tokens, node, edge, options) t = leaf_tokens.shift if !t.kind_of?(Symbol) then node.name = t t = leaf_tokens.shift end if t == :':' then t = leaf_tokens.shift if !t.kind_of?(Symbol) then edge.distance_string = t if t and !(t.strip.empty?) t = leaf_tokens.shift end end if t == :'[' then btokens = leaf_tokens case __get_option(:original_format, options) when :nhx # regarded as NHX string which might be broken __parse_nhx(btokens, node, edge) when :traditional # simply ignored else case btokens[0].to_s.strip when '' # not automatically determined when /\A\&\&NHX/ # NHX string # force to set NHX mode @options[:original_format] = :nhx __parse_nhx(btokens, node, edge) else # Molphy-style boostrap values # let molphy mode if nothing determined @options[:original_format] ||= :molphy bstr = '' while t = btokens.shift and t != :']' bstr.concat t.to_s end node.bootstrap_string = bstr end #case btokens[0] end end if !btokens and !leaf_tokens.empty? then # syntax error? end node.name ||= '' # compatibility for older BioRuby # returns true true end # Parses NHX (New Hampshire eXtended) string def __parse_nhx(btokens, node, edge) btokens.shift if btokens[0] == '&&NHX' btokens.each do |str| break if str == :']' next if str.kind_of?(Symbol) tag, val = str.split(/\=/, 2) case tag when 'B' node.bootstrap_string = val when 'D' case val when 'Y' node.events.push :gene_duplication when 'N' node.events.push :speciation end when 'E' node.ec_number = val when 'L' edge.log_likelihood = val.to_f when 'S' node.scientific_name = val when 'T' node.taxonomy_id = val when 'W' edge.width = val.to_i when 'XB' edge.nhx_parameters[:XB] = val when 'O', 'SO' node.nhx_parameters[tag.to_sym] = val.to_i else # :Co, :SN, :Sw, :XN, and others node.nhx_parameters[tag.to_sym] = val end end #each true end # splits string to tokens def __parse_newick_tokenize(str, options) str = str.chop if str[-1..-1] == ';' # http://evolution.genetics.washington.edu/phylip/newick_doc.html # quoted_label ==> ' string_of_printing_characters ' # single quote in quoted_label is '' (two single quotes) # if __get_option(:parser, options) == :naive then ary = str.split(/([\(\)\,\:\[\]])/) ary.collect! { |x| x.strip!; x.empty? ? nil : x } ary.compact! ary.collect! do |x| if /\A([\(\)\,\:\[\]])\z/ =~ x then x.intern else x end end return ary end tokens = [] ss = StringScanner.new(str) while !(ss.eos?) if ss.scan(/\s+/) then # do nothing elsif ss.scan(/[\(\)\,\:\[\]]/) then # '(' or ')' or ',' or ':' or '[' or ']' t = ss.matched tokens.push t.intern elsif ss.scan(/\'/) then # quoted_label t = '' while true if ss.scan(/([^\']*)\'/) then t.concat ss[1] if ss.scan(/\'/) then # single quote in quoted_label t.concat ss.matched else break end else # incomplete quoted_label? break end end #while true unless ss.match?(/\s*[\(\)\,\:\[\]]/) or ss.match?(/\s*\z/) then # label continues? (illegal, but try to rescue) if ss.scan(/[^\(\)\,\:\[\]]+/) then t.concat ss.matched.lstrip end end tokens.push t elsif ss.scan(/[^\(\)\,\:\[\]]+/) then # unquoted_label t = ss.matched.strip t.gsub!(/[\r\n]/, '') # unquoted underscore should be converted to blank t.gsub!(/\_/, ' ') tokens.push t unless t.empty? else # unquoted_label in end of string t = ss.rest.strip t.gsub!(/[\r\n]/, '') # unquoted underscore should be converted to blank t.gsub!(/\_/, ' ') tokens.push t unless t.empty? ss.terminate end end #while !(ss.eos?) tokens end # get tokens for a leaf def __parse_newick_get_tokens_for_leaf(ary) r = [] while t = ary[0] and t != :',' and t != :')' and t != :'(' r.push ary.shift end r end # Parses newick formatted string. def __parse_newick(str, options = {}) # initializing root = Node.new cur_node = root edges = [] nodes = [ root ] internal_nodes = [] node_stack = [] # preparation of tokens ary = __parse_newick_tokenize(str, options) previous_token = nil # main loop while token = ary.shift #p token case token when :',' if previous_token == :',' or previous_token == :'(' then # there is a leaf whose name is empty. ary.unshift(token) ary.unshift('') token = nil end when :'(' node = Node.new nodes << node internal_nodes << node node_stack.push(cur_node) cur_node = node when :')' if previous_token == :',' or previous_token == :'(' then # there is a leaf whose name is empty. ary.unshift(token) ary.unshift('') token = nil else edge = Edge.new leaf_tokens = __parse_newick_get_tokens_for_leaf(ary) token = nil if leaf_tokens.size > 0 then __parse_newick_leaf(leaf_tokens, cur_node, edge, options) end parent = node_stack.pop raise ParseError, 'unmatched parentheses' unless parent edges << Bio::Relation.new(parent, cur_node, edge) cur_node = parent end else leaf = Node.new edge = Edge.new ary.unshift(token) leaf_tokens = __parse_newick_get_tokens_for_leaf(ary) token = nil __parse_newick_leaf(leaf_tokens, leaf, edge, options) nodes << leaf edges << Bio::Relation.new(cur_node, leaf, edge) end #case previous_token = token end #while raise ParseError, 'unmatched parentheses' unless node_stack.empty? bsopt = __get_option(:bootstrap_style, options) ofmt = __get_option(:original_format, options) unless bsopt == :disabled or bsopt == :molphy or ofmt == :nhx or ofmt == :molphy then # If all of the internal node's names are numeric, # the names are regarded as bootstrap values. flag = false internal_nodes.each do |inode| if inode.name and !inode.name.to_s.strip.empty? then if /\A[\+\-]?\d*\.?\d*\z/ =~ inode.name flag = true else flag = false break end end end if flag then @options[:bootstrap_style] = :traditional @options[:original_format] = :traditional internal_nodes.each do |inode| if inode.name then inode.bootstrap_string = inode.name inode.name = nil end end end end # Sets nodes order numbers nodes.each_with_index do |xnode, i| xnode.order_number = i end # If the root implicitly prepared by the program is a leaf and # there are no additional information for the edge from the root to # the first internal node, the root is removed. if rel = edges[-1] and rel.node == [ root, internal_nodes[0] ] and rel.relation.instance_eval { !defined?(@distance) and !defined?(@log_likelihood) and !defined?(@width) and !defined?(@nhx_parameters) } and edges.find_all { |x| x.node.include?(root) }.size == 1 nodes.shift edges.pop end # Let the tree into instance variables tree = Bio::Tree.new tree.instance_eval { @pathway.relations.concat(edges) @pathway.to_list } tree.root = nodes[0] tree.options.update(@options) tree end end #class Newick end #module Bio bio-2.0.3/lib/bio/db/fantom.rb0000644000175000017500000003612414141516614015341 0ustar nileshnilesh# # bio/db/fantom.rb - RIKEN FANTOM2 database classes # # Copyright:: Copyright (C) 2003 GOTO Naohisa # License:: The Ruby License # # $Id:$ # require 'rexml/document' require 'cgi' require 'uri' require 'net/http' require 'bio/db' require 'bio/command' #require 'bio/sequence' module Bio module FANTOM def query(idstr, http_proxy = nil) xml = get_by_id(idstr, http_proxy) seqs = MaXML::Sequences.new(xml.to_s) seqs[0] end module_function :query def get_by_id(idstr, http_proxy = nil) addr = 'fantom.gsc.riken.go.jp' port = 80 path = "/db/maxml/maxmlseq.cgi?masterid=#{CGI.escape(idstr.to_s)}&style=xml" xml = '' if http_proxy then proxy = URI.parse(http_proxy.to_s) Net::HTTP.start(addr, port, proxy.host, proxy.port) do |http| response = http.get(path) xml = response.body end else Bio::Command.start_http(addr, port) do |http| response = http.get(path) xml = response.body end end xml end module_function :get_by_id class MaXML < DB # DTD of MaXML(Mouse annotation XML) # http://fantom.gsc.riken.go.jp/maxml/maxml.dtd DELIMITER = RS = "\n--EOF--\n" # This class is for {allseq|repseq|allclust}.sep.xml, # not for {allseq|repseq|allclust}.xml. Data_XPath = '' def initialize(x) if x.is_a?(REXML::Element) then @elem = x else if x.is_a?(String) then x = x.sub(/#{Regexp.escape(DELIMITER)}\z/om, "\n") end doc = REXML::Document.new(x) @elem = doc.elements[self.class::Data_XPath] #raise 'element is null' unless @elem @elem = REXML::Document.new('') unless @elem end end attr_reader :elem def to_s @elem.to_s end def gsub_entities(str) # workaround for bug? if str then str.gsub(/\&\#(\d{1,3})\;/) { sprintf("%c", $1.to_i) } else str end end def entry_id unless defined?(@entry_id) @entry_id = @elem.attributes['id'] end @entry_id end def self.define_element_text_method(array) array.each do |tagstr| module_eval(" def #{tagstr} unless defined?(@#{tagstr}) @#{tagstr} = gsub_entities(@elem.text('#{tagstr}')) end @#{tagstr} end ") end end private_class_method :define_element_text_method class Cluster < MaXML # (MaXML cluster) # ftp://fantom2.gsc.riken.go.jp/fantom/2.1/allclust.sep.xml.gz Data_XPath = 'maxml-clusters/cluster' def representative_seqid unless defined?(@representative_seqid) @representative_seqid = gsub_entities(@elem.text('representative-seqid')) end @representative_seqid end def sequences unless defined?(@sequences) @sequences = MaXML::Sequences.new(@elem) end @sequences end def sequence(idstr = nil) idstr ? sequences[idstr] : representative_sequence end def representative_sequence unless defined?(@representative_sequence) rid = representative_seqid @representative_sequence = rid ? sequences[representative_seqid] : nil end @representative_sequence end alias representative_clone representative_sequence def representative_annotations e = representative_sequence e ? e.annotations : nil end def representative_cloneid e = representative_sequence e ? e.cloneid : nil end define_element_text_method(%w(fantomid)) end #class MaXML::Cluster class Sequences < MaXML Data_XPath = 'maxml-sequences' include Enumerable def each to_a.each { |x| yield x } end def to_a unless defined?(@sequences) @sequences = @elem.get_elements('sequence') @sequences.collect! { |e| MaXML::Sequence.new(e) } end @sequences end def get(idstr) unless defined?(@hash) @hash = {} end unless @hash.member?(idstr) then @hash[idstr] = self.find do |x| x.altid.values.index(idstr) end end @hash[idstr] end def [](*arg) if arg[0].is_a?(String) and arg.size == 1 then get(arg[0]) else to_a[*arg] end end def cloneids unless defined?(@cloneids) @cloneids = to_a.collect { |x| x.cloneid } end @cloneids end def id_strings unless defined?(@id_strings) @id_strings = to_a.collect { |x| x.id_strings } @id_strings.flatten! @id_strings.sort! @id_strings.uniq! end @id_strings end end #class MaXML::Sequences class Sequence < MaXML # (MaXML sequence) # ftp://fantom2.gsc.riken.go.jp/fantom/2.1/allseq.sep.xml.gz # ftp://fantom2.gsc.riken.go.jp/fantom/2.1/repseq.sep.xml.gz Data_XPath = 'maxml-sequences/sequence' def altid(t = nil) unless defined?(@altid) @altid = {} @elem.each_element('altid') do |e| @altid[e.attributes['type']] = gsub_entities(e.text) end end if t then @altid[t] else @altid end end def id_strings altid.values.sort.uniq end def library_id entry_id[0,2] end def annotations unless defined?(@annotations) @annotations = MaXML::Annotations.new(@elem.elements['annotations']) end @annotations end define_element_text_method(%w(annotator version modified_time comment)) def self.define_id_method(array) array.each do |tagstr| module_eval(" def #{tagstr} unless defined?(@#{tagstr}) @#{tagstr} = gsub_entities(@elem.text('#{tagstr}')) @#{tagstr} = altid('#{tagstr}') unless @#{tagstr} end @#{tagstr} end ") end end private_class_method :define_id_method define_id_method(%w(seqid fantomid cloneid rearrayid accession)) end #class MaXML::Sequence class Annotations < MaXML Data_XPath = nil include Enumerable def each to_a.each { |x| yield x } end def to_a unless defined?(@a) @a = @elem.get_elements('annotation') @a.collect! { |e| MaXML::Annotation.new(e) } end @a end def get_all_by_qualifier(qstr) unless defined?(@hash) @hash = {} end unless @hash.member?(qstr) then @hash[qstr] = self.find_all do |x| x.qualifier == qstr end end @hash[qstr] end def get_by_qualifier(qstr) a = get_all_by_qualifier(qstr) a ? a[0] : nil end def [](*arg) if arg[0].is_a?(String) and arg.size == 1 then get_by_qualifier(arg[0]) else to_a[*arg] end end def cds_start unless defined?(@cds_start) e = get_by_qualifier('cds_start') @cds_start = e ? e.anntext.to_i : nil end @cds_start end def cds_stop unless defined?(@cds_stop) e = get_by_qualifier('cds_stop') @cds_stop = e ? e.anntext.to_i : nil end @cds_stop end def gene_name unless defined?(@gene_name) e = get_by_qualifier('gene_name') @gene_name = e ? e.anntext : nil end @gene_name end def data_source unless defined?(@data_source) e = get_by_qualifier('gene_name') @data_source = e ? e.datasrc[0] : nil end @data_source end def evidence unless defined?(@evidence) e = get_by_qualifier('gene_name') @evidence = e ? e.evidence : nil end @evidence end end #class MaXML::Annotations class Annotation < MaXML def entry_id nil end class DataSrc < String def initialize(text, href) super(text) @href = href end attr_reader :href end def datasrc unless defined?(@datasrc) @datasrc = [] @elem.each_element('datasrc') do |e| text = e.text href = e.attributes['href'] @datasrc << DataSrc.new(gsub_entities(text), gsub_entities(href)) end end @datasrc end define_element_text_method(%w(qualifier srckey anntext evidence)) end #class MaXML::Annotation end #class MaXML end #module FANTOM end #module Bio =begin Bio::FANTOM are database classes (and modules) treating RIKEN FANTOM2 data. FANTOM2 is available at (()). = Bio::FANTOM This module contains useful methods to access databases. --- Bio::FANTOM.query(idstr, http_proxy=nil) Get MaXML sequence data corresponding to given ID through the internet from (()). Not that this class is not suitable for 'allclust.xml'. --- Bio::FANTOM::MaXML::Cluster.new(str) --- Bio::FANTOM::MaXML::Cluster#entry_id --- Bio::FANTOM::MaXML::Cluster#fantomid --- Bio::FANTOM::MaXML::Cluster#representative_seqid --- Bio::FANTOM::MaXML::Cluster#sequences Lists sequences in this cluster. Returns Bio::FANTOM::MaXML::Sequences object. --- Bio::FANTOM::MaXML::Cluster#sequence(id_str) Shows a sequence information of given id. Returns Bio::FANTOM::MaXML::Sequence object or nil. --- Bio::FANTOM::MaXML::Cluster#representataive_sequence --- Bio::FANTOM::MaXML::Cluster#representataive_clone Shows a sequence of repesentative_seqid. Returns Bio::FANTOM::MaXML::Sequence object (or nil). -- Bio::FANTOM::MaXML::Cluster#representative_annotations Shows annotations of repesentative sequence. Returns Bio::FANTOM::MaXML::Annotations object (or nil). -- Bio::FANTOM::MaXML::Cluster#representative_cloneid Shows cloneid of repesentative sequence. Returns String (or nil). = Bio::FANTOM::MaXML::Sequences The instances of this class are automatically created by Bio::FANTOM::MaXML::Cluster class. This class can also be used for 'allseq.sep.xml' and 'repseq.sep.xml', but you'd better using Bio::FANTOM::MaXML::Sequence class. In addition, this class can be used for 'allseq.xml' and 'repseq.xml', but you'd better not to use them, becase of the speed is very slow. --- Bio::FANTOM::MaXML::Sequences#to_a Returns an Array of Bio::FANTOM::MaXML::Sequence objects. --- Bio::FANTOM::MaXML::Sequences#each --- Bio::FANTOM::MaXML::Sequences#[](x) Same as to_a[x] when x is a integer. Same as get[x] when x is a string. --- Bio::FANTOM::MaXML::Sequences#get(id_str) Shows a sequence information of given id. Returns Bio::FANTOM::MaXML::Sequence object or nil. --- Bio::FANTOM::MaXML::Sequences#cloneids Shows clone ID list. Returns an array of strings. --- Bio::FANTOM::MaXML::Sequences#id_strings Shows ID list. Returns an array of strings. = Bio::FANTOM::MaXML::Sequence This class is for 'allseq.sep.xml' and 'repseq.sep.xml' found at (()) and (()). Not that this class is not suitable for 'allseq.xml' and 'repseq.xml'. In addition, the instances of this class are automatically created by Bio::FANTOM::MaXML::Sequences class. --- Bio::FANTOM::MaXML::Sequence.new(str) --- Bio::FANTOM::MaXML::Sequence#entry_id --- Bio::FANTOM::MaXML::Sequence#altid(type_str = nil) Returns hash of altid if no arguments are given. Returns ID as a string if a type of ID (string) is given. --- Bio::FANTOM::MaXML::Sequence#annotations Gets lists of annotation data. Returns a Bio::FANTOM::MaXML::Annotations object. --- Bio::FANTOM::MaXML::Sequence#id_strings Gets lists of ID. (same as altid.values) Returns an array of strings. --- Bio::FANTOM::MaXML::Sequence#library_id Shows library ID. (same as cloneid[0,2]) Library IDs are listed at: (()) =end bio-2.0.3/lib/bio/db/soft.rb0000644000175000017500000003365314141516614015034 0ustar nileshnilesh# # bio/db/soft.rb - Interface for SOFT formatted files # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # # $Id:$ # module Bio # # bio/db/soft.rb - Interface for SOFT formatted files # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # # # = Description # # "SOFT (Simple Omnibus in Text Format) is a compact, simple, line-based, # ASCII text format that incorporates experimental data and metadata." # -- GEO, National Center for Biotechnology Information # # The Bio::SOFT module reads SOFT Series or Platform formatted files that # contain information # describing one database, one series, one platform, and many samples (GEO # accessions). The data from the file can then be viewed with Ruby methods. # # Bio::SOFT also supports the reading of SOFT DataSet files which contain # one database, one dataset, and many subsets. # # Format specification is located here: # * http://www.ncbi.nlm.nih.gov/projects/geo/info/soft2.html#SOFTformat # # SOFT data files may be directly downloaded here: # * ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SOFT # # NCBI's Gene Expression Omnibus (GEO) is here: # * http://www.ncbi.nlm.nih.gov/geo # # = Usage # # If an attribute has more than one value then the values are stored in an # Array of String objects. Otherwise the attribute is stored as a String. # # The platform and each sample may contain a table of data. A dataset from a # DataSet file may also contain a table. # # Attributes are dynamically created based on the data in the file. # Predefined keys have not been created in advance due to the variability of # SOFT files in-the-wild. # # Keys are generally stored as Symbols. In the case of keys for samples and # table headings may alternatively be accessed with Strings. # The names of samples (geo accessions) are case sensitive. Table headers # are case insensitive. # # require 'bio' # # lines = IO.readlines('GSE3457_family.soft') # soft = Bio::SOFT.new(lines) # # soft.platform[:geo_accession] # => "GPL2092" # soft.platform[:organism] # => "Populus" # soft.platform[:contributor] # => ["Jingyi,,Li", "Olga,,Shevchenko", "Steve,H,Strauss", "Amy,M,Brunner"] # soft.platform[:data_row_count] # => "240" # soft.platform.keys.sort {|a,b| a.to_s <=> b.to_s}[0..2] # => [:contact_address, :contact_city, :contact_country] # soft.platform[:"contact_zip/postal_code"] # => "97331" # soft.platform[:table].header # => ["ID", "GB_ACC", "SPOT_ID", "Function/Family", "ORGANISM", "SEQUENCE"] # soft.platform[:table].header_description # => {"ORGANISM"=>"sequence sources", "SEQUENCE"=>"oligo sequence used", "Function/Family"=>"gene functions and family", "ID"=>"", "SPOT_ID"=>"", "GB_ACC"=>"Gene bank accession number"} # soft.platform[:table].rows.size # => 240 # soft.platform[:table].rows[5] # => ["A039P68U", "AI163321", "", "TF, flowering protein CONSTANS", "P. tremula x P. tremuloides", "AGAAAATTCGATATACTGTCCGTAAAGAGGTAGCACTTAGAATGCAACGGAATAAAGGGCAGTTCACCTC"] # soft.platform[:table].rows[5][4] # => "P. tremula x P. tremuloides" # soft.platform[:table].rows[5][:organism] # => "P. tremula x P. tremuloides" # soft.platform[:table].rows[5]['ORGANISM'] # => "P. tremula x P. tremuloides" # # soft.series[:geo_accession] # => "GSE3457" # soft.series[:contributor] # => ["Jingyi,,Li", "Olga,,Shevchenko", "Ove,,Nilsson", "Steve,H,Strauss", "Amy,M,Brunner"] # soft.series[:platform_id] # => "GPL2092" # soft.series[:sample_id].size # => 74 # soft.series[:sample_id][0..4] # => ["GSM77557", "GSM77558", "GSM77559", "GSM77560", "GSM77561"] # # soft.database[:name] # => "Gene Expression Omnibus (GEO)" # soft.database[:ref] # => "Nucleic Acids Res. 2005 Jan 1;33 Database Issue:D562-6" # soft.database[:institute] # => "NCBI NLM NIH" # # soft.samples.size # => 74 # soft.samples[:GSM77600][:series_id] # => "GSE3457" # soft.samples['GSM77600'][:series_id] # => "GSE3457" # soft.samples[:GSM77600][:platform_id] # => "GPL2092" # soft.samples[:GSM77600][:type] # => "RNA" # soft.samples[:GSM77600][:title] # => "jst2b2" # soft.samples[:GSM77600][:table].header # => ["ID_REF", "VALUE"] # soft.samples[:GSM77600][:table].header_description # => {"ID_REF"=>"", "VALUE"=>"normalized signal intensities"} # soft.samples[:GSM77600][:table].rows.size # => 217 # soft.samples[:GSM77600][:table].rows[5] # => ["A039P68U", "8.19"] # soft.samples[:GSM77600][:table].rows[5][0] # => "A039P68U" # soft.samples[:GSM77600][:table].rows[5][:id_ref] # => "A039P68U" # soft.samples[:GSM77600][:table].rows[5]['ID_REF'] # => "A039P68U" # # # lines = IO.readlines('GDS100.soft') # soft = Bio::SOFT.new(lines) # # soft.database[:name] # => "Gene Expression Omnibus (GEO)" # soft.database[:ref] # => "Nucleic Acids Res. 2005 Jan 1;33 Database Issue:D562-6" # soft.database[:institute] # => "NCBI NLM NIH" # # soft.subsets.size # => 8 # soft.subsets.keys # => ["GDS100_1", "GDS100_2", "GDS100_3", "GDS100_4", "GDS100_5", "GDS100_6", "GDS100_7", "GDS100_8"] # soft.subsets[:GDS100_7] # => {:dataset_id=>"GDS100", :type=>"time", :sample_id=>"GSM548,GSM543", :description=>"60 minute"} # soft.subsets['GDS100_7'][:sample_id] # => "GSM548,GSM543" # soft.subsets[:GDS100_7][:sample_id] # => "GSM548,GSM543" # soft.subsets[:GDS100_7][:dataset_id] # => "GDS100" # # soft.dataset[:order] # => "none" # soft.dataset[:sample_organism] # => "Escherichia coli" # soft.dataset[:table].header # => ["ID_REF", "IDENTIFIER", "GSM549", "GSM542", "GSM543", "GSM547", "GSM544", "GSM545", "GSM546", "GSM548"] # soft.dataset[:table].rows.size # => 5764 # soft.dataset[:table].rows[5] # => ["6", "EMPTY", "0.097", "0.217", "0.242", "0.067", "0.104", "0.162", "0.104", "0.154"] # soft.dataset[:table].rows[5][4] # => "0.242" # soft.dataset[:table].rows[5][:gsm549] # => "0.097" # soft.dataset[:table].rows[5][:GSM549] # => "0.097" # soft.dataset[:table].rows[5]['GSM549'] # => "0.097" # class SOFT attr_accessor :database attr_accessor :series, :platform, :samples attr_accessor :dataset, :subsets LINE_TYPE_ENTITY_INDICATOR = '^' LINE_TYPE_ENTITY_ATTRIBUTE = '!' LINE_TYPE_TABLE_HEADER = '#' # data table row defined by absence of line type character TABLE_COLUMN_DELIMITER = "\t" # Constructor # # --- # *Arguments* # * +lines+: (_required_) contents of SOFT formatted file # *Returns*:: Bio::SOFT def initialize(lines=nil) @database = Database.new @series = Series.new @platform = Platform.new @samples = Samples.new @dataset = Dataset.new @subsets = Subsets.new process(lines) end # Classes for Platform and Series files class Samples < Hash #:nodoc: def [](x) x = x.to_s if x.kind_of?( Symbol ) super(x) end end class Entity < Hash #:nodoc: end class Sample < Entity #:nodoc: end class Platform < Entity #:nodoc: end class Series < Entity #:nodoc: end # Classes for DataSet files class Subsets < Samples #:nodoc: end class Subset < Entity #:nodoc: end class Dataset < Entity #:nodoc: end # Classes important for all types class Database < Entity #:nodoc: end class Table #:nodoc: attr_accessor :header attr_accessor :header_description attr_accessor :rows class Header < Array #:nodoc: # @column_index contains column name => numerical index of column attr_accessor :column_index def initialize @column_index = {} end end class Row < Array #:nodoc: attr_accessor :header_object def initialize( n, header_object=nil ) @header_object = header_object super(n) end def [](x) if x.kind_of?( Integer ) super(x) else begin x = x.to_s.downcase.to_sym z = @header_object.column_index[x] unless z.kind_of?( Integer ) raise IndexError, "#{x.inspect} is not a valid index. Contents of @header_object.column_index: #{@header_object.column_index.inspect}" end self[ z ] rescue NoMethodError unless @header_object $stderr.puts "Table::Row @header_object undefined!" end raise end end end end def initialize() @header_description = {} @header = Header.new @rows = [] end def add_header( line ) raise "Can only define one header" unless @header.empty? @header = @header.concat( parse_row( line ) ) # beware of clobbering this into an Array @header.each_with_index do |key, i| @header.column_index[key.downcase.to_sym] = i end end def add_row( line ) @rows << Row.new( parse_row( line ), @header ) end def add_header_or_row( line ) @header.empty? ? add_header( line ) : add_row( line ) end protected def parse_row( line ) line.split( TABLE_COLUMN_DELIMITER ) end end ######### protected ######### def process(lines) current_indicator = nil current_class_accessor = nil in_table = false lines.each_with_index do |line, line_number| line.strip! next if line.nil? or line.empty? case line[0].chr when LINE_TYPE_ENTITY_INDICATOR current_indicator, value = split_label_value_in( line[1..-1] ) case current_indicator when 'DATABASE' current_class_accessor = @database when 'DATASET' current_class_accessor = @dataset when 'PLATFORM' current_class_accessor = @platform when 'SERIES' current_class_accessor = @series when 'SAMPLE' @samples[value] = Sample.new current_class_accessor = @samples[value] when 'SUBSET' @subsets[value] = Subset.new current_class_accessor = @subsets[value] else custom_raise( line_number, error_msg(40, line) ) end when LINE_TYPE_ENTITY_ATTRIBUTE if( current_indicator == nil ) custom_raise( line_number, error_msg(30) ) end # Handle lines such as '!platform_table_begin' and '!platform_table_end' if in_table if line =~ %r{table_begin} next elsif line =~ %r{table_end} in_table = false next end end key, value = split_label_value_in( line, true ) key_s = key.to_sym if current_class_accessor.include?( key_s ) if current_class_accessor[ key_s ].class != Array current_class_accessor[ key_s ] = [ current_class_accessor[ key_s ] ] end current_class_accessor[key.to_sym] << value else current_class_accessor[key.to_sym] = value end when LINE_TYPE_TABLE_HEADER if( (current_indicator != 'SAMPLE') and (current_indicator != 'PLATFORM') and (current_indicator != 'DATASET') ) custom_raise( line_number, error_msg(20, current_indicator.inspect) ) end in_table = true # may be redundant, computationally not worth checking # We only expect one table per platform or sample current_class_accessor[:table] ||= Table.new key, value = split_label_value_in( line ) # key[1..-1] -- Remove first character which is the LINE_TYPE_TABLE_HEADER current_class_accessor[:table].header_description[ key[1..-1] ] = value else # Type: No line type - should be a row in a table. if( (current_indicator == nil) or (in_table == false) ) custom_raise( line_number, error_msg(10) ) end current_class_accessor[:table].add_header_or_row( line ) end end end def error_msg( i, extra_info=nil ) case i when 10 x = ["Lines without line-type characters are rows in a table, but", "a line containing an entity indicator such as", "\"#{LINE_TYPE_ENTITY_INDICATOR}SAMPLE\",", "\"#{LINE_TYPE_ENTITY_INDICATOR}PLATFORM\",", "or \"#{LINE_TYPE_ENTITY_INDICATOR}DATASET\" has not been", "previously encountered or it does not appear that this line is", "in a table."] when 20 # tables are allowed inside samples and platforms x = ["Tables are only allowed inside SAMPLE and PLATFORM.", "Current table information found inside #{extra_info}."] when 30 x = ["Entity attribute line (\"#{LINE_TYPE_ENTITY_ATTRIBUTE}\")", "found before entity indicator line (\"#{LINE_TYPE_ENTITY_INDICATOR}\")"] when 40 x = ["Unkown entity indicator. Must be DATABASE, SAMPLE, PLATFORM,", "SERIES, DATASET, or SUBSET."] else raise IndexError, "Unknown error message requested." end x.join(" ") end def custom_raise( line_number_with_0_based_indexing, msg ) raise ["Error processing input line: #{line_number_with_0_based_indexing+1}", msg].join("\t") end def split_label_value_in( line, shift_key=false ) line =~ %r{\s*=\s*} key, value = $`, $' if shift_key key =~ %r{_} key = $' end if( (key == nil) or (value == nil) ) puts line.inspect raise end [key, value] end end # SOFT end # Bio bio-2.0.3/lib/bio/db/transfac.rb0000644000175000017500000001360614141516614015656 0ustar nileshnilesh# # = bio/db/transfac.rb - TRANSFAC database class # # Copyright:: Copyright (C) 2001 # Shuichi Kawashima # License:: The Ruby License # # $Id: transfac.rb,v 1.12 2007/04/05 23:35:40 trevor Exp $ # require "bio/db" require "matrix" module Bio class TRANSFAC < EMBLDB DELIMITER = RS = "\n//\n" TAGSIZE = 4 def initialize(entry) super(entry, TAGSIZE) end # AC Accession number (1 per entry) # # AC T00001 in the case of FACTOR # AC M00001 in the case of MATRIX # AC R00001 in the case of SITE # AC G000001 in the case of GENE # AC C00001 in the case of CLASS # AC 00001 in the case of CELL # def ac unless @data['AC'] @data['AC'] = fetch('AC') end @data['AC'] end alias entry_id ac # DT Date (1 per entry) # # DT DD.MM.YYYY (created); ewi. # DT DD.MM.YYYY (updated); mpr. # def dt field_fetch('DT') end alias date dt def cc field_fetch('CC') end alias comment cc def os field_fetch('OS') end alias org_species os def oc field_fetch('OC') end alias org_class oc def rn field_fetch('RN') end alias ref_no rn def ra field_fetch('RA') end alias ref_authors ra def rt field_fetch('RT') end alias ref_title rt def rl field_fetch('RL') end alias ref_data rl class MATRIX < TRANSFAC def initialize(entry) super(entry) end # NA Name of the binding factor def na field_fetch('NA') end # DE Short factor description def de field_fetch('DE') end # BF List of linked factor entries def bf field_fetch('bf') end def ma ma_dat = {} ma_ary = [] key = '' @orig.each do |k, v| if k =~ /^0*(\d+)/ key = $1.to_i ma_dat[key] = fetch(k) unless ma_dat[key] end end ma_dat.keys.sort.each_with_index do |k, i| ma_dat[k].slice!(-1, 1) ma_dat[k].slice!(-1, 1) ma_ary[i] = ma_dat[k].split(/\s+/) ma_ary[i].each_with_index do |x, j| ma_ary[i][j] = x.to_i end end Matrix[*ma_ary] end # BA Statistical basis def ba field_fetch('BA') end end class SITE < TRANSFAC def initialize(entry) super(entry) end def ty field_fetch('TY') end def de field_fetch('DE') end def re field_fetch('RE') end def sq field_fetch('SQ') end def el field_fetch('EL') end def sf field_fetch('SF') end def st field_fetch('ST') end def s1 field_fetch('S1') end def bf field_fetch('BF') end def so field_fetch('SO') end def mm field_fetch('MM') end # DR Cross-references to other databases (>=0 per entry) def dr field_fetch('DR') end end class FACTOR < TRANSFAC def initialize(entry) super(entry) end # FA Factor name def fa field_fetch('FA') end # SY Synonyms def sy field_fetch('SY') end # DR Cross-references to other databases (>=0 per entry) def dr field_fetch('DR') end # HO Homologs (suggested) def ho field_fetch('HO') end # CL Classification (class accession no.; class identifier; decimal # CL classification number.) def cl field_fetch('CL') end # SZ Size (length (number of amino acids); calculated molecular mass # SZ in kDa; experimental molecular mass (or range) in kDa # SZ (experimental method) [Ref] def sz field_fetch('SZ') end # SQ Sequence def sq field_fetch('SQ') end # SC Sequence comment, i. e. source of the protein sequence def sc field_fetch('SC') end # FT Feature table (1st position last position feature) def ft field_fetch('FT') end # SF Structural features def sf field_fetch('SF') end # CP Cell specificity (positive) def cp field_fetch('CP') end # CN Cell specificity (negative) def cn field_fetch('CN') end # FF Functional features def ff field_fetch('FF') end # IN Interacting factors (factor accession no.; factor name; # IN biological species.) def in field_fetch('IN') end # MX Matrix (matrix accession no.; matrix identifier) def mx field_fetch('MX') end # BS Bound sites (site accession no.; site ID; quality: N; biological # BS species) def bs field_fetch('BS') end end class CELL < TRANSFAC def initialize(entry) super(entry) end # CD Cell description def cd field_fetch('CD') end end class CLASS < TRANSFAC def initialize(entry) super(entry) end # CL Class def cl field_fetch('CL') end # SD Structure description def sd field_fetch('SD') end # BF Factors belonging to this class def bf field_fetch('BF') end # DR PROSITE accession numbers def dr field_fetch('DR') end end class GENE < TRANSFAC def initialize(entry) super(entry) end # SD Short description/name of the gene def sd field_fetch('SD') end # DE def de field_fetch('DE') end # BC Bucher promoter def bc field_fetch('BC') end # BS TRANSFAC SITE positions and accession numbers def bs field_fetch('BS') end # CO COMPEL accession number def co field_fetch('CO') end # TR TRRD accession number def tr field_fetch('TR') end end end # class TRANSFAC end # module Bio bio-2.0.3/lib/bio/db/fasta/0000755000175000017500000000000014141516614014620 5ustar nileshnileshbio-2.0.3/lib/bio/db/fasta/qual_to_biosequence.rb0000644000175000017500000000136614141516614021201 0ustar nileshnilesh# # = bio/db/fasta/qual_to_biosequence.rb - Bio::FastaNumericFormat to Bio::Sequence adapter module # # Copyright:: Copyright (C) 2010 # Naohisa Goto # License:: The Ruby License # require 'bio/sequence' require 'bio/sequence/adapter' require 'bio/db/fasta/fasta_to_biosequence' # Internal use only. Normal users should not use this module. # # Bio::FastaNumericFormat to Bio::Sequence adapter module. # It is internally used in Bio::FastaNumericFormat#to_biosequence. # module Bio::Sequence::Adapter::FastaNumericFormat extend Bio::Sequence::Adapter include Bio::Sequence::Adapter::FastaFormat private def_biosequence_adapter :quality_scores, :data end #module Bio::Sequence::Adapter::FastaNumericFormat bio-2.0.3/lib/bio/db/fasta/defline.rb0000644000175000017500000004062014141516614016555 0ustar nileshnilesh# # = bio/db/fasta/defline.rb - FASTA defline parser class # # Copyright:: Copyright (C) 2001, 2002 # GOTO Naohisa , # Toshiaki Katayama # License:: The Ruby License # # # == Description # # Bio::FastaDefline is a parser class for definition line (defline) # of the FASTA format. # # == Examples # # rub = Bio::FastaDefline.new('>gi|671595|emb|CAA85678.1| rubisco large subunit [Perovskia abrotanoides]') # rub.entry_id ==> 'gi|671595' # rub.get('emb') ==> 'CAA85678.1' # rub.emb ==> 'CAA85678.1' # rub.gi ==> '671595' # rub.accession ==> 'CAA85678' # rub.accessions ==> [ 'CAA85678' ] # rub.acc_version ==> 'CAA85678.1' # rub.locus ==> nil # rub.list_ids ==> [["gi", "671595"], # ["emb", "CAA85678.1", nil], # ["Perovskia abrotanoides"]] # # ckr = Bio::FastaDefline.new(">gi|2495000|sp|Q63931|CCKR_CAVPO CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)\001gi|2147182|pir||I51898 cholecystokinin A receptor - guinea pig\001gi|544724|gb|AAB29504.1| cholecystokinin A receptor; CCK-A receptor [Cavia]") # ckr.entry_id ==> "gi|2495000" # ckr.sp ==> "CCKR_CAVPO" # ckr.pir ==> "I51898" # ckr.gb ==> "AAB29504.1" # ckr.gi ==> "2495000" # ckr.accession ==> "AAB29504" # ckr.accessions ==> ["Q63931", "AAB29504"] # ckr.acc_version ==> "AAB29504.1" # ckr.locus ==> nil # ckr.description ==> # "CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)" # ckr.descriptions ==> # ["CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)", # "cholecystokinin A receptor - guinea pig", # "cholecystokinin A receptor; CCK-A receptor [Cavia]"] # ckr.words ==> # ["cavia", "cck-a", "cck-ar", "cholecystokinin", "guinea", "pig", # "receptor", "type"] # ckr.id_strings ==> # ["2495000", "Q63931", "CCKR_CAVPO", "2147182", "I51898", # "544724", "AAB29504.1", "Cavia"] # ckr.list_ids ==> # [["gi", "2495000"], ["sp", "Q63931", "CCKR_CAVPO"], # ["gi", "2147182"], ["pir", nil, "I51898"], ["gi", "544724"], # ["gb", "AAB29504.1", nil], ["Cavia"]] # # == References # # * FASTA format (WikiPedia) # http://en.wikipedia.org/wiki/FASTA_format # # * Fasta format description (NCBI) # http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml # module Bio #-- # split from fasta.rb revision 1.28 #++ # Parsing FASTA Defline, and extract IDs and other informations. # IDs are NSIDs (NCBI standard FASTA sequence identifiers) # or ":"-separated IDs. # # specs are described in: # ftp://ftp.ncbi.nih.gov/blast/documents/README.formatdb # http://blast.wustl.edu/doc/FAQ-Indexing.html#Identifiers # # === Examples # # rub = Bio::FastaDefline.new('>gi|671595|emb|CAA85678.1| rubisco large subunit [Perovskia abrotanoides]') # rub.entry_id ==> 'gi|671595' # rub.get('emb') ==> 'CAA85678.1' # rub.emb ==> 'CAA85678.1' # rub.gi ==> '671595' # rub.accession ==> 'CAA85678' # rub.accessions ==> [ 'CAA85678' ] # rub.acc_version ==> 'CAA85678.1' # rub.locus ==> nil # rub.list_ids ==> [["gi", "671595"], # ["emb", "CAA85678.1", nil], # ["Perovskia abrotanoides"]] # # ckr = Bio::FastaDefline.new(">gi|2495000|sp|Q63931|CCKR_CAVPO CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)\001gi|2147182|pir||I51898 cholecystokinin A receptor - guinea pig\001gi|544724|gb|AAB29504.1| cholecystokinin A receptor; CCK-A receptor [Cavia]") # ckr.entry_id ==> "gi|2495000" # ckr.sp ==> "CCKR_CAVPO" # ckr.pir ==> "I51898" # ckr.gb ==> "AAB29504.1" # ckr.gi ==> "2495000" # ckr.accession ==> "AAB29504" # ckr.accessions ==> ["Q63931", "AAB29504"] # ckr.acc_version ==> "AAB29504.1" # ckr.locus ==> nil # ckr.description ==> # "CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)" # ckr.descriptions ==> # ["CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)", # "cholecystokinin A receptor - guinea pig", # "cholecystokinin A receptor; CCK-A receptor [Cavia]"] # ckr.words ==> # ["cavia", "cck-a", "cck-ar", "cholecystokinin", "guinea", "pig", # "receptor", "type"] # ckr.id_strings ==> # ["2495000", "Q63931", "CCKR_CAVPO", "2147182", "I51898", # "544724", "AAB29504.1", "Cavia"] # ckr.list_ids ==> # [["gi", "2495000"], ["sp", "Q63931", "CCKR_CAVPO"], # ["gi", "2147182"], ["pir", nil, "I51898"], ["gi", "544724"], # ["gb", "AAB29504.1", nil], ["Cavia"]] # # === References # # * Fasta format description (NCBI) # http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml # # * Frequently Asked Questions: Indexing of Sequence Identifiers (by Warren R. Gish.) # (Dead link. Please find in http://web.archive.org/ ). # http://blast.wustl.edu/doc/FAQ-Indexing.html#Identifiers # # * Program Parameters for formatdb and fastacmd (by Tao Tao) # http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/formatdb_fastacmd.html#t1.1 # # * Formatdb README # ftp://ftp.ncbi.nih.gov/blast/documents/formatdb.html # class FastaDefline NSIDs = { # NCBI and WU-BLAST 'gi' => [ 'gi' ], # NCBI GI 'gb' => [ 'acc_version', 'locus' ], # GenBank 'emb' => [ 'acc_version', 'locus' ], # EMBL 'dbj' => [ 'acc_version', 'locus' ], # DDBJ 'sp' => [ 'accession', 'entry_id' ], # SWISS-PROT 'tr' => [ 'accession', 'entry_id' ], # TREMBL 'pdb' => [ 'entry_id', 'chain' ], # PDB 'bbs' => [ 'number' ], # GenInfo Backbone Id 'gnl' => [ 'database' , 'entry_id' ], # General database identifier 'ref' => [ 'acc_version' , 'locus' ], # NCBI Reference Sequence 'lcl' => [ 'entry_id' ], # Local Sequence identifier # WU-BLAST and NCBI 'pir' => [ 'accession', 'entry_id' ], # PIR 'prf' => [ 'accession', 'entry_id' ], # Protein Research Foundation 'pat' => [ 'country', 'number', 'serial' ], # Patents # WU-BLAST only 'bbm' => [ 'number' ], # NCBI GenInfo Backbone database identifier 'gim' => [ 'number' ], # NCBI GenInfo Import identifier 'gp' => [ 'acc_version', 'locus' ], # GenPept 'oth' => [ 'accession', 'name', 'release' ], # Other (user-definable) identifier 'tpd' => [ 'accession', 'name' ], # Third party annotation, DDBJ 'tpe' => [ 'accession', 'name' ], # Third party annotation, EMBL 'tpg' => [ 'accession', 'name' ], # Third party annotation, GenBank # Original 'ri' => [ 'entry_id', 'rearray_id', 'len' ], # RIKEN FANTOM DB } # Shows array that contains IDs (or ID-like strings). # Returns an array of arrays of strings. attr_reader :list_ids # Shows a possibly unique identifier. # Returns a string. attr_reader :entry_id # Parses given string. def initialize(str) @deflines = [] @info = {} @list_ids = [] @entry_id = nil lines = str.split("\x01") lines.each do |line| add_defline(line) end end #def initialize # Parses given string and adds parsed data. def add_defline(str) case str when /^\>?\s*((?:[^\|\s]*\|)+[^\s]+)\s*(.*)$/ # NSIDs # examples: # >gi|9910844|sp|Q9UWG2|RL3_METVA 50S ribosomal protein L3P # # note: regexp (:?) means grouping without backreferences i = $1 d = $2 tks = i.split('|') tks << '' if i[-1,1] == '|' a = parse_NSIDs(tks) i = a[0].join('|') a.unshift('|') d = tks.join('|') + ' ' + d unless tks.empty? a << d this_line = a match_EC(d) parse_square_brackets(d).each do |x| if !match_EC(x, false) and x =~ /\A[A-Z]/ then di = [ x ] @list_ids << di @info['organism'] = x unless @info['organism'] end end when /^\>?\s*([a-zA-Z0-9]+\:[^\s]+)\s*(.*)$/ # examples: # >sce:YBR160W CDC28, SRM5; cyclin-dependent protein kinase catalytic subunit [EC:2.7.1.-] [SP:CC28_YEAST] # >emb:CACDC28 [X80034] C.albicans CDC28 gene i = $1 d = $2 a = parse_ColonSepID(i) i = a.join(':') this_line = [ ':', a , d ] match_EC(d) parse_square_brackets(d).each do |x| if !match_EC(x, false) and x =~ /:/ then parse_ColonSepID(x) elsif x =~ /\A\s*([A-Z][A-Z0-9_\.]+)\s*\z/ then @list_ids << [ $1 ] end end when /^\>?\s*(\S+)(?:\s+(.+))?$/ # examples: # >ABC12345 this is test i = $1 d = $2.to_s @list_ids << [ i.chomp('.') ] this_line = [ '', [ i ], d ] match_EC(d) else i = str d = '' match_EC(i) this_line = [ '', [ i ], d ] end @deflines << this_line @entry_id = i unless @entry_id end def match_EC(str, write_flag = true) di = nil str.scan(/EC\:((:?[\-\d]+\.){3}(:?[\-\d]+))/i) do |x| di = [ 'EC', $1 ] if write_flag then @info['ec'] = di[1] if (!@info['ec'] or @info['ec'].to_s =~ /\-/) @list_ids << di end end di end private :match_EC def parse_square_brackets(str) r = [] str.scan(/\[([^\]]*)\]/) do |x| r << x[0] end r end private :parse_square_brackets def parse_ColonSepID(str) di = str.split(':', 2) di << nil if di.size <= 1 @list_ids << di di end private :parse_ColonSepID def parse_NSIDs(ary) # this method destroys ary data = [] while token = ary.shift if labels = self.class::NSIDs[token] then di = [ token ] labels.each do |x| token = ary.shift break unless token if self.class::NSIDs[token] then ary.unshift(token) break #each end if token.length > 0 then di << token else di << nil end end data << di else if token.length > 0 then # UCID (uncontrolled identifiers) di = [ token ] data << di @info['ucid'] = token unless @info['ucid'] end break #while end end #while @list_ids.concat data data end #def parse_NSIDs private :parse_NSIDs # Shows original string. # Note that the result of this method may be different from # original string which is given in FastaDefline.new method. def to_s @deflines.collect { |a| s = a[0] (a[1..-2].collect { |x| x.join(s) }.join(s) + ' ' + a[-1]).strip }.join("\x01") end # Shows description. def description @deflines[0].to_a[-1] end # Returns descriptions. def descriptions @deflines.collect do |a| a[-1] end end # Shows ID-like strings. # Returns an array of strings. def id_strings r = [] @list_ids.each do |a| if a.size >= 2 then r.concat a[1..-1].find_all { |x| x } else if a[0].to_s.size > 0 and a[0] =~ /\A[A-Za-z0-9\.\-\_]+\z/ r << a[0] end end end r.concat( words(true, []).find_all do |x| x =~ /\A[A-Z][A-Za-z0-9\_]*[0-9]+[A-Za-z0-9\_]+\z/ or x =~ /\A[A-Z][A-Z0-9]*\_[A-Z0-9\_]+\z/ end) r end KillWords = [ 'an', 'the', 'this', 'that', 'is', 'are', 'were', 'was', 'be', 'can', 'may', 'might', 'as', 'at', 'by', 'for', 'in', 'of', 'on', 'to', 'with', 'from', 'and', 'or', 'not', 'dna', 'rna', 'mrna', 'cdna', 'orf', 'aa', 'nt', 'pct', 'id', 'ec', 'sp', 'subsp', 'similar', 'involved', 'identical', 'identity', 'cds', 'clone', 'library', 'contig', 'contigs', 'homolog', 'homologue', 'homologs', 'homologous', 'protein', 'proteins', 'gene', 'genes', 'product', 'products', 'sequence', 'sequences', 'strain', 'strains', 'region', 'regions', ] KillWordsHash = {} KillWords.each { |x| KillWordsHash[x] = true } KillRegexpArray = [ /\A\d{1,3}\%?\z/, /\A[A-Z][A-Za-z0-9\_]*[0-9]+[A-Za-z0-9\_]+\z/, /\A[A-Z][A-Z0-9]*\_[A-Z0-9\_]+\z/ ] # Shows words used in the defline. Returns an Array. def words(case_sensitive = nil, kill_regexp = self.class::KillRegexpArray, kwhash = self.class::KillWordsHash) a = descriptions.join(' ').split(/[\.\,\;\:\(\)\[\]\{\}\<\>\"\'\`\~\/\|\?\!\&\@\# \x00-\x1f\x7f]+/) a.collect! do |x| x.sub!(/\A[\$\*\-\+]+/, '') x.sub!(/[\$\*\-\=]+\z/, '') if x.size <= 1 then nil elsif kwhash[x.downcase] then nil else if kill_regexp.find { |expr| expr =~ x } then nil else x end end end a.compact! a.collect! { |x| x.downcase } unless case_sensitive a.sort! a.uniq! a end # Returns identifires by a database name. def get(dbname) db = dbname.to_s r = nil unless r = @info[db] then di = @list_ids.find { |x| x[0] == db.to_s } if di and di.size <= 2 then r = di[-1] elsif di then labels = self.class::NSIDs[db] [ 'acc_version', 'entry_id', 'locus', 'accession', 'number'].each do |x| if i = labels.index(x) then r = di[i+1] break if r end end r = di[1..-1].find { |x| x } unless r end @info[db] = r if r end r end # Returns an identifier by given type. def get_by_type(type_str) @list_ids.each do |x| if labels = self.class::NSIDs[x[0]] then if i = labels.index(type_str) then return x[i+1] end end end nil end # Returns identifiers by given type. def get_all_by_type(*type_strarg) d = [] @list_ids.each do |x| if labels = self.class::NSIDs[x[0]] then type_strarg.each do |y| if i = labels.index(y) then d << x[i+1] if x[i+1] end end end end d end # Shows locus. # If the entry has more than two of such IDs, # only the first ID are shown. # Returns a string or nil. def locus unless defined?(@locus) @locus = get_by_type('locus') end @locus end # Shows GI. # If the entry has more than two of such IDs, # only the first ID are shown. # Returns a string or nil. def gi unless defined?(@gi) then @gi = get_by_type('gi') end @gi end # Shows accession with version number. # If the entry has more than two of such IDs, # only the first ID are shown. # Returns a string or nil. def acc_version unless defined?(@acc_version) then @acc_version = get_by_type('acc_version') end @acc_version end # Shows accession numbers. # Returns an array of strings. def accessions unless defined?(@accessions) then @accessions = get_all_by_type('accession', 'acc_version') @accessions.collect! { |x| x.sub(/\..*\z/, '') } end @accessions end # Shows an accession number. def accession unless defined?(@accession) then if acc_version then @accession = acc_version.split('.')[0] else @accession = accessions[0] end end @accession end def method_missing(name, *args) # raise ArgumentError, # "wrong # of arguments(#{args.size} for 1)" if args.size >= 2 r = get(name, *args) if !r and !(self.class::NSIDs[name.to_s]) then raise "NameError: undefined method `#{name.inspect}'" end r end end #class FastaDefline end #module Bio bio-2.0.3/lib/bio/db/fasta/qual.rb0000644000175000017500000000657714141516614016126 0ustar nileshnilesh# # = bio/db/fasta/qual.rb - Qual format, FASTA formatted numeric entry # # Copyright:: Copyright (C) 2001, 2002, 2009 # Naohisa Goto , # Toshiaki Katayama # License:: The Ruby License # # $Id:$ # # == Description # # QUAL format, FASTA formatted numeric entry. # # == Examples # # See documents of Bio::FastaNumericFormat class. # # == References # # * FASTA format (WikiPedia) # http://en.wikipedia.org/wiki/FASTA_format # # * Phred quality score (WikiPedia) # http://en.wikipedia.org/wiki/Phred_quality_score # # * Fasta format description (NCBI) # http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml # require 'bio/db/fasta' module Bio # Treats a FASTA formatted numerical entry, such as: # # >id and/or some comments <== comment line # 24 15 23 29 20 13 20 21 21 23 22 25 13 <== numerical data # 22 17 15 25 27 32 26 32 29 29 25 # # The precedent '>' can be omitted and the trailing '>' will be removed # automatically. # # --- Bio::FastaNumericFormat.new(entry) # # Stores the comment and the list of the numerical data. # # --- Bio::FastaNumericFormat#definition # # The comment line of the FASTA formatted data. # # * FASTA format (Wikipedia) # http://en.wikipedia.org/wiki/FASTA_format # # * Phred quality score (WikiPedia) # http://en.wikipedia.org/wiki/Phred_quality_score # class FastaNumericFormat < FastaFormat # Returns the list of the numerical data (typically the quality score # of its corresponding sequence) as an Array. # --- # *Returns*:: (Array containing Integer) numbers def data unless defined?(@list) @list = @data.strip.split(/\s+/).map {|x| x.to_i} end @list end # Returns the number of elements in the numerical data, # which will be the same of its corresponding sequence length. # --- # *Returns*:: (Integer) the number of elements def length data.length end # Yields on each elements of the numerical data. # --- # *Yields*:: (Integer) a numerical data element # *Returns*:: (undefined) def each data.each do |x| yield x end end # Returns the n-th element. If out of range, returns nil. # --- # *Arguments*: # * (required) _n_: (Integer) position # *Returns*:: (Integer or nil) the value def [](n) data[n] end # Returns the data as a Bio::Sequence object. # In the returned sequence object, the length of the sequence is zero, # and the numeric data is stored to the Bio::Sequence#quality_scores # attirbute. # # Because the meaning of the numeric data is unclear, # Bio::Sequence#quality_score_type is not set by default. # # Note: If you modify the returned Bio::Sequence object, # the sequence or definition in this FastaNumericFormat object # might also be changed (but not always be changed) # because of efficiency. # # --- # *Arguments*: # *Returns*:: (Bio::Sequence) sequence object def to_biosequence s = Bio::Sequence.adapter(self, Bio::Sequence::Adapter::FastaNumericFormat) s.seq = Bio::Sequence::Generic.new('') s end alias to_seq to_biosequence undef query, blast, fasta, seq, naseq, nalen, aaseq, aalen end #class FastaNumericFormat end #module Bio bio-2.0.3/lib/bio/db/fasta/fasta_to_biosequence.rb0000644000175000017500000000301414141516614021325 0ustar nileshnilesh# # = bio/db/fasta/fasta_to_biosequence.rb - Bio::FastaFormat to Bio::Sequence adapter module # # Copyright:: Copyright (C) 2008 # Naohisa Goto , # License:: The Ruby License # # $Id:$ # require 'bio/sequence' require 'bio/sequence/adapter' # Internal use only. Normal users should not use this module. # # Bio::FastaFormat to Bio::Sequence adapter module. # It is internally used in Bio::FastaFormat#to_biosequence. # module Bio::Sequence::Adapter::FastaFormat extend Bio::Sequence::Adapter private def_biosequence_adapter :seq # primary accession def_biosequence_adapter :primary_accession do |orig| orig.identifiers.accessions.first or orig.identifiers.entry_id end # secondary accessions def_biosequence_adapter :secondary_accessions do |orig| orig.identifiers.accessions[1..-1] end # entry_id def_biosequence_adapter :entry_id do |orig| orig.identifiers.locus or orig.identifiers.accessions.first or orig.identifiers.entry_id end # NCBI GI is stored on other_seqids def_biosequence_adapter :other_seqids do |orig| other = [] if orig.identifiers.gi then other.push Bio::Sequence::DBLink.new('GI', orig.identifiers.gi) end other.empty? ? nil : other end # definition def_biosequence_adapter :definition do |orig| if orig.identifiers.accessions.empty? and !(orig.identifiers.gi) then orig.definition else orig.identifiers.description end end end #module Bio::Sequence::Adapter::FastaFormat bio-2.0.3/lib/bio/db/fasta/format_qual.rb0000644000175000017500000001413014141516614017456 0ustar nileshnilesh# # = bio/db/fasta/format_qual.rb - Qual format and FastaNumericFormat generater # # Copyright:: Copyright (C) 2009 # Naohisa Goto # License:: The Ruby License # module Bio::Sequence::Format::Formatter # INTERNAL USE ONLY, YOU SHOULD NOT USE THIS CLASS. # Simple FastaNumeric format output class for Bio::Sequence. class Fasta_numeric < Bio::Sequence::Format::FormatterBase # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. # # Creates a new FastaNumericFormat generater object from the sequence. # # It does not care whether the content of the quality score is # consistent with the sequence or not, e.g. it does not check # length of the quality score. # # --- # *Arguments*: # * _sequence_: Bio::Sequence object # * (optional) :header => _header_: (String) (default nil) # * (optional) :width => _width_: (Fixnum) (default 70) def initialize; end if false # dummy for RDoc # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. # # Output the FASTA format string of the sequence. # # Currently, this method is used in Bio::Sequence#output like so, # # s = Bio::Sequence.new('atgc') # s.quality_scores = [ 70, 80, 90, 100 ] # puts s.output(:fasta_numeric) # --- # *Returns*:: String object def output header = @options[:header] width = @options.has_key?(:width) ? @options[:width] : 70 seq = @sequence.seq.to_s entry_id = @sequence.entry_id || "#{@sequence.primary_accession}.#{@sequence.sequence_version}" definition = @sequence.definition header ||= "#{entry_id} #{definition}" sc = fastanumeric_quality_scores(seq) if width then if width <= 0 then main = sc.join("\n") else len = 0 main = sc.collect do |x| str = (len == 0) ? "#{x}" : " #{x}" len += str.size if len > width then len = "#{x}".size str = "\n#{x}" end str end.join('') end else main = sc.join(' ') end ">#{header}\n#{main}\n" end private def fastanumeric_quality_scores(seq) @sequence.quality_scores || [] end end #class Fasta_numeric # INTERNAL USE ONLY, YOU SHOULD NOT USE THIS CLASS. # Simple Qual format (sequence quality) output class for Bio::Sequence. class Qual < Fasta_numeric # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. # # Creates a new Qual format generater object from the sequence. # # The only difference from Fastanumeric is that Qual outputs # Phred score by default, and data conversion will be performed # if needed. Output score type can be changed by the # ":quality_score_type" option. # # If the sequence have no quality score type information # and no error probabilities, but the score exists, # the score is regarded as :phred (Phred score). # # --- # *Arguments*: # * _sequence_: Bio::Sequence object # * (optional) :header => _header_: (String) (default nil) # * (optional) :width => _width_: (Fixnum) (default 70) # * (optional) :quality_score_type => _type_: (Symbol) (default nil) # * (optional) :default_score => _score_: (Integer) default score for bases that have no valid quality scores or error probabilities (default 0) def initialize; end if false # dummy for RDoc private def fastanumeric_quality_scores(seq) qsc = qual_quality_scores(seq) if qsc.size > seq.length then qsc = qsc[0, seq.length] elsif qsc.size < seq.length then padding = @options[:default_score] || 0 psize = seq.length - qsc.size qsc += Array.new(psize, padding) end qsc end def qual_quality_scores(seq) return [] if seq.length <= 0 # get output quality score type fmt = @options[:quality_score_type] qsc = @sequence.quality_scores qsc_type = @sequence.quality_score_type # checks if no need to convert if qsc and qsc_type == fmt and qsc.size >= seq.length then return qsc end # default output quality score type is :phred fmt ||= :phred # If quality score type of the sequence is nil, implicitly # regarded as :phred. qsc_type ||= :phred # checks error_probabilities ep = @sequence.error_probabilities if ep and ep.size >= seq.length then case fmt when :phred return Bio::Sequence::QualityScore::Phred.p2q(ep[0, seq.length]) when :solexa return Bio::Sequence::QualityScore::Solexa.p2q(ep[0, seq.length]) end end # Checks if scores can be converted. if qsc and qsc.size >= seq.length then case [ qsc_type, fmt ] when [ :phred, :solexa ] return Bio::Sequence::QualityScore::Phred.convert_scores_to_solexa(qsc[0, seq.length]) when [ :solexa, :phred ] return Bio::Sequence::QualityScore::Solexa.convert_scores_to_phred(qsc[0, seq.length]) end end # checks quality scores type case qsc_type when :phred, :solexa #does nothing else qsc_type = nil qsc = nil end # collects piece of information qsc_cov = qsc ? qsc.size.quo(seq.length) : 0 ep_cov = ep ? ep.size.quo(seq.length) : 0 if qsc_cov > ep_cov then case [ qsc_type, fmt ] when [ :phred, :phred ], [ :solexa, :solexa ] return qsc when [ :phred, :solexa ] return Bio::Sequence::QualityScore::Phred.convert_scores_to_solexa(qsc) when [ :solexa, :phred ] return Bio::Sequence::QualityScore::Solexa.convert_scores_to_phred(qsc) end elsif ep_cov > qsc_cov then case fmt when :phred return Bio::Sequence::QualityScore::Phred.p2q(ep) when :solexa return Bio::Sequence::QualityScore::Solexa.p2q(ep) end end # if no information, returns empty array return [] end end #class Qual end #module Bio::Sequence::Format::Formatter bio-2.0.3/lib/bio/db/fasta/format_fasta.rb0000644000175000017500000000547714141516614017630 0ustar nileshnilesh# # = bio/db/fasta/format_fasta.rb - Fasta format generater # # Copyright:: Copyright (C) 2006-2008 # Toshiaki Katayama , # Naohisa Goto , # Jan Aerts # License:: The Ruby License # module Bio::Sequence::Format::Formatter # INTERNAL USE ONLY, YOU SHOULD NOT USE THIS CLASS. # Simple Fasta format output class for Bio::Sequence. class Fasta < Bio::Sequence::Format::FormatterBase # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. # # Creates a new Fasta format generater object from the sequence. # # --- # *Arguments*: # * _sequence_: Bio::Sequence object # * (optional) :header => _header_: String (default nil) # * (optional) :width => _width_: Fixnum (default 70) def initialize; end if false # dummy for RDoc # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. # # Output the FASTA format string of the sequence. # # Currently, this method is used in Bio::Sequence#output like so, # # s = Bio::Sequence.new('atgc') # puts s.output(:fasta) #=> "> \natgc\n" # --- # *Returns*:: String object def output header = @options[:header] width = @options.has_key?(:width) ? @options[:width] : 70 seq = @sequence.seq entry_id = @sequence.entry_id || "#{@sequence.primary_accession}.#{@sequence.sequence_version}" definition = @sequence.definition header ||= "#{entry_id} #{definition}" ">#{header}\n" + if width seq.to_s.gsub(Regexp.new(".{1,#{width}}"), "\\0\n") else seq.to_s + "\n" end end end #class Fasta # INTERNAL USE ONLY, YOU SHOULD NOT USE THIS CLASS. # NCBI-Style Fasta format output class for Bio::Sequence. # (like "ncbi" format in EMBOSS) # # Note that this class is under construction. class Fasta_ncbi < Bio::Sequence::Format::FormatterBase # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. # # Output the FASTA format string of the sequence. # # Currently, this method is used in Bio::Sequence#output like so, # # s = Bio::Sequence.new('atgc') # puts s.output(:ncbi) #=> "> \natgc\n" # --- # *Returns*:: String object def output width = 70 seq = @sequence.seq #gi = @sequence.gi_number dbname = 'lcl' if @sequence.primary_accession.to_s.empty? then idstr = @sequence.entry_id else idstr = "#{@sequence.primary_accession}.#{@sequence.sequence_version}" end definition = @sequence.definition header = "#{dbname}|#{idstr} #{definition}" ">#{header}\n" + seq.to_s.gsub(Regexp.new(".{1,#{width}}"), "\\0\n") end end #class Ncbi end #module Bio::Sequence::Format::Formatter bio-2.0.3/lib/bio/db/medline.rb0000644000175000017500000001670614141516614015476 0ustar nileshnilesh# # = bio/db/medline.rb - NCBI PubMed/MEDLINE database class # # Copyright:: Copyright (C) 2001, 2005 # Toshiaki Katayama # License:: The Ruby License # # $Id: medline.rb,v 1.17 2007/12/21 05:12:41 k Exp $ # require 'bio/db' module Bio # == Description # # NCBI PubMed/MEDLINE database class. # # == Examples # # medline = Bio::MEDLINE.new(txt) # medline.reference # medline.pmid == medline.entry_id # medilne.mesh # class MEDLINE < NCBIDB def initialize(entry) @pubmed = Hash.new('') tag = '' entry.each_line do |line| if line =~ /^\w/ tag = line[0,4].strip else # continuation from previous lines @pubmed[tag] = @pubmed[tag].sub(/(?:\r|\r\n|\n)\z/, ' ') end value = line[6..-1] @pubmed[tag] += value if value end end attr_reader :pubmed # returns a Reference object. def reference hash = Hash.new hash['authors'] = authors hash['title'] = title hash['journal'] = journal hash['volume'] = volume hash['issue'] = issue hash['pages'] = pages hash['year'] = year hash['pubmed'] = pmid hash['medline'] = ui hash['abstract'] = abstract hash['mesh'] = mesh hash['doi'] = doi hash['affiliations'] = affiliations hash.delete_if { |k, v| v.nil? or v.empty? } return Reference.new(hash) end ### Common MEDLINE tags # PMID - PubMed Unique Identifier # Unique number assigned to each PubMed citation. def pmid @pubmed['PMID'].strip end alias entry_id pmid # UI - MEDLINE Unique Identifier # Unique number assigned to each MEDLINE citation. def ui @pubmed['UI'].strip end # TA - Journal Title Abbreviation # Standard journal title abbreviation. def ta @pubmed['TA'].gsub(/\s+/, ' ').strip end alias journal ta # VI - Volume # Journal volume. def vi @pubmed['VI'].strip end alias volume vi # IP - Issue # The number of the issue, part, or supplement of the journal in which # the article was published. def ip @pubmed['IP'].strip end alias issue ip # PG - Page Number # The full pagination of the article. def pg @pubmed['PG'].strip end def pages pages = pg if pages =~ /-/ from, to = pages.split('-') if (len = from.length - to.length) > 0 to = from[0,len] + to end pages = "#{from}-#{to}" end return pages end # DP - Publication Date # The date the article was published. def dp @pubmed['DP'].strip end alias date dp def year dp[0,4] end # TI - Title Words # The title of the article. def ti @pubmed['TI'].gsub(/\s+/, ' ').strip end alias title ti # AB - Abstract # Abstract. def ab @pubmed['AB'].gsub(/\s+/, ' ').strip end alias abstract ab # AU - Author Name # Authors' names. def au @pubmed['AU'].strip end def authors authors = [] au.split(/\n/).each do |author| if author =~ / / name = author.split(/\s+/) suffix = nil if name.length > 2 && name[-2] =~ /^[A-Z]+$/ # second to last are the initials suffix = name.pop end initial = name.pop.split(//).join('. ') author = "#{name.join(' ')}, #{initial}." end if suffix author << " " + suffix end authors.push(author) end return authors end # SO - Source # Composite field containing bibliographic information. def so @pubmed['SO'].strip end alias source so # MH - MeSH Terms # NLM's controlled vocabulary. def mh @pubmed['MH'].strip.split(/\n/) end alias mesh mh # AD - Affiliation # Institutional affiliation and address of the first author, and grant # numbers. def ad @pubmed['AD'].strip.split(/\n/) end alias affiliations ad # AID - Article Identifier # Article ID values may include the pii (controlled publisher identifier) # or doi (Digital Object Identifier). def doi @pubmed['AID'][/(\S+) \[doi\]/, 1] end def pii @pubmed['AID'][/(\S+) \[pii\]/, 1] end ### Other MEDLINE tags # CI - Copyright Information # Copyright statement. # CIN - Comment In # Reference containing a comment about the article. # CN - Collective Name # Corporate author or group names with authorship responsibility. # CON - Comment On # Reference upon which the article comments. # CY - Country # The place of publication of the journal. # DA - Date Created # Used for internal processing at NLM. # DCOM - Date Completed # Used for internal processing at NLM. # DEP - Date of Electronic Publication # Electronic publication date. # EDAT - Entrez Date # The date the citation was added to PubMed. # EIN - Erratum In # Reference containing a published erratum to the article. # GS - Gene Symbol # Abbreviated gene names (used 1991 through 1996). # ID - Identification Number # Research grant numbers, contract numbers, or both that designate # financial support by any agency of the US PHS (Public Health Service). # IS - ISSN # International Standard Serial Number of the journal. # JC - Journal Title Code # MEDLINE unique three-character code for the journal. # JID - NLM Unique ID # Unique journal ID in NLM's catalog of books, journals, and audiovisuals. # LA - Language # The language in which the article was published. # LR - Last Revision Date # The date a change was made to the record during a maintenance procedure. # MHDA - MeSH Date # The date MeSH terms were added to the citation. The MeSH date is the # same as the Entrez date until MeSH are added. # PHST - Publication History Status Date # History status date. # PS - Personal Name as Subject # Individual is the subject of the article. # PST - Publication Status # Publication status. # PT - Publication Type # The type of material the article represents. def pt @pubmed['PT'].strip.split(/\n/) end alias publication_type pt # RF - Number of References # Number of bibliographic references for Review articles. # RIN - Retraction In # Retraction of the article # RN - EC/RN Number # Number assigned by the Enzyme Commission to designate a particular # enzyme or by the Chemical Abstracts Service for Registry Numbers. # ROF - Retraction Of # Article being retracted. # RPF - Republished From # Original article. # SB - Journal Subset # Code for a specific set of journals. # SI - Secondary Source Identifier # Identifies a secondary source that supplies information, e.g., other # data sources, databanks and accession numbers of molecular sequences # discussed in articles. # TT - Transliterated / Vernacular Title # Non-Roman alphabet language titles are transliterated. # UIN - Update In # Update to the article. # UOF - Update Of # The article being updated. # URLF - URL Full-Text # Link to the full-text of article at provider's website. Links are # incomplete. Use PmLink for the complete set of available links. # [PmLink] http://www.ncbi.nlm.nih.gov/entrez/utils/pmlink_help.html # URLS - URL Summary # Link to the article summary at provider's website. Links are # incomplete. Use PmLink for the complete set of available links. # [PmLink] http://www.ncbi.nlm.nih.gov/entrez/utils/pmlink_help.html end # MEDLINE end # Bio bio-2.0.3/lib/bio/db/pdb/0000755000175000017500000000000014141516614014267 5ustar nileshnileshbio-2.0.3/lib/bio/db/pdb/model.rb0000644000175000017500000000707114141516614015721 0ustar nileshnilesh# # = bio/db/pdb/model.rb - model class for PDB # # Copyright:: Copyright (C) 2004, 2006 # Alex Gutteridge # Naohisa Goto # License:: The Ruby License # # # = Bio::PDB::Model # # Please refer Bio::PDB::Model. # module Bio require 'bio/db/pdb' unless const_defined?(:PDB) class PDB # Bio::PDB::Model is a class to store a model. # # The object would contain some chains (Bio::PDB::Chain objects). class Model include Utils include AtomFinder include ResidueFinder include ChainFinder include HetatmFinder include HeterogenFinder include Enumerable include Comparable # Creates a new Model object def initialize(serial = nil, structure = nil) @serial = serial @structure = structure @chains = [] @chains_hash = {} @solvents = Chain.new('', self) end # chains in this model attr_reader :chains # (OBSOLETE) solvents (water, HOH) in this model attr_reader :solvents # serial number of this model. (Integer or nil) attr_accessor :serial # for backward compatibility alias model_serial serial # (reserved for future extension) attr_reader :structure # Adds a chain to this model def addChain(chain) raise "Expecting a Bio::PDB::Chain" unless chain.is_a? Bio::PDB::Chain @chains.push(chain) if @chains_hash[chain.chain_id] then $stderr.puts "Warning: chain_id #{chain.chain_id.inspect} is already used" if $VERBOSE else @chains_hash[chain.chain_id] = chain end self end # rehash chains hash def rehash begin chains_bak = @chains chains_hash_bak = @chains_hash @chains = [] @chains_hash = {} chains_bak.each do |chain| self.addChain(chain) end rescue RuntimeError @chains = chains_bak @chains_hash = chains_hash_bak raise end self end # (OBSOLETE) Adds a solvent molecule to this model def addSolvent(solvent) raise "Expecting a Bio::PDB::Residue" unless solvent.is_a? Bio::PDB::Residue @solvents.addResidue(solvent) end # (OBSOLETE) not recommended to use this method def removeSolvent @solvents = nil end # Iterates over each chain def each(&x) #:yields: chain @chains.each(&x) end # Alias to override ChainFinder#each_chain alias each_chain each # Operator aimed to sort models based on serial number def <=>(other) return @serial <=> other.model_serial end # Keyed access to chains def [](key) #chain = @chains.find{ |chain| key == chain.id } @chains_hash[key] end # stringifies to chains def to_s string = "" if model_serial string = "MODEL #{model_serial}\n" #Should use proper formatting end @chains.each{ |chain| string << chain.to_s } #if solvent # string << @solvent.to_s #end if model_serial string << "ENDMDL\n" end return string end # returns a string containing human-readable representation # of this object. def inspect "#<#{self.class.to_s} serial=#{serial.inspect} chains.size=#{chains.size}>" end end #class Model end #class PDB end #module Bio bio-2.0.3/lib/bio/db/pdb/chemicalcomponent.rb0000644000175000017500000001521314141516614020306 0ustar nileshnilesh# # = bio/db/pdb/chemicalcomponent.rb - PDB Chemical Component Dictionary parser # # Copyright:: Copyright (C) 2006 # GOTO Naohisa # License:: The Ruby License # # # = About Bio::PDB::ChemicalComponent # # Please refer Bio::PDB::ChemicalComponent. # # = References # # * (()) # * http://deposit.pdb.org/het_dictionary.txt # module Bio require 'bio/db/pdb' unless const_defined?(:PDB) class PDB # Bio::PDB::ChemicalComponet is a parser for a entry of # the PDB Chemical Component Dictionary. # # The PDB Chemical Component Dictionary is available in # http://deposit.pdb.org/het_dictionary.txt class ChemicalComponent # delimiter for reading via Bio::FlatFile DELIMITER = RS = "\n\n" # Single field (normally single line) of a entry class Record < Bio::PDB::Record # fetches record name def fetch_record_name(str) str[0..6].strip end private :fetch_record_name # fetches record name def self.fetch_record_name(str) str[0..6].strip end private_class_method :fetch_record_name # RESIDUE field. # It would be wrong because the definition described in documents # seems ambiguous. RESIDUE = def_rec([ 11, 13, Pdb_LString[3], :hetID ], [ 16, 20, Pdb_Integer, :numHetAtoms ] ) # CONECT field # It would be wrong because the definition described in documents # seems ambiguous. CONECT = def_rec([ 12, 15, Pdb_Atom, :name ], [ 19, 20, Pdb_Integer, :num ], [ 21, 24, Pdb_Atom, :other_atoms ], [ 26, 29, Pdb_Atom, :other_atoms ], [ 31, 34, Pdb_Atom, :other_atoms ], [ 36, 39, Pdb_Atom, :other_atoms ], [ 41, 44, Pdb_Atom, :other_atoms ], [ 46, 49, Pdb_Atom, :other_atoms ], [ 51, 54, Pdb_Atom, :other_atoms ], [ 56, 59, Pdb_Atom, :other_atoms ], [ 61, 64, Pdb_Atom, :other_atoms ], [ 66, 69, Pdb_Atom, :other_atoms ], [ 71, 74, Pdb_Atom, :other_atoms ], [ 76, 79, Pdb_Atom, :other_atoms ] ) # HET field. # It is the same as Bio::PDB::Record::HET. HET = Bio::PDB::Record::HET #-- #HETSYN = Bio::PDB::Record::HETSYN #++ # HETSYN field. # It is very similar to Bio::PDB::Record::HETSYN. HETSYN = def_rec([ 9, 10, Pdb_Continuation, nil ], [ 12, 14, Pdb_LString(3), :hetID ], [ 16, 70, Pdb_String, :hetSynonyms ] ) # HETNAM field. # It is the same as Bio::PDB::Record::HETNAM. HETNAM = Bio::PDB::Record::HETNAM # FORMUL field. # It is the same as Bio::PDB::Record::FORMUL. FORMUL = Bio::PDB::Record::FORMUL # default definition for unknown fields. Default = Bio::PDB::Record::Default # Hash to store allowed definitions. Definition = create_definition_hash # END record class. # # Because END is a reserved word of Ruby, it is separately # added to the hash End = Bio::PDB::Record::End Definition['END'] = End # Look up the class in Definition hash def self.get_record_class(str) t = fetch_record_name(str) return Definition[t] end end #class Record # Creates a new object. def initialize(str) @data = str.split(/[\r\n]+/) @hash = {} #Flag to say whether the current line is part of a continuation cont = false #Goes through each line and replace that line with a PDB::Record @data.collect! do |line| #Go to next if the previous line was contiunation able, and #add_continuation returns true. Line is added by add_continuation next if cont and cont = cont.add_continuation(line) #Make the new record f = Record.get_record_class(line).new.initialize_from_string(line) #p f #Set cont cont = f if f.continue? #Set the hash to point to this record either by adding to an #array, or on it's own key = f.record_name if a = @hash[key] then a << f else @hash[key] = [ f ] end f end #each #At the end we need to add the final model @data.compact! end # all records in this entry as an array. attr_reader :data # all records in this entry as an hash accessed by record names. attr_reader :hash # Identifier written in the first line "RESIDUE" record. (e.g. CMP) def entry_id @data[0].hetID end # Synonyms for the comical component. Returns an array of strings. def hetsyn unless defined? @hetsyn if r = @hash["HETSYN"] @hetsyn = r[0].hetSynonyms.to_s.split(/\;\s*/) else return [] end end @hetsyn end # The name of the chemical component. # Returns a string (or nil, if the entry is something wrong). def hetnam @hash["HETNAM"][0].text end # The chemical formula of the chemical component. # Returns a string (or nil, if the entry is something wrong). def formul @hash["FORMUL"][0].text end # Returns an hash of bindings of atoms. # Note that each white spaces are stripped for atom symbols. def conect unless defined? @conect c = {} @hash["CONECT"].each do |e| key = e.name.to_s.strip unless key.empty? val = e.other_atoms.collect { |x| x.strip } #warn "Warning: #{key}: atom name conflict?" if c[key] c[key] = val end end @conect = c end @conect end # Gets all records whose record type is _name_. # Returns an array of Bio::PDB::Record::* objects. # # if _name_ is nil, returns hash storing all record data. # # Example: # p pdb.record('CONECT') # p pdb.record['CONECT'] # def record(name = nil) name ? @hash[name] : @hash end end #class ChemicalComponent end #class PDB end #module Bio bio-2.0.3/lib/bio/db/pdb/utils.rb0000644000175000017500000002416114141516614015760 0ustar nileshnilesh# # = bio/db/pdb/utils.rb - Utility modules for PDB # # Copyright:: Copyright (C) 2004, 2006 # Alex Gutteridge # Naohisa Goto # License:: The Ruby License # # # = Bio::PDB::Utils # # Bio::PDB::Utils # # = Bio::PDB::ModelFinder # # Bio::PDB::ModelFinder # # = Bio::PDB::ChainFinder # # Bio::PDB::ChainFinder # # = Bio::PDB::ResidueFinder # # Bio::PDB::ResidueFinder # # = Bio::PDB::AtomFinder # # Bio::PDB::AtomFinder # # = Bio::PDB::HeterogenFinder # # Bio::PDB::HeterogenFinder # # = Bio::PDB::HetatmFinder # # Bio::PDB::HetatmFinder # require 'matrix' module Bio require 'bio/db/pdb' unless const_defined?(:PDB) class PDB # Utility methods for PDB data. # The methods in this mixin should be applicalbe to all PDB objects. # # Bio::PDB::Utils is included by Bio::PDB, Bio::PDB::Model, # Bio::PDB::Chain, Bio::PDB::Residue, and Bio::PDB::Heterogen classes. module Utils # Returns the coordinates of the geometric centre (average co-ord) # of any AtomFinder (or .atoms) implementing object # # If you want to get the geometric centre of hetatms, # call geometricCentre(:each_hetatm). def geometricCentre(method = :each_atom) x = y = z = count = 0 self.__send__(method) do |atom| x += atom.x y += atom.y z += atom.z count += 1 end x = (x / count) y = (y / count) z = (z / count) Coordinate[x,y,z] end #Returns the coords of the centre of gravity for any #AtomFinder implementing object #Blleurgh! - working out what element it is from the atom name is #tricky - this'll work in most cases but not metals etc... #a proper element field is included in some PDB files but not all. ElementMass = { 'H' => 1, 'C' => 12, 'N' => 14, 'O' => 16, 'S' => 32, 'P' => 31 } # calculates centre of gravitiy def centreOfGravity() x = y = z = total = 0 self.each_atom{ |atom| element = atom.element[0,1] mass = ElementMass[element] total += mass x += atom.x * mass y += atom.y * mass z += atom.z * mass } x = x / total y = y / total z = z / total Coordinate[x,y,z] end #-- #Perhaps distance and dihedral would be better off as class methods? #(rather) than instance methods #++ # Calculates distance between _coord1_ and _coord2_. def distance(coord1, coord2) coord1 = convert_to_xyz(coord1) coord2 = convert_to_xyz(coord2) (coord1 - coord2).r end module_function :distance # Calculates dihedral angle. def dihedral_angle(coord1, coord2, coord3, coord4) (a1,b1,c1,d) = calculatePlane(coord1,coord2,coord3) (a2,b2,c2) = calculatePlane(coord2,coord3,coord4) torsion = acos((a1*a2 + b1*b2 + c1*c2)/(Math.sqrt(a1**2 + b1**2 + c1**2) * Math.sqrt(a2**2 + b2**2 + c2**2))) if ((a1*coord4.x + b1*coord4.y + c1*coord4.z + d) < 0) -torsion else torsion end end module_function :dihedral_angle # Implicit conversion into Vector or Bio::PDB::Coordinate def convert_to_xyz(obj) unless obj.is_a?(Vector) begin obj = obj.xyz rescue NameError obj = Vector.elements(obj.to_a) end end obj end module_function :convert_to_xyz # (Deprecated) alias of convert_to_xyz(obj) def self.to_xyz(obj) convert_to_xyz(obj) end #-- #Methods required for the dihedral angle calculations #perhaps these should go in some separate Math module #++ # radian to degree def rad2deg(r) (r/Math::PI)*180 end module_function :rad2deg # acos def acos(x) Math.atan2(Math.sqrt(1 - x**2),x) end module_function :acos # calculates plane def calculatePlane(coord1, coord2, coord3) a = coord1.y * (coord2.z - coord3.z) + coord2.y * (coord3.z - coord1.z) + coord3.y * (coord1.z - coord2.z) b = coord1.z * (coord2.x - coord3.x) + coord2.z * (coord3.x - coord1.x) + coord3.z * (coord1.x - coord2.x) c = coord1.x * (coord2.y - coord3.y) + coord2.x * (coord3.y - coord1.y) + coord3.x * (coord1.y - coord2.y) d = -1 * ( (coord1.x * (coord2.y * coord3.z - coord3.y * coord2.z)) + (coord2.x * (coord3.y * coord1.z - coord1.y * coord3.z)) + (coord3.x * (coord1.y * coord2.z - coord2.y * coord1.z)) ) return [a,b,c,d] end module_function :calculatePlane # Every class in the heirarchy implements finder, this takes # a class which determines which type of object to find, the associated # block is then run in classic .find style. # # The method might be deprecated. # You'd better using find_XXX directly. def finder(findtype, &block) #:yields: obj if findtype == Bio::PDB::Atom return self.find_atom(&block) elsif findtype == Bio::PDB::Residue return self.find_residue(&block) elsif findtype == Bio::PDB::Chain return self.find_chain(&block) elsif findtype == Bio::PDB::Model return self.find_model(&block) else raise TypeError, "You can't find a #{findtype}" end end end #module Utils #-- #The *Finder modules implement a find_* method which returns #an array of anything for which the block evals true #(suppose Enumerable#find_all method). #The each_* style methods act as classic iterators. #++ # methods to access models # # XXX#each_model must be defined. # # Bio::PDB::ModelFinder is included by Bio::PDB::PDB. # module ModelFinder # returns an array containing all chains for which given block # is not +false+ (similar to Enumerable#find_all). def find_model array = [] self.each_model do |model| array.push(model) if yield(model) end return array end end #module ModelFinder #-- #The heirarchical nature of the objects allow us to re-use the #methods from the previous level - e.g. A PDB object can use the .models #method defined in ModuleFinder to iterate through the models to find the #chains #++ # methods to access chains # # XXX#each_model must be defined. # # Bio::PDB::ChainFinder is included by Bio::PDB::PDB and Bio::PDB::Model. # module ChainFinder # returns an array containing all chains for which given block # is not +false+ (similar to Enumerable#find_all). def find_chain array = [] self.each_chain do |chain| array.push(chain) if yield(chain) end return array end # iterates over each chain def each_chain(&x) #:yields: chain self.each_model { |model| model.each(&x) } end # returns all chains def chains array = [] self.each_model { |model| array.concat(model.chains) } return array end end #module ChainFinder # methods to access residues # # XXX#each_chain must be defined. # # Bio::PDB::ResidueFinder is included by Bio::PDB::PDB, Bio::PDB::Model, # and Bio::PDB::Chain. # module ResidueFinder # returns an array containing all residues for which given block # is not +false+ (similar to Enumerable#find_all). def find_residue array = [] self.each_residue do |residue| array.push(residue) if yield(residue) end return array end # iterates over each residue def each_residue(&x) #:yields: residue self.each_chain { |chain| chain.each(&x) } end # returns all residues def residues array = [] self.each_chain { |chain| array.concat(chain.residues) } return array end end #module ResidueFinder # methods to access atoms # # XXX#each_residue must be defined. module AtomFinder # returns an array containing all atoms for which given block # is not +false+ (similar to Enumerable#find_all). def find_atom array = [] self.each_atom do |atom| array.push(atom) if yield(atom) end return array end # iterates over each atom def each_atom(&x) #:yields: atom self.each_residue { |residue| residue.each(&x) } end # returns all atoms def atoms array = [] self.each_residue { |residue| array.concat(residue.atoms) } return array end end #module AtomFinder # methods to access HETATMs # # XXX#each_heterogen must be defined. # # Bio::PDB::HetatmFinder is included by Bio::PDB::PDB, Bio::PDB::Model, # Bio::PDB::Chain, and Bio::PDB::Heterogen. # module HetatmFinder # returns an array containing all HETATMs for which given block # is not +false+ (similar to Enumerable#find_all). def find_hetatm array = [] self.each_hetatm do |hetatm| array.push(hetatm) if yield(hetatm) end return array end # iterates over each HETATM def each_hetatm(&x) #:yields: hetatm self.each_heterogen { |heterogen| heterogen.each(&x) } end # returns all HETATMs def hetatms array = [] self.each_heterogen { |heterogen| array.concat(heterogen.hetatms) } return array end end #module HetatmFinder # methods to access heterogens (compounds or ligands) # # XXX#each_chain must be defined. # # Bio::PDB::HeterogenFinder is included by Bio::PDB::PDB, Bio::PDB::Model, # and Bio::PDB::Chain. # module HeterogenFinder # returns an array containing all heterogens for which given block # is not +false+ (similar to Enumerable#find_all). def find_heterogen array = [] self.each_heterogen do |heterogen| array.push(heterogen) if yield(heterogen) end return array end # iterates over each heterogens def each_heterogen(&x) #:yields: heterogen self.each_chain { |chain| chain.each_heterogen(&x) } end # returns all heterogens def heterogens array = [] self.each_chain { |chain| array.concat(chain.heterogens) } return array end end #module HeterogenFinder end #class PDB end #module Bio bio-2.0.3/lib/bio/db/pdb/chain.rb0000644000175000017500000001306714141516614015705 0ustar nileshnilesh# # = bio/db/pdb/chain.rb - chain class for PDB # # Copyright:: Copyright (C) 2004, 2006 # Alex Gutteridge # Naohisa Goto # License:: The Ruby License # # # = Bio::PDB::Chain # # Please refer Bio::PDB::Chain. # module Bio require 'bio/db/pdb' unless const_defined?(:PDB) class PDB # Bio::PDB::Chain is a class to store a chain. # # The object would contain some residues (Bio::PDB::Residue objects) # and some heterogens (Bio::PDB::Heterogen objects). # class Chain include Utils include AtomFinder include ResidueFinder include HetatmFinder include HeterogenFinder include Enumerable include Comparable # Creates a new chain object. def initialize(id = nil, model = nil) @chain_id = id @model = model @residues = [] @residues_hash = {} @heterogens = [] @heterogens_hash = {} end # Identifier of this chain attr_accessor :chain_id # alias alias id chain_id # the model to which this chain belongs. attr_reader :model # residues in this chain attr_reader :residues # heterogens in this chain attr_reader :heterogens # get the residue by id def get_residue_by_id(key) #@residues.find { |r| r.residue_id == key } @residues_hash[key] end # get the residue by id. # # Compatibility Note: Now, you cannot find HETATMS in this method. # To add "LIGAND" to the id is no longer available. # To get heterogens, you must use get_heterogen_by_id. def [](key) get_residue_by_id(key) end # get the heterogen (ligand) by id def get_heterogen_by_id(key) #@heterogens.find { |r| r.residue_id == key } @heterogens_hash[key] end #Add a residue to this chain def addResidue(residue) raise "Expecting a Bio::PDB::Residue" unless residue.is_a? Bio::PDB::Residue @residues.push(residue) if @residues_hash[residue.residue_id] then $stderr.puts "Warning: residue_id #{residue.residue_id.inspect} is already used" if $VERBOSE else @residues_hash[residue.residue_id] = residue end self end #Add a heterogen (ligand) to this chain def addLigand(ligand) raise "Expecting a Bio::PDB::Residue" unless ligand.is_a? Bio::PDB::Residue @heterogens.push(ligand) if @heterogens_hash[ligand.residue_id] then $stderr.puts "Warning: heterogen_id (residue_id) #{ligand.residue_id.inspect} is already used" if $VERBOSE else @heterogens_hash[ligand.residue_id] = ligand end self end # rehash residues hash def rehash_residues begin residues_bak = @residues residues_hash_bak = @residues_hash @residues = [] @residues_hash = {} residues_bak.each do |residue| self.addResidue(residue) end rescue RuntimeError @residues = residues_bak @residues_hash = residues_hash_bak raise end self end # rehash heterogens hash def rehash_heterogens begin heterogens_bak = @heterogens heterogens_hash_bak = @heterogens_hash @heterogens = [] @heterogens_hash = {} heterogens_bak.each do |heterogen| self.addLigand(heterogen) end rescue RuntimeError @heterogens = heterogens_bak @heterogens_hash = heterogens_hash_bak raise end self end # rehash residues hash and heterogens hash def rehash rehash_residues rehash_heterogens end # Iterates over each residue def each(&x) #:yields: residue @residues.each(&x) end #Alias to override ResidueFinder#each_residue alias each_residue each # Iterates over each hetero-compound def each_heterogen(&x) #:yields: heterogen @heterogens.each(&x) end # Operator aimed to sort based on chain id def <=>(other) return @chain_id <=> other.chain_id end # Stringifies each residue def to_s @residues.join('') + "TER\n" + @heterogens.join('') end # returns a string containing human-readable representation # of this object. def inspect "#<#{self.class.to_s} id=#{chain_id.inspect} model.serial=#{(model ? model.serial : nil).inspect} residues.size=#{residues.size} heterogens.size=#{heterogens.size} aaseq=#{aaseq.inspect}>" end # gets an amino acid sequence of this chain from ATOM records def aaseq unless defined? @aaseq string = "" last_residue_num = nil @residues.each do |residue| if last_residue_num and (x = (residue.resSeq.to_i - last_residue_num).abs) > 1 then x.times { string << 'X' } end tlc = residue.resName.capitalize olc = (begin Bio::AminoAcid.three2one(tlc) rescue ArgumentError nil end || 'X') string << olc end @aaseq = Bio::Sequence::AA.new(string) end @aaseq end # for backward compatibility alias atom_seq aaseq end #class Chain end #class PDB end #module Bio bio-2.0.3/lib/bio/db/pdb/atom.rb0000644000175000017500000000327214141516614015560 0ustar nileshnilesh# # = bio/db/pdb/atom.rb - Coordinate class for PDB # # Copyright:: Copyright (C) 2004, 2006 # Alex Gutteridge # Naohisa Goto # License:: The Ruby License # # # = Bio::PDB::Coordinate # # Coordinate class for PDB. # # = Compatibility Note # # From bioruby 0.7.0, the Bio::PDB::Atom class is no longer available. # Please use Bio::PDB::Record::ATOM and Bio::PDB::Record::HETATM instead. # require 'matrix' module Bio require 'bio/db/pdb' unless const_defined?(:PDB) class PDB # Bio::PDB::Coordinate is a class to store a 3D coordinate. # It inherits Vector (in bundled library in Ruby). # class Coordinate < Vector # same as Vector.[x,y,z] def self.[](x,y,z) super end # same as Vector.elements def self.elements(array, *a) raise 'Size of given array must be 3' if array.size != 3 super end # x def x; self[0]; end # y def y; self[1]; end # z def z; self[2]; end # x=(n) def x=(n); self[0]=n; end # y=(n) def y=(n); self[1]=n; end # z=(n) def z=(n); self[2]=n; end # Implicit conversion to an array. # # Note that this method would be deprecated in the future. # #-- # Definition of 'to_ary' means objects of the class is # implicitly regarded as an array. #++ def to_ary; self.to_a; end # returns self. def xyz; self; end # distance between object2. def distance(object2) Utils::convert_to_xyz(object2) (self - object2).r end end #class Coordinate end #class PDB end #class Bio bio-2.0.3/lib/bio/db/pdb/residue.rb0000644000175000017500000001020214141516614016247 0ustar nileshnilesh# # = bio/db/pdb/residue.rb - residue class for PDB # # Copyright:: Copyright (C) 2004, 2006 # Alex Gutteridge # Naohisa Goto # License:: The Ruby License # # # = Bio::PDB::Residue # # = Bio::PDB::Heterogen # module Bio require 'bio/db/pdb' unless const_defined?(:PDB) class PDB # Bio::PDB::Residue is a class to store a residue. # The object would contain some atoms (Bio::PDB::Record::ATOM objects). # class Residue include Utils include AtomFinder include Enumerable include Comparable # Creates residue id from an ATOM (or HETATM) object. def self.get_residue_id_from_atom(atom) "#{atom.resSeq}#{atom.iCode.strip}".strip end # Creates a new Residue object. def initialize(resName = nil, resSeq = nil, iCode = nil, chain = nil) @resName = resName @resSeq = resSeq @iCode = iCode @chain = chain @atoms = [] update_residue_id end # atoms in this residue. (Array) attr_reader :atoms # the chain to which this residue belongs attr_accessor :chain # resName (residue name) attr_accessor :resName # residue id (String or nil). # The id is a composite of resSeq and iCode. attr_reader :residue_id # Now, Residue#id is an alias of residue_id. alias id residue_id #Keyed access to atoms based on atom name e.g. ["CA"] def [](key) @atoms.find{ |atom| key == atom.name } end # Updates residue id. This is a private method. # Need to call this method to make sure id is correctly updated. def update_residue_id if !@resSeq and !@iCode @residue_id = nil else @residue_id = "#{@resSeq}#{@iCode.to_s.strip}".strip end end private :update_residue_id # resSeq attr_reader :resSeq # resSeq=() def resSeq=(resSeq) @resSeq = resSeq.to_i update_residue_id @resSeq end # iCode attr_reader :iCode # iCode=() def iCode=(iCode) @iCode = iCode update_residue_id @iCode end # Adds an atom to this residue def addAtom(atom) raise "Expecting ATOM or HETATM" unless atom.is_a? Bio::PDB::Record::ATOM @atoms.push(atom) self end # Iterator over the atoms def each @atoms.each{ |atom| yield atom } end # Alias to override AtomFinder#each_atom alias each_atom each # Sorts based on resSeq and iCode if need be def <=>(other) if @resSeq != other.resSeq return @resSeq <=> other.resSeq else return @iCode <=> other.iCode end end # Stringifies each atom def to_s @atoms.join('') end # returns a string containing human-readable representation # of this object. def inspect "#<#{self.class.to_s} resName=#{resName.inspect} id=#{residue_id.inspect} chain.id=#{(chain ? chain.id : nil).inspect} resSeq=#{resSeq.inspect} iCode=#{iCode.inspect} atoms.size=#{atoms.size}>" end # Always returns false. # # If the residue is HETATM, returns true. # Otherwise, returns false. def hetatm false end end #class Residue # Bio::PDB::Heterogen is a class to store a heterogen. # It inherits Bio::PDB::Residue and most of the methods are the same. # # The object would contain some HETATMs # (Bio::PDB::Record::HETATM objects). class Heterogen < Residue include HetatmFinder # Always returns true. # # If the residue is HETATM, returns true. # Otherwise, returns false. def hetatm true end # Alias to override HetatmFinder#each_hetatm alias each_hetatm each # Alias needed for HeterogenFinder. alias hetatms atoms # Alias to avoid confusion alias heterogen_id residue_id end #class Heterogen end #class PDB end #module Bio bio-2.0.3/lib/bio/db/pdb/pdb.rb0000644000175000017500000017513514141516614015375 0ustar nileshnilesh# # = bio/db/pdb/pdb.rb - PDB database class for PDB file format # # Copyright:: Copyright (C) 2003-2006 # GOTO Naohisa # Alex Gutteridge # License:: The Ruby License # # $Id:$ # # = About Bio::PDB # # Please refer document of Bio::PDB class. # # = References # # * (()) # * PDB File Format Contents Guide Version 2.2 (20 December 1996) # (()) # # = *** CAUTION *** # This is beta version. Specs shall be changed frequently. # require 'bio/data/aa' module Bio require 'bio/db/pdb' unless const_defined?(:PDB) # This is the main PDB class which takes care of parsing, annotations # and is the entry way to the co-ordinate data held in models. # # There are many related classes. # # Bio::PDB::Model # Bio::PDB::Chain # Bio::PDB::Residue # Bio::PDB::Heterogen # Bio::PDB::Record::ATOM # Bio::PDB::Record::HETATM # Bio::PDB::Record::* # Bio::PDB::Coordinate # class PDB include Utils include AtomFinder include ResidueFinder include ChainFinder include ModelFinder include HetatmFinder include HeterogenFinder include Enumerable # delimiter for reading via Bio::FlatFile DELIMITER = RS = nil # 1 file 1 entry # Modules required by the field definitions module DataType Pdb_Continuation = nil module Pdb_Integer def self.new(str) str.to_i end end module Pdb_SList def self.new(str) str.to_s.strip.split(/\;\s*/) end end module Pdb_List def self.new(str) str.to_s.strip.split(/\,\s*/) end end module Pdb_Specification_list def self.new(str) a = str.to_s.strip.split(/\;\s*/) a.collect! { |x| x.split(/\:\s*/, 2) } a end end module Pdb_String def self.new(str) str.to_s.gsub(/\s+\z/, '') end #Creates a new module with a string left justified to the #length given in nn def self.[](nn) m = Module.new m.module_eval %Q{ @@nn = nn def self.new(str) str.to_s.gsub(/\s+\z/, '').ljust(@@nn)[0, @@nn] end } m end end #module Pdb_String module Pdb_LString def self.[](nn) m = Module.new m.module_eval %Q{ @@nn = nn def self.new(str) str.to_s.ljust(@@nn)[0, @@nn] end } m end def self.new(str) String.new(str.to_s) end end module Pdb_Real def self.[](fmt) m = Module.new m.module_eval %Q{ @@format = fmt def self.new(str) str.to_f end } m end def self.new(str) str.to_f end end module Pdb_StringRJ def self.new(str) str.to_s.gsub(/\A\s+/, '') end end Pdb_Date = Pdb_String Pdb_IDcode = Pdb_String Pdb_Residue_name = Pdb_String Pdb_SymOP = Pdb_String Pdb_Atom = Pdb_String Pdb_AChar = Pdb_String Pdb_Character = Pdb_LString module ConstLikeMethod def Pdb_LString(nn) Pdb_LString[nn] end def Pdb_String(nn) Pdb_String[nn] end def Pdb_Real(fmt) Pdb_Real[fmt] end end #module ConstLikeMethod end #module DataType # The ancestor of every single PDB record class. # It inherits Struct class. # Basically, each line of a PDB file corresponds to # an instance of each corresponding child class. # If continuation exists, multiple lines may correspond to # single instance. # class Record < Struct include DataType extend DataType::ConstLikeMethod # Internal use only. # # parse filed definitions. def self.parse_field_definitions(ary) symbolhash = {} symbolary = [] cont = false # For each field definition (range(start, end), type,symbol) ary.each do |x| range = (x[0] - 1)..(x[1] - 1) # If type is nil (Pdb_Continuation) then set 'cont' to the range # (other wise it is false to indicate no continuation unless x[2] then cont = range else klass = x[2] sym = x[3] # If the symbol is a proper symbol then... if sym.is_a?(Symbol) then # ..if we have the symbol already in the symbol hash # then add the range onto the range array if symbolhash.has_key?(sym) then symbolhash[sym][1] << range else # Other wise put a new symbol in with its type and range # range is given its own array. You can have # anumber of ranges. symbolhash[sym] = [ klass, [ range ] ] symbolary << sym end end end end #each [ symbolhash, symbolary, cont ] end private_class_method :parse_field_definitions # Creates new class by given field definition # The difference from new_direct() is the class # created by the method does lazy evaluation. # # Internal use only. def self.def_rec(*ary) symbolhash, symbolary, cont = parse_field_definitions(ary) klass = Class.new(self.new(*symbolary)) klass.module_eval { @definition = ary @symbols = symbolhash @cont = cont } klass.module_eval { symbolary.each do |x| define_method(x) { do_parse; super() } end } klass end #def self.def_rec # creates new class which inherits given class. def self.new_inherit(klass) newklass = Class.new(klass) newklass.module_eval { @definition = klass.module_eval { @definition } @symbols = klass.module_eval { @symbols } @cont = klass.module_eval { @cont } } newklass end # Creates new class by given field definition. # # Internal use only. def self.new_direct(*ary) symbolhash, symbolary, cont = parse_field_definitions(ary) if cont raise 'continuation not allowed. please use def_rec instead' end klass = Class.new(self.new(*symbolary)) klass.module_eval { @definition = ary @symbols = symbolhash @cont = cont } klass.module_eval { define_method(:initialize_from_string) { |str| r = super(str) do_parse r } } klass end #def self.new_direct # symbols def self.symbols #p self @symbols end # Returns true if this record has a field type which allows # continuations. def self.continue? @cont end # Returns true if this record has a field type which allows # continuations. def continue? self.class.continue? end # yields the symbol(k), type(x[0]) and array of ranges # of each symbol. def each_symbol self.class.symbols.each do |k, x| yield k, x[0], x[1] end end # Return original string (except that "\n" are truncated) # for this record (usually just @str, but # sometimes add on the continuation data from other lines. # Returns an array of string. # def original_data if defined?(@cont_data) then [ @str, *@cont_data ] else [ @str ] end end # initialize this record from the given string. # str must be a line (in PDB format). # # You can add continuation lines later using # add_continuation method. def initialize_from_string(str) @str = str @record_name = fetch_record_name(str) @parsed = false self end #-- # Called when we need to access the data, takes the string # and the array of FieldDefs and parses it out. #++ # In order to speeding up processing of PDB file format, # fields have not been parsed before calling this method. # # Normally, it is automatically called and you don't explicitly # need to call it . # def do_parse return self if @parsed or !@str str0 = @str each_symbol do |key, klass, ranges| #If we only have one range then pull that out #and store it in the hash if ranges.size <= 1 then self[key] = klass.new(str0[ranges.first]) else #Go through each range and add the string to an array #set the hash key to point to that array ary = [] ranges.each do |r| ary << klass.new(str0[r]) unless str0[r].to_s.strip.empty? end self[key] = ary end end #each_symbol #If we have continuations then for each line of extra data... if defined?(@cont_data) then @cont_data.each do |str| #Get the symbol, type and range array each_symbol do |key, klass, ranges| #If there's one range then grab that range if ranges.size <= 1 then r1 = ranges.first unless str[r1].to_s.strip.empty? #and concatenate the new data onto the old v = klass.new(str[r1]) self[key].concat(v) if self[key] != v end else #If there's more than one range then add to the array ary = self[key] ranges.each do |r| ary << klass.new(str[r]) unless str[r].to_s.strip.empty? end end end end end @parsed = true self end # fetches record name def fetch_record_name(str) str[0..5].strip end private :fetch_record_name # fetches record name def self.fetch_record_name(str) str[0..5].strip end private_class_method :fetch_record_name # If given str can be the continuation of the current record, # then return the order number of the continuation associated with # the Pdb_Continuation field definition. # Otherwise, returns -1. def fetch_cont(str) (c = continue?) ? str[c].to_i : -1 end private :fetch_cont # Record name of this record, e.g. "HEADER", "ATOM". def record_name @record_name or self.class.to_s.split(/\:\:/)[-1].to_s.upcase end # keeping compatibility with old version alias record_type record_name # Internal use only. # # Adds continuation data to the record from str if str is # really the continuation of current record. # Returns self (= not nil) if str is the continuation. # Otherwaise, returns false. # def add_continuation(str) #Check that this record can continue #and that str has the same type and definition return false unless self.continue? return false unless fetch_record_name(str) == @record_name return false unless self.class.get_record_class(str) == self.class return false unless fetch_cont(str) >= 2 #If all this is OK then add onto @cont_data unless defined?(@cont_data) @cont_data = [] end @cont_data << str # Returns self (= not nil) if succeeded. self end # creates definition hash from current classes constants def self.create_definition_hash hash = {} constants.each do |x| x = x.intern # keep compatibility both Ruby 1.8 and 1.9 hash[x] = const_get(x) if /\A[A-Z][A-Z0-9]+\z/ =~ x.to_s end if x = const_get(:Default) then hash.default = x end hash end # same as Struct#inspect. # # Note that do_parse is automatically called # before inspect. # # (Warning: The do_parse might sweep hidden bugs in PDB classes.) def inspect do_parse super end #-- # # definitions # contains all the rules for parsing each field # based on format V 2.2, 16-DEC-1996 # # http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html # http://www.rcsb.org/pdb/docs/format/pdbguide2.2/Contents_Guide_21.html # # Details of following data are taken from these documents. # [ 1..6, :Record_name, nil ], # XXXXXX = # new([ start, end, type of data, symbol to access ], ...) # #++ # HEADER record class HEADER = def_rec([ 11, 50, Pdb_String, :classification ], #Pdb_String(40) [ 51, 59, Pdb_Date, :depDate ], [ 63, 66, Pdb_IDcode, :idCode ] ) # OBSLTE record class OBSLTE = def_rec([ 9, 10, Pdb_Continuation, nil ], [ 12, 20, Pdb_Date, :repDate ], [ 22, 25, Pdb_IDcode, :idCode ], [ 32, 35, Pdb_IDcode, :rIdCode ], [ 37, 40, Pdb_IDcode, :rIdCode ], [ 42, 45, Pdb_IDcode, :rIdCode ], [ 47, 50, Pdb_IDcode, :rIdCode ], [ 52, 55, Pdb_IDcode, :rIdCode ], [ 57, 60, Pdb_IDcode, :rIdCode ], [ 62, 65, Pdb_IDcode, :rIdCode ], [ 67, 70, Pdb_IDcode, :rIdCode ] ) # TITLE record class TITLE = def_rec([ 9, 10, Pdb_Continuation, nil ], [ 11, 70, Pdb_String, :title ] ) # CAVEAT record class CAVEAT = def_rec([ 9, 10, Pdb_Continuation, nil ], [ 12, 15, Pdb_IDcode, :idcode ], [ 20, 70, Pdb_String, :comment ] ) # COMPND record class COMPND = def_rec([ 9, 10, Pdb_Continuation, nil ], [ 11, 70, Pdb_Specification_list, :compound ] ) # SOURCE record class SOURCE = def_rec([ 9, 10, Pdb_Continuation, nil ], [ 11, 70, Pdb_Specification_list, :srcName ] ) # KEYWDS record class KEYWDS = def_rec([ 9, 10, Pdb_Continuation, nil ], [ 11, 70, Pdb_List, :keywds ] ) # EXPDTA record class EXPDTA = def_rec([ 9, 10, Pdb_Continuation, nil ], [ 11, 70, Pdb_SList, :technique ] ) # AUTHOR record class AUTHOR = def_rec([ 9, 10, Pdb_Continuation, nil ], [ 11, 70, Pdb_List, :authorList ] ) # REVDAT record class REVDAT = def_rec([ 8, 10, Pdb_Integer, :modNum ], [ 11, 12, Pdb_Continuation, nil ], [ 14, 22, Pdb_Date, :modDate ], [ 24, 28, Pdb_String, :modId ], # Pdb_String(5) [ 32, 32, Pdb_Integer, :modType ], [ 40, 45, Pdb_LString(6), :record ], [ 47, 52, Pdb_LString(6), :record ], [ 54, 59, Pdb_LString(6), :record ], [ 61, 66, Pdb_LString(6), :record ] ) # SPRSDE record class SPRSDE = def_rec([ 9, 10, Pdb_Continuation, nil ], [ 12, 20, Pdb_Date, :sprsdeDate ], [ 22, 25, Pdb_IDcode, :idCode ], [ 32, 35, Pdb_IDcode, :sIdCode ], [ 37, 40, Pdb_IDcode, :sIdCode ], [ 42, 45, Pdb_IDcode, :sIdCode ], [ 47, 50, Pdb_IDcode, :sIdCode ], [ 52, 55, Pdb_IDcode, :sIdCode ], [ 57, 60, Pdb_IDcode, :sIdCode ], [ 62, 65, Pdb_IDcode, :sIdCode ], [ 67, 70, Pdb_IDcode, :sIdCode ] ) # 'JRNL' is defined below JRNL = nil # 'REMARK' is defined below REMARK = nil # DBREF record class DBREF = def_rec([ 8, 11, Pdb_IDcode, :idCode ], [ 13, 13, Pdb_Character, :chainID ], [ 15, 18, Pdb_Integer, :seqBegin ], [ 19, 19, Pdb_AChar, :insertBegin ], [ 21, 24, Pdb_Integer, :seqEnd ], [ 25, 25, Pdb_AChar, :insertEnd ], [ 27, 32, Pdb_String, :database ], #Pdb_LString [ 34, 41, Pdb_String, :dbAccession ], #Pdb_LString [ 43, 54, Pdb_String, :dbIdCode ], #Pdb_LString [ 56, 60, Pdb_Integer, :dbseqBegin ], [ 61, 61, Pdb_AChar, :idbnsBeg ], [ 63, 67, Pdb_Integer, :dbseqEnd ], [ 68, 68, Pdb_AChar, :dbinsEnd ] ) # SEQADV record class SEQADV = def_rec([ 8, 11, Pdb_IDcode, :idCode ], [ 13, 15, Pdb_Residue_name, :resName ], [ 17, 17, Pdb_Character, :chainID ], [ 19, 22, Pdb_Integer, :seqNum ], [ 23, 23, Pdb_AChar, :iCode ], [ 25, 28, Pdb_String, :database ], #Pdb_LString [ 30, 38, Pdb_String, :dbIdCode ], #Pdb_LString [ 40, 42, Pdb_Residue_name, :dbRes ], [ 44, 48, Pdb_Integer, :dbSeq ], [ 50, 70, Pdb_LString, :conflict ] ) # SEQRES record class SEQRES = def_rec(#[ 8, 10, Pdb_Integer, :serNum ], [ 8, 10, Pdb_Continuation, nil ], # PDB v3.2 (2008) [ 12, 12, Pdb_Character, :chainID ], [ 14, 17, Pdb_Integer, :numRes ], [ 20, 22, Pdb_Residue_name, :resName ], [ 24, 26, Pdb_Residue_name, :resName ], [ 28, 30, Pdb_Residue_name, :resName ], [ 32, 34, Pdb_Residue_name, :resName ], [ 36, 38, Pdb_Residue_name, :resName ], [ 40, 42, Pdb_Residue_name, :resName ], [ 44, 46, Pdb_Residue_name, :resName ], [ 48, 50, Pdb_Residue_name, :resName ], [ 52, 54, Pdb_Residue_name, :resName ], [ 56, 58, Pdb_Residue_name, :resName ], [ 60, 62, Pdb_Residue_name, :resName ], [ 64, 66, Pdb_Residue_name, :resName ], [ 68, 70, Pdb_Residue_name, :resName ] ) # MODRS record class MODRES = def_rec([ 8, 11, Pdb_IDcode, :idCode ], [ 13, 15, Pdb_Residue_name, :resName ], [ 17, 17, Pdb_Character, :chainID ], [ 19, 22, Pdb_Integer, :seqNum ], [ 23, 23, Pdb_AChar, :iCode ], [ 25, 27, Pdb_Residue_name, :stdRes ], [ 30, 70, Pdb_String, :comment ] ) # HET record class HET = def_rec([ 8, 10, Pdb_LString(3), :hetID ], [ 13, 13, Pdb_Character, :ChainID ], [ 14, 17, Pdb_Integer, :seqNum ], [ 18, 18, Pdb_AChar, :iCode ], [ 21, 25, Pdb_Integer, :numHetAtoms ], [ 31, 70, Pdb_String, :text ] ) # HETNAM record class HETNAM = def_rec([ 9, 10, Pdb_Continuation, nil ], [ 12, 14, Pdb_LString(3), :hetID ], [ 16, 70, Pdb_String, :text ] ) # HETSYN record class HETSYN = def_rec([ 9, 10, Pdb_Continuation, nil ], [ 12, 14, Pdb_LString(3), :hetID ], [ 16, 70, Pdb_SList, :hetSynonyms ] ) # FORMUL record class FORMUL = def_rec([ 9, 10, Pdb_Integer, :compNum ], [ 13, 15, Pdb_LString(3), :hetID ], [ 17, 18, Pdb_Integer, :continuation ], [ 19, 19, Pdb_Character, :asterisk ], [ 20, 70, Pdb_String, :text ] ) # HELIX record class HELIX = def_rec([ 8, 10, Pdb_Integer, :serNum ], #[ 12, 14, Pdb_LString(3), :helixID ], [ 12, 14, Pdb_StringRJ, :helixID ], [ 16, 18, Pdb_Residue_name, :initResName ], [ 20, 20, Pdb_Character, :initChainID ], [ 22, 25, Pdb_Integer, :initSeqNum ], [ 26, 26, Pdb_AChar, :initICode ], [ 28, 30, Pdb_Residue_name, :endResName ], [ 32, 32, Pdb_Character, :endChainID ], [ 34, 37, Pdb_Integer, :endSeqNum ], [ 38, 38, Pdb_AChar, :endICode ], [ 39, 40, Pdb_Integer, :helixClass ], [ 41, 70, Pdb_String, :comment ], [ 72, 76, Pdb_Integer, :length ] ) # SHEET record class SHEET = def_rec([ 8, 10, Pdb_Integer, :strand ], #[ 12, 14, Pdb_LString(3), :sheetID ], [ 12, 14, Pdb_StringRJ, :sheetID ], [ 15, 16, Pdb_Integer, :numStrands ], [ 18, 20, Pdb_Residue_name, :initResName ], [ 22, 22, Pdb_Character, :initChainID ], [ 23, 26, Pdb_Integer, :initSeqNum ], [ 27, 27, Pdb_AChar, :initICode ], [ 29, 31, Pdb_Residue_name, :endResName ], [ 33, 33, Pdb_Character, :endChainID ], [ 34, 37, Pdb_Integer, :endSeqNum ], [ 38, 38, Pdb_AChar, :endICode ], [ 39, 40, Pdb_Integer, :sense ], [ 42, 45, Pdb_Atom, :curAtom ], [ 46, 48, Pdb_Residue_name, :curResName ], [ 50, 50, Pdb_Character, :curChainId ], [ 51, 54, Pdb_Integer, :curResSeq ], [ 55, 55, Pdb_AChar, :curICode ], [ 57, 60, Pdb_Atom, :prevAtom ], [ 61, 63, Pdb_Residue_name, :prevResName ], [ 65, 65, Pdb_Character, :prevChainId ], [ 66, 69, Pdb_Integer, :prevResSeq ], [ 70, 70, Pdb_AChar, :prevICode ] ) # TURN record class TURN = def_rec([ 8, 10, Pdb_Integer, :seq ], #[ 12, 14, Pdb_LString(3), :turnId ], [ 12, 14, Pdb_StringRJ, :turnId ], [ 16, 18, Pdb_Residue_name, :initResName ], [ 20, 20, Pdb_Character, :initChainId ], [ 21, 24, Pdb_Integer, :initSeqNum ], [ 25, 25, Pdb_AChar, :initICode ], [ 27, 29, Pdb_Residue_name, :endResName ], [ 31, 31, Pdb_Character, :endChainId ], [ 32, 35, Pdb_Integer, :endSeqNum ], [ 36, 36, Pdb_AChar, :endICode ], [ 41, 70, Pdb_String, :comment ] ) # SSBOND record class SSBOND = def_rec([ 8, 10, Pdb_Integer, :serNum ], [ 12, 14, Pdb_LString(3), :pep1 ], # "CYS" [ 16, 16, Pdb_Character, :chainID1 ], [ 18, 21, Pdb_Integer, :seqNum1 ], [ 22, 22, Pdb_AChar, :icode1 ], [ 26, 28, Pdb_LString(3), :pep2 ], # "CYS" [ 30, 30, Pdb_Character, :chainID2 ], [ 32, 35, Pdb_Integer, :seqNum2 ], [ 36, 36, Pdb_AChar, :icode2 ], [ 60, 65, Pdb_SymOP, :sym1 ], [ 67, 72, Pdb_SymOP, :sym2 ] ) # LINK record class LINK = def_rec([ 13, 16, Pdb_Atom, :name1 ], [ 17, 17, Pdb_Character, :altLoc1 ], [ 18, 20, Pdb_Residue_name, :resName1 ], [ 22, 22, Pdb_Character, :chainID1 ], [ 23, 26, Pdb_Integer, :resSeq1 ], [ 27, 27, Pdb_AChar, :iCode1 ], [ 43, 46, Pdb_Atom, :name2 ], [ 47, 47, Pdb_Character, :altLoc2 ], [ 48, 50, Pdb_Residue_name, :resName2 ], [ 52, 52, Pdb_Character, :chainID2 ], [ 53, 56, Pdb_Integer, :resSeq2 ], [ 57, 57, Pdb_AChar, :iCode2 ], [ 60, 65, Pdb_SymOP, :sym1 ], [ 67, 72, Pdb_SymOP, :sym2 ] ) # HYDBND record class HYDBND = def_rec([ 13, 16, Pdb_Atom, :name1 ], [ 17, 17, Pdb_Character, :altLoc1 ], [ 18, 20, Pdb_Residue_name, :resName1 ], [ 22, 22, Pdb_Character, :Chain1 ], [ 23, 27, Pdb_Integer, :resSeq1 ], [ 28, 28, Pdb_AChar, :ICode1 ], [ 30, 33, Pdb_Atom, :nameH ], [ 34, 34, Pdb_Character, :altLocH ], [ 36, 36, Pdb_Character, :ChainH ], [ 37, 41, Pdb_Integer, :resSeqH ], [ 42, 42, Pdb_AChar, :iCodeH ], [ 44, 47, Pdb_Atom, :name2 ], [ 48, 48, Pdb_Character, :altLoc2 ], [ 49, 51, Pdb_Residue_name, :resName2 ], [ 53, 53, Pdb_Character, :chainID2 ], [ 54, 58, Pdb_Integer, :resSeq2 ], [ 59, 59, Pdb_AChar, :iCode2 ], [ 60, 65, Pdb_SymOP, :sym1 ], [ 67, 72, Pdb_SymOP, :sym2 ] ) # SLTBRG record class SLTBRG = def_rec([ 13, 16, Pdb_Atom, :atom1 ], [ 17, 17, Pdb_Character, :altLoc1 ], [ 18, 20, Pdb_Residue_name, :resName1 ], [ 22, 22, Pdb_Character, :chainID1 ], [ 23, 26, Pdb_Integer, :resSeq1 ], [ 27, 27, Pdb_AChar, :iCode1 ], [ 43, 46, Pdb_Atom, :atom2 ], [ 47, 47, Pdb_Character, :altLoc2 ], [ 48, 50, Pdb_Residue_name, :resName2 ], [ 52, 52, Pdb_Character, :chainID2 ], [ 53, 56, Pdb_Integer, :resSeq2 ], [ 57, 57, Pdb_AChar, :iCode2 ], [ 60, 65, Pdb_SymOP, :sym1 ], [ 67, 72, Pdb_SymOP, :sym2 ] ) # CISPEP record class CISPEP = def_rec([ 8, 10, Pdb_Integer, :serNum ], [ 12, 14, Pdb_LString(3), :pep1 ], [ 16, 16, Pdb_Character, :chainID1 ], [ 18, 21, Pdb_Integer, :seqNum1 ], [ 22, 22, Pdb_AChar, :icode1 ], [ 26, 28, Pdb_LString(3), :pep2 ], [ 30, 30, Pdb_Character, :chainID2 ], [ 32, 35, Pdb_Integer, :seqNum2 ], [ 36, 36, Pdb_AChar, :icode2 ], [ 44, 46, Pdb_Integer, :modNum ], [ 54, 59, Pdb_Real('6.2'), :measure ] ) # SITE record class SITE = def_rec([ 8, 10, Pdb_Integer, :seqNum ], [ 12, 14, Pdb_LString(3), :siteID ], [ 16, 17, Pdb_Integer, :numRes ], [ 19, 21, Pdb_Residue_name, :resName1 ], [ 23, 23, Pdb_Character, :chainID1 ], [ 24, 27, Pdb_Integer, :seq1 ], [ 28, 28, Pdb_AChar, :iCode1 ], [ 30, 32, Pdb_Residue_name, :resName2 ], [ 34, 34, Pdb_Character, :chainID2 ], [ 35, 38, Pdb_Integer, :seq2 ], [ 39, 39, Pdb_AChar, :iCode2 ], [ 41, 43, Pdb_Residue_name, :resName3 ], [ 45, 45, Pdb_Character, :chainID3 ], [ 46, 49, Pdb_Integer, :seq3 ], [ 50, 50, Pdb_AChar, :iCode3 ], [ 52, 54, Pdb_Residue_name, :resName4 ], [ 56, 56, Pdb_Character, :chainID4 ], [ 57, 60, Pdb_Integer, :seq4 ], [ 61, 61, Pdb_AChar, :iCode4 ] ) # CRYST1 record class CRYST1 = def_rec([ 7, 15, Pdb_Real('9.3'), :a ], [ 16, 24, Pdb_Real('9.3'), :b ], [ 25, 33, Pdb_Real('9.3'), :c ], [ 34, 40, Pdb_Real('7.2'), :alpha ], [ 41, 47, Pdb_Real('7.2'), :beta ], [ 48, 54, Pdb_Real('7.2'), :gamma ], [ 56, 66, Pdb_LString, :sGroup ], [ 67, 70, Pdb_Integer, :z ] ) # ORIGX1 record class # # ORIGXn n=1, 2, or 3 ORIGX1 = def_rec([ 11, 20, Pdb_Real('10.6'), :On1 ], [ 21, 30, Pdb_Real('10.6'), :On2 ], [ 31, 40, Pdb_Real('10.6'), :On3 ], [ 46, 55, Pdb_Real('10.5'), :Tn ] ) # ORIGX2 record class ORIGX2 = new_inherit(ORIGX1) # ORIGX3 record class ORIGX3 = new_inherit(ORIGX1) # SCALE1 record class # # SCALEn n=1, 2, or 3 SCALE1 = def_rec([ 11, 20, Pdb_Real('10.6'), :Sn1 ], [ 21, 30, Pdb_Real('10.6'), :Sn2 ], [ 31, 40, Pdb_Real('10.6'), :Sn3 ], [ 46, 55, Pdb_Real('10.5'), :Un ] ) # SCALE2 record class SCALE2 = new_inherit(SCALE1) # SCALE3 record class SCALE3 = new_inherit(SCALE1) # MTRIX1 record class # # MTRIXn n=1,2, or 3 MTRIX1 = def_rec([ 8, 10, Pdb_Integer, :serial ], [ 11, 20, Pdb_Real('10.6'), :Mn1 ], [ 21, 30, Pdb_Real('10.6'), :Mn2 ], [ 31, 40, Pdb_Real('10.6'), :Mn3 ], [ 46, 55, Pdb_Real('10.5'), :Vn ], [ 60, 60, Pdb_Integer, :iGiven ] ) # MTRIX2 record class MTRIX2 = new_inherit(MTRIX1) # MTRIX3 record class MTRIX3 = new_inherit(MTRIX1) # TVECT record class TVECT = def_rec([ 8, 10, Pdb_Integer, :serial ], [ 11, 20, Pdb_Real('10.5'), :t1 ], [ 21, 30, Pdb_Real('10.5'), :t2 ], [ 31, 40, Pdb_Real('10.5'), :t3 ], [ 41, 70, Pdb_String, :text ] ) # MODEL record class MODEL = def_rec([ 11, 14, Pdb_Integer, :serial ] ) # ChangeLog: model_serial are changed to serial # ATOM record class ATOM = new_direct([ 7, 11, Pdb_Integer, :serial ], [ 13, 16, Pdb_Atom, :name ], [ 17, 17, Pdb_Character, :altLoc ], [ 18, 20, Pdb_Residue_name, :resName ], [ 22, 22, Pdb_Character, :chainID ], [ 23, 26, Pdb_Integer, :resSeq ], [ 27, 27, Pdb_AChar, :iCode ], [ 31, 38, Pdb_Real('8.3'), :x ], [ 39, 46, Pdb_Real('8.3'), :y ], [ 47, 54, Pdb_Real('8.3'), :z ], [ 55, 60, Pdb_Real('6.2'), :occupancy ], [ 61, 66, Pdb_Real('6.2'), :tempFactor ], [ 73, 76, Pdb_LString(4), :segID ], [ 77, 78, Pdb_LString(2), :element ], [ 79, 80, Pdb_LString(2), :charge ] ) # ATOM record class class ATOM include Utils include Comparable # for backward compatibility alias occ occupancy # for backward compatibility alias bfac tempFactor # residue the atom belongs to. attr_accessor :residue # SIGATM record attr_accessor :sigatm # ANISOU record attr_accessor :anisou # TER record attr_accessor :ter #Returns a Coordinate class instance of the xyz positions def xyz Coordinate[ x, y, z ] end #Returns an array of the xyz positions def to_a [ x, y, z ] end #Sorts based on serial numbers def <=>(other) return serial <=> other.serial end def do_parse return self if @parsed or !@str self.serial = @str[6..10].to_i self.name = @str[12..15].strip self.altLoc = @str[16..16] self.resName = @str[17..19].strip self.chainID = @str[21..21] self.resSeq = @str[22..25].to_i self.iCode = @str[26..26].strip self.x = @str[30..37].to_f self.y = @str[38..45].to_f self.z = @str[46..53].to_f self.occupancy = @str[54..59].to_f self.tempFactor = @str[60..65].to_f self.segID = @str[72..75].to_s.rstrip self.element = @str[76..77].to_s.lstrip self.charge = @str[78..79].to_s.strip @parsed = true self end def justify_atomname atomname = self.name.to_s return atomname[0, 4] if atomname.length >= 4 case atomname.length when 0 return ' ' when 1 return ' ' + atomname + ' ' when 2 if /\A[0-9]/ =~ atomname then return sprintf('%-4s', atomname) elsif /[0-9]\z/ =~ atomname then return sprintf(' %-3s', atomname) end when 3 if /\A[0-9]/ =~ atomname then return sprintf('%-4s', atomname) end end # ambiguous case for two- or three-letter name elem = self.element.to_s.strip if elem.size > 0 and i = atomname.index(elem) then if i == 0 and elem.size == 1 then return sprintf(' %-3s', atomname) else return sprintf('%-4s', atomname) end end if self.kind_of?(HETATM) then if /\A(B[^AEHIKR]|C[^ADEFLMORSU]|F[^EMR]|H[^EFGOS]|I[^NR]|K[^R]|N[^ABDEIOP]|O[^S]|P[^ABDMORTU]|S[^BCEGIMNR]|V|W|Y[^B])/ =~ atomname then return sprintf(' %-3s', atomname) else return sprintf('%-4s', atomname) end else # ATOM if /\A[CHONSP]/ =~ atomname then return sprintf(' %-3s', atomname) else return sprintf('%-4s', atomname) end end # could not be reached here raise 'bug!' end private :justify_atomname def to_s atomname = justify_atomname sprintf("%-6s%5d %-4s%-1s%3s %-1s%4d%-1s %8.3f%8.3f%8.3f%6.2f%6.2f %-4s%2s%-2s\n", self.record_name, self.serial, atomname, self.altLoc, self.resName, self.chainID, self.resSeq, self.iCode, self.x, self.y, self.z, self.occupancy, self.tempFactor, self.segID, self.element, self.charge) end end #class ATOM # SIGATM record class SIGATM = def_rec([ 7, 11, Pdb_Integer, :serial ], [ 13, 16, Pdb_Atom, :name ], [ 17, 17, Pdb_Character, :altLoc ], [ 18, 20, Pdb_Residue_name, :resName ], [ 22, 22, Pdb_Character, :chainID ], [ 23, 26, Pdb_Integer, :resSeq ], [ 27, 27, Pdb_AChar, :iCode ], [ 31, 38, Pdb_Real('8.3'), :sigX ], [ 39, 46, Pdb_Real('8.3'), :sigY ], [ 47, 54, Pdb_Real('8.3'), :sigZ ], [ 55, 60, Pdb_Real('6.2'), :sigOcc ], [ 61, 66, Pdb_Real('6.2'), :sigTemp ], [ 73, 76, Pdb_LString(4), :segID ], [ 77, 78, Pdb_LString(2), :element ], [ 79, 80, Pdb_LString(2), :charge ] ) # ANISOU record class ANISOU = def_rec([ 7, 11, Pdb_Integer, :serial ], [ 13, 16, Pdb_Atom, :name ], [ 17, 17, Pdb_Character, :altLoc ], [ 18, 20, Pdb_Residue_name, :resName ], [ 22, 22, Pdb_Character, :chainID ], [ 23, 26, Pdb_Integer, :resSeq ], [ 27, 27, Pdb_AChar, :iCode ], [ 29, 35, Pdb_Integer, :U11 ], [ 36, 42, Pdb_Integer, :U22 ], [ 43, 49, Pdb_Integer, :U33 ], [ 50, 56, Pdb_Integer, :U12 ], [ 57, 63, Pdb_Integer, :U13 ], [ 64, 70, Pdb_Integer, :U23 ], [ 73, 76, Pdb_LString(4), :segID ], [ 77, 78, Pdb_LString(2), :element ], [ 79, 80, Pdb_LString(2), :charge ] ) # ANISOU record class class ANISOU # SIGUIJ record attr_accessor :siguij end #class ANISOU # SIGUIJ record class SIGUIJ = def_rec([ 7, 11, Pdb_Integer, :serial ], [ 13, 16, Pdb_Atom, :name ], [ 17, 17, Pdb_Character, :altLoc ], [ 18, 20, Pdb_Residue_name, :resName ], [ 22, 22, Pdb_Character, :chainID ], [ 23, 26, Pdb_Integer, :resSeq ], [ 27, 27, Pdb_AChar, :iCode ], [ 29, 35, Pdb_Integer, :SigmaU11 ], [ 36, 42, Pdb_Integer, :SigmaU22 ], [ 43, 49, Pdb_Integer, :SigmaU33 ], [ 50, 56, Pdb_Integer, :SigmaU12 ], [ 57, 63, Pdb_Integer, :SigmaU13 ], [ 64, 70, Pdb_Integer, :SigmaU23 ], [ 73, 76, Pdb_LString(4), :segID ], [ 77, 78, Pdb_LString(2), :element ], [ 79, 80, Pdb_LString(2), :charge ] ) # TER record class TER = def_rec([ 7, 11, Pdb_Integer, :serial ], [ 18, 20, Pdb_Residue_name, :resName ], [ 22, 22, Pdb_Character, :chainID ], [ 23, 26, Pdb_Integer, :resSeq ], [ 27, 27, Pdb_AChar, :iCode ] ) #HETATM = # new_direct([ 7, 11, Pdb_Integer, :serial ], # [ 13, 16, Pdb_Atom, :name ], # [ 17, 17, Pdb_Character, :altLoc ], # [ 18, 20, Pdb_Residue_name, :resName ], # [ 22, 22, Pdb_Character, :chainID ], # [ 23, 26, Pdb_Integer, :resSeq ], # [ 27, 27, Pdb_AChar, :iCode ], # [ 31, 38, Pdb_Real('8.3'), :x ], # [ 39, 46, Pdb_Real('8.3'), :y ], # [ 47, 54, Pdb_Real('8.3'), :z ], # [ 55, 60, Pdb_Real('6.2'), :occupancy ], # [ 61, 66, Pdb_Real('6.2'), :tempFactor ], # [ 73, 76, Pdb_LString(4), :segID ], # [ 77, 78, Pdb_LString(2), :element ], # [ 79, 80, Pdb_LString(2), :charge ] # ) # HETATM record class HETATM = new_inherit(ATOM) # HETATM record class. # It inherits ATOM class. class HETATM; end # ENDMDL record class ENDMDL = def_rec([ 2, 1, Pdb_Integer, :serial ] # dummy field (always 0) ) # CONECT record class CONECT = def_rec([ 7, 11, Pdb_Integer, :serial ], [ 12, 16, Pdb_Integer, :serial ], [ 17, 21, Pdb_Integer, :serial ], [ 22, 26, Pdb_Integer, :serial ], [ 27, 31, Pdb_Integer, :serial ], [ 32, 36, Pdb_Integer, :serial ], [ 37, 41, Pdb_Integer, :serial ], [ 42, 46, Pdb_Integer, :serial ], [ 47, 51, Pdb_Integer, :serial ], [ 52, 56, Pdb_Integer, :serial ], [ 57, 61, Pdb_Integer, :serial ] ) # MASTER record class MASTER = def_rec([ 11, 15, Pdb_Integer, :numRemark ], [ 16, 20, Pdb_Integer, "0" ], [ 21, 25, Pdb_Integer, :numHet ], [ 26, 30, Pdb_Integer, :numHelix ], [ 31, 35, Pdb_Integer, :numSheet ], [ 36, 40, Pdb_Integer, :numTurn ], [ 41, 45, Pdb_Integer, :numSite ], [ 46, 50, Pdb_Integer, :numXform ], [ 51, 55, Pdb_Integer, :numCoord ], [ 56, 60, Pdb_Integer, :numTer ], [ 61, 65, Pdb_Integer, :numConect ], [ 66, 70, Pdb_Integer, :numSeq ] ) # JRNL record classes class Jrnl < self # subrecord of JRNL # 13, 16 # JRNL AUTH record class AUTH = def_rec([ 13, 16, Pdb_String, :sub_record ], # "AUTH" [ 17, 18, Pdb_Continuation, nil ], [ 20, 70, Pdb_List, :authorList ] ) # JRNL TITL record class TITL = def_rec([ 13, 16, Pdb_String, :sub_record ], # "TITL" [ 17, 18, Pdb_Continuation, nil ], [ 20, 70, Pdb_LString, :title ] ) # JRNL EDIT record class EDIT = def_rec([ 13, 16, Pdb_String, :sub_record ], # "EDIT" [ 17, 18, Pdb_Continuation, nil ], [ 20, 70, Pdb_List, :editorList ] ) # JRNL REF record class REF = def_rec([ 13, 16, Pdb_String, :sub_record ], # "REF" [ 17, 18, Pdb_Continuation, nil ], [ 20, 47, Pdb_LString, :pubName ], [ 50, 51, Pdb_LString(2), "V." ], [ 52, 55, Pdb_String, :volume ], [ 57, 61, Pdb_String, :page ], [ 63, 66, Pdb_Integer, :year ] ) # JRNL PUBL record class PUBL = def_rec([ 13, 16, Pdb_String, :sub_record ], # "PUBL" [ 17, 18, Pdb_Continuation, nil ], [ 20, 70, Pdb_LString, :pub ] ) # JRNL REFN record class REFN = def_rec([ 13, 16, Pdb_String, :sub_record ], # "REFN" [ 20, 23, Pdb_LString(4), "ASTM" ], [ 25, 30, Pdb_LString(6), :astm ], [ 33, 34, Pdb_LString(2), :country ], [ 36, 39, Pdb_LString(4), :BorS ], # "ISBN" or "ISSN" [ 41, 65, Pdb_LString, :isbn ], [ 67, 70, Pdb_LString(4), :coden ] # "0353" for unpublished ) # default or unknown record # Default = def_rec([ 13, 16, Pdb_String, :sub_record ]) # "" # definitions (hash) Definition = create_definition_hash end #class JRNL # REMARK record classes for REMARK 1 class Remark1 < self # 13, 16 # REMARK 1 REFERENCE record class EFER = def_rec([ 8, 10, Pdb_Integer, :remarkNum ], # "1" [ 12, 20, Pdb_String, :sub_record ], # "REFERENCE" [ 22, 70, Pdb_Integer, :refNum ] ) # REMARK 1 AUTH record class AUTH = def_rec([ 8, 10, Pdb_Integer, :remarkNum ], # "1" [ 13, 16, Pdb_String, :sub_record ], # "AUTH" [ 17, 18, Pdb_Continuation, nil ], [ 20, 70, Pdb_List, :authorList ] ) # REMARK 1 TITL record class TITL = def_rec([ 8, 10, Pdb_Integer, :remarkNum ], # "1" [ 13, 16, Pdb_String, :sub_record ], # "TITL" [ 17, 18, Pdb_Continuation, nil ], [ 20, 70, Pdb_LString, :title ] ) # REMARK 1 EDIT record class EDIT = def_rec([ 8, 10, Pdb_Integer, :remarkNum ], # "1" [ 13, 16, Pdb_String, :sub_record ], # "EDIT" [ 17, 18, Pdb_Continuation, nil ], [ 20, 70, Pdb_LString, :editorList ] ) # REMARK 1 REF record class REF = def_rec([ 8, 10, Pdb_Integer, :remarkNum ], # "1" [ 13, 16, Pdb_LString(3), :sub_record ], # "REF" [ 17, 18, Pdb_Continuation, nil ], [ 20, 47, Pdb_LString, :pubName ], [ 50, 51, Pdb_LString(2), "V." ], [ 52, 55, Pdb_String, :volume ], [ 57, 61, Pdb_String, :page ], [ 63, 66, Pdb_Integer, :year ] ) # REMARK 1 PUBL record class PUBL = def_rec([ 8, 10, Pdb_Integer, :remarkNum ], # "1" [ 13, 16, Pdb_String, :sub_record ], # "PUBL" [ 17, 18, Pdb_Continuation, nil ], [ 20, 70, Pdb_LString, :pub ] ) # REMARK 1 REFN record class REFN = def_rec([ 8, 10, Pdb_Integer, :remarkNum ], # "1" [ 13, 16, Pdb_String, :sub_record ], # "REFN" [ 20, 23, Pdb_LString(4), "ASTM" ], [ 25, 30, Pdb_LString, :astm ], [ 33, 34, Pdb_LString, :country ], [ 36, 39, Pdb_LString(4), :BorS ], [ 41, 65, Pdb_LString, :isbn ], [ 68, 70, Pdb_LString(4), :coden ] ) # default (or unknown) record class for REMARK 1 Default = def_rec([ 8, 10, Pdb_Integer, :remarkNum ], # "1" [ 13, 16, Pdb_String, :sub_record ] # "" ) # definitions (hash) Definition = create_definition_hash end #class Remark1 # REMARK record classes for REMARK 2 class Remark2 < self # 29, 38 == 'ANGSTROMS.' ANGSTROMS = def_rec([ 8, 10, Pdb_Integer, :remarkNum ], # "2" [ 12, 22, Pdb_LString(11), :sub_record ], # "RESOLUTION." [ 23, 27, Pdb_Real('5.2'), :resolution ], [ 29, 38, Pdb_LString(10), "ANGSTROMS." ] ) # 23, 38 == ' NOT APPLICABLE.' NOT_APPLICABLE = def_rec([ 8, 10, Pdb_Integer, :remarkNum ], # "2" [ 12, 22, Pdb_LString(11), :sub_record ], # "RESOLUTION." [ 23, 38, Pdb_LString(16), :resolution ], # " NOT APPLICABLE." [ 41, 70, Pdb_String, :comment ] ) # others Default = def_rec([ 8, 10, Pdb_Integer, :remarkNum ], # "2" [ 12, 22, Pdb_LString(11), :sub_record ], # "RESOLUTION." [ 24, 70, Pdb_String, :comment ] ) end #class Remark2 # REMARK record class for REMARK n (n>=3) RemarkN = def_rec([ 8, 10, Pdb_Integer, :remarkNum ], [ 12, 70, Pdb_LString, :text ] ) # default (or unknown) record class Default = def_rec([ 8, 70, Pdb_LString, :text ]) # definitions (hash) Definition = create_definition_hash # END record class. # # Because END is a reserved word of Ruby, it is separately # added to the hash End = def_rec([ 2, 1, Pdb_Integer, :serial ]) # dummy field (always 0) Definition['END'.intern] = End # Basically just look up the class in Definition hash # do some munging for JRNL and REMARK def self.get_record_class(str) t = fetch_record_name(str) t = t.intern unless t.empty? if d = Definition[t] then return d end case t when :JRNL ts = str[12..15].to_s.strip ts = ts.intern unless ts.empty? d = Jrnl::Definition[ts] when :REMARK case str[7..9].to_i when 1 ts = str[12..15].to_s.strip ts = ts.intern unless ts.empty? d = Remark1::Definition[ts] when 2 if str[28..37] == 'ANGSTROMS.' then d = Remark2::ANGSTROMS elsif str[22..37] == ' NOT APPLICABLE.' then d = Remark2::NOT_APPLICABLE else d = Remark2::Default end else d = RemarkN end else # unknown field d = Default end return d end end #class Record Coordinate_fileds = { 'MODEL' => true, :MODEL => true, 'ENDMDL' => true, :ENDMDL => true, 'ATOM' => true, :ATOM => true, 'HETATM' => true, :HETATM => true, 'SIGATM' => true, :SIGATM => true, 'SIGUIJ' => true, :SIGUIJ => true, 'ANISOU' => true, :ANISOU => true, 'TER' => true, :TER => true, } # Creates a new Bio::PDB object from given str. def initialize(str) #Aha! Our entry into the world of PDB parsing, we initialise a PDB #object with the whole PDB file as a string #each PDB has an array of the lines of the original file #a bit memory-tastic! A hash of records and an array of models #also has an id @data = str.split(/[\r\n]+/) @hash = {} @models = [] @id = nil #Flag to say whether the current line is part of a continuation cont = false #Empty current model cModel = Model.new cChain = nil #Chain.new cResidue = nil #Residue.new cLigand = nil #Heterogen.new c_atom = nil #Goes through each line and replace that line with a PDB::Record @data.collect! do |line| #Go to next if the previous line was contiunation able, and #add_continuation returns true. Line is added by add_continuation next if cont and cont = cont.add_continuation(line) #Make the new record f = Record.get_record_class(line).new.initialize_from_string(line) #p f #Set cont cont = f if f.continue? #Set the hash to point to this record either by adding to an #array, or on it's own key = f.record_name if a = @hash[key] then a << f else @hash[key] = [ f ] end # Do something for ATOM and HETATM if key == 'ATOM' or key == 'HETATM' then if cChain and f.chainID == cChain.id chain = cChain else if chain = cModel[f.chainID] cChain = chain unless cChain else # If we don't have chain, add a new chain newChain = Chain.new(f.chainID, cModel) cModel.addChain(newChain) cChain = newChain chain = newChain end # chain might be changed, clearing cResidue and cLigand cResidue = nil cLigand = nil end end case key when 'ATOM' c_atom = f residueID = Residue.get_residue_id_from_atom(f) if cResidue and residueID == cResidue.id residue = cResidue else if residue = chain.get_residue_by_id(residueID) cResidue = residue unless cResidue else # add a new residue newResidue = Residue.new(f.resName, f.resSeq, f.iCode, chain) chain.addResidue(newResidue) cResidue = newResidue residue = newResidue end end f.residue = residue residue.addAtom(f) when 'HETATM' c_atom = f residueID = Heterogen.get_residue_id_from_atom(f) if cLigand and residueID == cLigand.id ligand = cLigand else if ligand = chain.get_heterogen_by_id(residueID) cLigand = ligand unless cLigand else # add a new heterogen newLigand = Heterogen.new(f.resName, f.resSeq, f.iCode, chain) chain.addLigand(newLigand) cLigand = newLigand ligand = newLigand #Each model has a special solvent chain. (for compatibility) if f.resName == 'HOH' cModel.addSolvent(newLigand) end end end f.residue = ligand ligand.addAtom(f) when 'MODEL' c_atom = nil cChain = nil cResidue = nil cLigand = nil if cModel.model_serial or cModel.chains.size > 0 then self.addModel(cModel) end cModel = Model.new(f.serial) when 'TER' if c_atom c_atom.ter = f else #$stderr.puts "Warning: stray TER?" end when 'SIGATM' if c_atom #$stderr.puts "Warning: duplicated SIGATM?" if c_atom.sigatm c_atom.sigatm = f else #$stderr.puts "Warning: stray SIGATM?" end when 'ANISOU' if c_atom #$stderr.puts "Warning: duplicated ANISOU?" if c_atom.anisou c_atom.anisou = f else #$stderr.puts "Warning: stray ANISOU?" end when 'SIGUIJ' if c_atom and c_atom.anisou #$stderr.puts "Warning: duplicated SIGUIJ?" if c_atom.anisou.siguij c_atom.anisou.siguij = f else #$stderr.puts "Warning: stray SIGUIJ?" end else c_atom = nil end f end #each #At the end we need to add the final model self.addModel(cModel) @data.compact! end #def initialize # all records in this entry as an array. attr_reader :data # all records in this entry as an hash accessed by record names. attr_reader :hash # models in this entry (array). attr_reader :models # Adds a Bio::Model object to the current strucutre. # Adds a model to the current structure. # Returns self. def addModel(model) raise "Expecting a Bio::PDB::Model" if not model.is_a? Bio::PDB::Model @models.push(model) self end # Iterates over each model. # Iterates over each of the models in the structure. # Returns self. def each @models.each{ |model| yield model } self end # Alias needed for Bio::PDB::ModelFinder alias each_model each # Provides keyed access to the models based on serial number # returns nil if it's not there def [](key) @models.find{ |model| key == model.model_serial } end #-- # (should it raise an exception?) #++ #-- #Stringifies to a list of atom records - we could add the annotation #as well if needed #++ # Returns a string of Bio::PDB::Models. This propogates down the heirarchy # till you get to Bio::PDB::Record::ATOM which are outputed in PDB format def to_s string = "" @models.each{ |model| string << model.to_s } string << "END\n" return string end #Makes a hash out of an array of PDB::Records and some kind of symbol #.__send__ invokes the method specified by the symbol. #Essentially it ends up with a hash with keys given in the sub_record #Not sure I fully understand this def make_hash(ary, meth) h = {} ary.each do |f| k = f.__send__(meth) h[k] = [] unless h.has_key?(k) h[k] << f end h end private :make_hash #Takes an array and returns another array of PDB::Records def make_grouping(ary, meth) a = [] k_prev = nil ary.each do |f| k = f.__send__(meth) if k_prev and k_prev == k then a.last << f else a << [] a.last << f end k_prev = k end a end private :make_grouping # Gets all records whose record type is _name_. # Returns an array of Bio::PDB::Record::* objects. # # if _name_ is nil, returns hash storing all record data. # # Example: # p pdb.record('HETATM') # p pdb.record['HETATM'] # def record(name = nil) name ? (@hash[name] || []) : @hash end #-- # PDB original methods #Returns a hash of the REMARK records based on the remarkNum #++ # Gets REMARK records. # If no arguments, it returns all REMARK records as a hash. # If remark number is specified, returns only corresponding REMARK records. # If number == 1 or 2 ("REMARK 1" or "REMARK 2"), returns an array # of Bio::PDB::Record instances. Otherwise, returns an array of strings. # def remark(nn = nil) unless defined?(@remark) h = make_hash(self.record('REMARK'), :remarkNum) h.each do |i, a| a.shift # remove first record (= space only) if i != 1 and i != 2 then a.collect! { |f| f.text.gsub(/\s+\z/, '') } end end @remark = h end nn ? @remark[nn] : @remark end # Gets JRNL records. # If no arguments, it returns all JRNL records as a hash. # If sub record name is specified, it returns only corresponding records # as an array of Bio::PDB::Record instances. # def jrnl(sub_record = nil) unless defined?(@jrnl) @jrnl = make_hash(self.record('JRNL'), :sub_record) end sub_record ? @jrnl[sub_record] : @jrnl end #-- #Finding methods - just grabs the record with the appropriate id #or returns and array of all of them #++ # Gets HELIX records. # If no arguments are given, it returns all HELIX records. # (Returns an array of Bio::PDB::Record::HELIX instances.) # If helixID is given, it only returns records # corresponding to given helixID. # (Returns an Bio::PDB::Record::HELIX instance.) # def helix(helixID = nil) if helixID then self.record('HELIX').find { |f| f.helixID == helixID } else self.record('HELIX') end end # Gets TURN records. # If no arguments are given, it returns all TURN records. # (Returns an array of Bio::PDB::Record::TURN instances.) # If turnId is given, it only returns a record # corresponding to given turnId. # (Returns an Bio::PDB::Record::TURN instance.) # def turn(turnId = nil) if turnId then self.record('TURN').find { |f| f.turnId == turnId } else self.record('TURN') end end # Gets SHEET records. # If no arguments are given, it returns all SHEET records # as an array of arrays of Bio::PDB::Record::SHEET instances. # If sheetID is given, it returns an array of # Bio::PDB::Record::SHEET instances. def sheet(sheetID = nil) unless defined?(@sheet) @sheet = make_grouping(self.record('SHEET'), :sheetID) end if sheetID then @sheet.find_all { |f| f.first.sheetID == sheetID } else @sheet end end # Gets SSBOND records. def ssbond self.record('SSBOND') end #-- # Get seqres - we get this to return a nice Bio::Seq object #++ # Amino acid or nucleic acid sequence of backbone residues in "SEQRES". # If chainID is given, it returns corresponding sequence # as an array of string. # Otherwise, returns a hash which contains all sequences. # def seqres(chainID = nil) unless defined?(@seqres) h = make_hash(self.record('SEQRES'), :chainID) newHash = {} h.each do |k, a| a.collect! { |f| f.resName } a.flatten! # determine nuc or aa? tmp = Hash.new(0) a[0,13].each { |x| tmp[x.to_s.strip.size] += 1 } if tmp[3] >= tmp[1] then # amino acid sequence a.collect! do |aa| #aa is three letter code: i.e. ALA #need to look up with Ala aa = aa.capitalize (begin Bio::AminoAcid.three2one(aa) rescue ArgumentError nil end || 'X') end seq = Bio::Sequence::AA.new(a.join('')) else # nucleic acid sequence a.collect! do |na| na = na.delete('^a-zA-Z') na.size == 1 ? na : 'n' end seq = Bio::Sequence::NA.new(a.join('')) end newHash[k] = seq end @seqres = newHash end if chainID then @seqres[chainID] else @seqres end end # Gets DBREF records. # Returns an array of Bio::PDB::Record::DBREF objects. # # If chainID is given, it returns corresponding DBREF records. def dbref(chainID = nil) if chainID then self.record('DBREF').find_all { |f| f.chainID == chainID } else self.record('DBREF') end end # Keywords in "KEYWDS". # Returns an array of string. def keywords self.record('KEYWDS').collect { |f| f.keywds }.flatten end # Classification in "HEADER". def classification f = self.record('HEADER').first f ? f.classification : nil end # Get authors in "AUTHOR". def authors self.record('AUTHOR').collect { |f| f.authorList }.flatten end #-- # Bio::DB methods #++ # PDB identifier written in "HEADER". (e.g. 1A00) def entry_id unless @id f = self.record('HEADER').first @id = f ? f.idCode : nil end @id end # Same as Bio::PDB#entry_id. def accession self.entry_id end # Title of this entry in "TITLE". def definition f = self.record('TITLE').first f ? f.title : nil end # Current modification number in "REVDAT". def version f = self.record('REVDAT').first f ? f.modNum : nil end # returns a string containing human-readable representation # of this object. def inspect "#<#{self.class.to_s} entry_id=#{entry_id.inspect}>" end end #class PDB end #module Bio bio-2.0.3/lib/bio/db/fastq.rb0000644000175000017500000004415714141516614015200 0ustar nileshnilesh# # = bio/db/fastq.rb - FASTQ format parser class # # Copyright:: Copyright (C) 2009 # Naohisa Goto # License:: The Ruby License # # == Description # # FASTQ format parser class. # # Be careful that it is for the fastQ format, not for the fastA format. # # == Examples # # See documents of Bio::Fastq class. # # == References # # * FASTQ format specification # http://maq.sourceforge.net/fastq.shtml # require "strscan" require "singleton" require 'bio/sequence' require 'bio/io/flatfile' module Bio # Bio::Fastq is a parser for FASTQ format. # class Fastq # Bio::Fastq::FormatData is a data class to store Fastq format parameters # and quality calculation methods. # Bio::Fastq internal use only. class FormatData # Format name. Should be redefined in subclass. NAME = nil # Offset. Should be redefined in subclass. OFFSET = nil # Range of score. Should be redefined in subclass. # The range must not exclude end value, i.e. it must be X..Y, # and must not be X...Y. SCORE_RANGE = nil def initialize @name = self.class::NAME @symbol = @name.gsub(/\-/, '_').to_sym @offset = self.class::OFFSET @score_range = self.class::SCORE_RANGE end # Format name attr_reader :name # Format name symbol. # Note that "-" in the format name is substituted to "_" because # "-" in a symbol is relatively difficult to handle. attr_reader :symbol # Offset when converting a score to a character attr_reader :offset # Allowed range of a score value attr_reader :score_range # Type of quality scores. Maybe one of :phred or :solexa. attr_reader :quality_score_type if false # for RDoc # Converts quality string to scores. # No overflow/underflow checks will be performed. # --- # *Arguments*: # * (required) _c_: (String) quality string # *Returns*:: (Array containing Integer) score values def str2scores(str) a = str.unpack('C*') a.collect! { |i| i - @offset } a end # Converts scores to a string. # Overflow/underflow checks will be performed. # If a block is given, when overflow/underflow detected, # the score value is passed to the block, and uses returned value # as the score. If no blocks, silently truncated. # # --- # *Arguments*: # * (required) _a_: (Array containing Integer) score values # *Returns*:: (String) quality string def scores2str(a) if block_given? then tmp = a.collect do |i| i = yield(i) unless @score_range.include?(i) i + @offset end else min = @score_range.begin max = @score_range.end tmp = a.collect do |i| if i < min then i = min elsif i > max then i = max end i + @offset end end tmp.pack('C*') end # Format information for "fastq-sanger". # Bio::Fastq internal use only. class FASTQ_SANGER < FormatData include Singleton include Bio::Sequence::QualityScore::Phred # format name NAME = 'fastq-sanger'.freeze # offset OFFSET = 33 # score range SCORE_RANGE = 0..93 end #class FASTQ_SANGER # Format information for "fastq-solexa" # Bio::Fastq internal use only. class FASTQ_SOLEXA < FormatData include Singleton include Bio::Sequence::QualityScore::Solexa # format name NAME = 'fastq-solexa'.freeze # offset OFFSET = 64 # score range SCORE_RANGE = (-5)..62 end #class FASTQ_SOLEXA # Format information for "fastq-illumina" # Bio::Fastq internal use only. class FASTQ_ILLUMINA < FormatData include Singleton include Bio::Sequence::QualityScore::Phred # format name NAME = 'fastq-illumina'.freeze # offset OFFSET = 64 # score range SCORE_RANGE = 0..62 end #class FASTQ_ILLUMINA end #class FormatData # Available format names. FormatNames = { "fastq-sanger" => FormatData::FASTQ_SANGER, "fastq-solexa" => FormatData::FASTQ_SOLEXA, "fastq-illumina" => FormatData::FASTQ_ILLUMINA }.freeze # Available format name symbols. Formats = { :fastq_sanger => FormatData::FASTQ_SANGER, :fastq_solexa => FormatData::FASTQ_SOLEXA, :fastq_illumina => FormatData::FASTQ_ILLUMINA }.freeze # Default format name DefaultFormatName = 'fastq-sanger'.freeze # Splitter for Bio::FlatFile FLATFILE_SPLITTER = Bio::FlatFile::Splitter::LineOriented # Basic exception class of all Bio::Fastq::Error:XXXX. # Bio::Fastq internal use only. class Error < RuntimeError private # default error message for this exception def default_message(i) "FASTQ error #{i}" end # Creates a new object. # If error message is not given, default error message is stored. # If error message is a Integer value, it is treated as the # position inside the sequence or the quality, and default # error message including the position is stored. # --- # *Arguments*: # * (optional) error_message: error message (see above) def initialize(error_message = nil) if !error_message or error_message.kind_of?(Integer) then error_message = default_message(error_message) end super(error_message) end # Error::No_atmark -- the first identifier does not begin with "@" class No_atmark < Error private # default error message for this exception def default_message(i) 'the first identifier does not begin with "@"' end end # Error::No_ids -- sequence identifier not found class No_ids < Error private # default error message for this exception def default_message(i) 'sequence identifier not found' end end # Error::Diff_ids -- the identifier in the two lines are different class Diff_ids < Error private # default error message for this exception def default_message(i) 'the identifier in the two lines are different' end end # Error::Long_qual -- length of quality is longer than the sequence class Long_qual < Error private # default error message for this exception def default_message(i) 'length of quality is longer than the sequence' end end # Error::Short_qual -- length of quality is shorter than the sequence class Short_qual < Error private # default error message for this exception def default_message(i) 'length of quality is shorter than the sequence' end end # Error::No_qual -- no quality characters found class No_qual < Error private # default error message for this exception def default_message(i) 'no quality characters found' end end # Error::No_seq -- no sequence found class No_seq < Error private # default error message for this exception def default_message(i) 'no sequence found' end end # Error::Qual_char -- invalid character in the quality class Qual_char < Error private # default error message for this exception def default_message(i) pos = i ? " at [#{i}]" : '' "invalid character in the quality#{pos}" end end # Error::Seq_char -- invalid character in the sequence class Seq_char < Error private # default error message for this exception def default_message(i) pos = i ? " at [#{i}]" : '' "invalid character in the sequence#{pos}" end end # Error::Qual_range -- quality score value out of range class Qual_range < Error private # default error message for this exception def default_message(i) pos = i ? " at [#{i}]" : '' "quality score value out of range#{pos}" end end # Error::Skipped_unformatted_lines -- the parser skipped unformatted # lines that could not be recognized as FASTQ format class Skipped_unformatted_lines < Error private # default error message for this exception def default_message(i) "the parser skipped unformatted lines that could not be recognized as FASTQ format" end end end #class Error # Adds a header line if the header data is not yet given and # the given line is suitable for header. # Returns self if adding header line is succeeded. # Otherwise, returns false (the line is not added). def add_header_line(line) @header ||= "" if line[0,1] == "@" then false else @header.concat line self end end # misc lines before the entry (String or nil) attr_reader :header # Adds a line to the entry if the given line is regarded as # a part of the current entry. def add_line(line) line = line.chomp if !defined? @definition then if line[0, 1] == "@" then @definition = line[1..-1] else @definition = line @parse_errors ||= [] @parse_errors.push Error::No_atmark.new end return self end if defined? @definition2 then @quality_string ||= '' if line[0, 1] == "@" and @quality_string.size >= @sequence_string.size then return false else @quality_string.concat line return self end else @sequence_string ||= '' if line[0, 1] == '+' then @definition2 = line[1..-1] else @sequence_string.concat line end return self end raise "Bug: should not reach here!" end # entry_overrun attr_reader :entry_overrun # Creates a new Fastq object from formatted text string. # # The format of quality scores should be specified later # by using format= method. # # --- # *Arguments*: # * _str_: Formatted string (String) def initialize(str = nil) return unless str sc = StringScanner.new(str) while !sc.eos? and line = sc.scan(/.*(?:\n|\r|\r\n)?/) unless add_header_line(line) then sc.unscan break end end while !sc.eos? and line = sc.scan(/.*(?:\n|\r|\r\n)?/) unless add_line(line) then sc.unscan break end end @entry_overrun = sc.rest end # definition; ID line (begins with @) attr_reader :definition # quality as a string attr_reader :quality_string # raw sequence data as a String object attr_reader :sequence_string # Returns Fastq formatted string constructed from instance variables. # The string will always be consisted of four lines without wrapping of # the sequence and quality string, and the third-line is always only # contains "+". This may be different from initial entry. # # Note that use of the method may be inefficient and may lose performance # because new string object is created every time it is called. # For showing an entry as-is, consider using Bio::FlatFile#entry_raw. # For output with various options, use Bio::Sequence#output(:fastq). # def to_s "@#{@definition}\n#{@sequence_string}\n+\n#{@quality_string}\n" end # returns Bio::Sequence::NA def naseq unless defined? @naseq then @naseq = Bio::Sequence::NA.new(@sequence_string) end @naseq end # length of naseq def nalen naseq.length end # returns Bio::Sequence::Generic def seq unless defined? @seq then @seq = Bio::Sequence::Generic.new(@sequence_string) end @seq end # Identifier of the entry. Normally, the first word of the ID line. def entry_id unless defined? @entry_id then eid = @definition.strip.split(/\s+/)[0] || @definition @entry_id = eid end @entry_id end # (private) reset internal state def reset_state if defined? @quality_scores then remove_instance_variable(:@quality_scores) end if defined? @error_probabilities then remove_instance_variable(:@error_probabilities) end end private :reset_state # Specify the format. If the format is not found, raises RuntimeError. # # Available formats are: # "fastq-sanger" or :fastq_sanger # "fastq-solexa" or :fastq_solexa # "fastq-illumina" or :fastq_illumina # # --- # *Arguments*: # * (required) _name_: format name (String or Symbol). # *Returns*:: (String) format name def format=(name) if name then f = FormatNames[name] || Formats[name] if f then reset_state @format = f.instance self.format else raise "unknown format" end else reset_state nil end end # Format name. # One of "fastq-sanger", "fastq-solexa", "fastq-illumina", # or nil (when not specified). # --- # *Returns*:: (String or nil) format name def format ((defined? @format) && @format) ? @format.name : nil end # The meaning of the quality scores. # It may be one of :phred, :solexa, or nil. def quality_score_type self.format ||= self.class::DefaultFormatName @format.quality_score_type end # Quality score for each base. # For "fastq-sanger" or "fastq-illumina", it is PHRED score. # For "fastq-solexa", it is Solexa score. # # --- # *Returns*:: (Array containing Integer) quality score values def quality_scores unless defined? @quality_scores then self.format ||= self.class::DefaultFormatName s = @format.str2scores(@quality_string) @quality_scores = s end @quality_scores end alias qualities quality_scores # Estimated probability of error for each base. # --- # *Returns*:: (Array containing Float) error probability values def error_probabilities unless defined? @error_probabilities then self.format ||= self.class::DefaultFormatName a = @format.q2p(self.quality_scores) @error_probabilities = a end @error_probabilities end # Format validation. # # If an array is given as the argument, when errors are found, # error objects are pushed to the array. # Currently, following errors may be added to the array. # (All errors are under the Bio::Fastq namespace, for example, # Bio::Fastq::Error::Diff_ids). # # Error::Diff_ids -- the identifier in the two lines are different # Error::Long_qual -- length of quality is longer than the sequence # Error::Short_qual -- length of quality is shorter than the sequence # Error::No_qual -- no quality characters found # Error::No_seq -- no sequence found # Error::Qual_char -- invalid character in the quality # Error::Seq_char -- invalid character in the sequence # Error::Qual_range -- quality score value out of range # Error::No_ids -- sequence identifier not found # Error::No_atmark -- the first identifier does not begin with "@" # Error::Skipped_unformatted_lines -- the parser skipped unformatted lines that could not be recognized as FASTQ format # # --- # *Arguments*: # * (optional) _errors_: (Array or nil) an array for pushing error messages. The array should be empty. # *Returns*:: true:no error, false: containing error. def validate_format(errors = nil) err = [] # if header exists, the format might be broken. if defined? @header and @header and !@header.strip.empty? then err.push Error::Skipped_unformatted_lines.new end # if parse errors exist, adding them if defined? @parse_errors and @parse_errors then err.concat @parse_errors end # check if identifier exists, and identifier matches if !defined?(@definition) or !@definition then err.push Error::No_ids.new elsif defined?(@definition2) and !@definition2.to_s.empty? and @definition != @definition2 then err.push Error::Diff_ids.new end # check if sequence exists has_seq = true if !defined?(@sequence_string) or !@sequence_string then err.push Error::No_seq.new has_seq = false end # check if quality exists has_qual = true if !defined?(@quality_string) or !@quality_string then err.push Error::No_qual.new has_qual = false end # sequence and quality length check if has_seq and has_qual then slen = @sequence_string.length qlen = @quality_string.length if slen > qlen then err.push Error::Short_qual.new elsif qlen > slen then err.push Error::Long_qual.new end end # sequence character check if has_seq then sc = StringScanner.new(@sequence_string) while sc.scan_until(/[ \x00-\x1f\x7f-\xff]/n) err.push Error::Seq_char.new(sc.pos - sc.matched_size) end end # sequence character check if has_qual then fmt = if defined?(@format) and @format then @format.name else nil end re = case fmt when 'fastq-sanger' /[^\x21-\x7e]/n when 'fastq-solexa' /[^\x3b-\x7e]/n when 'fastq-illumina' /[^\x40-\x7e]/n else /[ \x00-\x1f\x7f-\xff]/n end sc = StringScanner.new(@quality_string) while sc.scan_until(re) err.push Error::Qual_char.new(sc.pos - sc.matched_size) end end # if "errors" is given, set errors errors.concat err if errors # returns true if no error; otherwise, returns false err.empty? ? true : false end # Returns sequence as a Bio::Sequence object. # # Note: If you modify the returned Bio::Sequence object, # the sequence or definition in this Fastq object # might also be changed (but not always be changed) # because of efficiency. # def to_biosequence Bio::Sequence.adapter(self, Bio::Sequence::Adapter::Fastq) end # Masks low quality sequence regions. # For each sequence position, if the quality score is smaller than # the threshold, the sequence in the position is replaced with # mask_char. # # Note: This method does not care quality_score_type. # --- # *Arguments*: # * (required) threshold : (Numeric) threshold # * (optional) mask_char : (String) character used for masking # *Returns*:: Bio::Sequence object def mask(threshold, mask_char = 'n') to_biosequence.mask_with_quality_score(threshold, mask_char) end end #class Fastq end #module Bio bio-2.0.3/lib/bio/db/nexus.rb0000644000175000017500000015557314141516614015231 0ustar nileshnilesh# # = bio/db/nexus.rb - Nexus Standard phylogenetic tree parser / formatter # # Copyright:: Copyright (C) 2006 Christian M Zmasek # # License:: The Ruby License # # $Id: nexus.rb,v 1.3 2007/04/05 23:35:40 trevor Exp $ # # == Description # # This file contains classes that implement a parser for NEXUS formatted # data as well as objects to store, access, and write the parsed data. # # The following five blocks: # taxa, characters, distances, trees, data # are recognizable and parsable. # # The parser can deal with (nested) comments (indicated by square brackets), # unless the comments are inside a command or data item (e.g. # "Dim[comment]ensions" or inside a matrix). # # Single or double quoted TaxLabels are processed as follows (by way # of example): "mus musculus" -> mus_musculus # # # == USAGE # # require 'bio/db/nexus' # # # Create a new parser: # nexus = Bio::Nexus.new( nexus_data_as_string ) # # # Get first taxa block: # taxa_block = nexus.get_taxa_blocks[ 0 ] # # Get number of taxa: # number_of_taxa = taxa_block.get_number_of_taxa.to_i # # Get name of first taxon: # first_taxon = taxa_block.get_taxa[ 0 ] # # # Get first data block: # data_block = nexus.get_data_blocks[ 0 ] # # Get first characters name: # seq_name = data_block.get_row_name( 0 ) # # Get first characters row named "taxon_2" as Bio::Sequence sequence: # seq_tax_2 = data_block.get_sequences_by_name( "taxon_2" )[ 0 ] # # Get third characters row as Bio::Sequence sequence: # seq_2 = data_block.get_sequence( 2 ) # # Get first characters row named "taxon_3" as String: # string_tax_3 = data_block.get_characters_strings_by_name( "taxon_3" ) # # Get name of first taxon: # taxon_0 = data_block.get_taxa[ 0 ] # # Get characters matrix as Bio::Nexus::NexusMatrix (names are in column 0) # characters_matrix = data_block.get_matrix # # # Get first characters block (same methods as Nexus::DataBlock except # # it lacks get_taxa method): # characters_block = nexus.get_characters_blocks[ 0 ] # # # Get trees block(s): # trees_block = nexus.get_trees_blocks[ 0 ] # # Get first tree named "best" as String: # string_fish = trees_block.get_tree_strings_by_name( "best" )[ 0 ] # # Get first tree named "best" as Bio::Db::Newick object: # tree_fish = trees_block.get_trees_by_name( "best" )[ 0 ] # # Get first tree as Bio::Db::Newick object: # tree_first = trees_block.get_tree( 0 ) # # # Get distances block(s): # distances_blocks = nexus.get_distances_blocks # # Get matrix as Bio::Nexus::NexusMatrix object: # matrix = distances_blocks[ 0 ].get_matrix # # Get value (column 0 are names): # val = matrix.get_value( 1, 5 ) # # # Get blocks for which no class exists (private blocks): # private_blocks = nexus.get_blocks_by_name( "my_block" ) # # Get first block names "my_block": # my_block_0 = private_blocks[ 0 ] # # Get first token in first block names "my_block": # first_token = my_block_0.get_tokens[ 0 ] # # # == References # # * Maddison DR, Swofford DL, Maddison WP (1997). NEXUS: an extensible file # format for systematic information. # Syst Biol. 1997 46(4):590-621. # require 'bio/sequence' require 'bio/tree' require 'bio/db/newick' module Bio # == DESCRIPTION # Bio::Nexus is a parser for nexus formatted data. # It contains classes and constants enabling the representation and # processing of nexus data. # # == USAGE # # # Parsing a nexus formatted string str: # nexus = Bio::Nexus.new( nexus_str ) # # # Obtaining of the nexus blocks as array of GenericBlock or # # any of its subclasses (such as DistancesBlock): # blocks = nexus.get_blocks # # # Getting a block by name: # my_blocks = nexus.get_blocks_by_name( "my_block" ) # # # Getting distance blocks: # distances_blocks = nexus.get_distances_blocks # # # Getting trees blocks: # trees_blocks = nexus.get_trees_blocks # # # Getting data blocks: # data_blocks = nexus.get_data_blocks # # # Getting characters blocks: # character_blocks = nexus.get_characters_blocks # # # Getting taxa blocks: # taxa_blocks = nexus.get_taxa_blocks # class Nexus END_OF_LINE = "\n" INDENTENTION = " " DOUBLE_QUOTE = "\"" SINGLE_QUOTE = "'" BEGIN_NEXUS = "#NEXUS" DELIMITER = ";" BEGIN_BLOCK = "Begin" END_BLOCK = "End" + DELIMITER BEGIN_COMMENT = "[" END_COMMENT = "]" TAXA = "Taxa" CHARACTERS = "Characters" DATA = "Data" DISTANCES = "Distances" TREES = "Trees" TAXA_BLOCK = TAXA + DELIMITER CHARACTERS_BLOCK = CHARACTERS + DELIMITER DATA_BLOCK = DATA + DELIMITER DISTANCES_BLOCK = DISTANCES + DELIMITER TREES_BLOCK = TREES + DELIMITER DIMENSIONS = "Dimensions" FORMAT = "Format" NTAX = "NTax" NCHAR = "NChar" DATATYPE = "DataType" TAXLABELS = "TaxLabels" MATRIX = "Matrix" # End of constants. # Nexus parse error class, # indicates error during parsing of nexus formatted data. class NexusParseError < RuntimeError; end # Creates a new nexus parser for 'nexus_str'. # # --- # *Arguments*: # * (required) _nexus_str_: String - nexus formatted data def initialize( nexus_str ) @blocks = Array.new @current_cmd = nil @current_subcmd = nil @current_block_name = nil @current_block = nil parse( nexus_str ) end # Returns an Array of all blocks found in the String 'nexus_str' # set via Bio::Nexus.new( nexus_str ). # # --- # *Returns*:: Array of GenericBlocks or any of its subclasses def get_blocks @blocks end # A convenience methods which returns an array of # all nexus blocks for which the name equals 'name' found # in the String 'nexus_str' set via Bio::Nexus.new( nexus_str ). # # --- # *Arguments*: # * (required) _name_: String # *Returns*:: Array of GenericBlocks or any of its subclasses def get_blocks_by_name( name ) found_blocks = Array.new @blocks.each do | block | if ( name == block.get_name ) found_blocks.push( block ) end end found_blocks end # A convenience methods which returns an array of # all data blocks. # # --- # *Returns*:: Array of DataBlocks def get_data_blocks get_blocks_by_name( DATA_BLOCK.chomp( ";").downcase ) end # A convenience methods which returns an array of # all characters blocks. # # --- # *Returns*:: Array of CharactersBlocks def get_characters_blocks get_blocks_by_name( CHARACTERS_BLOCK.chomp( ";").downcase ) end # A convenience methods which returns an array of # all trees blocks. # # --- # *Returns*:: Array of TreesBlocks def get_trees_blocks get_blocks_by_name( TREES_BLOCK.chomp( ";").downcase ) end # A convenience methods which returns an array of # all distances blocks. # # --- # *Returns*:: Array of DistancesBlock def get_distances_blocks get_blocks_by_name( DISTANCES_BLOCK.chomp( ";").downcase ) end # A convenience methods which returns an array of # all taxa blocks. # # --- # *Returns*:: Array of TaxaBlocks def get_taxa_blocks get_blocks_by_name( TAXA_BLOCK.chomp( ";").downcase ) end # Returns a String listing how many of each blocks it parsed. # # --- # *Returns*:: String def to_s str = String.new if get_blocks.length < 1 str << "empty" else str << "number of blocks: " << get_blocks.length.to_s if get_characters_blocks.length > 0 str << " [characters blocks: " << get_characters_blocks.length.to_s << "] " end if get_data_blocks.length > 0 str << " [data blocks: " << get_data_blocks.length.to_s << "] " end if get_distances_blocks.length > 0 str << " [distances blocks: " << get_distances_blocks.length.to_s << "] " end if get_taxa_blocks.length > 0 str << " [taxa blocks: " << get_taxa_blocks.length.to_s << "] " end if get_trees_blocks.length > 0 str << " [trees blocks: " << get_trees_blocks.length.to_s << "] " end end str end alias to_str to_s private # The master method for parsing. # Stores the resulting block in array @blocks. # # --- # *Arguments*: # * (required) _str_: String - the String to be parsed def parse( str ) str = str.chop if str[-1..-1] == ';' ary = str.split(/[\s+=]/) ary.collect! { |x| x.strip!; x.empty? ? nil : x } ary.compact! #in_comment = false comment_level = 0 # Main loop while token = ary.shift # Quotes: if ( token.index( SINGLE_QUOTE ) == 0 || token.index( DOUBLE_QUOTE ) == 0 ) token << "_" << ary.shift token = token.chop if token[-1..-1] == ';' token = token.slice( 1, token.length - 2 ) end # Comments: open = token.count( BEGIN_COMMENT ) close = token.count( END_COMMENT ) comment = comment_level > 0 comment_level = comment_level + open - close if ( open > 0 && open == close ) next elsif comment_level > 0 || comment next elsif equal?( token, END_BLOCK ) end_block() elsif equal?( token, BEGIN_BLOCK ) begin_block() @current_block_name = token = ary.shift @current_block_name.downcase! @current_block = create_block() @blocks.push( @current_block ) elsif ( @current_block_name != nil ) process_token( token.chomp( DELIMITER ), ary ) end end # main loop @blocks.compact! end # parse # Operations required when beginnig of block encountered. # # --- def begin_block() if @current_block_name != nil raise NexusParseError, "Cannot have nested nexus blocks (\"end;\" might be missing)" end reset_command_state() end # Operations required when ending of block encountered. # # --- def end_block() if @current_block_name == nil raise NexusParseError, "Cannot have two or more \"end;\" tokens in sequence" end @current_block_name = nil end # This calls various process_token_for__block methods # depeding on state of @current_block_name. # # --- # *Arguments*: # * (required) _token_: String # * (required) _ary_: Array def process_token( token, ary ) case @current_block_name when TAXA_BLOCK.downcase process_token_for_taxa_block( token ) when CHARACTERS_BLOCK.downcase process_token_for_character_block( token, ary ) when DATA_BLOCK.downcase process_token_for_data_block( token, ary ) when DISTANCES_BLOCK.downcase process_token_for_distances_block( token, ary ) when TREES_BLOCK.downcase process_token_for_trees_block( token, ary ) else process_token_for_generic_block( token ) end end # Resets @current_cmd and @current_subcmd to nil. # # --- def reset_command_state() @current_cmd = nil @current_subcmd = nil end # Creates GenericBlock (or any of its subclasses) the type of # which is determined by the state of @current_block_name. # # --- # *Returns*:: GenericBlock (or any of its subclasses) object def create_block() case @current_block_name when TAXA_BLOCK.downcase return Bio::Nexus::TaxaBlock.new( @current_block_name ) when CHARACTERS_BLOCK.downcase return Bio::Nexus::CharactersBlock.new( @current_block_name ) when DATA_BLOCK.downcase return Bio::Nexus::DataBlock.new( @current_block_name ) when DISTANCES_BLOCK.downcase return Bio::Nexus::DistancesBlock.new( @current_block_name ) when TREES_BLOCK.downcase return Bio::Nexus::TreesBlock.new( @current_block_name ) else return Bio::Nexus::GenericBlock.new( @current_block_name ) end end # This processes the tokens (between Begin Taxa; and End;) for a taxa block # Example of a currently parseable taxa block: # Begin Taxa; # Dimensions NTax=4; # TaxLabels fish [comment] 'african frog' "rat snake" 'red mouse'; # End; # # --- # *Arguments*: # * (required) _token_: String def process_token_for_taxa_block( token ) if ( equal?( token, DIMENSIONS ) ) @current_cmd = DIMENSIONS @current_subcmd = nil elsif ( equal?( token, TAXLABELS ) ) @current_cmd = TAXLABELS @current_subcmd = nil elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) ) @current_subcmd = NTAX elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) ) @current_block.set_number_of_taxa( token ) elsif ( cmds_equal_to?( TAXLABELS, nil ) ) @current_block.add_taxon( token ) end end # This processes the tokens (between Begin Taxa; and End;) for a character # block # Example of a currently parseable character block: # Begin Characters; # Dimensions NChar=20 # NTax=4; # Format DataType=DNA # Missing=x # Gap=- MatchChar=.; # Matrix # fish ACATA GAGGG TACCT CTAAG # frog ACTTA GAGGC TACCT CTAGC # snake ACTCA CTGGG TACCT TTGCG # mouse ACTCA GACGG TACCT TTGCG; # End; # # --- # *Arguments*: # * (required) _token_: String # * (required) _ary_: Array def process_token_for_character_block( token, ary ) if ( equal?( token, DIMENSIONS ) ) @current_cmd = DIMENSIONS @current_subcmd = nil elsif ( equal?( token, FORMAT ) ) @current_cmd = FORMAT @current_subcmd = nil elsif ( equal?( token, MATRIX ) ) @current_cmd = MATRIX @current_subcmd = nil elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) ) @current_subcmd = NTAX elsif ( @current_cmd == DIMENSIONS && equal?( token, NCHAR ) ) @current_subcmd = NCHAR elsif ( @current_cmd == FORMAT && equal?( token, DATATYPE ) ) @current_subcmd = DATATYPE elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MISSING ) ) @current_subcmd = CharactersBlock::MISSING elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::GAP ) ) @current_subcmd = CharactersBlock::GAP elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MATCHCHAR ) ) @current_subcmd = CharactersBlock::MATCHCHAR elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) ) @current_block.set_number_of_taxa( token ) elsif ( cmds_equal_to?( DIMENSIONS, NCHAR ) ) @current_block.set_number_of_characters( token ) elsif ( cmds_equal_to?( FORMAT, DATATYPE ) ) @current_block.set_datatype( token ) elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MISSING ) ) @current_block.set_missing( token ) elsif ( cmds_equal_to?( FORMAT, CharactersBlock::GAP ) ) @current_block.set_gap_character( token ) elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MATCHCHAR ) ) @current_block.set_match_character( token ) elsif ( cmds_equal_to?( MATRIX, nil ) ) @current_block.set_matrix( make_matrix( token, ary, @current_block.get_number_of_characters, true ) ) end end # This processes the tokens (between Begin Trees; and End;) for a trees block # Example of a currently parseable taxa block: # Begin Trees; # Tree best=(fish,(frog,(snake, mouse))); # Tree other=(snake,(frog,( fish, mouse))); # End; # # --- # *Arguments*: # * (required) _token_: String # * (required) _ary_: Array def process_token_for_trees_block( token, ary ) if ( equal?( token, TreesBlock::TREE ) ) @current_cmd = TreesBlock::TREE @current_subcmd = nil elsif ( cmds_equal_to?( TreesBlock::TREE, nil ) ) @current_block.add_tree_name( token ) tree_string = ary.shift while ( tree_string.index( ";" ) == nil ) tree_string << ary.shift end @current_block.add_tree( tree_string ) @current_cmd = nil end end # This processes the tokens (between Begin Taxa; and End;) for a character # block. # Example of a currently parseable character block: # Begin Distances; # Dimensions nchar=20 ntax=5; # Format Triangle=Upper; # Matrix # taxon_1 0.0 1.0 2.0 4.0 7.0 # taxon_2 1.0 0.0 3.0 5.0 8.0 # taxon_3 3.0 4.0 0.0 6.0 9.0 # taxon_4 7.0 3.0 1.0 0.0 9.5 # taxon_5 1.2 1.3 1.4 1.5 0.0; # End; # # --- # *Arguments*: # * (required) _token_: String # * (required) _ary_: Array def process_token_for_distances_block( token, ary ) if ( equal?( token, DIMENSIONS ) ) @current_cmd = DIMENSIONS @current_subcmd = nil elsif ( equal?( token, FORMAT ) ) @current_cmd = FORMAT @current_subcmd = nil elsif ( equal?( token, MATRIX ) ) @current_cmd = MATRIX @current_subcmd = nil elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) ) @current_subcmd = NTAX elsif ( @current_cmd == DIMENSIONS && equal?( token, NCHAR ) ) @current_subcmd = NCHAR elsif ( @current_cmd == FORMAT && equal?( token, DATATYPE ) ) @current_subcmd = DATATYPE elsif ( @current_cmd == FORMAT && equal?( token, DistancesBlock::TRIANGLE ) ) @current_subcmd = DistancesBlock::TRIANGLE elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) ) @current_block.set_number_of_taxa( token ) elsif ( cmds_equal_to?( DIMENSIONS, NCHAR ) ) @current_block.set_number_of_characters( token ) elsif ( cmds_equal_to?( FORMAT, DistancesBlock::TRIANGLE ) ) @current_block.set_triangle( token ) elsif ( cmds_equal_to?( MATRIX, nil ) ) @current_block.set_matrix( make_matrix( token, ary, @current_block.get_number_of_taxa, false ) ) end end # This processes the tokens (between Begin Taxa; and End;) for a data # block. # Example of a currently parseable data block: # Begin Data; # Dimensions ntax=5 nchar=14; # Format Datatype=RNA gap=# MISSING=x MatchChar=^; # TaxLabels ciona cow [comment] ape 'purple urchin' "green lizard"; # Matrix # taxon_1 A- CCGTCGA-GTTA # taxon_2 T- CCG-CGA-GATA # taxon_3 A- C-GTCGA-GATA # taxon_4 A- CCTCGA--GTTA # taxon_5 T- CGGTCGT-CTTA; # End; # # --- # *Arguments*: # * (required) _token_: String # * (required) _ary_: Array def process_token_for_data_block( token, ary ) if ( equal?( token, DIMENSIONS ) ) @current_cmd = DIMENSIONS @current_subcmd = nil elsif ( equal?( token, FORMAT ) ) @current_cmd = FORMAT @current_subcmd = nil elsif ( equal?( token, TAXLABELS ) ) @current_cmd = TAXLABELS @current_subcmd = nil elsif ( equal?( token, MATRIX ) ) @current_cmd = MATRIX @current_subcmd = nil elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) ) @current_subcmd = NTAX elsif ( @current_cmd == DIMENSIONS && equal?( token, NCHAR ) ) @current_subcmd = NCHAR elsif ( @current_cmd == FORMAT && equal?( token, DATATYPE ) ) @current_subcmd = DATATYPE elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MISSING ) ) @current_subcmd = CharactersBlock::MISSING elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::GAP ) ) @current_subcmd = CharactersBlock::GAP elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MATCHCHAR ) ) @current_subcmd = CharactersBlock::MATCHCHAR elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) ) @current_block.set_number_of_taxa( token ) elsif ( cmds_equal_to?( DIMENSIONS, NCHAR ) ) @current_block.set_number_of_characters( token ) elsif ( cmds_equal_to?( FORMAT, DATATYPE ) ) @current_block.set_datatype( token ) elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MISSING ) ) @current_block.set_missing( token ) elsif ( cmds_equal_to?( FORMAT, CharactersBlock::GAP ) ) @current_block.set_gap_character( token ) elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MATCHCHAR ) ) @current_block.set_match_character( token ) elsif ( cmds_equal_to?( TAXLABELS, nil ) ) @current_block.add_taxon( token ) elsif ( cmds_equal_to?( MATRIX, nil ) ) @current_block.set_matrix( make_matrix( token, ary, @current_block.get_number_of_characters, true ) ) end end # Makes a NexusMatrix out of token from token Array ary # Used by process_token_for_X_block methods which contain # data in a matrix form. Column 0 contains names. # This will shift tokens from ary. # --- # *Arguments*: # * (required) _token_: String # * (required) _ary_: Array # * (required) _size_: Integer # * (optional) _scan_token_: true or false # *Returns*:: NexusMatrix def make_matrix( token, ary, size, scan_token = false ) matrix = NexusMatrix.new col = -1 row = 0 done = false while ( !done ) if ( col == -1 ) # name col = 0 matrix.set_value( row, col, token ) # name is in col 0 else # values col = add_token_to_matrix( token, scan_token, matrix, row, col ) if ( col == size.to_i ) col = -1 row += 1 end end token = ary.shift if ( token.index( DELIMITER ) != nil ) col = add_token_to_matrix( token.chomp( ";" ), scan_token, matrix, row, col ) done = true end end # while matrix end # Helper method for make_matrix. # # --- # *Arguments*: # * (required) _token_: String # * (required) _scan_token_: true or false - add whole token # or # scan into chars # * (required) _matrix_: NexusMatrix - the matrix to which to add token # * (required) _row_: Integer - the row for matrix # * (required) _col_: Integer - the starting row # *Returns*:: Integer - ending row def add_token_to_matrix( token, scan_token, matrix, row, col ) if ( scan_token ) token.scan(/./) { |w| col += 1 matrix.set_value( row, col, w ) } else col += 1 matrix.set_value( row, col, token ) end col end # This processes the tokens (between Begin Taxa; and End;) for a block # for which a specific parser is not available. # Example of a currently parseable generic block: # Begin Taxa; # token1 token2 token3 ... # End; # # --- # *Arguments*: # * (required) _token_: String def process_token_for_generic_block( token ) @current_block.add_token( token ) end # Returns true if Strings str1 and str2 are # equal - ignoring case. # # --- # *Arguments*: # * (required) _str1_: String # * (required) _str2_: String # *Returns*:: true or false def equal?( str1, str2 ) if ( str1 == nil || str2 == nil ) return false else return ( str1.downcase == str2.downcase ) end end # Returns true if @current_cmd == command # and @current_subcmd == subcommand, false otherwise # --- # *Arguments*: # * (required) _command_: String # * (required) _subcommand_: String # *Returns*:: true or false def cmds_equal_to?( command, subcommand ) return ( @current_cmd == command && @current_subcmd == subcommand ) end # Classes to represent nexus data follow. # == DESCRIPTION # Bio::Nexus::GenericBlock represents a generic nexus block. # It is mainly intended to be extended into more specific classes, # although it is used for blocks not represented by more specific # block classes. # It has a name and a array for the tokenized content of a # nexus block. # # == USAGE # # require 'bio/db/nexus' # # # Create a new parser: # nexus = Bio::Nexus.new( nexus_data_as_string ) # # # Get blocks for which no class exists (private blocks) # as Nexus::GenericBlock: # private_blocks = nexus.get_blocks_by_name( "my_block" ) # # Get first block names "my_block": # my_block_0 = private_blocks[ 0 ] # # Get first token in first block names "my_block": # first_token = my_block_0.get_tokens[ 0 ] # # Get name of block (would return "my_block" in this case): # name = my_block_0.get_name # # Return data of block as nexus formatted String: # name = my_block_0.to_nexus # class GenericBlock # Creates a new GenericBlock object named 'name'. # --- # *Arguments*: # * (required) _name_: String def initialize( name ) @name = name.chomp(";") @tokens = Array.new end # Gets the name of this block. # # --- # *Returns*:: String def get_name @name end # Returns contents as Array of Strings. # # --- # *Returns*:: Array def get_tokens @tokens end # Same as to_nexus. # # --- # *Returns*:: String def to_s to_nexus end alias to_str to_s # Should return a String describing this block as nexus formatted data. # --- # *Returns*:: String def to_nexus str = "generic block \"" + get_name + "\" [do not know how to write in nexus format]" str end # Adds a token to this. # # --- # *Arguments*: # * (required) _token_: String def add_token( token ) @tokens.push( token ) end end # class GenericBlock # == DESCRIPTION # Bio::Nexus::TaxaBlock represents a taxa nexus block. # # = Example of Taxa block: # Begin Taxa; # Dimensions NTax=4; # TaxLabels fish [comment] 'african frog' "rat snake" 'red mouse'; # End; # # == USAGE # # require 'bio/db/nexus' # # # Create a new parser: # nexus = Bio::Nexus.new( nexus_data_as_string ) # # # Get first taxa block: # taxa_block = nexus.get_taxa_blocks[ 0 ] # # Get number of taxa: # number_of_taxa = taxa_block.get_number_of_taxa.to_i # # Get name of first taxon: # first_taxon = taxa_block.get_taxa[ 0 ] # class TaxaBlock < GenericBlock # Creates a new TaxaBlock object named 'name'. # --- # *Arguments*: # * (required) _name_: String def initialize( name ) super( name ) @number_of_taxa = 0 @taxa = Array.new end # Returns a String describing this block as nexus formatted data. # --- # *Returns*:: String def to_nexus line_1 = String.new line_1 << DIMENSIONS if ( Nexus::Util::larger_than_zero( get_number_of_taxa ) ) line_1 << " " << NTAX << "=" << get_number_of_taxa end line_1 << DELIMITER line_2 = String.new line_2 << TAXLABELS << " " << Nexus::Util::array_to_string( get_taxa ) << DELIMITER Nexus::Util::to_nexus_helper( TAXA_BLOCK, [ line_1, line_2 ] ) end # Gets the "number of taxa" property. # # --- # *Returns*:: Integer def get_number_of_taxa @number_of_taxa end # Gets the taxa of this block. # # --- # *Returns*:: Array def get_taxa @taxa end # Sets the "number of taxa" property. # # --- # *Arguments*: # * (required) _number_of_taxa_: Integer def set_number_of_taxa( number_of_taxa ) @number_of_taxa = number_of_taxa end # Adds a taxon name to this block. # # --- # *Arguments*: # * (required) _taxon_: String def add_taxon( taxon ) @taxa.push( taxon ) end end # class TaxaBlock # == DESCRIPTION # Bio::Nexus::CharactersBlock represents a characters nexus block. # # = Example of Characters block: # Begin Characters; # Dimensions NChar=20 # NTax=4; # Format DataType=DNA # Missing=x # Gap=- MatchChar=.; # Matrix # fish ACATA GAGGG TACCT CTAAG # frog ACTTA GAGGC TACCT CTAGC # snake ACTCA CTGGG TACCT TTGCG # mouse ACTCA GACGG TACCT TTGCG; # End; # # # == USAGE # # require 'bio/db/nexus' # # # Create a new parser: # nexus = Bio::Nexus.new( nexus_data_as_string ) # # # # Get first characters block (same methods as Nexus::DataBlock except # # it lacks get_taxa method): # characters_block = nexus.get_characters_blocks[ 0 ] # class CharactersBlock < GenericBlock MISSING = "Missing" GAP = "Gap" MATCHCHAR = "MatchChar" # Creates a new CharactersBlock object named 'name'. # --- # *Arguments*: # * (required) _name_: String def initialize( name ) super( name ) @number_of_taxa = 0 @number_of_characters = 0 @data_type = String.new @gap_character = String.new @missing = String.new @match_character = String.new @matrix = NexusMatrix.new end # Returns a String describing this block as nexus formatted data. # # --- # *Returns*:: String def to_nexus line_1 = String.new line_1 << DIMENSIONS if ( Nexus::Util::larger_than_zero( get_number_of_taxa ) ) line_1 << " " << NTAX << "=" << get_number_of_taxa end if ( Nexus::Util::larger_than_zero( get_number_of_characters ) ) line_1 << " " << NCHAR << "=" << get_number_of_characters end line_1 << DELIMITER line_2 = String.new line_2 << FORMAT if ( Nexus::Util::longer_than_zero( get_datatype ) ) line_2 << " " << DATATYPE << "=" << get_datatype end if ( Nexus::Util::longer_than_zero( get_missing ) ) line_2 << " " << MISSING << "=" << get_missing end if ( Nexus::Util::longer_than_zero( get_gap_character ) ) line_2 << " " << GAP << "=" << get_gap_character end if ( Nexus::Util::longer_than_zero( get_match_character ) ) line_2 << " " << MATCHCHAR << "=" << get_match_character end line_2 << DELIMITER line_3 = String.new line_3 << MATRIX Nexus::Util::to_nexus_helper( CHARACTERS_BLOCK, [ line_1, line_2, line_3 ] + get_matrix.to_nexus_row_array ) end # Gets the "number of taxa" property. # # --- # *Returns*:: Integer def get_number_of_taxa @number_of_taxa end # Gets the "number of characters" property. # # --- # *Returns*:: Integer def get_number_of_characters @number_of_characters end # Gets the "datatype" property. # --- # *Returns*:: String def get_datatype @data_type end # Gets the "gap character" property. # --- # *Returns*:: String def get_gap_character @gap_character end # Gets the "missing" property. # --- # *Returns*:: String def get_missing @missing end # Gets the "match character" property. # --- # *Returns*:: String def get_match_character @match_character end # Gets the matrix. # --- # *Returns*:: Bio::Nexus::NexusMatrix def get_matrix @matrix end # Returns character data as Bio::Sequence object Array # for matrix rows named 'name'. # --- # *Arguments*: # * (required) _name_: String # *Returns*:: Bio::Sequence def get_sequences_by_name( name ) seq_strs = get_characters_strings_by_name( name ) seqs = Array.new seq_strs.each do | seq_str | seqs.push( create_sequence( seq_str, name ) ) end seqs end # Returns the characters in the matrix at row 'row' as # Bio::Sequence object. Column 0 of the matrix is set as # the definition of the Bio::Sequence object. # --- # *Arguments*: # * (required) _row_: Integer # *Returns*:: Bio::Sequence def get_sequence( row ) create_sequence( get_characters_string( row ), get_row_name( row ) ) end # Returns the String in the matrix at row 'row' and column 0, # which usually is interpreted as a sequence name (if the matrix # contains molecular sequence characters). # # --- # *Arguments*: # * (required) _row_: Integer # *Returns*:: String def get_row_name( row ) get_matrix.get_name( row ) end # Returns character data as String Array # for matrix rows named 'name'. # # --- # *Arguments*: # * (required) _name_: String # *Returns*:: Array of Strings def get_characters_strings_by_name( name ) get_matrix.get_row_strings_by_name( name, "" ) end # Returns character data as String # for matrix row 'row'. # # --- # *Arguments*: # * (required) _row_: Integer # *Returns*:: String def get_characters_string( row ) get_matrix.get_row_string( row, "" ) end # Sets the "number of taxa" property. # --- # *Arguments*: # * (required) _number_of_taxa_: Integer def set_number_of_taxa( number_of_taxa ) @number_of_taxa = number_of_taxa end # Sets the "number of characters" property. # --- # *Arguments*: # * (required) _number_of_characters_: Integer def set_number_of_characters( number_of_characters ) @number_of_characters = number_of_characters end # Sets the "data type" property. # --- # *Arguments*: # * (required) _data_type_: String def set_datatype( data_type ) @data_type = data_type end # Sets the "gap character" property. # --- # *Arguments*: # * (required) _gap_character_: String def set_gap_character( gap_character ) @gap_character = gap_character end # Sets the "missing" property. # --- # *Arguments*: # * (required) _missing_: String def set_missing( missing ) @missing = missing end # Sets the "match character" property. # --- # *Arguments*: # * (required) _match_character_: String def set_match_character( match_character ) @match_character = match_character end # Sets the matrix. # --- # *Arguments*: # * (required) _matrix_: Bio::Nexus::NexusMatrix def set_matrix( matrix ) @matrix = matrix end private # Creates a Bio::Sequence object with sequence 'seq_str' # and definition 'definition'. # --- # *Arguments*: # * (required) _seq_str_: String # * (optional) _defintion_: String # *Returns*:: Bio::Sequence def create_sequence( seq_str, definition = "" ) seq = Bio::Sequence.auto( seq_str ) seq.definition = definition seq end end # class CharactersBlock # == DESCRIPTION # Bio::Nexus::DataBlock represents a data nexus block. # A data block is a Bio::Nexus::CharactersBlock with the added # capability to store taxa names. # # = Example of Data block: # Begin Data; # Dimensions ntax=5 nchar=14; # Format Datatype=RNA gap=# MISSING=x MatchChar=^; # TaxLabels ciona cow [comment] ape 'purple urchin' "green lizard"; # Matrix # taxon_1 A- CCGTCGA-GTTA # taxon_2 T- CCG-CGA-GATA # taxon_3 A- C-GTCGA-GATA # taxon_4 A- CCTCGA--GTTA # taxon_5 T- CGGTCGT-CTTA; # End; # # # == USAGE # # require 'bio/db/nexus' # # # Create a new parser: # nexus = Bio::Nexus.new( nexus_data_as_string ) # # # # Get first data block: # data_block = nexus.get_data_blocks[ 0 ] # # Get first characters name: # seq_name = data_block.get_row_name( 0 ) # # Get first characters row named "taxon_2" as Bio::Sequence sequence: # seq_tax_2 = data_block.get_sequences_by_name( "taxon_2" )[ 0 ] # # Get third characters row as Bio::Sequence sequence: # seq_2 = data_block.get_sequence( 2 ) # # Get first characters row named "taxon_3" as String: # string_tax_3 = data_block.get_characters_strings_by_name( "taxon_3" ) # # Get name of first taxon: # taxon_0 = data_block.get_taxa[ 0 ] # # Get characters matrix as Bio::Nexus::NexusMatrix (names are in column 0) # characters_matrix = data_block.get_matrix # class DataBlock < CharactersBlock # Creates a new DataBlock object named 'name'. # --- # *Arguments*: # * (required) _name_: String def initialize( name ) super( name ) @taxa = Array.new end # Returns a String describing this block as nexus formatted data. # --- # *Returns*:: String def to_nexus line_1 = String.new line_1 << DIMENSIONS if ( Nexus::Util::larger_than_zero( get_number_of_taxa ) ) line_1 << " " << NTAX << "=" << get_number_of_taxa end if ( Nexus::Util::larger_than_zero( get_number_of_characters ) ) line_1 << " " << NCHAR << "=" << get_number_of_characters end line_1 << DELIMITER line_2 = String.new line_2 << FORMAT if ( Nexus::Util::longer_than_zero( get_datatype ) ) line_2 << " " << DATATYPE << "=" << get_datatype end if ( Nexus::Util::longer_than_zero( get_missing ) ) line_2 << " " << MISSING << "=" << get_missing end if ( Nexus::Util::longer_than_zero( get_gap_character ) ) line_2 << " " << GAP << "=" << get_gap_character end if ( Nexus::Util::longer_than_zero( get_match_character ) ) line_2 << " " << MATCHCHAR << "=" << get_match_character end line_2 << DELIMITER line_3 = String.new line_3 << TAXLABELS << " " << Nexus::Util::array_to_string( get_taxa ) line_3 << DELIMITER line_4 = String.new line_4 << MATRIX Nexus::Util::to_nexus_helper( DATA_BLOCK, [ line_1, line_2, line_3, line_4 ] + get_matrix.to_nexus_row_array ) end # Gets the taxa of this block. # --- # *Returns*:: Array def get_taxa @taxa end # Adds a taxon name to this block. # --- # *Arguments*: # * (required) _taxon_: String def add_taxon( taxon ) @taxa.push( taxon ) end end # class DataBlock # == DESCRIPTION # Bio::Nexus::DistancesBlock represents a distances nexus block. # # = Example of Distances block: # Begin Distances; # Dimensions nchar=20 ntax=5; # Format Triangle=Upper; # Matrix # taxon_1 0.0 1.0 2.0 4.0 7.0 # taxon_2 1.0 0.0 3.0 5.0 8.0 # taxon_3 3.0 4.0 0.0 6.0 9.0 # taxon_4 7.0 3.0 1.0 0.0 9.5 # taxon_5 1.2 1.3 1.4 1.5 0.0; # End; # # # == USAGE # # require 'bio/db/nexus' # # # Create a new parser: # nexus = Bio::Nexus.new( nexus_data_as_string ) # # # Get distances block(s): # distances_blocks = nexus.get_distances_blocks # # Get matrix as Bio::Nexus::NexusMatrix object: # matrix = distances_blocks[ 0 ].get_matrix # # Get value (column 0 are names): # val = matrix.get_value( 1, 5 ) # class DistancesBlock < GenericBlock TRIANGLE = "Triangle" # Creates a new DistancesBlock object named 'name'. # --- # *Arguments*: # * (required) _name_: String def initialize( name ) super( name ) @number_of_taxa = 0 @number_of_characters = 0 @triangle = String.new @matrix = NexusMatrix.new end # Returns a String describing this block as nexus formatted data. # --- # *Returns*:: String def to_nexus line_1 = String.new line_1 << DIMENSIONS if ( Nexus::Util::larger_than_zero( get_number_of_taxa ) ) line_1 << " " << NTAX << "=" << get_number_of_taxa end if ( Nexus::Util::larger_than_zero( get_number_of_characters ) ) line_1 << " " << NCHAR << "=" << get_number_of_characters end line_1 << DELIMITER line_2 = String.new line_2 << FORMAT if ( Nexus::Util::longer_than_zero( get_triangle ) ) line_2 << " " << TRIANGLE << "=" << get_triangle end line_2 << DELIMITER line_3 = String.new line_3 << MATRIX Nexus::Util::to_nexus_helper( DISTANCES_BLOCK, [ line_1, line_2, line_3 ] + get_matrix.to_nexus_row_array( " " ) ) end # Gets the "number of taxa" property. # --- # *Returns*:: Integer def get_number_of_taxa @number_of_taxa end # Gets the "number of characters" property. # --- # *Returns*:: Integer def get_number_of_characters @number_of_characters end # Gets the "triangle" property. # --- # *Returns*:: String def get_triangle @triangle end # Gets the matrix. # --- # *Returns*:: Bio::Nexus::NexusMatrix def get_matrix @matrix end # Sets the "number of taxa" property. # --- # *Arguments*: # * (required) _number_of_taxa_: Integer def set_number_of_taxa( number_of_taxa ) @number_of_taxa = number_of_taxa end # Sets the "number of characters" property. # --- # *Arguments*: # * (required) _number_of_characters_: Integer def set_number_of_characters( number_of_characters ) @number_of_characters = number_of_characters end # Sets the "triangle" property. # --- # *Arguments*: # * (required) _triangle_: String def set_triangle( triangle ) @triangle = triangle end # Sets the matrix. # --- # *Arguments*: # * (required) _matrix_: Bio::Nexus::NexusMatrix def set_matrix( matrix ) @matrix = matrix end end # class DistancesBlock # == DESCRIPTION # Bio::Nexus::TreesBlock represents a trees nexus block. # # = Example of Trees block: # Begin Trees; # Tree best=(fish,(frog,(snake, mouse))); # Tree other=(snake,(frog,( fish, mouse))); # End; # # # == USAGE # # require 'bio/db/nexus' # # # Create a new parser: # nexus = Bio::Nexus.new( nexus_data_as_string ) # # Get trees block(s): # trees_block = nexus.get_trees_blocks[ 0 ] # # Get first tree named "best" as String: # string_fish = trees_block.get_tree_strings_by_name( "best" )[ 0 ] # # Get first tree named "best" as Bio::Db::Newick object: # tree_fish = trees_block.get_trees_by_name( "best" )[ 0 ] # # Get first tree as Bio::Db::Newick object: # tree_first = trees_block.get_tree( 0 ) # class TreesBlock < GenericBlock TREE = "Tree" def initialize( name ) super( name ) @trees = Array.new @tree_names = Array.new end # Returns a String describing this block as nexus formatted data. # --- # *Returns*:: String def to_nexus trees_ary = Array.new for i in 0 .. @trees.length - 1 trees_ary.push( TREE + " " + @tree_names[ i ] + "=" + @trees[ i ] ) end Nexus::Util::to_nexus_helper( TREES_BLOCK, trees_ary ) end # Returns an array of strings describing trees # --- # *Returns*:: Array def get_tree_strings @trees end # Returns an array of tree names. # --- # *Returns*:: Array def get_tree_names @tree_names end # Returns an array of strings describing trees # for which name matches the tree name. # --- # *Arguments*: # * (required) _name_: String # *Returns*:: Array def get_tree_strings_by_name( name ) found_trees = Array.new i = 0 @tree_names.each do | n | if ( n == name ) found_trees.push( @trees[ i ] ) end i += 1 end found_trees end # Returns tree i (same order as in nexus data) as # newick parsed tree object. # --- # *Arguments*: # * (required) _i_: Integer # *Returns*:: Bio::Newick def get_tree( i ) newick = Bio::Newick.new( @trees[ i ] ) tree = newick.tree tree end # Returns an array of newick parsed tree objects # for which name matches the tree name. # --- # *Arguments*: # * (required) _name_: String # *Returns*:: Array of Bio::Newick def get_trees_by_name( name ) found_trees = Array.new i = 0 @tree_names.each do | n | if ( n == name ) found_trees.push( get_tree( i ) ) end i += 1 end found_trees end # Adds a tree name to this block. # --- # *Arguments*: # * (required) _tree_name_: String def add_tree_name( tree_name ) @tree_names.push( tree_name ) end # Adds a tree to this block. # --- # *Arguments*: # * (required) _tree_as_string_: String def add_tree( tree_as_string ) @trees.push( tree_as_string ) end end # class TreesBlock # == DESCRIPTION # Bio::Nexus::NexusMatrix represents a characters or distance matrix, # where the names are stored in column zero. # # # == USAGE # # require 'bio/db/nexus' # # # Create a new parser: # nexus = Bio::Nexus.new( nexus_data_as_string ) # # Get distances block(s): # distances_block = nexus.get_distances_blocks[ 0 ] # # Get matrix as Bio::Nexus::NexusMatrix object: # matrix = distances_blocks.get_matrix # # Get value (column 0 are names): # val = matrix.get_value( 1, 5 ) # # Return first row as String (all columns except column 0), # # values are separated by "_": # row_str_0 = matrix.get_row_string( 0, "_" ) # # Return all rows named "ciona" as String (all columns except column 0), # # values are separated by "+": # ciona_rows = matrix.get_row_strings_by_name( "ciona", "+" ) class NexusMatrix # Nexus matrix error class. class NexusMatrixError < RuntimeError; end # Creates new NexusMatrix. def initialize() @rows = Hash.new @max_row = -1 @max_col = -1 end # Sets the value at row 'row' and column 'col' to 'value'. # --- # *Arguments*: # * (required) _row_: Integer # * (required) _col_: Integer # * (required) _value_: Object def set_value( row, col, value ) if ( ( row < 0 ) || ( col < 0 ) ) raise( NexusTableError, "attempt to use negative values for row or column" ) end if ( row > get_max_row() ) set_max_row( row ) end if ( col > get_max_col() ) set_max_col( col ) end row_map = nil if ( @rows.has_key?( row ) ) row_map = @rows[ row ] else row_map = Hash.new @rows[ row ] = row_map end row_map[ col ] = value end # Returns the value at row 'row' and column 'col'. # --- # *Arguments*: # * (required) _row_: Integer # * (required) _col_: Integer # *Returns*:: Object def get_value( row, col ) if ( ( row > get_max_row() ) || ( row < 0 ) ) raise( NexusMatrixError, "value for row (" + row.to_s + ") is out of range [max row: " + get_max_row().to_s + "]" ) elsif ( ( col > get_max_col() ) || ( row < 0 ) ) raise( NexusMatrixError, "value for column (" + col.to_s + ") is out of range [max column: " + get_max_col().to_s + "]" ) end r = @rows[ row ] if ( ( r == nil ) || ( r.length < 1 ) ) return nil end r[ col ] end # Returns the maximal columns number. # --- # *Returns*:: Integer def get_max_col return @max_col end # Returns the maximal row number. # --- # *Returns*:: Integer def get_max_row return @max_row end # Returns true of matrix is empty. # # --- # *Returns*:: true or false def is_empty? return get_max_col < 0 || get_max_row < 0 end # Convenience method which return the value of # column 0 and row 'row' which is usually the name. # # --- # *Arguments*: # * (required) _row_: Integer # *Returns*:: String def get_name( row ) get_value( row, 0 ).to_s end # Returns the values of columns 1 to maximal column length # in row 'row' concatenated as string. Individual values can be # separated by 'spacer'. # # --- # *Arguments*: # * (required) _row_: Integer # * (optional) _spacer_: String # *Returns*:: String def get_row_string( row, spacer = "" ) row_str = String.new if is_empty? return row_str end for col in 1 .. get_max_col row_str << get_value( row, col ) << spacer end row_str end # Returns all rows as Array of Strings separated by 'spacer' # for which column 0 is 'name'. # --- # *Arguments*: # * (required) _name_: String # * (optional) _spacer_: String # *Returns*:: Array def get_row_strings_by_name( name, spacer = "" ) row_strs = Array.new if is_empty? return row_strs end for row in 0 .. get_max_row if ( get_value( row, 0 ) == name ) row_strs.push( get_row_string( row, spacer ) ) end end row_strs end # Returns matrix as String, returns "empty" if empty. # --- # *Returns*:: String def to_s if is_empty? return "empty" end str = String.new row_array = to_nexus_row_array( " ", false ) row_array.each do | row | str << row << END_OF_LINE end str end alias to_str to_s # Helper method to produce nexus formatted data. # --- # *Arguments*: # * (optional) _spacer_: String # * (optional) _append_delimiter_: true or false # *Returns*:: Array def to_nexus_row_array( spacer = "", append_delimiter = true ) ary = Array.new if is_empty? return ary end max_length = 10 for row in 0 .. get_max_row l = get_value( row, 0 ).length if ( l > max_length ) max_length = l end end for row in 0 .. get_max_row row_str = String.new ary.push( row_str ) name = get_value( row, 0 ) name = name.ljust( max_length + 1 ) row_str << name << " " << get_row_string( row, spacer ) if ( spacer != nil && spacer.length > 0 ) row_str.chomp!( spacer ) end if ( append_delimiter && row == get_max_row ) row_str << DELIMITER end end ary end private # Returns row data as Array. # --- # *Arguments*: # * (required) _row_: Integer # *Returns*:: Array def get_row( row ) return @rows[ row ] end # Sets maximal column number. # --- # *Arguments*: # * (required) _max_col_: Integer def set_max_col( max_col ) @max_col = max_col end # Sets maximal row number. # --- # *Arguments*: # * (required) _max_row_: Integer def set_max_row( max_row ) @max_row = max_row end end # NexusMatrix # End of classes to represent nexus data. # = DESCRIPTION # Bio::Nexus::Util is a class containing static helper methods # class Util # Helper method to produce nexus formatted data. # --- # *Arguments*: # * (required) _block_: Nexus:GenericBlock or its subclasses # * (required) _block_: Array # *Returns*:: String def Util::to_nexus_helper( block, lines ) str = String.new str << BEGIN_BLOCK << " " << block << END_OF_LINE lines.each do | line | if ( line != nil ) str << INDENTENTION << line << END_OF_LINE end end # do str << END_BLOCK << END_OF_LINE str end # Returns string as array separated by " ". # --- # *Arguments*: # * (required) _ary_: Array # *Returns*:: String def Util::array_to_string( ary ) str = String.new ary.each do | e | str << e << " " end str.chomp!( " " ) str end # Returns true if Integer i is not nil and larger than 0. # --- # *Arguments*: # * (required) _i_: Integer # *Returns*:: true or false def Util::larger_than_zero( i ) return ( i != nil && i.to_i > 0 ) end # Returns true if String str is not nil and longer than 0. # --- # *Arguments*: # * (required) _str_: String # *Returns*:: true or false def Util::longer_than_zero( str ) return ( str != nil && str.length > 0 ) end end # class Util end # class Nexus end #module Bio bio-2.0.3/lib/bio/db/nbrf.rb0000644000175000017500000001232014141516614014774 0ustar nileshnilesh# # = bio/db/nbrf.rb - NBRF/PIR format sequence data class # # Copyright:: Copyright (C) 2001-2003,2006 Naohisa Goto # Copyright (C) 2001-2002 Toshiaki Katayama # License:: The Ruby License # # $Id: nbrf.rb,v 1.10 2007/04/05 23:35:40 trevor Exp $ # # Sequence data class for NBRF/PIR flatfile format. # # = References # # * http://pir.georgetown.edu/pirwww/otherinfo/doc/techbulletin.html # * http://www.sander.embl-ebi.ac.uk/Services/webin/help/webin-align/align_format_help.html#pir # * http://www.cmbi.kun.nl/bioinf/tools/crab_pir.html # require 'bio/db' require 'bio/sequence' module Bio # Sequence data class for NBRF/PIR flatfile format. class NBRF < DB #-- # based on Bio::FastaFormat class #++ # Delimiter of each entry. Bio::FlatFile uses it. DELIMITER = RS = "\n>" # (Integer) excess read size included in DELIMITER. DELIMITER_OVERRUN = 1 # '>' #-- # Note: DELIMITER is changed due to the change of Bio::FlatFile. # DELIMITER = RS = "*\n" #++ # Creates a new NBRF object. It stores the comment and sequence # information from one entry of the NBRF/PIR format string. # If the argument contains more than one # entry, only the first entry is used. def initialize(str) str = str.sub(/\A[\r\n]+/, '') # remove first void lines line1, line2, rest = str.split(/^/, 3) rest = rest.to_s rest.sub!(/^>.*/m, '') # remove trailing entries for sure @entry_overrun = $& rest.sub!(/\*\s*\z/, '') # remove last '*' and "\n" @data = rest @definition = line2.to_s.chomp if /^>?([A-Za-z0-9]{2})\;(.*)/ =~ line1.to_s then @seq_type = $1 @entry_id = $2 end end # Returns sequence type described in the entry. # P1 (protein), F1 (protein fragment) # DL (DNA linear), DC (DNA circular) # RL (DNA linear), RC (DNA circular) # N3 (tRNA), N1 (other functional RNA) attr_accessor :seq_type # Returns ID described in the entry. attr_accessor :entry_id alias accession entry_id # Returns the description line of the NBRF/PIR formatted data. attr_accessor :definition # sequence data of the entry (???) attr_accessor :data # piece of next entry. Bio::FlatFile uses it. attr_reader :entry_overrun # Returns the stored one entry as a NBRF/PIR format. (same as to_s) def entry @entry = ">#{@seq_type or 'XX'};#{@entry_id}\n#{definition}\n#{@data}*\n" end alias to_s entry # Returns Bio::Sequence::AA, Bio::Sequence::NA, or Bio::Sequence, # depending on sequence type. def seq_class case @seq_type when /[PF]1/ # protein Sequence::AA when /[DR][LC]/, /N[13]/ # nucleic Sequence::NA else Sequence end end # Returns sequence data. # Returns Bio::Sequence::NA, Bio::Sequence::AA or Bio::Sequence, # according to the sequence type. def seq unless defined?(@seq) @seq = seq_class.new(@data.tr(" \t\r\n0-9", '')) # lazy clean up end @seq end # Returns sequence length. def length seq.length end # Returens the nucleic acid sequence. # If you call naseq for protein sequence, RuntimeError will be occurred. # Use the method if you know whether the sequence is NA or AA. def naseq if seq.is_a?(Bio::Sequence::AA) then raise 'not nucleic but protein sequence' elsif seq.is_a?(Bio::Sequence::NA) then seq else Bio::Sequence::NA.new(seq) end end # Returens the length of sequence. # If you call nalen for protein sequence, RuntimeError will be occurred. # Use the method if you know whether the sequence is NA or AA. def nalen naseq.length end # Returens the protein (amino acids) sequence. # If you call aaseq for nucleic acids sequence, # RuntimeError will be occurred. # Use the method if you know whether the sequence is NA or AA. def aaseq if seq.is_a?(Bio::Sequence::NA) then raise 'not nucleic but protein sequence' elsif seq.is_a?(Bio::Sequence::AA) then seq else Bio::Sequence::AA.new(seq) end end # Returens the length of protein (amino acids) sequence. # If you call aaseq for nucleic acids sequence, # RuntimeError will be occurred. # Use the method if you know whether the sequence is NA or AA. def aalen aaseq.length end #-- #class method #++ # Creates a NBRF/PIR formatted text. # Parameters can be omitted. def self.to_nbrf(hash) seq_type = hash[:seq_type] seq = hash[:seq] unless seq_type if seq.is_a?(Bio::Sequence::AA) then seq_type = 'P1' elsif seq.is_a?(Bio::Sequence::NA) then seq_type = /u/i =~ seq ? 'RL' : 'DL' else seq_type = 'XX' end end width = hash.has_key?(:width) ? hash[:width] : 70 if width then seq = seq.to_s + "*" seq.gsub!(Regexp.new(".{1,#{width}}"), "\\0\n") else seq = seq.to_s + "*\n" end ">#{seq_type};#{hash[:entry_id]}\n#{hash[:definition]}\n#{seq}" end end #class NBRF end #module Bio bio-2.0.3/lib/bio/db/genbank/0000755000175000017500000000000014141516614015127 5ustar nileshnileshbio-2.0.3/lib/bio/db/genbank/genpept.rb0000644000175000017500000000233714141516614017123 0ustar nileshnilesh# # = bio/db/genbank/genpept.rb - GenPept database class # # Copyright:: Copyright (C) 2002-2004 Toshiaki Katayama # License:: The Ruby License # # $Id: genpept.rb,v 1.12 2007/04/05 23:35:40 trevor Exp $ # require 'bio/db/genbank/common' require 'bio/db/genbank/genbank' module Bio class GenPept < NCBIDB include Bio::NCBIDB::Common # LOCUS class Locus def initialize(locus_line) @entry_id = locus_line[12..27].strip @length = locus_line[29..39].to_i @circular = locus_line[55..62].strip # always linear @division = locus_line[63..66].strip @date = locus_line[68..78].strip end attr_accessor :entry_id, :length, :circular, :division, :date end def locus @data['LOCUS'] ||= Locus.new(get('LOCUS')) end def entry_id; locus.entry_id; end def length; locus.length; end def circular; locus.circular; end def division; locus.division; end def date; locus.date; end # ORIGIN def seq unless @data['SEQUENCE'] origin end Bio::Sequence::AA.new(@data['SEQUENCE']) end alias aaseq seq alias aalen length def seq_len seq.length end # DBSOURCE def dbsource get('DBSOURCE') end end # GenPept end # Bio bio-2.0.3/lib/bio/db/genbank/common.rb0000644000175000017500000001677214141516614016761 0ustar nileshnilesh# # = bio/db/genbank/common.rb - Common methods for GenBank style database classes # # Copyright:: Copyright (C) 2004 Toshiaki Katayama # License:: The Ruby License # # $Id: common.rb,v 1.11.2.5 2008/06/17 15:53:21 ngoto Exp $ # require 'bio/db' module Bio class NCBIDB # == Description # # This module defines a common framework among GenBank, GenPept, RefSeq, and # DDBJ. For more details, see the documentations in each genbank/*.rb files. # # == References # # * ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt # * http://www.ncbi.nlm.nih.gov/collab/FT/index.html # module Common DELIMITER = RS = "\n//\n" TAGSIZE = 12 def initialize(entry) super(entry, TAGSIZE) end # LOCUS -- Locus class must be defined in child classes. def locus # must be overrided in each subclass end # DEFINITION -- Returns contents of the DEFINITION record as a String. def definition field_fetch('DEFINITION') end # ACCESSION -- Returns contents of the ACCESSION record as an Array. def accessions field_fetch('ACCESSION').strip.split(/\s+/) end # VERSION -- Returns contents of the VERSION record as an Array of Strings. def versions @data['VERSION'] ||= fetch('VERSION').split(/\s+/) end # Returns the first part of the VERSION record as "ACCESSION.VERSION" String. def acc_version versions.first.to_s end # Returns the ACCESSION part of the acc_version. def accession acc_version.split(/\./).first.to_s end # Returns the VERSION part of the acc_version as a Fixnum def version acc_version.split(/\./).last.to_i end # Returns the second part of the VERSION record as a "GI:#######" String. def gi versions.last end # NID -- Returns contents of the NID record as a String. def nid field_fetch('NID') end # KEYWORDS -- Returns contents of the KEYWORDS record as an Array of Strings. def keywords @data['KEYWORDS'] ||= fetch('KEYWORDS').chomp('.').split(/; /) end # SEGMENT -- Returns contents of the SEGMENT record as a "m/n" form String. def segment @data['SEGMENT'] ||= fetch('SEGMENT').scan(/\d+/).join("/") end # SOURCE -- Returns contents of the SOURCE record as a Hash. def source unless @data['SOURCE'] name, org = get('SOURCE').split('ORGANISM') org ||= "" if org[/\S+;/] organism = $` taxonomy = $& + $' elsif org[/\S+\./] # rs:NC_001741 organism = $` taxonomy = $& + $' else organism = org taxonomy = '' end @data['SOURCE'] = { 'common_name' => truncate(tag_cut(name)), 'organism' => truncate(organism), 'taxonomy' => truncate(taxonomy), } @data['SOURCE'].default = '' end @data['SOURCE'] end def common_name source['common_name'] end alias vernacular_name common_name def organism source['organism'] end def taxonomy source['taxonomy'] end # REFERENCE -- Returns contents of the REFERENCE records as an Array of # Bio::Reference objects. def references unless @data['REFERENCE'] ary = [] toptag2array(get('REFERENCE')).each do |ref| hash = Hash.new subtag2array(ref).each do |field| case tag_get(field) when /REFERENCE/ if /(\d+)(\s*\((.+)\))?/m =~ tag_cut(field) then hash['embl_gb_record_number'] = $1.to_i if $3 and $3 != 'sites' then seqpos = $3 seqpos.sub!(/\A\s*bases\s+/, '') seqpos.gsub!(/(\d+)\s+to\s+(\d+)/, "\\1-\\2") seqpos.gsub!(/\s*\;\s*/, ', ') hash['sequence_position'] = seqpos end end when /AUTHORS/ authors = truncate(tag_cut(field)) authors = authors.split(/, /) authors[-1] = authors[-1].split(/\s+and\s+/) if authors[-1] authors = authors.flatten.map { |a| a.sub(/,/, ', ') } hash['authors'] = authors when /TITLE/ hash['title'] = truncate(tag_cut(field)) # CHECK Actually GenBank is not demanding for dot at the end of TITLE #+ '.' when /JOURNAL/ journal = truncate(tag_cut(field)) if journal =~ /(.*) (\d+) \((\d+)\), (\d+-\d+) \((\d+)\)$/ hash['journal'] = $1 hash['volume'] = $2 hash['issue'] = $3 hash['pages'] = $4 hash['year'] = $5 else hash['journal'] = journal end when /MEDLINE/ hash['medline'] = truncate(tag_cut(field)) when /PUBMED/ hash['pubmed'] = truncate(tag_cut(field)) when /REMARK/ hash['comments'] ||= [] hash['comments'].push truncate(tag_cut(field)) end end ary.push(Reference.new(hash)) end @data['REFERENCE'] = ary.extend(Bio::References::BackwardCompatibility) end if block_given? @data['REFERENCE'].each do |r| yield r end else @data['REFERENCE'] end end # COMMENT -- Returns contents of the COMMENT record as a String. def comment str = get('COMMENT').to_s.sub(/\ACOMMENT /, '') str.gsub!(/^ {12}/, '') str.chomp! str end # FEATURES -- Returns contents of the FEATURES record as an array of # Bio::Feature objects. def features unless @data['FEATURES'] ary = [] in_quote = false get('FEATURES').each_line do |line| next if line =~ /^FEATURES/ # feature type (source, CDS, ...) head = line[0,20].to_s.strip # feature value (position or /qualifier=) body = line[20,60].to_s.chomp # sub-array [ feature type, position, /q="data", ... ] if line =~ /^ {5}\S/ ary.push([ head, body ]) # feature qualifier start (/q="data..., /q="data...", /q=data, /q) elsif body =~ /^ \// and not in_quote # gb:IRO125195 ary.last.push(body) # flag for open quote (/q="data...) if body =~ /="/ and body !~ /"$/ in_quote = true end # feature qualifier continued (...data..., ...data...") else ary.last.last << body # flag for closing quote (/q="data... lines ...") if body =~ /"$/ in_quote = false end end end ary.collect! do |subary| parse_qualifiers(subary) end @data['FEATURES'] = ary.extend(Bio::Features::BackwardCompatibility) end if block_given? @data['FEATURES'].each do |f| yield f end else @data['FEATURES'] end end # ORIGIN -- Returns contents of the ORIGIN record as a String. def origin unless @data['ORIGIN'] ori, seqstr = get('ORIGIN').split("\n", 2) seqstr ||= "" @data['ORIGIN'] = truncate(tag_cut(ori)) @data['SEQUENCE'] = seqstr.tr("0-9 \t\n\r\/", '') end @data['ORIGIN'] end ### private methods private def parse_qualifiers(ary) feature = Feature.new feature.feature = ary.shift feature.position = ary.shift.gsub(/\s/, '') ary.each do |f| if f =~ %r{/([^=]+)=?"?([^"]*)"?} qualifier, value = $1, $2 case qualifier when 'translation' value = Sequence::AA.new(value) when 'codon_start' value = value.to_i else value = true if value.empty? end feature.append(Feature::Qualifier.new(qualifier, value)) end end return feature end end # Common end # GenBank end # Bio bio-2.0.3/lib/bio/db/genbank/format_genbank.rb0000644000175000017500000001242114141516614020431 0ustar nileshnilesh# # = bio/db/genbank/format_genbank.rb - GenBank format generater # # Copyright:: Copyright (C) 2008 Naohisa Goto # License:: The Ruby License # module Bio::Sequence::Format::NucFormatter # INTERNAL USE ONLY, YOU SHOULD NOT USE THIS CLASS. # GenBank format output class for Bio::Sequence. class Genbank < Bio::Sequence::Format::FormatterBase # helper methods include Bio::Sequence::Format::INSDFeatureHelper private # string wrapper for GenBank format def genbank_wrap(str) wrap(str.to_s, 67).gsub(/\n/, "\n" + " " * 12) end # string wrap with adding a dot at the end of the string def genbank_wrap_dot(str) str = str.to_s str = str + '.' unless /\.\z/ =~ str genbank_wrap(str) end # Given words (an Array of String) are wrapping with EMBL style. # Each word is never splitted inside the word. def genbank_wrap_words(array) width = 67 result = [] str = nil array.each do |x| if str then if str.length + 1 + x.length > width then str = nil else str.concat ' ' str.concat x end end unless str then str = "#{x}" result.push str end end result.join("\n" + " " * 12) end # formats references def reference_format_genbank(ref, num) pos = ref.sequence_position.to_s.gsub(/\s/, '') pos.gsub!(/(\d+)\-(\d+)/, "\\1 to \\2") pos.gsub!(/\s*\,\s*/, '; ') if pos.empty? pos = '' else pos = " (bases #{pos})" end volissue = "#{ref.volume.to_s}" volissue += " (#{ref.issue})" unless ref.issue.to_s.empty? journal = "#{ref.journal.to_s}" journal += " #{volissue}" unless volissue.empty? journal += ", #{ref.pages}" unless ref.pages.to_s.empty? journal += " (#{ref.year})" unless ref.year.to_s.empty? alist = ref.authors.collect do |x| y = x.to_s.strip.split(/\, *([^\,]+)\z/) y[1].gsub!(/\. +/, '.') if y[1] y.join(',') end lastauthor = alist.pop last2author = alist.pop alist.each { |x| x.concat ',' } alist.push last2author if last2author alist.push "and" unless alist.empty? alist.push lastauthor.to_s result = <<__END_OF_REFERENCE__ REFERENCE #{ genbank_wrap(sprintf('%-2d%s', num, pos))} AUTHORS #{ genbank_wrap_words(alist) } TITLE #{ genbank_wrap(ref.title.to_s) } JOURNAL #{ genbank_wrap(journal) } __END_OF_REFERENCE__ unless ref.pubmed.to_s.empty? then result.concat " PUBMED #{ genbank_wrap(ref.pubmed) }\n" end if ref.comments and !(ref.comments.empty?) then ref.comments.each do |c| result.concat " REMARK #{ genbank_wrap(c) }\n" end end result end # formats comments lines as GenBank def comments_format_genbank(cmnts) return '' if !cmnts or cmnts.empty? cmnts = [ cmnts ] unless cmnts.kind_of?(Array) a = [] cmnts.each do |str| a.push "COMMENT #{ genbank_wrap(str) }\n" end a.join('') end # formats sequence lines as GenBank def seq_format_genbank(str) i = 1 result = str.gsub(/.{1,60}/) do |s| s = s.gsub(/.{1,10}/, ' \0') y = sprintf("%9d%s\n", i, s) i += 60 y end result end # formats date def date_format_genbank format_date(date_modified || date_created || null_date) end # moleculue type def mol_type_genbank if /(DNA|(t|r|m|u|sn|sno)?RNA)/i =~ molecule_type.to_s then $1.sub(/[DR]NA/) { |x| x.upcase } else 'NA' end end # NCBI GI number def ncbi_gi_number ids = other_seqids if ids and r = ids.find { |x| x.database == 'GI' } then r.id else nil end end # strandedness def strandedness_genbank return nil unless strandedness case strandedness when 'single'; 'ss-'; when 'double'; 'ds-'; when 'mixed'; 'ms-'; else; nil end end # Erb template of GenBank format for Bio::Sequence erb_template <<'__END_OF_TEMPLATE__' LOCUS <%= sprintf("%-16s", entry_id) %> <%= sprintf("%11d", length) %> bp <%= sprintf("%3s", strandedness_genbank) %><%= sprintf("%-6s", mol_type_genbank) %> <%= sprintf("%-8s", topology) %><%= sprintf("%4s", division) %> <%= date_format_genbank %> DEFINITION <%= genbank_wrap_dot(definition.to_s) %> ACCESSION <%= genbank_wrap(([ primary_accession ] + (secondary_accessions or [])).join(" ")) %> VERSION <%= primary_accession %>.<%= sequence_version %><% if gi = ncbi_gi_number then %> GI:<%= gi %><% end %> KEYWORDS <%= genbank_wrap_dot((keywords or []).join('; ')) %> SOURCE <%= genbank_wrap(species) %> ORGANISM <%= genbank_wrap(species) %> <%= genbank_wrap_dot((classification or []).join('; ')) %> <% n = 0 (references or []).each do |ref| n += 1 %><%= reference_format_genbank(ref, n) %><% end %><%= comments_format_genbank(comments) %>FEATURES Location/Qualifiers <%= format_features_genbank(features || []) %>ORIGIN <%= seq_format_genbank(seq) %>// __END_OF_TEMPLATE__ end #class Genbank end #module Bio::Sequence::Format::NucFormatter bio-2.0.3/lib/bio/db/genbank/ddbj.rb0000644000175000017500000000101414141516614016353 0ustar nileshnilesh# # = bio/db/genbank/ddbj.rb - DDBJ database class # # Copyright:: Copyright (C) 2000-2004 Toshiaki Katayama # License:: The Ruby License # warn "Bio::DDBJ is deprecated. Use Bio::GenBank." module Bio require 'bio/db/genbank/genbank' unless const_defined?(:GenBank) # Bio::DDBJ is deprecated. Use Bio::GenBank. class DDBJ < GenBank # Bio::DDBJ is deprecated. Use Bio::GenBank. def initialize(str) warn "Bio::DDBJ is deprecated. Use Bio::GenBank." super(str) end end # DDBJ end # Bio bio-2.0.3/lib/bio/db/genbank/genbank_to_biosequence.rb0000644000175000017500000000340514141516614022147 0ustar nileshnilesh# # = bio/db/genbank/genbank_to_biosequence.rb - Bio::GenBank to Bio::Sequence adapter module # # Copyright:: Copyright (C) 2008 # Naohisa Goto , # License:: The Ruby License # # $Id:$ # require 'bio/sequence' require 'bio/sequence/adapter' # Internal use only. Normal users should not use this module. # # Bio::GenBank to Bio::Sequence adapter module. # It is internally used in Bio::GenBank#to_biosequence. # module Bio::Sequence::Adapter::GenBank extend Bio::Sequence::Adapter private def_biosequence_adapter :seq def_biosequence_adapter :id_namespace do |orig| if /\_/ =~ orig.accession.to_s then 'RefSeq' else 'GenBank' end end def_biosequence_adapter :entry_id def_biosequence_adapter :primary_accession, :accession def_biosequence_adapter :secondary_accessions do |orig| orig.accessions - [ orig.accession ] end def_biosequence_adapter :other_seqids do |orig| if /GI\:(.+)/ =~ orig.gi.to_s then [ Bio::Sequence::DBLink.new('GI', $1) ] else nil end end def_biosequence_adapter :molecule_type, :natype def_biosequence_adapter :division def_biosequence_adapter :topology, :circular def_biosequence_adapter :strandedness def_biosequence_adapter :sequence_version, :version #-- #sequence.date_created = nil #???? #++ def_biosequence_adapter :date_modified def_biosequence_adapter :definition def_biosequence_adapter :keywords def_biosequence_adapter :species, :organism def_biosequence_adapter :classification #-- #sequence.organelle = nil # yet unsupported #++ def_biosequence_adapter :comments, :comment def_biosequence_adapter :references def_biosequence_adapter :features end #module Bio::Sequence::Adapter::GenBank bio-2.0.3/lib/bio/db/genbank/refseq.rb0000644000175000017500000000102514141516614016737 0ustar nileshnilesh# # = bio/db/genbank/refseq.rb - RefSeq database class # # Copyright:: Copyright (C) 2000-2004 Toshiaki Katayama # License:: The Ruby License # # warn "Bio::RefSeq is deprecated. Use Bio::GenBank." module Bio require 'bio/db/genbank/genbank' unless const_defined?(:GenBank) # Bio::RefSeq is deprecated. Use Bio::GenBank. class RefSeq < GenBank # Bio::RefSeq is deprecated. Use Bio::GenBank. def initialize(str) warn "Bio::RefSeq is deprecated. Use Bio::GenBank." super(str) end end end # Bio bio-2.0.3/lib/bio/db/genbank/genbank.rb0000644000175000017500000001013214141516614017056 0ustar nileshnilesh# # = bio/db/genbank/genbank.rb - GenBank database class # # Copyright:: Copyright (C) 2000-2005 Toshiaki Katayama # License:: The Ruby License # # $Id:$ # require 'date' require 'bio/db' require 'bio/db/genbank/common' require 'bio/sequence' require 'bio/sequence/dblink' module Bio # == Description # # Parses a GenBank formatted database entry # # == Example # # # entry is a string containing only one entry contents # gb = Bio::GenBank.new(entry) # class GenBank < NCBIDB include Bio::NCBIDB::Common # Parses the LOCUS line and returns contents of the LOCUS record # as a Bio::GenBank::Locus object. Locus object is created automatically # when Bio::GenBank#locus, entry_id etc. methods are called. class Locus def initialize(locus_line) if locus_line.empty? # do nothing (just for empty or incomplete entry string) elsif locus_line.length > 75 # after Rel 126.0 @entry_id = locus_line[12..27].strip @length = locus_line[29..39].to_i @strand = locus_line[44..46].strip @natype = locus_line[47..52].strip @circular = locus_line[55..62].strip @division = locus_line[63..66].strip @date = locus_line[68..78].strip else @entry_id = locus_line[12..21].strip @length = locus_line[22..29].to_i @strand = locus_line[33..35].strip @natype = locus_line[36..39].strip @circular = locus_line[42..51].strip @division = locus_line[52..54].strip @date = locus_line[62..72].strip end end attr_accessor :entry_id, :length, :strand, :natype, :circular, :division, :date end # Accessor methods for the contents of the LOCUS record. def locus @data['LOCUS'] ||= Locus.new(get('LOCUS')) end def entry_id; locus.entry_id; end def length; locus.length; end def circular; locus.circular; end def division; locus.division; end def date; locus.date; end def strand; locus.strand; end def natype; locus.natype; end # FEATURES -- Iterate only for the 'CDS' portion of the Bio::Features. def each_cds features.each do |feature| if feature.feature == 'CDS' yield(feature) end end end # FEATURES -- Iterate only for the 'gene' portion of the Bio::Features. def each_gene features.each do |feature| if feature.feature == 'gene' yield(feature) end end end # BASE COUNT (this field is obsoleted after GenBank release 138.0) -- # Returns the BASE COUNT as a Hash. When the base is specified, returns # count of the base as a Fixnum. The base can be one of 'a', 't', 'g', # 'c', and 'o' (others). def basecount(base = nil) unless @data['BASE COUNT'] hash = Hash.new(0) get('BASE COUNT').scan(/(\d+) (\w)/).each do |c, b| hash[b] = c.to_i end @data['BASE COUNT'] = hash end if base base.downcase! @data['BASE COUNT'][base] else @data['BASE COUNT'] end end # ORIGIN -- Returns DNA sequence in the ORIGIN record as a # Bio::Sequence::NA object. def seq unless @data['SEQUENCE'] origin end Bio::Sequence::NA.new(@data['SEQUENCE']) end alias naseq seq alias nalen length # (obsolete???) length of the sequence def seq_len seq.length end # modified date. Returns Date object, String or nil. def date_modified begin Date.parse(self.date) rescue ArgumentError, TypeError, NoMethodError, NameError self.date end end # Taxonomy classfication. Returns an array of strings. def classification self.taxonomy.to_s.sub(/\.\z/, '').split(/\s*\;\s*/) end # Strandedness. Returns one of 'single', 'double', 'mixed', or nil. def strandedness case self.strand.to_s.downcase when 'ss-'; 'single' when 'ds-'; 'double' when 'ms-'; 'mixed' else nil; end end # converts Bio::GenBank to Bio::Sequence # --- # *Arguments*: # *Returns*:: Bio::Sequence object def to_biosequence Bio::Sequence.adapter(self, Bio::Sequence::Adapter::GenBank) end end # GenBank end # Bio bio-2.0.3/lib/bio/db/gff.rb0000644000175000017500000017035514141516614014624 0ustar nileshnilesh# coding: US-ASCII # # = bio/db/gff.rb - GFF format class # # Copyright:: Copyright (C) 2003, 2005 # Toshiaki Katayama # 2006 Jan Aerts # 2008 Naohisa Goto # License:: The Ruby License # # require 'uri' require 'strscan' require 'enumerator' require 'bio/db/fasta' module Bio # == DESCRIPTION # The Bio::GFF and Bio::GFF::Record classes describe data contained in a # GFF-formatted file. For information on the GFF format, see # http://www.sanger.ac.uk/Software/formats/GFF/. Data are represented in tab- # delimited format, including # * seqname # * source # * feature # * start # * end # * score # * strand # * frame # * attributes (optional) # # For example: # SEQ1 EMBL atg 103 105 . + 0 # SEQ1 EMBL exon 103 172 . + 0 # SEQ1 EMBL splice5 172 173 . + . # SEQ1 netgene splice5 172 173 0.94 + . # SEQ1 genie sp5-20 163 182 2.3 + . # SEQ1 genie sp5-10 168 177 2.1 + . # SEQ1 grail ATG 17 19 2.1 - 0 # # The Bio::GFF object is a container for Bio::GFF::Record objects, each # representing a single line in the GFF file. class GFF # Creates a Bio::GFF object by building a collection of Bio::GFF::Record # objects. # # Create a Bio::GFF object the hard way # this_gff = "SEQ1\tEMBL\tatg\t103\t105\t.\t+\t0\n" # this_gff << "SEQ1\tEMBL\texon\t103\t172\t.\t+\t0\n" # this_gff << "SEQ1\tEMBL\tsplice5\t172\t173\t.\t+\t.\n" # this_gff << "SEQ1\tnetgene\tsplice5\t172\t173\t0.94\t+\t.\n" # this_gff << "SEQ1\tgenie\tsp5-20\t163\t182\t2.3\t+\t.\n" # this_gff << "SEQ1\tgenie\tsp5-10\t168\t177\t2.1\t+\t.\n" # this_gff << "SEQ1\tgrail\tATG\t17\t19\t2.1\t-\t0\n" # p Bio::GFF.new(this_gff) # # or create one based on a GFF-formatted file: # p Bio::GFF.new(File.open('my_data.gff') # --- # *Arguments*: # * _str_: string in GFF format # *Returns*:: Bio::GFF object def initialize(str = '') @records = Array.new str.each_line do |line| @records << Record.new(line) end end # An array of Bio::GFF::Record objects. attr_accessor :records # Represents a single line of a GFF-formatted file. See Bio::GFF for more # information. class Record # Name of the reference sequence attr_accessor :seqname # Name of the source of the feature (e.g. program that did prediction) attr_accessor :source # Name of the feature attr_accessor :feature # Start position of feature on reference sequence attr_accessor :start # End position of feature on reference sequence attr_accessor :end # Score of annotation (e.g. e-value for BLAST search) attr_accessor :score # Strand that feature is located on attr_accessor :strand # For features of type 'exon': indicates where feature begins in the reading frame attr_accessor :frame # List of tag=value pairs (e.g. to store name of the feature: ID=my_id) attr_accessor :attributes # Comments for the GFF record attr_accessor :comment # "comments" is deprecated. Instead, use "comment". def comments #warn "#{self.class.to_s}#comments is deprecated. Instead, use \"comment\"." if $VERBOSE self.comment end # "comments=" is deprecated. Instead, use "comment=". def comments=(str) #warn "#{self.class.to_s}#comments= is deprecated. Instead, use \"comment=\"." if $VERBOSE self.comment = str end # Creates a Bio::GFF::Record object. Is typically not called directly, but # is called automatically when creating a Bio::GFF object. # --- # *Arguments*: # * _str_: a tab-delimited line in GFF format def initialize(str) @comment = str.chomp[/#.*/] return if /^#/.match(str) @seqname, @source, @feature, @start, @end, @score, @strand, @frame, attributes, = str.chomp.split("\t") @attributes = parse_attributes(attributes) if attributes end private def parse_attributes(attributes) hash = Hash.new sc = StringScanner.new(attributes) attrs = [] token = '' while !sc.eos? if sc.scan(/[^\\\;\"]+/) then token.concat sc.matched elsif sc.scan(/\;/) then attrs.push token unless token.empty? token = '' elsif sc.scan(/\"/) then origtext = sc.matched while !sc.eos? if sc.scan(/[^\\\"]+/) then origtext.concat sc.matched elsif sc.scan(/\"/) then origtext.concat sc.matched break elsif sc.scan(/\\([\"\\])/) then origtext.concat sc.matched elsif sc.scan(/\\/) then origtext.concat sc.matched else raise 'Bug: should not reach here' end end token.concat origtext elsif sc.scan(/\\\;/) then token.concat sc.matched elsif sc.scan(/\\/) then token.concat sc.matched else raise 'Bug: should not reach here' end #if end #while attrs.push token unless token.empty? attrs.each do |x| key, value = x.split(' ', 2) key.strip! value.strip! if value hash[key] = value end hash end end #Class Record # = DESCRIPTION # Represents version 2 of GFF specification. # Its behavior is somehow different from Bio::GFF, # especially for attributes. # class GFF2 < GFF VERSION = 2 # string representation of the whole entry. def to_s ver = @gff_version || VERSION.to_s ver = ver.gsub(/[\r\n]+/, ' ') ([ "##gff-version #{ver}\n" ] + @metadata.collect { |m| m.to_s } + @records.collect{ |r| r.to_s }).join('') end # Private methods for GFF2 escaping characters. # Internal only. Users should not use this module directly. module Escape # unsafe characters to be escaped UNSAFE_GFF2 = /[^-_.!~*'()a-zA-Z\d\/?:@+$\[\] \x80-\xfd><;=,%^&\|`]/n # GFF2 standard identifier IDENTIFIER_GFF2 = /\A[A-Za-z][A-Za-z0-9_]*\z/n # GFF2 numeric value NUMERIC_GFF2 = /\A[-+]?([0-9]+|[0-9]*\.[0-9]*)([eE][+-]?[0-9]+)?\z/n # List of 1-letter special backslash code. # The letters other than listed here are the same as # those of without backslash, except for "x" and digits. # (Note that \u (unicode) is not supported.) BACKSLASH = { 't' => "\t", 'n' => "\n", 'r' => "\r", 'f' => "\f", 'b' => "\b", 'a' => "\a", 'e' => "\e", 'v' => "\v", # 's' => " ", }.freeze # inverted hash of BACKSLASH CHAR2BACKSLASH = BACKSLASH.invert.freeze # inverted hash of BACKSLASH, including double quote and backslash CHAR2BACKSLASH_EXTENDED = CHAR2BACKSLASH.merge({ '"' => '"', "\\" => "\\" }).freeze # prohibited characters in GFF2 columns PROHIBITED_GFF2_COLUMNS = /[\t\r\n\x00-\x08\x0b\x0c\x0e-\x1f\x7f\xfe\xff]/ # prohibited characters in GFF2 attribute tags PROHIBITED_GFF2_TAGS = /[\s\"\;\x00-\x08\x0e-\x1f\x7f\xfe\xff]/ private # (private) escapes GFF2 free text string def escape_gff2_freetext(str) '"' + str.gsub(UNSAFE_GFF2) do |x| "\\" + (CHAR2BACKSLASH_EXTENDED[x] || char2octal(x)) end + '"' end # (private) "x" => "\\oXXX" # "x" must be a letter. # If "x" is consisted of two bytes or more, joined with "\\". def char2octal(x) x.enum_for(:each_byte).collect { |y| sprintf("%03o", y) }.join("\\") end # (private) escapes GFF2 attribute value string def escape_gff2_attribute_value(str) freetext?(str) ? escape_gff2_freetext(str) : str end # (private) check if the given string is a free text to be quoted # by double-qoute. def freetext?(str) if IDENTIFIER_GFF2 =~ str or NUMERIC_GFF2 =~ str then false else true end end # (private) escapes normal columns in GFF2 def gff2_column_to_s(str) str = str.to_s str = str.empty? ? '.' : str str = str.gsub(PROHIBITED_GFF2_COLUMNS) do |x| "\\" + (CHAR2BACKSLASH[x] || char2octal(x)) end if str[0, 1] == '#' then str[0, 1] = "\\043" end str end # (private) escapes GFF2 attribute tag string def escape_gff2_attribute_tag(str) str = str.to_s str = str.empty? ? '.' : str str = str.gsub(PROHIBITED_GFF2_TAGS) do |x| "\\" + (CHAR2BACKSLASH[x] || char2octal(x)) end if str[0, 1] == '#' then str[0, 1] = "\\043" end str end # (private) dummy method, will be redefined in GFF3. def unescape(str) str end end #module Escape # Stores GFF2 record. class Record < GFF::Record include Escape # Stores GFF2 attribute's value. class Value include Escape # Creates a new Value object. # Note that the given array _values_ is directly stored in # the object. # # --- # *Arguments*: # * (optional) _values_: Array containing String objects. # *Returns*:: Value object. def initialize(values = []) @values = values end # Returns string representation of this Value object. # --- # *Returns*:: String def to_s @values.collect do |str| escape_gff2_attribute_value(str) end.join(' ') end # Returns all values in this object. # # Note that modification of the returned array would affect # original Value object. # --- # *Returns*:: Array def values @values end alias to_a values # Returns true if other == self. # Otherwise, returns false. def ==(other) return false unless other.kind_of?(self.class) or self.kind_of?(other.class) self.values == other.values rescue super(other) end end #class Value # Parses a GFF2-formatted line and returns a new # Bio::GFF::GFF2::Record object. def self.parse(str) ret = self.new ret.parse(str) ret end # Creates a Bio::GFF::GFF2::Record object. # Is typically not called directly, but # is called automatically when creating a Bio::GFF::GFF2 object. # # --- # *Arguments*: # * _str_: a tab-delimited line in GFF2 format # *Arguments*: # * _seqname_: seqname (String or nil) # * _source_: source (String or nil) # * _feature_: feature type (String) # * _start_position_: start (Integer) # * _end_position_: end (Integer) # * _score_: score (Float or nil) # * _strand_: strand (String or nil) # * _frame_: frame (Integer or nil) # * _attributes_: attributes (Array or nil) def initialize(*arg) if arg.size == 1 then parse(arg[0]) else @seqname, @source, @feature, start, endp, @score, @strand, frame, @attributes = arg @start = start ? start.to_i : nil @end = endp ? endp.to_i : nil @score = score ? score.to_f : nil @frame = frame ? frame.to_i : nil end @attributes ||= [] end # Comment for the GFF record attr_accessor :comment # "comments" is deprecated. Instead, use "comment". def comments warn "#{self.class.to_s}#comments is deprecated. Instead, use \"comment\"." self.comment end # "comments=" is deprecated. Instead, use "comment=". def comments=(str) warn "#{self.class.to_s}#comments= is deprecated. Instead, use \"comment=\"." self.comment = str end # Parses a GFF2-formatted line and stores data from the string. # Note that all existing data is wiped out. def parse(string) if /^\s*\#/ =~ string then @comment = string[/\#(.*)/, 1].chomp columns = [] else columns = string.chomp.split("\t", 10) @comment = columns[9][/\#(.*)/, 1].chomp if columns[9] end @seqname, @source, @feature, start, endp, score, @strand, frame = columns[0, 8].collect { |x| str = unescape(x) str == '.' ? nil : str } @start = start ? start.to_i : nil @end = endp ? endp.to_i : nil @score = score ? score.to_f : nil @frame = frame ? frame.to_i : nil @attributes = parse_attributes(columns[8]) end # Returns true if the entry is empty except for comment. # Otherwise, returns false. def comment_only? if !@seqname and !@source and !@feature and !@start and !@end and !@score and !@strand and !@frame and @attributes.empty? then true else false end end # Return the record as a GFF2 compatible string def to_s cmnt = if defined?(@comment) and @comment and !@comment.to_s.strip.empty? then @comment.gsub(/[\r\n]+/, ' ') else false end return "\##{cmnt}\n" if self.comment_only? and cmnt [ gff2_column_to_s(@seqname), gff2_column_to_s(@source), gff2_column_to_s(@feature), gff2_column_to_s(@start), gff2_column_to_s(@end), gff2_column_to_s(@score), gff2_column_to_s(@strand), gff2_column_to_s(@frame), attributes_to_s(@attributes) ].join("\t") + (cmnt ? "\t\##{cmnt}\n" : "\n") end # Returns true if self == other. Otherwise, returns false. def ==(other) super || ((self.class == other.class and self.seqname == other.seqname and self.source == other.source and self.feature == other.feature and self.start == other.start and self.end == other.end and self.score == other.score and self.strand == other.strand and self.frame == other.frame and self.attributes == other.attributes) ? true : false) end # Gets the attribute value for the given tag. # # Note that if two or more tag-value pairs with the same name found, # only the first value is returned. # --- # *Arguments*: # * (required) _tag_: String # *Returns*:: String, Bio::GFF::GFF2::Record::Value object, or nil. def get_attribute(tag) ary = @attributes.assoc(tag) ary ? ary[1] : nil end alias attribute get_attribute # Gets the attribute values for the given tag. # This method always returns an array. # --- # *Arguments*: # * (required) _tag_: String # *Returns*:: Array containing String or \ # Bio::GFF::GFF2::Record::Value objects. def get_attributes(tag) ary = @attributes.find_all do |x| x[0] == tag end ary.collect! { |x| x[1] } ary end # Sets value for the given tag. # If the tag exists, the value of the tag is replaced with _value_. # Note that if two or more tag-value pairs with the same name found, # only the first tag-value pair is replaced. # # If the tag does not exist, the tag-value pair is newly added. # --- # *Arguments*: # * (required) _tag_: String # * (required) _value_: String or Bio::GFF::GFF2::Record::Value object. # *Returns*:: _value_ def set_attribute(tag, value) ary = @attributes.find do |x| x[0] == tag end if ary then ary[1] = value else ary = [ String.new(tag), value ] @attributes.push ary end value end # Replaces values for the given tags with new values. # Existing values for the tag are completely wiped out and # replaced by new tag-value pairs. # If the tag does not exist, the tag-value pairs are newly added. # # --- # *Arguments*: # * (required) _tag_: String # * (required) _values_: String or Bio::GFF::GFF2::Record::Value objects. # *Returns*:: _self_ def replace_attributes(tag, *values) i = 0 @attributes.reject! do |x| if x[0] == tag then if i >= values.size then true else x[1] = values[i] i += 1 false end else false end end (i...(values.size)).each do |j| @attributes.push [ String.new(tag), values[j] ] end self end # Adds a new tag-value pair. # --- # *Arguments*: # * (required) _tag_: String # * (required) _value_: String or Bio::GFF::GFF2::Record::Value object. # *Returns*:: _value_ def add_attribute(tag, value) @attributes.push([ String.new(tag), value ]) end # Removes a specific tag-value pair. # # Note that if two or more tag-value pairs found, # only the first tag-value pair is removed. # # --- # *Arguments*: # * (required) _tag_: String # * (required) _value_: String or Bio::GFF::GFF2::Record::Value object. # *Returns*:: if removed, _value_. Otherwise, nil. def delete_attribute(tag, value) removed = nil if i = @attributes.index([ tag, value ]) then ary = @attributes.delete_at(i) removed = ary[1] end removed end # Removes all attributes with the specified tag. # # --- # *Arguments*: # * (required) _tag_: String # *Returns*:: if removed, self. Otherwise, nil. def delete_attributes(tag) @attributes.reject! do |x| x[0] == tag end ? self : nil end # Sorts attributes order by given tag name's order. # If a block is given, the argument _tags_ is ignored, and # yields two tag names like Array#sort!. # # --- # *Arguments*: # * (required or optional) _tags_: Array containing String objects # *Returns*:: _self_ def sort_attributes_by_tag!(tags = nil) h = {} s = @attributes.size @attributes.each_with_index { |x, i| h[x] = i } if block_given? then @attributes.sort! do |x, y| r = yield x[0], y[0] if r == 0 then r = (h[x] || s) <=> (h[y] || s) end r end else unless tags then raise ArgumentError, 'wrong number of arguments (0 for 1) or wrong argument value' end @attributes.sort! do |x, y| r = (tags.index(x[0]) || tags.size) <=> (tags.index(y[0]) || tags.size) if r == 0 then r = (h[x] || s) <=> (h[y] || s) end r end end self end # Returns hash representation of attributes. # # Note: If two or more tag-value pairs with same tag names exist, # only the first tag-value pair is used for each tag. # # --- # *Returns*:: Hash object def attributes_to_hash h = {} @attributes.each do |x| key, val = x h[key] = val unless h[key] end h end private # (private) Parses attributes. # Returns arrays def parse_attributes(str) return [] if !str or str == '.' attr_pairs = parse_attributes_string(str) attr_pairs.collect! do |x| key = x.shift val = (x.size == 1) ? x[0] : Value.new(x) [ key, val ] end attr_pairs end # (private) Parses attributes string. # Returns arrays def parse_attributes_string(str) sc = StringScanner.new(str) attr_pairs = [] tokens = [] cur_token = '' while !sc.eos? if sc.scan(/[^\\\;\"\s]+/) then cur_token.concat sc.matched elsif sc.scan(/\s+/) then tokens.push cur_token unless cur_token.empty? cur_token = '' elsif sc.scan(/\;/) then tokens.push cur_token unless cur_token.empty? cur_token = '' attr_pairs.push tokens tokens = [] elsif sc.scan(/\"/) then tokens.push cur_token unless cur_token.empty? cur_token = '' freetext = '' while !sc.eos? if sc.scan(/[^\\\"]+/) then freetext.concat sc.matched elsif sc.scan(/\"/) then break elsif sc.scan(/\\([\"\\])/) then freetext.concat sc[1] elsif sc.scan(/\\x([0-9a-fA-F][0-9a-fA-F])/n) then chr = sc[1].to_i(16).chr freetext.concat chr elsif sc.scan(/\\([0-7][0-7][0-7])/n) then chr = sc[1].to_i(8).chr freetext.concat chr elsif sc.scan(/\\([^x0-9])/n) then chr = Escape::BACKSLASH[sc[1]] || sc.matched freetext.concat chr elsif sc.scan(/\\/) then freetext.concat sc.matched else raise 'Bug: should not reach here' end end tokens.push freetext #p freetext # # disabled support for \; out of freetext #elsif sc.scan(/\\\;/) then # cur_token.concat sc.matched elsif sc.scan(/\\/) then cur_token.concat sc.matched else raise 'Bug: should not reach here' end #if end #while tokens.push cur_token unless cur_token.empty? attr_pairs.push tokens unless tokens.empty? return attr_pairs end # (private) string representation of attributes def attributes_to_s(attr) attr.collect do |a| tag, val = a if Escape::IDENTIFIER_GFF2 !~ tag then warn "Illegal GFF2 attribute tag: #{tag.inspect}" if $VERBOSE end tagstr = gff2_column_to_s(tag) valstr = if val.kind_of?(Value) then val.to_s else escape_gff2_attribute_value(val) end "#{tagstr} #{valstr}" end.join(' ; ') end end #class Record # Stores GFF2 meta-data. class MetaData # Creates a new MetaData object def initialize(directive, data = nil) @directive = directive @data = data end # Directive. Usually, one of "feature-ontology", "attribute-ontology", # or "source-ontology". attr_accessor :directive # data of this entry attr_accessor :data # parses a line def self.parse(line) directive, data = line.chomp.split(/\s+/, 2) directive = directive.sub(/\A\#\#/, '') if directive self.new(directive, data) end # string representation of this meta-data def to_s d = @directive.to_s.gsub(/[\r\n]+/, ' ') v = ' ' + @data.to_s.gsub(/[\r\n]+/, ' ') unless @data.to_s.empty? "\#\##{d}#{v}\n" end # Returns true if self == other. Otherwise, returns false. def ==(other) if self.class == other.class and self.directive == other.directive and self.data == other.data then true else false end end end #class MetaData # (private) parses metadata def parse_metadata(directive, line) case directive when 'gff-version' @gff_version ||= line.split(/\s+/)[1] else @metadata.push MetaData.parse(line) end true end private :parse_metadata # Creates a Bio::GFF::GFF2 object by building a collection of # Bio::GFF::GFF2::Record (and metadata) objects. # # --- # *Arguments*: # * _str_: string in GFF format # *Returns*:: Bio::GFF::GFF2 object def initialize(str = nil) @gff_version = nil @records = [] @metadata = [] parse(str) if str end # GFF2 version string (String or nil). nil means "2". attr_reader :gff_version # Metadata (except "##gff-version"). # Must be an array of Bio::GFF::GFF2::MetaData objects. attr_accessor :metadata # Parses a GFF2 entries, and concatenated the parsed data. # # --- # *Arguments*: # * _str_: string in GFF format # *Returns*:: self def parse(str) # parses GFF lines str.each_line do |line| if /^\#\#([^\s]+)/ =~ line then parse_metadata($1, line) else @records << GFF2::Record.new(line) end end self end end #class GFF2 # = DESCRIPTION # Represents version 3 of GFF specification. # For more information on version GFF3, see # http://song.sourceforge.net/gff3.shtml #-- # obsolete URL: # http://flybase.bio.indiana.edu/annot/gff3.html #++ class GFF3 < GFF VERSION = 3 # Creates a Bio::GFF::GFF3 object by building a collection of # Bio::GFF::GFF3::Record (and metadata) objects. # # --- # *Arguments*: # * _str_: string in GFF format # *Returns*:: Bio::GFF object def initialize(str = nil) @gff_version = nil @records = [] @sequence_regions = [] @metadata = [] @sequences = [] @in_fasta = false parse(str) if str end # GFF3 version string (String or nil). nil means "3". attr_reader :gff_version # Metadata of "##sequence-region". # Must be an array of Bio::GFF::GFF3::SequenceRegion objects. attr_accessor :sequence_regions # Metadata (except "##sequence-region", "##gff-version", "###"). # Must be an array of Bio::GFF::GFF3::MetaData objects. attr_accessor :metadata # Sequences bundled within GFF3. # Must be an array of Bio::Sequence objects. attr_accessor :sequences # Parses a GFF3 entries, and concatenated the parsed data. # # Note that after "##FASTA" line is given, # only fasta-formatted text is accepted. # # --- # *Arguments*: # * _str_: string in GFF format # *Returns*:: self def parse(str) # if already after the ##FASTA line, parses fasta format and return if @in_fasta then parse_fasta(str) return self end if str.respond_to?(:gets) then # str is a IO-like object fst = nil else # str is a String gff, sep, fst = str.split(/^(\>|##FASTA.*)/n, 2) fst = sep + fst if sep == '>' and fst str = gff end # parses GFF lines str.each_line do |line| if /^\#\#([^\s]+)/ =~ line then parse_metadata($1, line) parse_fasta(str) if @in_fasta elsif /^\>/ =~ line then @in_fasta = true parse_fasta(str, line) else @records << GFF3::Record.new(line) end end # parses fasta format when str is a String and fasta data exists if fst then @in_fasta = true parse_fasta(fst) end self end # parses fasta formatted data def parse_fasta(str, line = nil) str.each_line("\n>") do |seqstr| if line then seqstr = line + seqstr; line = nil; end x = seqstr.strip next if x.empty? or x == '>' fst = Bio::FastaFormat.new(seqstr) seq = fst.to_seq seq.entry_id = unescape(fst.definition.strip.split(/\s/, 2)[0].to_s) @sequences.push seq end end private :parse_fasta # string representation of whole entry. def to_s ver = @gff_version || VERSION.to_s if @sequences.size > 0 then seqs = "##FASTA\n" + @sequences.collect { |s| s.to_fasta(s.entry_id, 70) }.join('') else seqs = '' end ([ "##gff-version #{escape(ver)}\n" ] + @metadata.collect { |m| m.to_s } + @sequence_regions.collect { |m| m.to_s } + @records.collect{ |r| r.to_s }).join('') + seqs end # Private methods for escaping characters. # Internal only. Users should not use this module directly. module Escape # unsafe characters to be escaped for normal columns UNSAFE = /[^-_.!~*'()a-zA-Z\d\/?:@+$\[\] "\x80-\xfd><;=,]/n # unsafe characters to be escaped for seqid columns # and target_id of the "Target" attribute UNSAFE_SEQID = /[^-a-zA-Z0-9.:^*$@!+_?|]/n # unsafe characters to be escaped for attribute columns UNSAFE_ATTRIBUTE = /[^-_.!~*'()a-zA-Z\d\/?:@+$\[\] "\x80-\xfd><]/n private # If str is empty, returns '.'. Otherwise, returns str. def column_to_s(str) str = str.to_s str.empty? ? '.' : str end if URI.const_defined?(:Parser) then # (private) URI::Parser object for escape/unescape GFF3 columns URI_PARSER = URI::Parser.new # (private) the same as URI::Parser#escape(str, unsafe) def _escape(str, unsafe) URI_PARSER.escape(str, unsafe) end # (private) the same as URI::Parser#unescape(str) def _unescape(str) URI_PARSER.unescape(str) end else # (private) the same as URI.escape(str, unsafe) def _escape(str, unsafe) URI.escape(str, unsafe) end # (private) the same as URI.unescape(str) def _unescape(str) URI.unescape(str) end end # Return the string corresponding to these characters unescaped def unescape(string) _unescape(string) end # Escape a column according to the specification at # http://song.sourceforge.net/gff3.shtml. def escape(string) _escape(string, UNSAFE) end # Escape seqid column according to the specification at # http://song.sourceforge.net/gff3.shtml. def escape_seqid(string) _escape(string, UNSAFE_SEQID) end # Escape attribute according to the specification at # http://song.sourceforge.net/gff3.shtml. # In addition to the normal escape rule, the following characters # are escaped: ",=;". # Returns the string corresponding to these characters escaped. def escape_attribute(string) _escape(string, UNSAFE_ATTRIBUTE) end end #module Escape include Escape # Stores meta-data "##sequence-region seqid start end". class SequenceRegion include Escape extend Escape # creates a new SequenceRegion class def initialize(seqid, start, endpos) @seqid = seqid @start = start ? start.to_i : nil @end = endpos ? endpos.to_i : nil end # parses given string and returns SequenceRegion class def self.parse(str) _, seqid, start, endpos = str.chomp.split(/\s+/, 4).collect { |x| unescape(x) } self.new(seqid, start, endpos) end # sequence ID attr_accessor :seqid # start position attr_accessor :start # end position attr_accessor :end # string representation def to_s i = escape_seqid(column_to_s(@seqid)) s = escape_seqid(column_to_s(@start)) e = escape_seqid(column_to_s(@end)) "##sequence-region #{i} #{s} #{e}\n" end # Returns true if self == other. Otherwise, returns false. def ==(other) if other.class == self.class and other.seqid == self.seqid and other.start == self.start and other.end == self.end then true else false end end end #class SequenceRegion # Represents a single line of a GFF3-formatted file. # See Bio::GFF::GFF3 for more information. class Record < GFF2::Record include GFF3::Escape # shortcut to the ID attribute def id get_attribute('ID') end # set ID attribute def id=(str) set_attribute('ID', str) end # aliases for Column 1 (formerly "seqname") alias seqid seqname alias seqid= seqname= # aliases for Column 3 (formerly "feature"). # In the GFF3 document http://song.sourceforge.net/gff3.shtml, # column3 is called "type", but we used "feature_type" # because "type" is already used by Ruby itself. alias feature_type feature alias feature_type= feature= # aliases for Column 8 alias phase frame alias phase= frame= # Parses a GFF3-formatted line and returns a new # Bio::GFF::GFF3::Record object. def self.parse(str) self.new.parse(str) end # Creates a Bio::GFF::GFF3::Record object. # Is typically not called directly, but # is called automatically when creating a Bio::GFF::GFF3 object. # # --- # *Arguments*: # * _str_: a tab-delimited line in GFF3 format # *Arguments*: # * _seqid_: sequence ID (String or nil) # * _source_: source (String or nil) # * _feature_type_: type of feature (String) # * _start_position_: start (Integer) # * _end_position_: end (Integer) # * _score_: score (Float or nil) # * _strand_: strand (String or nil) # * _phase_: phase (Integer or nil) # * _attributes_: attributes (Array or nil) def initialize(*arg) super(*arg) end # Parses a GFF3-formatted line and stores data from the string. # Note that all existing data is wiped out. def parse(string) super end # Return the record as a GFF3 compatible string def to_s cmnt = if defined?(@comment) and @comment and !@comment.to_s.strip.empty? then @comment.gsub(/[\r\n]+/, ' ') else false end return "\##{cmnt}\n" if self.comment_only? and cmnt [ escape_seqid(column_to_s(@seqname)), escape(column_to_s(@source)), escape(column_to_s(@feature)), escape(column_to_s(@start)), escape(column_to_s(@end)), escape(column_to_s(@score)), escape(column_to_s(@strand)), escape(column_to_s(@frame)), attributes_to_s(@attributes) ].join("\t") + (cmnt ? "\t\##{cmnt}\n" : "\n") end # Bio:GFF::GFF3::Record::Target is a class to store # data of "Target" attribute. class Target include GFF3::Escape extend GFF3::Escape # Creates a new Target object. def initialize(target_id, start, endpos, strand = nil) @target_id = target_id @start = start ? start.to_i : nil @end = endpos ? endpos.to_i : nil @strand = strand end # target ID attr_accessor :target_id # start position attr_accessor :start # end position attr_accessor :end # strand (optional). Normally, "+" or "-", or nil. attr_accessor :strand # parses "target_id start end [strand]"-style string # (for example, "ABC789 123 456 +") # and creates a new Target object. # def self.parse(str) target_id, start, endpos, strand = str.split(/ +/, 4).collect { |x| unescape(x) } self.new(target_id, start, endpos, strand) end # returns a string def to_s i = escape_seqid(column_to_s(@target_id)) s = escape_attribute(column_to_s(@start)) e = escape_attribute(column_to_s(@end)) strnd = escape_attribute(@strand.to_s) strnd = " " + strnd unless strnd.empty? "#{i} #{s} #{e}#{strnd}" end # Returns true if self == other. Otherwise, returns false. def ==(other) if other.class == self.class and other.target_id == self.target_id and other.start == self.start and other.end == self.end and other.strand == self.strand then true else false end end end #class Target # Bio:GFF::GFF3::Record::Gap is a class to store # data of "Gap" attribute. class Gap # Code is a class to store length of single-letter code. Code = Struct.new(:code, :length) # Code is a class to store length of single-letter code. class Code # 1-letter code (Symbol). One of :M, :I, :D, :F, or :R is expected. attr_reader :code if false #dummy for RDoc # length (Integer) attr_reader :length if false #dummy for RDoc def to_s "#{code}#{length}" end end #class code # Creates a new Gap object. # # --- # *Arguments*: # * _str_: a formatted string, or nil. def initialize(str = nil) if str then @data = str.split(/ +/).collect do |x| if /\A([A-Z])([0-9]+)\z/ =~ x.strip then Code.new($1.intern, $2.to_i) else warn "ignored unknown token: #{x}.inspect" if $VERBOSE nil end end @data.compact! else @data = [] end end # Same as new(str). def self.parse(str) self.new(str) end # (private method) # Scans gaps and returns an array of Code objects def __scan_gap(str, gap_regexp = /[^a-zA-Z]/, code_i = :I, code_m = :M) sc = StringScanner.new(str) data = [] while len = sc.skip_until(gap_regexp) mlen = len - sc.matched_size data.push Code.new(code_m, mlen) if mlen > 0 g = Code.new(code_i, sc.matched_size) while glen = sc.skip(gap_regexp) g.length += glen end data.push g end if sc.rest_size > 0 then m = Code.new(code_m, sc.rest_size) data.push m end data end private :__scan_gap # (private method) # Parses given reference-target sequence alignment and # initializes self. Existing data will be erased. def __initialize_from_sequences_na(reference, target, gap_regexp = /[^a-zA-Z]/) data_ref = __scan_gap(reference, gap_regexp, :I, :M) data_tgt = __scan_gap(target, gap_regexp, :D, :M) data = [] while !data_ref.empty? and !data_tgt.empty? ref = data_ref.shift tgt = data_tgt.shift if ref.length > tgt.length then x = Code.new(ref.code, ref.length - tgt.length) data_ref.unshift x ref.length = tgt.length elsif ref.length < tgt.length then x = Code.new(tgt.code, tgt.length - ref.length) data_tgt.unshift x tgt.length = ref.length end case ref.code when :M if tgt.code == :M then data.push ref elsif tgt.code == :D then data.push tgt else raise 'Bug: should not reach here.' end when :I if tgt.code == :M then data.push ref elsif tgt.code == :D then # This site is ignored, # because both reference and target are gap else raise 'Bug: should not reach here.' end end end #while # rest of data_ref len = 0 data_ref.each do |r| len += r.length if r.code == :M end data.push Code.new(:D, len) if len > 0 # rest of data_tgt len = 0 data_tgt.each do |t| len += t.length if t.code == :M end data.push Code.new(:I, len) if len > 0 @data = data true end private :__initialize_from_sequences_na # Creates a new Gap object from given sequence alignment. # # Note that sites of which both reference and target are gaps # are silently removed. # # --- # *Arguments*: # * _reference_: reference sequence (nucleotide sequence) # * _target_: target sequence (nucleotide sequence) # * gap_regexp: regexp to identify gap def self.new_from_sequences_na(reference, target, gap_regexp = /[^a-zA-Z]/) gap = self.new gap.instance_eval { __initialize_from_sequences_na(reference, target, gap_regexp) } gap end # (private method) # scans a codon or gap in reference sequence def __scan_codon(sc_ref, gap_regexp, space_regexp, forward_frameshift_regexp, reverse_frameshift_regexp) chars = [] gap_count = 0 fs_count = 0 while chars.size < 3 + fs_count and char = sc_ref.scan(/./mn) case char when space_regexp # ignored when forward_frameshift_regexp # next char is forward frameshift fs_count += 1 when reverse_frameshift_regexp # next char is reverse frameshift fs_count -= 1 when gap_regexp chars.push char gap_count += 1 else chars.push char end end #while if chars.size < (3 + fs_count) then gap_count += (3 + fs_count) - chars.size end return gap_count, fs_count end private :__scan_codon # (private method) # internal use only def __push_code_to_data(cur, data, code, len) if cur and cur.code == code then cur.length += len else cur = Code.new(code, len) data.push cur end return cur end private :__push_code_to_data # (private method) # Parses given reference(nuc)-target(amino) sequence alignment and # initializes self. Existing data will be erased. def __initialize_from_sequences_na_aa(reference, target, gap_regexp = /[^a-zA-Z]/, space_regexp = /\s/, forward_frameshift_regexp = /\>/, reverse_frameshift_regexp = /\ 0 then cur = __push_code_to_data(cur, data, :F, ref_fs) end end #len.times elsif len = sc_tgt.skip(re_one) then # always 1-letter ref_gaps, ref_fs = __scan_codon(sc_ref, gap_regexp, space_regexp, forward_frameshift_regexp, reverse_frameshift_regexp) case ref_gaps when 3 cur = __push_code_to_data(cur, data, :I, 1) when 2, 1, 0 # reverse frameshift inserted when gaps exist ref_fs -= ref_gaps # normal site cur = __push_code_to_data(cur, data, :M, 1) else raise 'Bug: should not reach here' end if ref_fs < 0 then cur = __push_code_to_data(cur, data, :R, -ref_fs) elsif ref_fs > 0 then cur = __push_code_to_data(cur, data, :F, ref_fs) end else raise 'Bug: should not reach here' end end #while if sc_ref.rest_size > 0 then rest = sc_ref.scan(/.*/mn) rest.gsub!(space_regexp, '') rest.gsub!(forward_frameshift_regexp, '') rest.gsub!(reverse_frameshift_regexp, '') rest.gsub!(gap_regexp, '') len = rest.length.div(3) cur = __push_code_to_data(cur, data, :D, len) if len > 0 len = rest.length % 3 cur = __push_code_to_data(cur, data, :F, len) if len > 0 end @data = data self end private :__initialize_from_sequences_na_aa # Creates a new Gap object from given sequence alignment. # # Note that sites of which both reference and target are gaps # are silently removed. # # For incorrect alignments that break 3:1 rule, # gap positions will be moved inside codons, # unwanted gaps will be removed, and # some forward or reverse frameshift will be inserted. # # For example, # atgg-taagac-att # M V K - I # is treated as: # atggt>I # # Incorrect combination of frameshift with frameshift or gap # may cause undefined behavior. # # Forward frameshifts are recomended to be indicated in the # target sequence. # Reverse frameshifts can be indicated in the reference sequence # or the target sequence. # # Priority of regular expressions: # space > forward/reverse frameshift > gap # # --- # *Arguments*: # * _reference_: reference sequence (nucleotide sequence) # * _target_: target sequence (amino acid sequence) # * gap_regexp: regexp to identify gap # * space_regexp: regexp to identify space character which is completely ignored # * forward_frameshift_regexp: regexp to identify forward frameshift # * reverse_frameshift_regexp: regexp to identify reverse frameshift def self.new_from_sequences_na_aa(reference, target, gap_regexp = /[^a-zA-Z]/, space_regexp = /\s/, forward_frameshift_regexp = /\>/, reverse_frameshift_regexp = /\gap_char: gap character def process_sequences_na(reference, target, gap_char = '-') s_ref, s_tgt = dup_seqs(reference, target) s_ref, s_tgt = __process_sequences(s_ref, s_tgt, gap_char, gap_char, 1, 1, gap_char, gap_char) if $VERBOSE and s_ref.length != s_tgt.length then warn "returned sequences not equal length" end return s_ref, s_tgt end # Processes sequences and # returns gapped sequences as an array of sequences. # reference must be a nucleotide sequence, and # target must be an amino acid sequence. # # Note for reverse frameshift: # Reverse_frameshift characers are inserted in the # reference sequence. # For example, alignment of "Gap=M3 R1 M2" is: # atgaagatgap_char: gap character # * space_char: space character inserted to amino sequence for matching na-aa alignment # * forward_frameshift: forward frameshift character # * reverse_frameshift: reverse frameshift character def process_sequences_na_aa(reference, target, gap_char = '-', space_char = ' ', forward_frameshift = '>', reverse_frameshift = '<') s_ref, s_tgt = dup_seqs(reference, target) s_tgt = s_tgt.gsub(/./, "\\0#{space_char}#{space_char}") ref_increment = 3 tgt_increment = 1 + space_char.length * 2 ref_gap = gap_char * 3 tgt_gap = "#{gap_char}#{space_char}#{space_char}" return __process_sequences(s_ref, s_tgt, ref_gap, tgt_gap, ref_increment, tgt_increment, forward_frameshift, reverse_frameshift) end end #class Gap private def parse_attributes(string) return [] if !string or string == '.' attr_pairs = [] string.split(';').each do |pair| key, value = pair.split('=', 2) key = unescape(key) values = value.to_s.split(',') case key when 'Target' values.collect! { |v| Target.parse(v) } when 'Gap' values.collect! { |v| Gap.parse(v) } else values.collect! { |v| unescape(v) } end attr_pairs.concat values.collect { |v| [ key, v ] } end return attr_pairs end # method parse_attributes # Return the attributes as a string as it appears at the end of # a GFF3 line def attributes_to_s(attr) return '.' if !attr or attr.empty? keys = [] hash = {} attr.each do |pair| key = pair[0] val = pair[1] keys.push key unless hash[key] hash[key] ||= [] hash[key].push val end keys.collect do |key| values = hash[key] val = values.collect do |v| if v.kind_of?(Target) then v.to_s else escape_attribute(v.to_s) end end.join(',') "#{escape_attribute(key)}=#{val}" end.join(';') end end # class GFF3::Record # This is a dummy record corresponding to the "###" metadata. class RecordBoundary < GFF3::Record def initialize(*arg) super(*arg) self.freeze end def to_s "###\n" end end #class RecordBoundary # stores GFF3 MetaData MetaData = GFF2::MetaData # parses metadata def parse_metadata(directive, line) case directive when 'gff-version' @gff_version ||= line.split(/\s+/)[1] when 'FASTA' @in_fasta = true when 'sequence-region' @sequence_regions.push SequenceRegion.parse(line) when '#' # "###" directive @records.push RecordBoundary.new else @metadata.push MetaData.parse(line) end true end private :parse_metadata end #class GFF3 end # class GFF end # module Bio bio-2.0.3/lib/bio/db/aaindex.rb0000644000175000017500000002153414141516614015465 0ustar nileshnilesh# # = bio/db/aaindex.rb - AAindex database class # # Copyright:: Copyright (C) 2001 # KAWASHIMA Shuichi # Copyright:: Copyright (C) 2006 # Mitsuteru C. Nakao # License:: The Ruby License # # # == Description # # Classes for Amino Acid Index Database (AAindex and AAindex2). # * AAindex Help: https://www.genome.jp/aaindex/aaindex_help.html # # == Examples # # aax1 = Bio::AAindex.auto("PRAM900102.aaindex1") # aax2 = Bio::AAindex.auto("DAYM780301.aaindex2") # # aax1 = Bio::AAindex1.new("PRAM900102.aaindex1") # aax1.entry_id # aax1.index # # aax2 = Bio::AAindex2.new("DAYM780301.aaindex2") # aax2.entry_id # aax2.matrix # aax2.matrix[2,2] # aax2.matrix('R', 'A') # aax2['R', 'A'] # # == References # # * http://www.genome.jp/aaindex/ # require "bio/db" require "matrix" module Bio # == Description # # Bio::AAindex is the super class of Bio::AAindex1 and Bio::AAindex2, # parser classes for AAindex Amino Acid Index Database. # * AAindex Help: https://www.genome.jp/aaindex/aaindex_help.html # # Except Bio::AAindex.auto, do not use this class directly. # Methods of this super class is used from AAindex1 and AAindex2 classes. # # == Examples # # # auto-detection of data format # aax1 = Bio::AAindex.auto("PRAM900102.aaindex1") # aax2 = Bio::AAindex.auto("DAYM780301.aaindex2") # # # Example of Bio::AAindex1 class # aax1 = Bio::AAindex1.new("PRAM900102.aaindex1") # aax1.entry_id # aax1.index # # # Example of Bio::AAindex2 class # aax2 = Bio::AAindex2.new("DAYM780301.aaindex2") # aax2.entry_id # aax2.matrix # aax2.matrix[2,2] # aax2.matrix('R', 'A') # aax2['R', 'A'] # # == References # # * http://www.genome.jp/aaindex/ # class AAindex < KEGGDB # Delimiter DELIMITER ="\n//\n" # Delimiter RS = DELIMITER # Bio::DB API TAGSIZE = 2 # Auto detecter for two AAindex formats. # returns a Bio::AAindex1 object or a Bio::AAindex2 object. def self.auto(str) case str when /^I /m Bio::AAindex1.new(str) when /^M /m Bio::AAindex2.new(str) else raise end end # def initialize(entry) super(entry, TAGSIZE) end # Returns entry_id in the H line. def entry_id if @data['entry_id'] @data['entry_id'] else @data['entry_id'] = field_fetch('H') end end # Returns definition in the D line. def definition if @data['definition'] @data['definition'] else @data['definition'] = field_fetch('D') end end # Returns database links in the R line. # cf.) ['LIT:123456', 'PMID:12345678'] def dblinks if @data['ref'] @data['ref'] else @data['ref'] = field_fetch('R').split(' ') end end # Returns authors in the A line. def author if @data['author'] @data['author'] else @data['author'] = field_fetch('A') end end # Returns title in the T line. def title if @data['title'] @data['title'] else @data['title'] = field_fetch('T') end end # Returns journal name in the J line. def journal if @data['journal'] @data['journal'] else @data['journal'] = field_fetch('J') end end # Returns comment (if any). def comment if @data['comment'] @data['comment'] else @data['comment'] = field_fetch('*') end end end # == Description # # Parser class for AAindex1, Amino Acid Index Database. # * AAindex Help: https://www.genome.jp/aaindex/aaindex_help.html # # == Examples # # # auto-detection of data format by using Bio::AAindex class # aax1 = Bio::AAindex.auto("PRAM900102.aaindex1") # # # parse a file and get contents # aax1 = Bio::AAindex1.new("PRAM900102.aaindex1") # aax1.entry_id # aax1.index # # == References # # * http://www.genome.jp/aaindex/ # class AAindex1 < AAindex def initialize(entry) super(entry) end # Returns correlation_coefficient (Hash) in the C line. # # cf.) {'ABCD12010203' => 0.999, 'CDEF123456' => 0.543, ...} def correlation_coefficient if @data['correlation_coefficient'] @data['correlation_coefficient'] else hash = {} ary = field_fetch('C').split(' ') ary.each do |x| next unless x =~ /^[A-Z]/ hash[x] = ary[ary.index(x) + 1].to_f end @data['correlation_coefficient'] = hash end end # Returns the index (Array) in the I line. # # an argument: :string, :float, :zscore or :integer def index(type = :float) aa = %w( A R N D C Q E G H I L K M F P S T W Y V ) values = field_fetch('I', 1).split(' ') if values.size != 20 raise "Invalid format in #{entry_id} : #{values.inspect}" end if type == :zscore and values.size > 0 sum = 0.0 values.each do |a| sum += a.to_f end mean = sum / values.size # / 20 var = 0.0 values.each do |a| var += (a.to_f - mean) ** 2 end sd = Math.sqrt(var) end if type == :integer figure = 0 values.each do |a| figure = [ figure, a[/\..*/].length - 1 ].max end end hash = {} aa.each_with_index do |a, i| case type when :string hash[a] = values[i] when :float hash[a] = values[i].to_f when :zscore hash[a] = (values[i].to_f - mean) / sd when :integer hash[a] = (values[i].to_f * 10 ** figure).to_i end end return hash end end # == Description # # Parser class for AAindex2, Amino Acid Index Database. # * AAindex Help: https://www.genome.jp/aaindex/aaindex_help.html # # == Examples # # # auto-detection of data format by using Bio::AAindex class # aax2 = Bio::AAindex.auto("DAYM780301.aaindex2") # # # parse a file and get contents # aax2 = Bio::AAindex2.new("DAYM780301.aaindex2") # aax2.entry_id # aax2.matrix # aax2.matrix[2,2] # aax2.matrix('R', 'A') # aax2['R', 'A'] # # == References # # * http://www.genome.jp/aaindex/ # class AAindex2 < AAindex def initialize(entry) super(entry) end # Returns row labels. def rows if @data['rows'] @data['rows'] else label_data @rows end end # Returns col labels. def cols if @data['cols'] @data['cols'] else label_data @cols end end # Returns the value of amino acids substitution (aa1 -> aa2). def [](aa1 = nil, aa2 = nil) matrix[cols.index(aa1), rows.index(aa2)] end # Returns amino acids matrix in Matrix. def matrix(aa1 = nil, aa2 = nil) return self[aa1, aa2] if aa1 and aa2 if @data['matrix'] @data['matrix'] else ma = [] label_data.each_line do |line| ma << line.strip.split(/\s+/).map {|x| x.to_f } end ma_len = ma.size ma.each do |row| row_size = row.size if row_size < ma_len (row_size..ma_len-1).each do |i| row[i] = ma[i][row_size-1] end end end mat = Matrix[*ma] @data['matrix'] = mat end end # Returns amino acids matrix in Matrix for the old format (<= ver 5.0). def old_matrix # for AAindex <= ver 5.0 return @data['matrix'] if @data['matrix'] @aa = {} # used to determine row/column of the aa attr_reader :aa alias_method :aa, :rows alias_method :aa, :cols field = field_fetch('I') case field when / (ARNDCQEGHILKMFPSTWYV)\s+(.*)/ # 20x19/2 matrix aalist = $1 values = $2.split(/\s+/) 0.upto(aalist.length - 1) do |i| @aa[aalist[i].chr] = i end ma = Array.new 20.times do ma.push(Array.new(20)) # 2D array of 20x(20) end for i in 0 .. 19 do for j in i .. 19 do ma[i][j] = values[i + j*(j+1)/2].to_f ma[j][i] = ma[i][j] end end @data['matrix'] = Matrix[*ma] when / -ARNDCQEGHILKMFPSTWYV / # 21x20/2 matrix (with gap) raise NotImplementedError when / ACDEFGHIKLMNPQRSTVWYJ- / # 21x21 matrix (with gap) raise NotImplementedError end end private def label_data if @data['data'] @data['data'] else label, data = get('M').split("\n", 2) if /M rows = (\S+), cols = (\S+)/.match(label) rows, cols = $1, $2 @rows = rows.split('') @cols = cols.split('') end @data['data'] = data end end end # class AAindex2 end # module Bio bio-2.0.3/lib/bio/db/pdb.rb0000644000175000017500000000100514141516614014610 0ustar nileshnilesh# # = bio/db/pdb.rb - PDB database classes # # Copyright:: Copyright (C) 2004 # GOTO Naohisa # License:: The Ruby License # # # definition of the PDB class module Bio class PDB autoload :ChemicalComponent, 'bio/db/pdb/chemicalcomponent' end #class PDB end #module Bio # require other files under pdb directory require 'bio/db/pdb/utils' require 'bio/db/pdb/atom' require 'bio/db/pdb/residue' require 'bio/db/pdb/chain' require 'bio/db/pdb/model' require 'bio/db/pdb/pdb' bio-2.0.3/lib/bio/compat/0000755000175000017500000000000014141516614014420 5ustar nileshnileshbio-2.0.3/lib/bio/compat/references.rb0000644000175000017500000000713414141516614017073 0ustar nileshnilesh# # = bio/compat/references.rb - Obsoleted References class # # Copyright:: Copyright (C) 2008 # Toshiaki Katayama , # Ryan Raaum , # Jan Aerts , # Naohisa Goto # License:: The Ruby License # # $Id: references.rb,v 1.1.2.1 2008/03/04 10:07:49 ngoto Exp $ # # == Description # # The Bio::References class was obsoleted after BioRuby 1.2.1. # To keep compatibility, some wrapper methods are provided in this file. # As the compatibility methods (and Bio::References) will soon be removed, # Please change your code not to use Bio::References. # # Note that Bio::Reference is different from Bio::References. # Bio::Reference still exists for storing a reference information # in sequence entries. module Bio # = DESCRIPTION # # This class is OBSOLETED, and will soon be removed. # Instead of this class, an array is to be used. # # # A container class for Bio::Reference objects. # # = USAGE # # This class should NOT be used. # # refs = Bio::References.new # refs.append(Bio::Reference.new(hash)) # refs.each do |reference| # ... # end # class References # module to keep backward compatibility with obsoleted Bio::References module BackwardCompatibility #:nodoc: # Backward compatibility with Bio::References#references. # Now, references are stored in an array, and # you should change your code not to use this method. def references warn 'Bio::References is obsoleted. Now, references are stored in an array.' self end # Backward compatibility with Bio::References#append. # Now, references are stored in an array, and # you should change your code not to use this method. def append(reference) warn 'Bio::References is obsoleted. Now, references are stored in an array.' self.push(reference) if reference.is_a? Reference self end end #module BackwardCompatibility # This method should not be used. # Only for backward compatibility of existing code. # # Since Bio::References is obsoleted, # Bio::References.new not returns Bio::References object, # but modifies given _ary_ and returns the _ary_. # # *Arguments*: # * (optional) __: Array of Bio::Reference objects # *Returns*:: the given array def self.new(ary = []) warn 'Bio::References is obsoleted. Some methods are added to given array to keep backward compatibility.' ary.extend(BackwardCompatibility) ary end # Array of Bio::Reference objects attr_accessor :references # Normally, users can not call this method. # # Create a new Bio::References object # # refs = Bio::References.new # --- # *Arguments*: # * (optional) __: Array of Bio::Reference objects # *Returns*:: Bio::References object def initialize(ary = []) @references = ary end # Add a Bio::Reference object to the container. # # refs.append(reference) # --- # *Arguments*: # * (required) _reference_: Bio::Reference object # *Returns*:: current Bio::References object def append(reference) @references.push(reference) if reference.is_a? Reference return self end # Iterate through Bio::Reference objects. # # refs.each do |reference| # ... # end # --- # *Block*:: yields each Bio::Reference object def each @references.each do |reference| yield reference end end end #class References end #module Bio bio-2.0.3/lib/bio/compat/features.rb0000644000175000017500000001077314141516614016573 0ustar nileshnilesh# # = bio/compat/features.rb - Obsoleted Features class # # Copyright:: Copyright (c) 2002, 2005 Toshiaki Katayama # 2006 Jan Aerts # 2008 Naohisa Goto # License:: The Ruby License # # $Id: features.rb,v 1.1.2.2 2008/03/10 13:42:26 ngoto Exp $ # # == Description # # The Bio::Features class was obsoleted after BioRuby 1.2.1. # To keep compatibility, some wrapper methods are provided in this file. # As the compatibility methods (and Bio::Features) will soon be removed, # Please change your code not to use Bio::Features. # # Note that Bio::Feature is different from the Bio::Features. # Bio::Feature still exists to store DDBJ/GenBank/EMBL feature information. require 'bio/location' module Bio # = DESCRIPTION # # This class is OBSOLETED, and will soon be removed. # Instead of this class, an array is to be used. # # # Container for a list of Feature objects. # # = USAGE # # First, create some Bio::Feature objects # feature1 = Bio::Feature.new('intron','3627..4059') # feature2 = Bio::Feature.new('exon','4060..4236') # feature3 = Bio::Feature.new('intron','4237..4426') # feature4 = Bio::Feature.new('CDS','join(2538..3626,4060..4236)', # [ Bio::Feature::Qualifier.new('gene', 'CYP2D6'), # Bio::Feature::Qualifier.new('translation','MGXXTVMHLL...') # ]) # # # And create a container for them # feature_container = Bio::Features.new([ feature1, feature2, feature3, feature4 ]) # # # Iterate over all features and print # feature_container.each do |feature| # puts feature.feature + "\t" + feature.position # feature.each do |qualifier| # puts "- " + qualifier.qualifier + ": " + qualifier.value # end # end # # # Iterate only over CDS features and extract translated amino acid sequences # features.each("CDS") do |feature| # hash = feature.to_hash # name = hash["gene"] || hash["product"] || hash["note"] # aaseq = hash["translation"] # pos = feature.position # if name and seq # puts ">#{gene} #{feature.position}" # puts aaseq # end # end class Features # module to keep backward compatibility with obsoleted Bio::Features module BackwardCompatibility #:nodoc: # Backward compatibility with Bio::Features#features. # Now, features are stored in an array, and # you should change your code not to use this method. def features warn 'Bio::Features is obsoleted. Now, features are stored in an array.' self end # Backward compatibility with Bio::Features#append. # Now, references are stored in an array, and # you should change your code not to use this method. def append(feature) warn 'Bio::Features is obsoleted. Now, features are stored in an array.' self.push(feature) if feature.is_a? Feature self end end #module BackwardCompatibility # This method should not be used. # Only for backward compatibility of existing code. # # Since Bio::Features is obsoleted, # Bio::Features.new not returns Bio::Features object, # but modifies given _ary_ and returns the _ary_. # # *Arguments*: # * (optional) __: Array of Bio::Feature objects # *Returns*:: the given array def self.new(ary = []) warn 'Bio::Features is obsoleted. Some methods are added to given array to keep backward compatibility.' ary.extend(BackwardCompatibility) ary end # Normally, users can not call this method. # # Create a new Bio::Features object. # # *Arguments*: # * (optional) _list of features_: list of Bio::Feature objects # *Returns*:: Bio::Features object def initialize(ary = []) @features = ary end # Returns an Array of Feature objects. attr_accessor :features # Appends a Feature object to Features. # # *Arguments*: # * (required) _feature_: Bio::Feature object # *Returns*:: Bio::Features object def append(a) @features.push(a) if a.is_a? Feature return self end # Iterates on each feature object. # # *Arguments*: # * (optional) _key_: if specified, only iterates over features with this key def each(arg = nil) @features.each do |x| next if arg and x.feature != arg yield x end end # Short cut for the Features#features[n] def [](*arg) @features[*arg] end # Short cut for the Features#features.first def first @features.first end # Short cut for the Features#features.last def last @features.last end end # Features end # Bio bio-2.0.3/lib/bio/feature.rb0000644000175000017500000000765514141516614015132 0ustar nileshnilesh# # = bio/feature.rb - Features/Feature class (GenBank Feature table) # # Copyright:: Copyright (c) 2002, 2005 Toshiaki Katayama # 2006 Jan Aerts # License:: The Ruby License # module Bio autoload :Locations, 'bio/location' unless const_defined?(:Locations) # = DESCRIPTION # Container for the sequence annotation. # # = USAGE # # Create a Bio::Feature object. # # For example: the GenBank-formatted entry in genbank for accession M33388 # # contains the following feature: # # exon 1532..1799 # # /gene="CYP2D6" # # /note="cytochrome P450 IID6; GOO-132-127" # # /number="1" # feature = Bio::Feature.new('exon','1532..1799') # feature.append(Bio::Feature::Qualifier.new('gene', 'CYP2D6')) # feature.append(Bio::Feature::Qualifier.new('note', 'cytochrome P450 IID6')) # feature.append(Bio::Feature::Qualifier.new('number', '1')) # # # or all in one go: # feature2 = Bio::Feature.new('exon','1532..1799', # [ Bio::Feature::Qualifier.new('gene', 'CYP2D6'), # Bio::Feature::Qualifier.new('note', 'cytochrome P450 IID6; GOO-132-127'), # Bio::Feature::Qualifier.new('number', '1') # ]) # # # Print the feature # puts feature.feature + "\t" + feature.position # feature.each do |qualifier| # puts "- " + qualifier.qualifier + ": " + qualifier.value # end # # = REFERENCES # INSD feature table definition:: http://www.ddbj.nig.ac.jp/FT/full_index.html class Feature # Create a new Bio::Feature object. # *Arguments*: # * (required) _feature_: type of feature (e.g. "exon") # * (required) _position_: position of feature (e.g. "complement(1532..1799)") # * (opt) _qualifiers_: list of Bio::Feature::Qualifier objects (default: []) # *Returns*:: Bio::Feature object def initialize(feature = '', position = '', qualifiers = []) @feature, @position, @qualifiers = feature, position, qualifiers end # Returns type of feature in String (e.g 'CDS', 'gene') attr_accessor :feature # Returns position of the feature in String (e.g. 'complement(123..146)') attr_accessor :position # Returns an Array of Qualifier objects. attr_accessor :qualifiers # Returns a Bio::Locations object translated from the position string. def locations Locations.new(@position) end # Appends a Qualifier object to the Feature. # # *Arguments*: # * (required) _qualifier_: Bio::Feature::Qualifier object # *Returns*:: Bio::Feature object def append(a) @qualifiers.push(a) if a.is_a? Qualifier return self end # Iterates on each qualifier object. # # *Arguments*: # * (optional) _key_: if specified, only iterates over qualifiers with this key def each(arg = nil) @qualifiers.each do |x| next if arg and x.qualifier != arg yield x end end # Returns a Hash constructed from qualifier objects. def assoc STDERR.puts "Bio::Feature#assoc is deprecated, use Bio::Feature#to_hash instead" if $DEBUG hash = Hash.new @qualifiers.each do |x| hash[x.qualifier] = x.value end return hash end # Returns a Hash constructed from qualifier objects. def to_hash hash = Hash.new @qualifiers.each do |x| hash[x.qualifier] ||= [] hash[x.qualifier] << x.value end return hash end # Short cut for the Bio::Feature#to_hash[key] def [](key) self.to_hash[key] end # Container for qualifier-value pairs for sequence features. class Qualifier # Creates a new Bio::Feature::Qualifier object # # *Arguments*: # * (required) _key_: key of the qualifier (e.g. "gene") # * (required) _value_: value of the qualifier (e.g. "CYP2D6") # *Returns*:: Bio::Feature::Qualifier object def initialize(key, value) @qualifier, @value = key, value end # Qualifier name in String attr_reader :qualifier # Qualifier value in String attr_reader :value end #Qualifier end #Feature end # Bio bio-2.0.3/lib/bio/io/0000755000175000017500000000000014141516614013544 5ustar nileshnileshbio-2.0.3/lib/bio/io/das.rb0000644000175000017500000002735614141516614014655 0ustar nileshnilesh# # = bio/io/das.rb - BioDAS access module # # Copyright:: Copyright (C) 2003, 2004, 2007 # Shuichi Kawashima , # Toshiaki Katayama # License:: The Ruby License # # #-- # == TODO # # link, stylesheet # #++ # begin require 'rexml/document' rescue LoadError end require 'bio/command' require 'bio/sequence' module Bio class DAS # Specify DAS server to connect def initialize(url = 'http://www.wormbase.org:80/db/') @server = url.chomp('/') end def dna(dsn, entry_point, start, stop) seg = Bio::DAS::SEGMENT.region(entry_point, start, stop) self.get_dna(dsn, seg).first.sequence end def features(dsn, entry_point, start, stop) seg = Bio::DAS::SEGMENT.region(entry_point, start, stop) self.get_features(dsn, seg) end # Returns an Array of Bio::DAS::DSN def get_dsn ary = [] result = Bio::Command.post_form("#{@server}/das/dsn") doc = REXML::Document.new(result.body) doc.elements.each('/descendant::DSN') do |ee| dsn = DSN.new ee.elements.each do |e| case e.name when 'SOURCE' dsn.source = e.text dsn.source_id = e.attributes['id'] dsn.source_version = e.attributes['version'] when 'MAPMASTER' dsn.mapmaster = e.text when 'DESCRIPTION' dsn.description = e.text dsn.description_href = e.attributes['href'] end end ary << dsn end ary end # Returns Bio::DAS::ENTRY_POINT. # The 'dsn' can be a String or a Bio::DAS::DSN object. def get_entry_points(dsn) entry_point = ENTRY_POINT.new if dsn.instance_of?(Bio::DAS::DSN) src = dsn.source else src = dsn end result = Bio::Command.post_form("#{@server}/das/#{src}/entry_points") doc = REXML::Document.new(result.body) doc.elements.each('/descendant::ENTRY_POINTS') do |ee| entry_point.href = ee.attributes['href'] entry_point.version = ee.attributes['version'] ee.elements.each do |e| segment = SEGMENT.new segment.entry_id = e.attributes['id'] segment.start = e.attributes['start'] segment.stop = e.attributes['stop'] || e.attributes['size'] segment.orientation = e.attributes['orientation'] segment.subparts = e.attributes['subparts'] segment.description = e.text entry_point.segments << segment end end entry_point end # Returns an Array of Bio::DAS::DNA. # The 'dsn' can be a String or a Bio::DAS::DSN object. # The 'segments' can be a Bio::DAS::SEGMENT object or an Array of # Bio::DAS::SEGMENT def get_dna(dsn, segments) ary = [] dsn = dsn.source if dsn.instance_of?(DSN) segments = [segments] if segments.instance_of?(SEGMENT) opts = [] segments.each do |s| opts << "segment=#{s.entry_id}:#{s.start},#{s.stop}" end result = Bio::Command.post_form("#{@server}/das/#{dsn}/dna", opts) doc = REXML::Document.new(result.body) doc.elements.each('/descendant::SEQUENCE') do |e| sequence = DNA.new sequence.entry_id = e.attributes['id'] sequence.start = e.attributes['start'] sequence.stop = e.attributes['stop'] sequence.version = e.attributes['version'] e.elements.each do |el| sequence.sequence = Bio::Sequence::NA.new(el.text) sequence.length = el.attributes['length'].to_i end ary << sequence end ary end # Returns an Array of Bio::DAS::SEQUENCE. # The 'dsn' can be a String or a Bio::DAS::DSN object. # The 'segments' can be a Bio::DAS::SEGMENT object or an Array of # Bio::DAS::SEGMENT def get_sequence(dsn, segments) ary = [] dsn = dsn.source if dsn.instance_of?(DSN) segments = [segments] if segments.instance_of?(SEGMENT) opts = [] segments.each do |s| opts << "segment=#{s.entry_id}:#{s.start},#{s.stop}" end result = Bio::Command.post_form("#{@server}/das/#{dsn}/sequence", opts) doc = REXML::Document.new(result.body) doc.elements.each('/descendant::SEQUENCE') do |e| sequence = SEQUENCE.new sequence.entry_id = e.attributes['id'] sequence.start = e.attributes['start'] sequence.stop = e.attributes['stop'] sequence.moltype = e.attributes['moltype'] sequence.version = e.attributes['version'] case sequence.moltype when /dna|rna/i # 'DNA', 'ssRNA', 'dsRNA' sequence.sequence = Bio::Sequence::NA.new(e.text) when /protein/i # 'Protein sequence.sequence = Bio::Sequence::AA.new(e.text) else sequence.sequence = e.text end ary << sequence end ary end # Returns a Bio::DAS::TYPES object. # The 'dsn' can be a String or a Bio::DAS::DSN object. # The 'segments' is optional and can be a Bio::DAS::SEGMENT object or # an Array of Bio::DAS::SEGMENT def get_types(dsn, segments = []) # argument 'type' is deprecated types = TYPES.new dsn = dsn.source if dsn.instance_of?(DSN) segments = [segments] if segments.instance_of?(SEGMENT) opts = [] segments.each do |s| opts << "segment=#{s.entry_id}:#{s.start},#{s.stop}" end result = Bio::Command.post_form("#{@server}/das/#{dsn}/types", opts) doc = REXML::Document.new(result.body) doc.elements.each('/descendant::GFF') do |ee| types.version = ee.attributes['version'] types.href = ee.attributes['href'] ee.elements.each do |e| segment = SEGMENT.new segment.entry_id = e.attributes['id'] segment.start = e.attributes['start'] segment.stop = e.attributes['stop'] segment.version = e.attributes['version'] segment.label = e.attributes['label'] e.elements.each do |el| t = TYPE.new t.entry_id = el.attributes['id'] t.method = el.attributes['method'] t.category = el.attributes['category'] t.count = el.text.to_i segment.types << t end types.segments << segment end end types end # Returns a Bio::DAS::GFF object. # The 'dsn' can be a String or a Bio::DAS::DSN object. # The 'segments' is optional and can be a Bio::DAS::SEGMENT object or # an Array of Bio::DAS::SEGMENT def get_features(dsn, segments = [], categorize = false, feature_ids = [], group_ids = []) # arguments 'type' and 'category' are deprecated gff = GFF.new dsn = dsn.source if dsn.instance_of?(DSN) segments = [segments] if segments.instance_of?(SEGMENT) opts = [] segments.each do |s| opts << "segment=#{s.entry_id}:#{s.start},#{s.stop}" end if categorize opts << "categorize=yes" # default is 'no' end feature_ids.each do |fid| opts << "feature_id=#{fid}" end group_ids.each do |gid| opts << "group_id=#{gid}" end result = Bio::Command.post_form("#{@server}/das/#{dsn}/features", opts) doc = REXML::Document.new(result.body) doc.elements.each('/descendant::GFF') do |elem| gff.version = elem.attributes['version'] gff.href = elem.attributes['href'] elem.elements.each('SEGMENT') do |ele| segment = SEGMENT.new segment.entry_id = ele.attributes['id'] segment.start = ele.attributes['start'] segment.stop = ele.attributes['stop'] segment.version = ele.attributes['version'] segment.label = ele.attributes['label'] ele.elements.each do |el| feature = FEATURE.new feature.entry_id = el.attributes['id'] feature.label = el.attributes['label'] el.elements.each do |e| case e.name when 'TYPE' type = TYPE.new type.entry_id = e.attributes['id'] type.category = e.attributes['category'] type.reference = e.attributes['referrence'] type.label = e.text feature.types << type when 'METHOD' feature.method_id = e.attributes['id'] feature.method = e.text when 'START' feature.start = e.text when 'STOP', 'END' feature.stop = e.text when 'SCORE' feature.score = e.text when 'ORIENTATION' feature.orientation = e.text when 'PHASE' feature.phase = e.text when 'NOTE' feature.notes << e.text when 'LINK' link = LINK.new link.href = e.attributes['href'] link.text = e.text feature.links << link when 'TARGET' target = TARGET.new target.entry_id = e.attributes['id'] target.start = e.attributes['start'] target.stop = e.attributes['stop'] target.name = e.text feature.targets << target when 'GROUP' group = GROUP.new group.entry_id = e.attributes['id'] group.label = e.attributes['label'] group.type = e.attributes['type'] e.elements.each do |ee| case ee.name when 'NOTE' # in GROUP group.notes << ee.text when 'LINK' # in GROUP link = LINK.new link.href = ee.attributes['href'] link.text = ee.text group.links << link when 'TARGET' # in GROUP target = TARGET.new target.entry_id = ee.attributes['id'] target.start = ee.attributes['start'] target.stop = ee.attributes['stop'] target.name = ee.text group.targets << target end end feature.groups << group end end segment.features << feature end gff.segments << segment end end gff end class DSN attr_accessor :source, :source_id, :source_version, :mapmaster, :description, :description_href end class ENTRY_POINT def initialize @segments = Array.new end attr_reader :segments attr_accessor :href, :version def each @segments.each do |x| yield x end end end class SEGMENT def self.region(entry_id, start, stop) segment = self.new segment.entry_id = entry_id segment.start = start segment.stop = stop return segment end def initialize @features = Array.new # for FEATURE @types = Array.new # for TYPE end attr_accessor :entry_id, :start, :stop, :orientation, :description, :subparts, # optional :features, :version, :label, # for FEATURE :types # for TYPE end class DNA attr_accessor :entry_id, :start, :stop, :version, :sequence, :length end class SEQUENCE attr_accessor :entry_id, :start, :stop, :moltype, :version, :sequence end class TYPES < ENTRY_POINT; end class TYPE attr_accessor :entry_id, :method, :category, :count, :reference, :label # for FEATURE end class GFF def initialize @segments = Array.new end attr_reader :segments attr_accessor :version, :href end class FEATURE def initialize @notes = Array.new @links = Array.new @types = Array.new @targets = Array.new @groups = Array.new end attr_accessor :entry_id, :label, :method_id, :method, :start, :stop, :score, :orientation, :phase attr_reader :notes, :links, :types, :targets, :groups end class LINK attr_accessor :href, :text end class TARGET attr_accessor :entry_id, :start, :stop, :name end class GROUP def initialize @notes = Array.new @links = Array.new @targets = Array.new end attr_accessor :entry_id, :label, :type attr_reader :notes, :links, :targets end end end # module Bio bio-2.0.3/lib/bio/io/hinv.rb0000644000175000017500000002676114141516614015051 0ustar nileshnilesh# # = bio/io/hinv.rb - H-invDB web service (REST) client module # # Copyright:: Copyright (C) 2008 Toshiaki Katayama # License:: The Ruby License # # require 'bio/command' require 'rexml/document' module Bio # = Bio::Hinv # # Accessing the H-invDB web services. # # * http://www.h-invitational.jp/ # * http://h-invitational.jp/hinv/hws/doc/index.html # class Hinv BASE_URI = "http://h-invitational.jp/hinv/hws/" module Common def query(options = nil) response = Bio::Command.post_form(@url, options) @result = response.body @xml = REXML::Document.new(@result) end end # Bio::Hinv.acc2hit("BC053657") # => "HIT000053961" def self.acc2hit(acc) serv = Acc2hit.new serv.query("acc" => acc) serv.result end # Bio::Hinv.hit2acc("HIT000022181") # => "AK097327" def self.hit2acc(hit) serv = Hit2acc.new serv.query("hit" => hit) serv.result end # Bio::Hinv.hit_cnt # => 187156 def self.hit_cnt serv = HitCnt.new serv.query serv.result end # Bio::Hinv.hit_definition("HIT000000001") # => "Rho guanine ..." def self.hit_definition(hit) serv = HitDefinition.new serv.query("hit" => hit) serv.result end # Bio::Hinv.hit_pubmedid("HIT000053961") # => [7624364, 11279095, ... ] def self.hit_pubmedid(hit) serv = HitPubmedId.new serv.query("hit" => hit) serv.result end # Bio::Hinv.hit_xml("HIT000000001") # => " hit) puts serv.result end # Bio::Hinv.hix2hit("HIX0000004") # => ["HIT000012846", ... ] def self.hix2hit(hix) serv = Bio::Hinv::Hix2hit.new serv.query("hix" => hix) serv.result end # Bio::Hinv.hix_cnt # => 36073 def self.hix_cnt serv = HixCnt.new serv.query serv.result end # Bio::Hinv.hix_represent("HIX0000001") # => "HIT000022181" def self.hix_represent(hix) serv = HixRepresent.new serv.query("hix" => hix) serv.result end # Bio::Hinv.id_search("HIT00002218*") # => ["HIT000022181", ... ] def self.id_search(query) serv = IdSearch.new serv.query("query" => query) serv.result end # Bio::Hinv.keyword_search("HIT00002218*") # => ["HIT000022181", ... ] def self.keyword_search(query) serv = KeywordSearch.new serv.query("query" => query) serv.result end # serv = Bio::Hinv::Acc2hit.new # serv.query("acc" => "BC053657") # puts serv.result class Acc2hit include Common def initialize @url = BASE_URI + "acc2hit.php" end # # # HIT000053961 # def result @xml.elements['//H-INVITATIONAL-ID'].text end end # serv = Bio::Hinv::Hit2acc.new # serv.query("hit" => "HIT000022181") # puts serv.result class Hit2acc include Common def initialize @url = BASE_URI + "hit2acc.php" end # # # AK097327 # def result @xml.elements['//ACCESSION-NO'].text end end # serv = Bio::Hinv::HitCnt.new # serv.query # puts serv.result class HitCnt include Common def initialize @url = BASE_URI + "hit_cnt.php" end # # # 187156 # def result @xml.elements['//TRANSCRIPT_CNT'].text.to_i end end # serv = Bio::Hinv::HitDefinition.new # serv.query("hit" => "HIT000000001") # puts serv.result # puts serv.data_source_definition # puts serv.cdna_rep_h_invitational # puts serv.cdna_splicing_isoform_curation # puts serv.data_source_db_reference_protein_motif_id # puts serv.data_source_identity # puts serv.data_source_coverage # puts serv.data_source_homologous_species # puts serv.data_source_similarity_category class HitDefinition include Common def initialize @url = BASE_URI + "hit_definition.php" end # # # # HIT000000001 # Rho guanine nucleotide exchange factor 10. # Representative transcript # # NP_055444 # 100.0 # 100.0 # Homo sapiens # Identical to known human protein(Category I). # # def result @xml.elements['//DATA-SOURCE_DEFINITION'].text end alias :data_source_definition :result def cdna_rep_h_invitational @xml.elements['//CDNA_REP-H-INVITATIONAL'].text end def cdna_splicing_isoform_curation @xml.elements['//CDNA_SPLICING-ISOFORM_CURATION'].text end def data_source_db_reference_protein_motif_id @xml.elements['//DATA-SOURCE_DB-REFERENCE_PROTEIN-MOTIF-ID'].text end def data_source_identity @xml.elements['//DATA-SOURCE_IDENTITY'].text.to_f end def data_source_coverage @xml.elements['//DATA-SOURCE_COVERAGE'].text.to_f end def data_source_homologous_species @xml.elements['//DATA-SOURCE_HOMOLOGOUS_SPECIES'].text end def data_source_similarity_category @xml.elements['//DATA-SOURCE_SIMILARITY-CATEGORY'].text end end # serv = Bio::Hinv::HitPubmedId.new # serv.query("hit" => "HIT000053961") # puts serv.result class HitPubmedId include Common def initialize @url = BASE_URI + "hit_pubmedid.php" end # # # 7624364 # 11279095 # 15489334 # def result list = [] @xml.elements.each('//CDNA_DB-REFERENCE_PUBMED') do |e| list << e.text.to_i end return list end end # serv = Bio::Hinv::HitXML.new # serv.query("hit" => "HIT000000001") # puts serv.result class HitXML include Common def initialize @url = BASE_URI + "hit_xml.php" end # # # # HIX0021591 # HIX0021591.11 # HIT000000001 # : # # # # def result @result end end # serv = Bio::Hinv::Hix2hit.new # serv.query("hix" => "HIX0000004") # puts serv.result class Hix2hit include Common def initialize @url = BASE_URI + "hix2hit.php" end # # # HIT000012846 # HIT000022124 # HIT000007722 # : # HIT000262478 # def result list = [] @xml.elements.each('//H-INVITATIONAL-ID') do |e| list << e.text end return list end end # serv = Bio::Hinv::HixCnt.new # serv.query # puts serv.result class HixCnt include Common def initialize @url = BASE_URI + "hix_cnt.php" end # # # 36073 # def result @xml.elements['//LOCUS_CNT'].text.to_i end end # serv = Bio::Hinv::HixRepresent.new # serv.query("hix" => "HIX0000001") # puts serv.result # puts serv.rep_h_invitational_id # puts serv.rep_accession_no class HixRepresent include Common def initialize @url = BASE_URI + "hix_represent.php" end # # # # HIX0000001 # HIT000022181 # AK097327 # # def result @xml.elements['//REP-H-INVITATIONAL-ID'].text end alias :rep_h_invitational_id :result def rep_accession_no @xml.elements['//REP-ACCESSION-NO'].text end end # example at "http://www.jbirc.aist.go.jp/hinv/hws/doc/index_jp.html" # is for hit_xml.php (not for hix_xml.php) class HixXML end # serv = Bio::Hinv::KeywordSearch.new # serv.query("query" => "HIT00002218*", "start" => 1, "end" => 100) # puts serv.result # puts serv.size # puts serv.start # puts serv.end class KeywordSearch include Common def initialize @url = BASE_URI + "keyword_search.php" end def query(hash = {}) default = { "start" => 1, "end" => 100 } options = default.update(hash) super(options) end # # # HIT00002218* # 8 # 1 # 8 # HIT000022180 # HIT000022181 # HIT000022183 # HIT000022184 # HIT000022185 # HIT000022186 # HIT000022188 # HIT000022189 # def result list = [] @xml.elements.each('//H-INVITATIONAL-ID') do |e| list << e.text end return list end def size @xml.elements['//SIZE'].text.to_i end def start @xml.elements['//START'].text.to_i end def end @xml.elements['//END'].text.to_i end end # serv = Bio::Hinv::IdSearch.new # serv.query("query" => "HIT00002218*", "id_type" => "H-INVITATIONAL-ID", "start" => 1, "end" => 100) # puts serv.result # puts serv.size # puts serv.start # puts serv.end class IdSearch < KeywordSearch def initialize @url = BASE_URI + "id_search.php" end def query(hash = {}) default = { "id_type" => "H-INVITATIONAL-ID", "start" => 1, "end" => 100 } options = default.update(hash) super(options) end end end end bio-2.0.3/lib/bio/io/ncbirest.rb0000644000175000017500000007167214141516614015717 0ustar nileshnilesh# # = bio/io/ncbirest.rb - NCBI Entrez client module # # Copyright:: Copyright (C) 2008 Toshiaki Katayama # License:: The Ruby License # # $Id:$ # require 'thread' require 'bio/command' require 'bio/version' module Bio class NCBI # (Hash) Default parameters for Entrez (eUtils). # They may also be used for other NCBI services. ENTREZ_DEFAULT_PARAMETERS = { # Cited from # https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.Release_Notes # tool: # Name of application making the E-utility call. # Value must be a string with no internal spaces. 'tool' => "bioruby", # Cited from # https://www.ncbi.nlm.nih.gov/books/NBK25497/ # The value of email should be a complete and valid e-mail address # of the software developer and not that of a third-party end user. 'email' => 'staff@bioruby.org', } # Resets Entrez (eUtils) default parameters. # --- # *Returns*:: (Hash) default parameters def self.reset_entrez_default_parameters h = { 'tool' => "bioruby", 'email' => 'staff@bioruby.org', } ENTREZ_DEFAULT_PARAMETERS.clear ENTREZ_DEFAULT_PARAMETERS.update(h) end # Gets default email address for Entrez (eUtils). # --- # *Returns*:: String or nil def self.default_email ENTREZ_DEFAULT_PARAMETERS['email'] end # Sets default email address used for Entrez (eUtils). # It may also be used for other NCBI services. # # In https://www.ncbi.nlm.nih.gov/books/NBK25497/ # NCBI says: # "The value of email should be a complete and valid e-mail address of # the software developer and not that of a third-party end user." # # By default, email address of BioRuby staffs is set. # # From the above NCBI documentation, the tool and email value is used # only for unblocking IP addresses blocked by NCBI due to excess requests. # For the purpose, NCBI says: # "Please be aware that merely providing values for tool and email # in requests is not sufficient to comply with this policy; # these values must be registered with NCBI." # # Please use your own email address and tool name when registering # tool and email values to NCBI. # # --- # *Arguments*: # * (required) _str_: (String) email address # *Returns*:: same as given argument def self.default_email=(str) ENTREZ_DEFAULT_PARAMETERS['email'] = str end # Gets default tool name for Entrez (eUtils). # --- # *Returns*:: String or nil def self.default_tool ENTREZ_DEFAULT_PARAMETERS['tool'] end # Sets default tool name for Entrez (eUtils). # It may also be used for other NCBI services. # # In https://www.ncbi.nlm.nih.gov/books/NBK25497/ # NCBI says: # "The value of tool should be a string with no internal spaces that # uniquely identifies the software producing the request." # # "bioruby" is set by default. # Please use your own tool name when registering to NCBI. # # See the document of default_email= for more information. # # --- # *Arguments*: # * (required) _str_: (String) tool name # *Returns*:: same as given argument def self.default_tool=(str) ENTREZ_DEFAULT_PARAMETERS['tool'] = str end # == Description # # The Bio::NCBI::REST class provides REST client for the NCBI E-Utilities # # Entrez Programming Utilities Help: # # * https://www.ncbi.nlm.nih.gov/books/NBK25501/ # * ( redirected from http://www.ncbi.nlm.nih.gov/entrez/utils/ ) # class REST # Run retrieval scripts on weekends or between 9 pm and 5 am Eastern Time # weekdays for any series of more than 100 requests. # -> Not implemented yet in BioRuby # # Wait for 1/3 seconds. # NCBI's restriction is: "Make no more than 3 requests every 1 second.". NCBI_INTERVAL = 1.0 / 3.0 @@last_access = nil @@last_access_mutex = nil private # (Private) Sleeps until allowed to access. # --- # *Arguments*: # * (required) _wait_: wait unit time # *Returns*:: (undefined) def ncbi_access_wait(wait = NCBI_INTERVAL) @@last_access_mutex ||= Mutex.new @@last_access_mutex.synchronize { if @@last_access duration = Time.now - @@last_access if wait > duration sleep wait - duration end end @@last_access = Time.now } nil end # (Private) default parameters # --- # *Returns*:: Hash def default_parameters Bio::NCBI::ENTREZ_DEFAULT_PARAMETERS end # (Private) Sends query to NCBI. # --- # *Arguments*: # * (required) _serv_: (String) server URI string # * (required) _opts_: (Hash) parameters # *Returns*:: nil def ncbi_post_form(serv, opts) ncbi_check_parameters(opts) ncbi_access_wait #$stderr.puts opts.inspect response = Bio::Command.post_form(serv, opts) response end # (Private) Checks parameters as NCBI requires. # If no email or tool parameter, raises an error. # # NCBI announces that "Effective on # June 1, 2010, all E-utility requests, either using standard URLs or # SOAP, must contain non-null values for both the &tool and &email # parameters. Any E-utility request made after June 1, 2010 that does # not contain values for both parameters will return an error explaining # that these parameters must be included in E-utility requests." # --- # *Arguments*: # * (required) _opts_: Hash containing parameters # *Returns*:: (undefined) def ncbi_check_parameters(opts) #return if Time.now < Time.gm(2010,5,31) if opts['email'].to_s.empty? then raise 'Set email parameter for the query, or set Bio::NCBI.default_email = "(email address of the author of this software)"' end if opts['tool'].to_s.empty? then raise 'Set tool parameter for the query, or set Bio::NCBI.default_tool = "(your tool name)"' end nil end public # List the NCBI database names E-Utils (einfo) service # # * https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi # # pubmed protein nucleotide nuccore nucgss nucest structure genome # books cancerchromosomes cdd gap domains gene genomeprj gensat geo # gds homologene journals mesh ncbisearch nlmcatalog omia omim pmc # popset probe proteinclusters pcassay pccompound pcsubstance snp # taxonomy toolkit unigene unists # # == Usage # # ncbi = Bio::NCBI::REST.new # ncbi.einfo # # Bio::NCBI::REST.einfo # # --- # *Returns*:: array of string (database names) def einfo serv = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi" opts = default_parameters.merge({}) response = ncbi_post_form(serv, opts) result = response.body list = result.scan(/(.*?)<\/DbName>/m).flatten return list end # Search the NCBI database by given keywords using E-Utils (esearch) service # and returns an array of entry IDs. # # For information on the possible arguments, see # # * https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESearch # * ( redirected from http://eutils.ncbi.nlm.nih.gov/books/n/helpeutils/chapter4/#chapter4.ESearch ) # * ( redirected from http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html ) # # == Usage # # ncbi = Bio::NCBI::REST.new # ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"}) # ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"}) # ncbi.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5}) # # Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"}) # Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"}) # Bio::NCBI::REST.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5}) # # --- # *Arguments*: # * _str_: query string (required) # * _hash_: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"} # * _db_: "sequences", "nucleotide", "protein", "pubmed", "taxonomy", ... # * _retmode_: "text", "xml", "html", ... # * _rettype_: "gb", "medline", "count", ... # * _retmax_: integer (default 100) # * _retstart_: integer # * _field_: # * "titl": Title [TI] # * "tiab": Title/Abstract [TIAB] # * "word": Text words [TW] # * "auth": Author [AU] # * "affl": Affiliation [AD] # * "jour": Journal [TA] # * "vol": Volume [VI] # * "iss": Issue [IP] # * "page": First page [PG] # * "pdat": Publication date [DP] # * "ptyp": Publication type [PT] # * "lang": Language [LA] # * "mesh": MeSH term [MH] # * "majr": MeSH major topic [MAJR] # * "subh": Mesh sub headings [SH] # * "mhda": MeSH date [MHDA] # * "ecno": EC/RN Number [rn] # * "si": Secondary source ID [SI] # * "uid": PubMed ID (PMID) [UI] # * "fltr": Filter [FILTER] [SB] # * "subs": Subset [SB] # * _reldate_: 365 # * _mindate_: 2001 # * _maxdate_: 2002/01/01 # * _datetype_: "edat" # * _limit_: maximum number of entries to be returned (0 for unlimited; nil for the "retmax" value in the hash or the internal default value (=100)) # * _step_: maximum number of entries retrieved at a time # *Returns*:: array of entry IDs or a number of results def esearch(str, hash = {}, limit = nil, step = 10000) serv = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi" opts = default_parameters.merge({ "term" => str }) opts.update(hash) case opts["rettype"] when "count" count = esearch_count(str, opts) return count else retstart = 0 retstart = hash["retstart"].to_i if hash["retstart"] limit ||= hash["retmax"].to_i if hash["retmax"] limit ||= 100 # default limit is 100 limit = esearch_count(str, opts) if limit == 0 # unlimit list = [] 0.step(limit, step) do |i| retmax = [step, limit - i].min opts.update("retmax" => retmax, "retstart" => i + retstart) response = ncbi_post_form(serv, opts) result = response.body list += result.scan(/(.*?)<\/Id>/m).flatten end return list end end # *Arguments*:: same as esearch method # *Returns*:: array of entry IDs or a number of results def esearch_count(str, hash = {}) serv = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi" opts = default_parameters.merge({ "term" => str }) opts.update(hash) opts.update("rettype" => "count") response = ncbi_post_form(serv, opts) result = response.body count = result.scan(/(.*?)<\/Count>/m).flatten.first.to_i return count end # Retrieve database entries by given IDs and using E-Utils (efetch) service. # # For information on the possible arguments, see # # * https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch # # == Usage # # ncbi = Bio::NCBI::REST.new # ncbi.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"}) # ncbi.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml"}) # ncbi.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"}) # # Bio::NCBI::REST.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"}) # Bio::NCBI::REST.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb"}) # Bio::NCBI::REST.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"}) # # --- # *Arguments*: # * _ids_: list of NCBI entry IDs (required) # * _hash_: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"} # * _db_: "sequences", "nucleotide", "protein", "pubmed", "omim", ... # * _retmode_: "text", "xml", "html", ... # * _rettype_: "gb", "gbc", "medline", "count",... # * _step_: maximum number of entries retrieved at a time # *Returns*:: String def efetch(ids, hash = {}, step = 100) serv = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi" opts = default_parameters.merge({ "retmode" => "text" }) opts.update(hash) case ids when Array list = ids else list = ids.to_s.split(/\s*,\s*/) end result = "" 0.step(list.size, step) do |i| opts["id"] = list[i, step].join(',') unless opts["id"].empty? response = ncbi_post_form(serv, opts) result += response.body end end return result.strip #return result.strip.split(/\n\n+/) end def self.einfo self.new.einfo end def self.esearch(*args) self.new.esearch(*args) end def self.esearch_count(*args) self.new.esearch_count(*args) end def self.efetch(*args) self.new.efetch(*args) end # Shortcut methods for the ESearch service class ESearch # Search database entries by given keywords using E-Utils (esearch). # # * https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESearch # # sequences = gene + genome + nucleotide + protein + popset + snp # nucleotide = nuccore + nucest + nucgss # # * https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi # # pubmed protein nucleotide nuccore nucgss nucest structure genome # books cancerchromosomes cdd gap domains gene genomeprj gensat geo # gds homologene journals mesh ncbisearch nlmcatalog omia omim pmc # popset probe proteinclusters pcassay pccompound pcsubstance snp # taxonomy toolkit unigene unists # # == Usage # # Bio::NCBI::REST::ESearch.search("nucleotide", "tardigrada") # Bio::NCBI::REST::ESearch.count("nucleotide", "tardigrada") # # Bio::NCBI::REST::ESearch.nucleotide("tardigrada") # Bio::NCBI::REST::ESearch.popset("aldh2") # Bio::NCBI::REST::ESearch.taxonomy("tardigrada") # Bio::NCBI::REST::ESearch.pubmed("tardigrada", "reldate" => 365) # Bio::NCBI::REST::ESearch.pubmed("mammoth mitochondrial genome") # Bio::NCBI::REST::ESearch.pmc("Indonesian coelacanth genome Latimeria menadoensis") # Bio::NCBI::REST::ESearch.journal("bmc bioinformatics") # # ncbi = Bio::NCBI::REST::ESearch.new # ncbi.search("nucleotide", "tardigrada") # ncbi.count("nucleotide", "tardigrada") # # ncbi.nucleotide("tardigrada") # ncbi.popset("aldh2") # ncbi.taxonomy("tardigrada") # ncbi.pubmed("tardigrada", "reldate" => 365) # ncbi.pubmed("mammoth mitochondrial genome") # ncbi.pmc("Indonesian coelacanth genome Latimeria menadoensis") # ncbi.journal("bmc bioinformatics") # # --- # # *Arguments*: # * _term_: search keywords (required) # * _limit_: maximum number of entries to be returned (0 for unlimited) # * _hash_: hash of E-Utils option # *Returns*:: array of entry IDs or a number of results module Methods # search("nucleotide", "tardigrada") # search("nucleotide", "tardigrada", 0) # unlimited # search("pubmed", "tardigrada") # search("pubmed", "tardigrada", 5) # first five # search("pubmed", "tardigrada", "reldate" => 365) # within a year # search("pubmed", "tardigrada", 5, "reldate" => 365) # combination # search("pubmed", "tardigrada", {"reldate" => 365}, 5) # combination 2 # search("journals", "bmc", 10) def search(db, term, *args) limit = 100 hash = {} args.each do |arg| case arg when Hash hash.update(arg) else limit = arg.to_i end end opts = { "db" => db } opts.update(hash) Bio::NCBI::REST.esearch(term, opts, limit) end # count("nucleotide", "tardigrada") # count("pubmed", "tardigrada") # count("journals", "bmc") def count(db, term, hash = {}) opts = { "db" => db } opts.update(hash) Bio::NCBI::REST.esearch_count(term, opts) end # nucleotide("tardigrada") # nucleotide("tardigrada", 0) # pubmed("tardigrada") # pubmed("tardigrada", 5) # pubmed("tardigrada", "reldate" => 365) # pubmed("tardigrada", 5, "reldate" => 365) # pubmed("tardigrada", {"reldate" => 365}, 5) def method_missing(*args) self.search(*args) end # alias for journals def journal(*args) self.search("journals", *args) end # alias for "nucest" def est(*args) self.search("nucest", *args) end # alias for "nucgss" def gss(*args) self.search("nucgss", *args) end end # Methods include Methods extend Methods end # ESearch # Shortcut methods for the EFetch service class EFetch module Methods # Retrieve sequence entries by given IDs using E-Utils (efetch). # # * https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch # # sequences = gene + genome + nucleotide + protein + popset + snp # nucleotide = nuccore + nucest + nucgss # # format (rettype): # * native all but Gene ASN Default format for viewing sequences # * fasta all sequence FASTA view of a sequence # * gb NA sequence GenBank view for sequences # * gbc NA sequence INSDSeq structured flat file # * gbwithparts NA sequence GenBank CON division with sequences # * est dbEST sequence EST Report # * gss dbGSS sequence GSS Report # * gp AA sequence GenPept view # * gpc AA sequence INSDSeq structured flat file # * seqid all sequence Convert GIs into seqids # * acc all sequence Convert GIs into accessions # * chr dbSNP only SNP Chromosome Report # * flt dbSNP only SNP Flat File report # * rsr dbSNP only SNP RS Cluster report # * brief dbSNP only SNP ID list # * docset dbSNP only SNP RS summary # # == Usage # # Bio::NCBI::REST::EFetch.sequence("123,U12345,U12345.1,gb|U12345|") # # list = [123, "U12345.1", "gb|U12345|"] # Bio::NCBI::REST::EFetch.sequence(list) # Bio::NCBI::REST::EFetch.sequence(list, "fasta") # Bio::NCBI::REST::EFetch.sequence(list, "acc") # Bio::NCBI::REST::EFetch.sequence(list, "xml") # # Bio::NCBI::REST::EFetch.sequence("AE009950") # Bio::NCBI::REST::EFetch.sequence("AE009950", "gbwithparts") # # ncbi = Bio::NCBI::REST::EFetch.new # ncbi.sequence("123,U12345,U12345.1,gb|U12345|") # ncbi.sequence(list) # ncbi.sequence(list, "fasta") # ncbi.sequence(list, "acc") # ncbi.sequence(list, "xml") # ncbi.sequence("AE009950") # ncbi.sequence("AE009950", "gbwithparts") # # --- # # *Arguments*: # * _ids_: list of NCBI entry IDs (required) # * _format_: "gb", "gbc", "fasta", "acc", "xml" etc. # *Returns*:: String def sequence(ids, format = "gb", hash = {}) case format when "xml" format = "gbc" end opts = { "db" => "sequences", "rettype" => format } opts.update(hash) Bio::NCBI::REST.efetch(ids, opts) end # Retrieve nucleotide sequence entries by given IDs using E-Utils # (efetch). # # * https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch # nucleotide = nuccore + nucest + nucgss # # format (rettype): # * native all but Gene ASN Default format for viewing sequences # * fasta all sequence FASTA view of a sequence # * gb NA sequence GenBank view for sequences # * gbc NA sequence INSDSeq structured flat file # * gbwithparts NA sequence GenBank CON division with sequences # * est dbEST sequence EST Report # * gss dbGSS sequence GSS Report # * gp AA sequence GenPept view # * gpc AA sequence INSDSeq structured flat file # * seqid all sequence Convert GIs into seqids # * acc all sequence Convert GIs into accessions # * chr dbSNP only SNP Chromosome Report # * flt dbSNP only SNP Flat File report # * rsr dbSNP only SNP RS Cluster report # * brief dbSNP only SNP ID list # * docset dbSNP only SNP RS summary # # == Usage # # Bio::NCBI::REST::EFetch.nucleotide("123,U12345,U12345.1,gb|U12345|") # # list = [123, "U12345.1", "gb|U12345|"] # Bio::NCBI::REST::EFetch.nucleotide(list) # Bio::NCBI::REST::EFetch.nucleotide(list, "fasta") # Bio::NCBI::REST::EFetch.nucleotide(list, "acc") # Bio::NCBI::REST::EFetch.nucleotide(list, "xml") # # Bio::NCBI::REST::EFetch.nucleotide("AE009950") # Bio::NCBI::REST::EFetch.nucleotide("AE009950", "gbwithparts") # # ncbi = Bio::NCBI::REST::EFetch.new # ncbi.nucleotide("123,U12345,U12345.1,gb|U12345|") # ncbi.nucleotide(list) # ncbi.nucleotide(list, "fasta") # ncbi.nucleotide(list, "acc") # ncbi.nucleotide(list, "xml") # ncbi.nucleotide("AE009950") # ncbi.nucleotide("AE009950", "gbwithparts") # # --- # # *Arguments*: # * _ids_: list of NCBI entry IDs (required) # * _format_: "gb", "gbc", "fasta", "acc", "xml" etc. # *Returns*:: String def nucleotide(ids, format = "gb", hash = {}) case format when "xml" format = "gbc" end opts = { "db" => "nucleotide", "rettype" => format } opts.update(hash) Bio::NCBI::REST.efetch(ids, opts) end # Retrieve protein sequence entries by given IDs using E-Utils # (efetch). # # * https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch # protein # # format (rettype): # * native all but Gene ASN Default format for viewing sequences # * fasta all sequence FASTA view of a sequence # * gb NA sequence GenBank view for sequences # * gbc NA sequence INSDSeq structured flat file # * gbwithparts NA sequence GenBank CON division with sequences # * est dbEST sequence EST Report # * gss dbGSS sequence GSS Report # * gp AA sequence GenPept view # * gpc AA sequence INSDSeq structured flat file # * seqid all sequence Convert GIs into seqids # * acc all sequence Convert GIs into accessions # * chr dbSNP only SNP Chromosome Report # * flt dbSNP only SNP Flat File report # * rsr dbSNP only SNP RS Cluster report # * brief dbSNP only SNP ID list # * docset dbSNP only SNP RS summary # # == Usage # # Bio::NCBI::REST::EFetch.protein("7527480,AAF63163.1,AAF63163") # # list = [ 7527480, "AAF63163.1", "AAF63163"] # Bio::NCBI::REST::EFetch.protein(list) # Bio::NCBI::REST::EFetch.protein(list, "fasta") # Bio::NCBI::REST::EFetch.protein(list, "acc") # Bio::NCBI::REST::EFetch.protein(list, "xml") # # ncbi = Bio::NCBI::REST::EFetch.new # ncbi.protein("7527480,AAF63163.1,AAF63163") # ncbi.protein(list) # ncbi.protein(list, "fasta") # ncbi.protein(list, "acc") # ncbi.protein(list, "xml") # # --- # # *Arguments*: # * _ids_: list of NCBI entry IDs (required) # * _format_: "gp", "gpc", "fasta", "acc", "xml" etc. # *Returns*:: String def protein(ids, format = "gp", hash = {}) case format when "xml" format = "gpc" end opts = { "db" => "protein", "rettype" => format } opts.update(hash) Bio::NCBI::REST.efetch(ids, opts) end # Retrieve PubMed entries by given IDs using E-Utils (efetch). # # * https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch # # == Usage # # Bio::NCBI::REST::EFetch.pubmed(15496913) # Bio::NCBI::REST::EFetch.pubmed("15496913,11181995") # # list = [15496913, 11181995] # Bio::NCBI::REST::EFetch.pubmed(list) # Bio::NCBI::REST::EFetch.pubmed(list, "abstract") # Bio::NCBI::REST::EFetch.pubmed(list, "citation") # Bio::NCBI::REST::EFetch.pubmed(list, "medline") # Bio::NCBI::REST::EFetch.pubmed(list, "xml") # # ncbi = Bio::NCBI::REST::EFetch.new # ncbi.pubmed(list) # ncbi.pubmed(list, "abstract") # ncbi.pubmed(list, "citation") # ncbi.pubmed(list, "medline") # ncbi.pubmed(list, "xml") # # --- # # *Arguments*: # * _ids_: list of PubMed entry IDs (required) # * _format_: "abstract", "citation", "medline", "xml" # *Returns*:: String def pubmed(ids, format = "medline", hash = {}) case format when "xml" format = "medline" mode = "xml" else mode = "text" end opts = { "db" => "pubmed", "rettype" => format, "retmode" => mode } opts.update(hash) Bio::NCBI::REST.efetch(ids, opts) end # Retrieve PubMed Central entries by given IDs using E-Utils (efetch). # # * https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch # # == Usage # # Bio::NCBI::REST::EFetch.pmc(1360101) # Bio::NCBI::REST::EFetch.pmc("1360101,534663") # # list = [1360101, 534663] # Bio::NCBI::REST::EFetch.pmc(list) # Bio::NCBI::REST::EFetch.pmc(list, "xml") # # ncbi = Bio::NCBI::REST::EFetch.new # ncbi.pmc(list) # ncbi.pmc(list, "xml") # # --- # # *Arguments*: # * _ids_: list of PubMed Central entry IDs (required) # * _format_: "docsum", "xml" # *Returns*:: String def pmc(ids, format = "docsum", hash = {}) case format when "xml" format = "medline" mode = "xml" else mode = "text" end opts = { "db" => "pmc", "rettype" => format, "retmode" => mode } Bio::NCBI::REST.efetch(ids, opts) end # Retrieve journal entries by given IDs using E-Utils (efetch). # # * https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch # # == Usage # # Bio::NCBI::REST::EFetch.journal(21854) # # list = [21854, 21855] # Bio::NCBI::REST::EFetch.journal(list) # Bio::NCBI::REST::EFetch.journal(list, "xml") # # ncbi = Bio::NCBI::REST::EFetch.new # ncbi.journal(list) # ncbi.journal(list, "xml") # # --- # # *Arguments*: # * _ids_: list of journal entry IDs (required) # * _format_: "full", "xml" # *Returns*:: String def journal(ids, format = "full", hash = {}) case format when "xml" format = "full" mode = "xml" else mode = "text" end opts = { "db" => "journals", "rettype" => format, "retmode" => mode } opts.update(hash) Bio::NCBI::REST.efetch(ids, opts) end # Retrieve OMIM entries by given IDs using E-Utils (efetch). # # * https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch # # == Usage # # Bio::NCBI::REST::EFetch.omim(143100) # # list = [143100, 602260] # Bio::NCBI::REST::EFetch.omim(list) # Bio::NCBI::REST::EFetch.omim(list, "xml") # # ncbi = Bio::NCBI::REST::EFetch.new # ncbi.omim(list) # ncbi.omim(list, "xml") # # --- # # *Arguments*: # * _ids_: list of OMIM entry IDs (required) # * _format_: "docsum", "synopsis", "variants", "detailed", "linkout", "xml" # *Returns*:: String def omim(ids, format = "detailed", hash = {}) case format when "xml" format = "full" mode = "xml" when "linkout" format = "ExternalLink" mode = "text" else mode = "text" end opts = { "db" => "omim", "rettype" => format, "retmode" => mode } opts.update(hash) Bio::NCBI::REST.efetch(ids, opts) end # Retrieve taxonomy entries by given IDs using E-Utils (efetch). # # * https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch # # == Usage # # Bio::NCBI::REST::EFetch.taxonomy(42241) # # list = [232323, 290179, 286681] # Bio::NCBI::REST::EFetch.taxonomy(list) # Bio::NCBI::REST::EFetch.taxonomy(list, "xml") # # ncbi = Bio::NCBI::REST::EFetch.new # ncbi.taxonomy(list) # ncbi.taxonomy(list, "xml") # # --- # # *Arguments*: # * _ids_: list of Taxonomy entry IDs (required) # * _format_: "brief", "docsum", "xml" # *Returns*:: String def taxonomy(ids, format = "docsum", hash = {}) case format when "xml" format = "full" mode = "xml" else mode = "text" end opts = { "db" => "taxonomy", "rettype" => format, "retmode" => mode } Bio::NCBI::REST.efetch(ids, opts) end end # Methods include Methods extend Methods end # EFetch end # REST end # NCBI end # Bio bio-2.0.3/lib/bio/io/togows.rb0000644000175000017500000003426214141516614015422 0ustar nileshnilesh# # = bio/io/togows.rb - REST interface for TogoWS # # Copyright:: Copyright (C) 2009 Naohisa Goto # License:: The Ruby License # # # Bio::TogoWS is a set of clients for the TogoWS web services # (http://togows.dbcls.jp/). # # * Bio::TogoWS::REST is a REST client for the TogoWS. # * Bio::TogoWS::SOAP will be implemented in the future. # require 'uri' require 'cgi' require 'bio/version' require 'bio/command' module Bio # Bio::TogoWS is a namespace for the TogoWS web services. module TogoWS # Internal Use Only. # # Bio::TogoWS::AccessWait is a module to implement a # private method for access. module AccessWait # common default access wait for TogoWS services TOGOWS_ACCESS_WAIT = 1 # Maximum waiting time to avoid dead lock. # When exceeding this value, (max/2) + rand(max) is used, # to randomize access. # This means real maximum waiting time is (max * 1.5). TOGOWS_ACCESS_WAIT_MAX = 60 # Sleeping if needed. # It sleeps about TOGOWS_ACCESS_WAIT * (number of waiting processes). # # --- # *Returns*:: (Numeric) sleeped time def togows_access_wait w_min = TOGOWS_ACCESS_WAIT debug = defined?(@debug) && @debug # initializing class variable @@togows_last_access ||= nil # determines waiting time wait = 0 if last = @@togows_last_access then elapsed = Time.now - last if elapsed < w_min then wait = w_min - elapsed end end # If wait is too long, truncated to TOGOWS_ACCESS_WAIT_MAX. if wait > TOGOWS_ACCESS_WAIT_MAX then orig_wait = wait wait = TOGOWS_ACCESS_WAIT_MAX wait = wait / 2 + rand(wait) if debug then $stderr.puts "TogoWS: sleeping time #{orig_wait} is too long and set to #{wait} to avoid dead lock." end newlast = Time.now + TOGOWS_ACCESS_WAIT_MAX else newlast = Time.now + wait end # put expected end time of sleeping if !@@togows_last_access or @@togows_last_access < newlast then @@togows_last_access = newlast end # sleeping if needed if wait > 0 then $stderr.puts "TogoWS: sleeping #{wait} second" if debug sleep(wait) end # returns waited time wait end private :togows_access_wait # (private) resets last access. # Should be used only for debug purpose. def reset_togows_access_wait @@togows_last_access = nil end private :reset_togows_access_wait end #module AccessWait # == Description # # Bio::TogoWS::REST is a REST client for the TogoWS web service. # # Details of the service are desribed in the following URI. # # * http://togows.dbcls.jp/site/en/rest.html # # == Examples # # For light users, class methods can be used. # # print Bio::TogoWS::REST.entry('ncbi-nucleotide', 'AF237819') # print Bio::TogoWS::REST.search('uniprot', 'lung cancer') # # For heavy users, an instance of the REST class can be created, and # using the instance is more efficient than using class methods. # # t = Bio::TogoWS::REST.new # print t.entry('ncbi-nucleotide', 'AF237819') # print t.search('uniprot', 'lung cancer') # # == References # # * http://togows.dbcls.jp/site/en/rest.html # class REST include AccessWait # URI of the TogoWS REST service BASE_URI = 'http://togows.dbcls.jp/'.freeze # preset default databases used by the retrieve method. # DEFAULT_RETRIEVAL_DATABASES = %w( genbank uniprot embl ddbj dad ) # Creates a new object. # --- # *Arguments*: # * (optional) _uri_: String or URI object # *Returns*:: new object def initialize(uri = BASE_URI) uri = URI.parse(uri) unless uri.kind_of?(URI) @pathbase = uri.path @pathbase = '/' + @pathbase unless /\A\// =~ @pathbase @pathbase = @pathbase + '/' unless /\/\z/ =~ @pathbase @http = Bio::Command.new_http(uri.host, uri.port) @header = { 'User-Agent' => "BioRuby/#{Bio::BIORUBY_VERSION_ID}" } @debug = false end # If true, shows debug information to $stderr. attr_accessor :debug # Debug purpose only. # Returns Net::HTTP object used inside the object. # The method will be changed in the future if the implementation # of this class is changed. def internal_http @http end # Intelligent version of the entry method. # If two or more databases are specified, sequentially tries # them until valid entry is obtained. # # If database is not specified, preset default databases are used. # See DEFAULT_RETRIEVAL_DATABASES for details. # # When multiple IDs and multiple databases are specified, sequentially # tries each IDs. Note that results with no hits found or with server # errors are regarded as void strings. Also note that data format of # the result entries can be different from entries to entries. # # --- # *Arguments*: # * (required) _ids_: (String) an entry ID, or # (Array containing String) IDs. Note that strings containing "," # * (optional) _hash_: (Hash) options below can be passed as a hash. # * (optional) :database: (String) database name, or # (Array containing String) database names. # * (optional) :format: (String) format # * (optional) :field: (String) gets only the specified field # *Returns*:: String or nil def retrieve(ids, hash = {}) begin a = ids.to_ary rescue NoMethodError ids = ids.to_s end ids = a.join(',') if a ids = ids.split(',') dbs = hash[:database] || DEFAULT_RETRIEVAL_DATABASES begin dbs.to_ary rescue NoMethodError dbs = dbs.to_s.empty? ? [] : [ dbs.to_s ] end return nil if dbs.empty? or ids.empty? if dbs.size == 1 then return entry(dbs[0], ids, hash[:format], hash[:field]) end results = [] ids.each do |idstr| dbs.each do |dbstr| r = entry(dbstr, idstr, hash[:format], hash[:field]) if r and !r.strip.empty? then results.push r break end end #dbs.each end #ids.each results.join('') end #def retrieve # Retrieves entries corresponding to the specified IDs. # # Example: # t = Bio::TogoWS::REST.new # kuma = t.entry('ncbi-nucleotide', 'AF237819') # # multiple IDs at a time # misc = t.entry('ncbi-nucleotide', [ 'AF237819', 'AF237820' ]) # # with format change # p53 = t.entry('uniprot', 'P53_HUMAN', 'fasta') # # --- # *Arguments*: # * (required) _database_: (String) database name # * (required) _ids_: (String) an entry ID, or # (Array containing String) IDs. Note that strings containing "," # are regarded as multiple IDs. # * (optional) _format_: (String) format. nil means the default format # (differs depending on the database). # * (optional) _field_: (String) gets only the specified field if not nil # *Returns*:: String or nil def entry(database, ids, format = nil, field = nil) begin a = ids.to_ary rescue NoMethodError ids = ids.to_s end arg = [ 'entry', database ] if a then b = a.dup (a.size - 1).downto(1) { |i| b.insert(i, :",") } arg.concat b else arg.push ids end arg.push field if field arg[-1] = "#{arg[-1]}.#{format}" if format response = get(*arg) prepare_return_value(response) end # Database search. # Format of the search term string follows the Common Query Language. # * http://en.wikipedia.org/wiki/Common_Query_Language # # Example: # t = Bio::TogoWS::REST.new # print t.search('uniprot', 'lung cancer') # # only get the 10th and 11th hit ID # print t.search('uniprot', 'lung cancer', 10, 2) # # with json format # print t.search('uniprot', 'lung cancer', 10, 2, 'json') # # --- # *Arguments*: # * (required) _database_: (String) database name # * (required) _query_: (String) query string # * (optional) _offset_: (Integer) offset in search results. # * (optional) _limit_: (Integer) max. number of returned results. # If offset is not nil and the limit is nil, it is set to 1. # * (optional) _format_: (String) format. nil means the default format. # *Returns*:: String or nil def search(database, query, offset = nil, limit = nil, format = nil) arg = [ 'search', database, query ] if offset then limit ||= 1 arg.concat [ "#{offset}", :",", "#{limit}" ] end arg[-1] = "#{arg[-1]}.#{format}" if format response = get(*arg) prepare_return_value(response) end # Data format conversion. # # Example: # t = Bio::TogoWS::REST.new # blast_string = File.read('test.blastn') # t.convert(blast_string, 'blast', 'gff') # # --- # *Arguments*: # * (required) _text_: (String) input data # * (required) _inputformat_: (String) data source format # * (required) _format_: (String) output format # *Returns*:: String or nil def convert(data, inputformat, format) response = post_data(data, 'convert', "#{inputformat}.#{format}") prepare_return_value(response) end # Returns list of available databases in the entry service. # --- # *Returns*:: Array containing String def entry_database_list database_list('entry') end # Returns list of available databases in the search service. # --- # *Returns*:: Array containing String def search_database_list database_list('search') end #-- # class methods #++ # The same as Bio::TogoWS::REST#entry. def self.entry(*arg) self.new.entry(*arg) end # The same as Bio::TogoWS::REST#search. def self.search(*arg) self.new.search(*arg) end # The same as Bio::TogoWS::REST#convert. def self.convert(*arg) self.new.convert(*arg) end # The same as Bio::TogoWS::REST#retrieve. def self.retrieve(*arg) self.new.retrieve(*arg) end # The same as Bio::TogoWS::REST#entry_database_list def self.entry_database_list(*arg) self.new.entry_database_list(*arg) end # The same as Bio::TogoWS::REST#search_database_list def self.search_database_list(*arg) self.new.search_database_list(*arg) end private # Access to the TogoWS by using GET method. # # Example 1: # get('entry', 'ncbi-nucleotide', AF209156') # Example 2: # get('search', 'uniprot', 'lung cancer') # # --- # *Arguments*: # * (optional) _path_: String # *Returns*:: Net::HTTPResponse object def get(*paths) path = make_path(paths) if @debug then $stderr.puts "TogoWS: HTTP#get(#{path.inspect}, #{@header.inspect})" end togows_access_wait @http.get(path, @header) end # Access to the TogoWS by using GET method. # Always adds '/' at the end of the path. # # Example 1: # get_dir('entry') # # --- # *Arguments*: # * (optional) _path_: String # *Returns*:: Net::HTTPResponse object def get_dir(*paths) path = make_path(paths) path += '/' unless /\/\z/ =~ path if @debug then $stderr.puts "TogoWS: HTTP#get(#{path.inspect}, #{@header.inspect})" end togows_access_wait @http.get(path, @header) end # Access to the TogoWS by using POST method. # Mime type is 'application/octet-stream'. # --- # *Arguments*: # * (required) _data_: String # * (optional) _path_: String # *Returns*:: Net::HTTPResponse object def post_data(data, *paths) path = make_path(paths) if @debug then $stderr.puts "TogoWS: Bio::Command.http_post(#{path.inspect}, data(#{data.size} bytes), #{@header.inspect})" end togows_access_wait Bio::Command.http_post(@http, path, data, @header) end # Generates path string from the given paths. # Symbol objects are not URL-escaped. # String objects are joined with '/'. # Symbol objects are joined directly without '/'. # # --- # *Arguments*: # * (required) _paths_: Array containing String or Symbol objects # *Returns*:: String def make_path(paths) flag_sep = false a = paths.collect do |x| case x when Symbol # without URL escape flag_sep = false str = x.to_s else str = CGI.escape(x.to_s) str = '/' + str if flag_sep flag_sep = true end str end @pathbase + a.join('') end # If response.code == "200", returns body as a String. # Otherwise, returns nil. def prepare_return_value(response) if @debug then $stderr.puts "TogoWS: #{response.inspect}" end if response.code == "200" then response.body else nil end end # Returns list of available databases # --- # *Arguments*: # * (required) _service_: String # *Returns*:: Array containing String def database_list(service) response = get_dir(service) str = prepare_return_value(response) if str then str.chomp.split(/\r?\n/) else raise 'Unexpected server response' end end end #class REST end #module TogoWS end #module Bio bio-2.0.3/lib/bio/io/fetch.rb0000644000175000017500000002074114141516614015166 0ustar nileshnilesh# # = bio/io/biofetch.rb - BioFetch access module # # Copyright:: Copyright (C) 2002, 2005 Toshiaki Katayama , # Copyright (C) 2006 Jan Aerts # License:: The Ruby License # # == DESCRIPTION # # Using EBI Dbfetch server # # ebi_server = Bio::Fetch::EBI.new # puts ebi_server.fetch('embl', 'J00231') # puts ebi_server.fetch('embl', 'J00231', 'raw') # puts ebi_server.fetch('embl', 'J00231', 'html') # # Getting metadata from EBI Dbfetch server # # puts ebi_server.databases # puts ebi_server.formats('embl') # puts ebi_server.maxids # # Using EBI Dbfetch server without creating a Bio::Fetch::EBI instance # # puts Bio::Fetch::EBI.query('ena_sequence', 'J00231') # puts Bio::Fetch::EBI.query('ena_sequence', 'J00231', 'raw', 'fasta') # # Using a BioFetch server with specifying URL # # server = Bio::Fetch.new('http://www.ebi.ac.uk/Tools/dbfetch/dbfetch') # puts server.fetch('ena_sequence', 'J00231') # puts server.fetch('ena_sequence', 'J00231', 'raw', 'fasta') # require 'uri' require 'cgi' require 'bio/command' module Bio # = DESCRIPTION # The Bio::Fetch class provides an interface to dbfetch servers. Given # a database name and an accession number, these servers return the associated # record. For example, for the embl database on the EBI, that would be a # nucleic or amino acid sequence. # # Possible dbfetch servers include: # * http://www.ebi.ac.uk/Tools/dbfetch/dbfetch # # Note that old URL http://www.ebi.ac.uk/cgi-bin/dbfetch still alives # probably because of compatibility, but using the new URL is recommended. # # Historically, there were other dbfetch servers including: # * http://bioruby.org/cgi-bin/biofetch.rb (default before BioRuby 1.4) # But they are unavailable now. # # # If you're behind a proxy server, be sure to set your HTTP_PROXY # environment variable accordingly. # # = USAGE # require 'bio' # # # Retrieve the sequence of accession number M33388 from the EMBL # # database. # server = Bio::Fetch::EBI.new #uses EBI server # puts server.fetch('ena_sequence','M33388') # # # database name "embl" can also be used though it is not officially listed # puts server.fetch('embl','M33388') # # # Do the same thing with explicitly giving the URL. # server = Bio::Fetch.new(Bio::Fetch::EBI::URL) #uses EBI server # puts server.fetch('ena_sequence','M33388') # # # Do the same thing without creating a Bio::Fetch::EBI object. # puts Bio::Fetch::EBI.query('ena_sequence','M33388') # # # To know what databases are available on the dbfetch server: # server = Bio::Fetch::EBI.new # puts server.databases # # # Some databases provide their data in different formats (e.g. 'fasta', # # 'genbank' or 'embl'). To check which formats are supported by a given # # database: # puts server.formats('embl') # class Fetch # Bio::Fetch::EBI is a client of EBI Dbfetch # (http://www.ebi.ac.uk/Tools/dbfetch/dbfetch). # # An instance of this class works the same as: # obj = Bio::Fetch.new("http://www.ebi.ac.uk/Tools/dbfetch/dbfetch") # obj.database = "ena_sequence" # # See the documents of Bio::Fetch for more details. class EBI < Fetch # EBI Dbfetch server URL URL = "http://www.ebi.ac.uk/Tools/dbfetch/dbfetch".freeze # For the usage, see the document of Bio::Fetch.new. def initialize(url = URL) @database = "ena_sequence" super end # Shortcut for using EBI Dbfetch server. You can fetch an entry # without creating an instance of Bio::Fetch::EBI. This method uses # EBI Dbfetch server http://www.ebi.ac.uk/Tools/dbfetch/dbfetch . # # Example: # puts Bio::Fetch::EBI.query('refseq','NM_123456') # puts Bio::Fetch::EBI.query('ena_sequence','J00231') # # --- # *Arguments*: # * _database_: name of database to query (see Bio::Fetch#databases to get list of supported databases) # * _id_: single ID or ID list separated by commas or white space # * _style_: [raw|html] (default = 'raw') # * _format_: name of output format (see Bio::Fetch#formats) def self.query(*args) self.new.fetch(*args) end end #class EBI # Create a new Bio::Fetch server object that can subsequently be queried # using the Bio::Fetch#fetch method. # # You must specify _url_ of a server. # The preset default server is deprecated. # # If you want to use a server without explicitly specifying the URL, # use Bio::Fetch::EBI.new that uses EBI Dbfetch server. # # --- # *Arguments*: # * _url_: URL of dbfetch server. (no default value) # *Returns*:: Bio::Fetch object def initialize(url = nil) unless url then raise ArgumentError, "No server URL is given in Bio::Fetch.new. The default server URL value have been deprecated. You must explicitly specify the url or use Bio::Fetch::EBI for using EBI Dbfetch." end @url = url end # The default database to query #-- # This will be used by the get_by_id method #++ attr_accessor :database # Get raw database entry by id. This method lets the Bio::Registry class # use Bio::Fetch objects. def get_by_id(id) fetch(@database, id) end # Fetch a database entry as specified by database (db), entry id (id), # 'raw' text or 'html' (style), and format. # # Examples: # server = Bio::Fetch.new('http://www.ebi.ac.uk/cgi-bin/dbfetch') # puts server.fetch('embl','M33388','raw','fasta') # puts server.fetch('refseq','NM_12345','html','embl') # --- # *Arguments*: # * _database_: name of database to query (see Bio::Fetch#databases to get list of supported databases) # * _id_: single ID or ID list separated by commas or white space # * _style_: [raw|html] (default = 'raw') # * _format_: name of output format (see Bio::Fetch#formats) def fetch(db, id, style = 'raw', format = nil) query = [ [ 'db', db ], [ 'id', id ], [ 'style', style ] ] query.push([ 'format', format ]) if format _get(query) end # Using this method, the user can ask a dbfetch server what databases # it supports. This would normally be the first step you'd take when # you use a dbfetch server for the first time. # Example: # server = Bio::Fetch.new() # puts server.databases # returns "aa aax bl cpd dgenes dr ec eg emb ..." # # This method works for EBI Dbfetch server (and for the bioruby dbfetch # server). Not all servers support this method. # --- # *Returns*:: array of database names def databases _get_single('info', 'dbs').strip.split(/\s+/) end # Lists the formats that are available for a given database. Like the # Bio::Fetch#databases method, not all servers support this method. # This method is available on the EBI Dbfetch server (and on the bioruby # dbfetch server). # # Example: # server = Bio::Fetch::EBI.new() # puts server.formats('embl') # returns [ "default", "annot", ... ] # --- # *Arguments*: # * _database_:: name of database you want the supported formats for # *Returns*:: array of formats def formats(database = @database) if database query = [ [ 'info', 'formats' ], [ 'db', database ] ] _get(query).strip.split(/\s+/) end end # A dbfetch server will only return entries up to a given maximum number. # This method retrieves that number from the server. As for the databases # and formats methods, not all servers support the maxids method. # This method is available on the EBI Dbfetch server (and on the bioruby # dbfetch server). # # Example: # server = Bio::Fetch::EBI.new # puts server.maxids # currently returns 200 # --- # *Arguments*: none # *Returns*:: number def maxids _get_single('info', 'maxids').to_i end private # (private) query to the server. # ary must be nested array, e.g. [ [ key0, val0 ], [ key1, val1 ], ... ] def _get(ary) query = ary.collect do |a| "#{CGI.escape(a[0])}=#{CGI.escape(a[1])}" end.join('&') Bio::Command.read_uri(@url + '?' + query) end # (private) query with single parameter def _get_single(key, val) query = "#{CGI.escape(key)}=#{CGI.escape(val)}" Bio::Command.read_uri(@url + '?' + query) end end end # module Bio bio-2.0.3/lib/bio/io/flatfile/0000755000175000017500000000000014141516614015332 5ustar nileshnileshbio-2.0.3/lib/bio/io/flatfile/buffer.rb0000644000175000017500000002041314141516614017130 0ustar nileshnilesh# # = bio/io/flatfile/buffer.rb - Input stream buffer for FlatFile # # Copyright (C) 2001-2006 Naohisa Goto # # License:: The Ruby License # # # # See documents for Bio::FlatFile::BufferedInputStream and Bio::FlatFile. # require 'bio/io/flatfile' module Bio class FlatFile # Wrapper for a IO (or IO-like) object. # It can input with a buffer. class BufferedInputStream # Creates a new input stream wrapper def initialize(io, path) @io = io @path = path # initialize prefetch buffer @buffer = '' end # Creates a new input stream wrapper from the given IO object. def self.for_io(io) begin path = io.path rescue NameError path = nil end self.new(io, path) end # Creates a new input stream wrapper to open file _filename_ # by using File.open. # *arg is passed to File.open. # # Like File.open, a block can be accepted. # # Unlike File.open, the default is binary mode, unless text mode # is explicity specified in mode. def self.open_file(filename, *arg) params = _parse_file_open_arg(*arg) if params[:textmode] or /t/ =~ params[:fmode_string].to_s then textmode = true else textmode = false end if block_given? then File.open(filename, *arg) do |fobj| fobj.binmode unless textmode yield self.new(fobj, filename) end else fobj = File.open(filename, *arg) fobj.binmode unless textmode self.new(fobj, filename) end end # Parses file open mode parameter. # mode must be an Integer or a String. def self._parse_file_open_mode(mode) modeint = nil modestr = nil begin modeint = mode.to_int rescue NoMethodError end unless modeint then begin modestr = mode.to_str rescue NoMethodError end end if modeint then return { :fmode_integer => modeint } end if modestr then fmode, ext_enc, int_enc = modestr.split(/\:/) ret = { :fmode_string => fmode } ret[:external_encoding] = ext_enc if ext_enc ret[:internal_encoding] = int_enc if int_enc return ret end nil end private_class_method :_parse_file_open_mode # Parses file open arguments def self._parse_file_open_arg(*arg) fmode_hash = nil perm = nil elem = arg.shift if elem then fmode_hash = _parse_file_open_mode(elem) if fmode_hash then elem = arg.shift if elem then begin perm = elem.to_int rescue NoMethodError end end elem = arg.shift if perm end end if elem.kind_of?(Hash) then opt = elem.dup else opt = {} end if elem = opt[:mode] then fmode_hash = _parse_file_open_mode(elem) end fmode_hash ||= {} fmode_hash[:perm] = perm if perm unless enc = opt[:encoding].to_s.empty? then ext_enc, int_enc = enc.split(/\:/) fmode_hash[:external_encoding] = ext_enc if ext_enc fmode_hash[:internal_encoding] = int_enc if int_enc end [ :external_encoding, :internal_encoding, :textmode, :binmode, :autoclose, :perm ].each do |key| val = opt[key] fmode_hash[key] = val if val end fmode_hash end private_class_method :_parse_file_open_arg # Creates a new input stream wrapper from URI specified as _uri_. # by using OpenURI.open_uri or URI#open. # _uri_ must be a String or URI object. # *arg is passed to OpenURI.open_uri or URI#open. # # Like OpenURI.open_uri, it can accept a block. def self.open_uri(uri, *arg) if uri.kind_of?(URI) if block_given? uri.open(*arg) do |fobj| yield self.new(fobj, uri.to_s) end else fobj0 = uri.open(*arg) self.new(fobj0, uri.to_s) end else if block_given? OpenURI.open_uri(uri, *arg) do |fobj| yield self.new(fobj, uri) end else fobj0 = OpenURI.open_uri(uri, *arg) self.new(fobj0, uri) end end end # Pathname, filename or URI to open the object. # Like File#path, returned value isn't normalized. attr_reader :path # Converts to IO object if possible def to_io @io.to_io end # Closes the IO object if possible def close @io.close end # Rewinds the IO object if possible # Internal buffer in this wrapper is cleared. def rewind r = @io.rewind @buffer = '' r end # Returns current file position def pos @io.pos - @buffer.size end # Sets current file position if possible # Internal buffer in this wrapper is cleared. def pos=(p) r = (@io.pos = p) @buffer = '' r end # Returns true if end-of-file. Otherwise, returns false. # # Note that it returns false if internal buffer is this wrapper # is not empty, def eof? if @buffer.size > 0 false else @io.eof? end end # Same as IO#gets. # # Compatibility note: the bahavior of paragraph mode (io_rs = '') # may differ from that of IO#gets(''). def gets(io_rs = $/) if @buffer.size > 0 if io_rs == nil then r = @buffer + @io.gets(nil).to_s @buffer = '' else if io_rs == '' then # io_rs.empty? sp_rs = /((?:\r?\n){2,})/n else sp_rs = io_rs end a = @buffer.split(sp_rs, 2) if a.size > 1 then r = a.shift r += (io_rs.empty? ? a.shift : io_rs) @buffer = a.shift.to_s else @buffer << @io.gets(io_rs).to_s a = @buffer.split(sp_rs, 2) if a.size > 1 then r = a.shift r += (io_rs.empty? ? a.shift : io_rs) @buffer = a.shift.to_s else r = @buffer @buffer = '' end end end r else @io.gets(io_rs) end end # Pushes back given str to the internal buffer. # Returns nil. # str must be read previously with the wrapper object. # # Note that in current implementation, the str can be everything, # but please don't depend on it. # def ungets(str) @buffer = str + @buffer nil end # Same as IO#getc. def getc if @buffer.size > 0 then r = @buffer[0] @buffer = @buffer[1..-1] else r = @io.getc end r end # Pushes back one character into the internal buffer. # Unlike IO#getc, it can be called more than one time. def ungetc(c) @buffer = sprintf("%c", c) + @buffer nil end # Gets current prefetch buffer def prefetch_buffer @buffer end # It does @io.gets, and addes returned string # to the internal buffer, and returns the string. def prefetch_gets(*arg) r = @io.gets(*arg) @buffer << r if r r end # It does @io.readpartial, and addes returned string # to the internal buffer, and returns the string. def prefetch_readpartial(*arg) r = @io.readpartial(*arg) @buffer << r if r r end # Skips space characters in the stream. # returns nil. def skip_spaces ws = { ?\s => true, ?\n => true, ?\r => true, ?\t => true } while r = self.getc unless ws[r] then self.ungetc(r) break end end nil end end #class BufferedInputStream end #class FlatFile end #module Bio bio-2.0.3/lib/bio/io/flatfile/index.rb0000644000175000017500000010302414141516614016766 0ustar nileshnilesh# # = bio/io/flatfile/index.rb - OBDA flatfile index # # Copyright:: Copyright (C) 2002 # GOTO Naohisa # License:: The Ruby License # # $Id: index.rb,v 1.19 2007/04/05 23:35:41 trevor Exp $ # # = About Bio::FlatFileIndex # # Please refer documents of following classes. # Classes/modules marked '#' are internal use only. # # == Classes/modules in index.rb # * class Bio::FlatFileIndex # * class Bio::FlatFileIndex::Results # * module Bio::FlatFileIndex::DEBUG # * #module Bio::FlatFileIndex::Template # * #class Bio::FlatFileIndex::Template::NameSpace # * #class Bio::FlatFileIndex::FileID # * #class Bio::FlatFileIndex::FileIDs # * #module Bio::FlatFileIndex::Flat_1 # * #class Bio::FlatFileIndex::Flat_1::Record # * #class Bio::FlatFileIndex::Flat_1::FlatMappingFile # * #class Bio::FlatFileIndex::Flat_1::PrimaryNameSpace # * #class Bio::FlatFileIndex::Flat_1::SecondaryNameSpace # * #class Bio::FlatFileIndex::NameSpaces # * #class Bio::FlatFileIndex::DataBank # # == Classes/modules in indexer.rb # * module Bio::FlatFileIndex::Indexer # * #class Bio::FlatFileIndex::Indexer::NameSpace # * #class Bio::FlatFileIndex::Indexer::NameSpaces # * #module Bio::FlatFileIndex::Indexer::Parser # * #class Bio::FlatFileIndex::Indexer::Parser::TemplateParser # * #class Bio::FlatFileIndex::Indexer::Parser::GenBankParser # * #class Bio::FlatFileIndex::Indexer::Parser::GenPeptParser # * #class Bio::FlatFileIndex::Indexer::Parser::EMBLParser # * #class Bio::FlatFileIndex::Indexer::Parser::SPTRParser # * #class Bio::FlatFileIndex::Indexer::Parser::FastaFormatParser # * #class Bio::FlatFileIndex::Indexer::Parser::MaXMLSequenceParser # * #class Bio::FlatFileIndex::Indexer::Parser::MaXMLClusterParser # * #class Bio::FlatFileIndex::Indexer::Parser::BlastDefaultParser # * #class Bio::FlatFileIndex::Indexer::Parser::PDBChemicalComponentParser # # == Classes/modules in bdb.rb # * #module Bio::FlatFileIndex::BDBDefault # * #class Bio::FlatFileIndex::BDBWrapper # * #module Bio::FlatFileIndex::BDB_1 # * #class Bio::FlatFileIndex::BDB_1::BDBMappingFile # * #class Bio::FlatFileIndex::BDB_1::PrimaryNameSpace # * #class Bio::FlatFileIndex::BDB_1::SecondaryNameSpace # # = References # * (()) # * (()) # require 'bio/io/flatfile/indexer' module Bio # Bio::FlatFileIndex is a class for OBDA flatfile index. class FlatFileIndex autoload :Indexer, 'bio/io/flatfile/indexer' autoload :BDBdefault, 'bio/io/flatfile/bdb' autoload :BDBwrapper, 'bio/io/flatfile/bdb' autoload :BDB_1, 'bio/io/flatfile/bdb' # magic string for flat/1 index MAGIC_FLAT = 'flat/1' # magic string for BerkeleyDB/1 index MAGIC_BDB = 'BerkeleyDB/1' ######################################################### # Opens existing databank. Databank is a directory which contains # indexed files and configuration files. The type of the databank # (flat or BerkeleyDB) are determined automatically. # # If block is given, the databank object is passed to the block. # The databank will be automatically closed when the block terminates. # def self.open(name) if block_given? then begin i = self.new(name) r = yield i ensure if i then begin i.close rescue IOError end end end else r = self.new(name) end r end # Opens existing databank. Databank is a directory which contains # indexed files and configuration files. The type of the databank # (flat or BerkeleyDB) are determined automatically. # # Unlike +FlatFileIndex.open+, block is not allowed. # def initialize(name) @db = DataBank.open(name) end # common interface defined in registry.rb # Searching databank and returns entry (or entries) as a string. # Multiple entries (contatinated to one string) may be returned. # Returns empty string if not found. # def get_by_id(key) search(key).to_s end #-- # original methods #++ # Closes the databank. # Returns nil. def close check_closed? @db.close @db = nil end # Returns true if already closed. Otherwise, returns false. def closed? if @db then false else true end end # Set default namespaces. # default_namespaces = nil # means all namespaces in the databank. # # default_namespaces= [ str1, str2, ... ] # means set default namespeces to str1, str2, ... # # Default namespaces specified in this method only affect # #get_by_id, #search, and #include? methods. # # Default of default namespaces is nil (that is, all namespaces # are search destinations by default). # def default_namespaces=(names) if names then @names = [] names.each { |x| @names.push(x.dup) } else @names = nil end end # Returns default namespaces. # Returns an array of strings or nil. # nil means all namespaces. def default_namespaces @names end # Searching databank and returns a Bio::FlatFileIndex::Results object. def search(key) check_closed? if @names then @db.search_namespaces(key, *@names) else @db.search_all(key) end end # Searching only specified namespeces. # Returns a Bio::FlatFileIndex::Results object. # def search_namespaces(key, *names) check_closed? @db.search_namespaces(key, *names) end # Searching only primary namespece. # Returns a Bio::FlatFileIndex::Results object. # def search_primary(key) check_closed? @db.search_primary(key) end # Searching databank. # If some entries are found, returns an array of # unique IDs (primary identifiers). # If not found anything, returns nil. # # This method is useful when search result is very large and # #search method is very slow. # def include?(key) check_closed? if @names then r = @db.search_namespaces_get_unique_id(key, *@names) else r = @db.search_all_get_unique_id(key) end if r.empty? then nil else r end end # Same as #include?, but serching only specified namespaces. # def include_in_namespaces?(key, *names) check_closed? r = @db.search_namespaces_get_unique_id(key, *names) if r.empty? then nil else r end end # Same as #include?, but serching only primary namespace. # def include_in_primary?(key) check_closed? r = @db.search_primary_get_unique_id(key) if r.empty? then nil else r end end # Returns names of namespaces defined in the databank. # (example: [ 'LOCUS', 'ACCESSION', 'VERSION' ] ) # def namespaces check_closed? r = secondary_namespaces r.unshift primary_namespace r end # Returns name of primary namespace as a string. def primary_namespace check_closed? @db.primary.name end # Returns names of secondary namespaces as an array of strings. def secondary_namespaces check_closed? @db.secondary.names end # Check consistency between the databank(index) and original flat files. # # If the original flat files are changed after creating # the databank, raises RuntimeError. # # Note that this check only compares file sizes as # described in the OBDA specification. # def check_consistency check_closed? @db.check_consistency end # If true is given, consistency checks will be performed every time # accessing flatfiles. If nil/false, no checks are performed. # # By default, always_check_consistency is true. # def always_check_consistency=(bool) @db.always_check=(bool) end # If true, consistency checks will be performed every time # accessing flatfiles. If nil/false, no checks are performed. # # By default, always_check_consistency is true. # def always_check_consistency(bool) @db.always_check end #-- # private methods #++ # If the databank is closed, raises IOError. def check_closed? @db or raise IOError, 'closed databank' end private :check_closed? #-- ######################################################### #++ # Results stores search results created by # Bio::FlatFileIndex methods. # # Currently, this class inherits Hash, but internal # structure of this class may be changed anytime. # Only using methods described below are strongly recomended. # class Results < Hash # Add search results. # "a + b" means "a OR b". # * Example # # I want to search 'ADH_IRON_1' OR 'ADH_IRON_2' # db = Bio::FlatFIleIndex.new(location) # a1 = db.search('ADH_IRON_1') # a2 = db.search('ADH_IRON_2') # # a1 and a2 are Bio::FlatFileIndex::Results objects. # print a1 + a2 # def +(a) raise 'argument must be Results class' unless a.is_a?(self.class) res = self.dup res.update(a) res end # Returns set intersection of results. # "a * b" means "a AND b". # * Example # # I want to search 'HIS_KIN' AND 'human' # db = Bio::FlatFIleIndex.new(location) # hk = db.search('HIS_KIN') # hu = db.search('human') # # hk and hu are Bio::FlatFileIndex::Results objects. # print hk * hu # def *(a) raise 'argument must be Results class' unless a.is_a?(self.class) res = self.class.new a.each_key { |x| res.store(x, a[x]) if self[x] } res end # Returns a string. (concatinated if multiple results exists). # Same as to_a.join(''). # def to_s self.values.join end #-- #alias each_orig each #++ # alias for each_value. alias each each_value # Iterates over each result (string). # Same as to_a.each. def each(&x) #:yields: str each_value(&x) end if false #dummy for RDoc #-- #alias to_a_orig to_a #++ # alias for to_a. alias to_a values # Returns an array of strings. # If no search results are exist, returns an empty array. # def to_a; values; end if false #dummy for RDoc # Returns number of results. # Same as to_a.size. def size; end if false #dummy for RDoc end #class Results ######################################################### # Module for output debug messages. # Default setting: If $DEBUG or $VERBOSE is true, output debug # messages to $stderr; Otherwise, don't output messages. # module DEBUG @@out = $stderr @@flag = nil # Set debug messages output destination. # If true is given, outputs to $stderr. # If nil is given, outputs nothing. # This method affects ALL of FlatFileIndex related objects/methods. # def self.out=(io) if io then @@out = io @@out = $stderr if io == true @@flag = true else @@out = nil @@flag = nil end @@out end # get current debug messeages output destination def self.out @@out end # prints debug messages def self.print(*arg) @@flag = true if $DEBUG or $VERBOSE @@out.print(*arg) if @@out and @@flag end end #module DEBUG ######################################################### # Templates # # Internal use only. module Template # templates of namespace # # Internal use only. class NameSpace def filename # should be redifined in child class raise NotImplementedError, "should be redefined in child class" end def mapping(filename) # should be redifined in child class raise NotImplementedError, "should be redefined in child class" #Flat_1::FlatMappingFile.new(filename) end def initialize(dbname, name) @dbname = dbname @name = name.dup @name.freeze @file = mapping(filename) end attr_reader :dbname, :name, :file def search(key) @file.open @file.search(key) end def close @file.close end def include?(key) r = search(key) unless r.empty? then key else nil end end end #class NameSpace end #module Template # FileID class. # # Internal use only. class FileID def self.new_from_string(str) a = str.split("\t", 2) a[1] = a[1].to_i if a[1] self.new(a[0], a[1]) end def initialize(filename, filesize = nil) @filename = filename @filesize = filesize @io = nil end attr_reader :filename, :filesize def check begin fsize = File.size(@filename) r = ( fsize == @filesize) rescue Errno::ENOENT fsize = -1 r = nil end DEBUG.print "FileID: File.size(#{@filename.inspect}) = ", fsize, (r ? ' == ' : ' != ') , @filesize, (r ? '' : ' bad!'), "\n" r end def recalc @filesize = File.size(@filename) end def to_s(i = nil) if i then str = "fileid_#{i}\t" else str = '' end str << "#{@filename}\t#{@filesize}" str end def open unless @io then DEBUG.print "FileID: open #{@filename}\n" @io = File.open(@filename, 'rb') true else nil end end def close if @io then DEBUG.print "FileID: close #{@filename}\n" @io.close @io = nil nil else true end end def seek(*arg) @io.seek(*arg) end def read(size) @io.read(size) end def get(pos, length) open seek(pos, IO::SEEK_SET) data = read(length) close data end end #class FileID # FileIDs class. # # Internal use only. class FileIDs < Array def initialize(prefix, hash) @hash = hash @prefix = prefix end def [](n) r = super(n) if r then r else data = @hash["#{@prefix}#{n}"] if data then self[n] = data end super(n) end end def []=(n, data) if data.is_a?(FileID) then super(n, data) elsif data then super(n, FileID.new_from_string(data)) else # data is nil super(n, nil) end self[n] end def add(*arg) arg.each do |filename| self << FileID.new(filename) end end def cache_all a = @hash.keys.collect do |k| if k =~ /\A#{Regexp.escape(@prefix)}(\d+)/ then $1.to_i else nil end end a.compact! a.each do |i| self[i] end a end def each (0...self.size).each do |i| x = self[i] yield(x) if x end self end def each_with_index (0...self.size).each do |i| x = self[i] yield(x, i) if x end self end def keys self.cache_all a = [] (0...self.size).each do |i| a << i if self[i] end a end def filenames self.cache_all a = [] self.each do |x| a << x.filename end a end def check_all self.cache_all r = true self.each do |x| r = x.check break unless r end r end alias check check_all def close_all self.each do |x| x.close end nil end alias close close_all def recalc_all self.cache_all self.each do |x| x.recalc end true end alias recalc recalc_all end #class FileIDs # module for flat/1 databank # # Internal use only. module Flat_1 # Record class. # # Internal use only. class Record def initialize(str, size = nil) a = str.split("\t") a.each { |x| x.to_s.gsub!(/[\000 ]+\z/, '') } @key = a.shift.to_s @val = a @size = (size or str.length) #DEBUG.print "key=#{@key.inspect},val=#{@val.inspect},size=#{@size}\n" end attr_reader :key, :val, :size def to_s self.class.to_string(@size, @key, @val) end def self.to_string(size, key, val) sprintf("%-*s", size, key + "\t" + val.join("\t")) end def self.create(size, key, val) self.new(self.to_string(size, key, val)) end def ==(x) self.to_s == x.to_s end end #class Record # FlatMappingFile class. # # Internal use only. class FlatMappingFile @@recsize_width = 4 @@recsize_regex = /\A\d{4}\z/ def self.open(*arg) self.new(*arg) end def initialize(filename, mode = 'rb') @filename = filename @mode = mode @file = nil #@file = File.open(filename, mode) @record_size = nil @records = nil end attr_accessor :mode attr_reader :filename def open unless @file then DEBUG.print "FlatMappingFile: open #{@filename}\n" @file = File.open(@filename, @mode) true else nil end end def close if @file then DEBUG.print "FlatMappingFile: close #{@filename}\n" @file.close @file = nil end nil end def record_size unless @record_size then open @file.seek(0, IO::SEEK_SET) s = @file.read(@@recsize_width) raise 'strange record size' unless s =~ @@recsize_regex @record_size = s.to_i DEBUG.print "FlatMappingFile: record_size: #{@record_size}\n" end @record_size end def get_record(i) rs = record_size seek(i) str = @file.read(rs) #DEBUG.print "get_record(#{i})=#{str.inspect}\n" str end def seek(i) rs = record_size @file.seek(@@recsize_width + rs * i) end def records unless @records then rs = record_size @records = (@file.stat.size - @@recsize_width) / rs DEBUG.print "FlatMappingFile: records: #{@records}\n" end @records end alias size records # methods for writing file def write_record(str) rs = record_size rec = sprintf("%-*s", rs, str)[0..rs] @file.write(rec) end def add_record(str) n = records rs = record_size @file.seek(0, IO::SEEK_END) write_record(str) @records += 1 end def put_record(i, str) n = records rs = record_size if i >= n then @file.seek(0, IO::SEEK_END) @file.write(sprintf("%-*s", rs, '') * (i - n)) @records = i + 1 else seek(i) end write_record(str) end def init(rs) unless 0 < rs and rs < 10 ** @@recsize_width then raise 'record size out of range' end open @record_size = rs str = sprintf("%0*d", @@recsize_width, rs) @file.truncate(0) @file.seek(0, IO::SEEK_SET) @file.write(str) @records = 0 end # export/import/edit data def each n = records seek(0) (0...n).each do |i| yield Record.new(get_record(i)) end self end def export_tsv(stream) self.each do |x| stream << "#{x.to_s}\n" end stream end def init_with_sorted_tsv_file(filename, flag_primary = false) rec_size = 1 f = File.open(filename) f.each do |y| rec_size = y.chomp.length if rec_size < y.chomp.length end self.init(rec_size) prev = nil f.rewind if flag_primary then f.each do |y| x = Record.new(y.chomp, rec_size) if prev then if x.key == prev.key DEBUG.print "Warining: overwrote unique id #{x.key.inspect}\n" else self.add_record(prev.to_s) end end prev = x end self.add_record(prev.to_s) if prev else f.each do |y| x = Record.new(y.chomp, rec_size) self.add_record(x.to_s) if x != prev prev = x end end f.close self end def self.external_sort_proc(sort_program = [ '/usr/bin/env', 'LC_ALL=C', '/usr/bin/sort' ]) Proc.new do |out, in1, *files| cmd = sort_program + [ '-o', out, in1, *files ] system(*cmd) end end def self.external_merge_sort_proc(sort_program = [ '/usr/bin/env', 'LC_ALL=C', '/usr/bin/sort' ]) Proc.new do |out, in1, *files| # (in1 may be sorted) tf_all = [] tfn_all = [] files.each do |fn| tf = Tempfile.open('sort') tf.close(false) cmd = sort_program + [ '-o', tf.path, fn ] system(*cmd) tf_all << tf tfn_all << tf.path end cmd_fin = sort_program + [ '-m', '-o', out, in1, *tfn_all ] system(*cmd_fin) tf_all.each do |tf| tf.close(true) end end end def self.external_merge_proc(sort_program = [ '/usr/bin/env', 'LC_ALL=C', '/usr/bin/sort' ]) Proc.new do |out, in1, *files| # files (and in1) must be sorted cmd = sort_program + [ '-m', '-o', out, in1, *files ] system(*cmd) end end def self.internal_sort_proc Proc.new do |out, in1, *files| a = IO.readlines(in1) files.each do |fn| IO.foreach(fn) do |x| a << x end end a.sort! of = File.open(out, 'w') a.each { |x| of << x } of.close end end def import_tsv_files(flag_primary, mode, sort_proc, *files) require 'tempfile' tmpfile1 = Tempfile.open('flat') self.export_tsv(tmpfile1) unless mode == :new tmpfile1.close(false) tmpfile0 = Tempfile.open('sorted') tmpfile0.close(false) sort_proc.call(tmpfile0.path, tmpfile1.path, *files) tmpmap = self.class.new(self.filename + ".#{$$}.tmp~", 'wb+') tmpmap.init_with_sorted_tsv_file(tmpfile0.path, flag_primary) tmpmap.close self.close begin File.rename(self.filename, self.filename + ".#{$$}.bak~") rescue Errno::ENOENT end File.rename(tmpmap.filename, self.filename) begin File.delete(self.filename + ".#{$$}.bak~") rescue Errno::ENOENT end tmpfile0.close(true) tmpfile1.close(true) self end # methods for searching def search(key) n = records return [] if n <= 0 i = n / 2 i_prev = nil DEBUG.print "binary search starts...\n" begin rec = Record.new(get_record(i)) i_prev = i if key < rec.key then n = i i = i / 2 elsif key > rec.key then i = (i + n) / 2 else # key == rec.key result = [ rec.val ] j = i - 1 while j >= 0 and (rec = Record.new(get_record(j))).key == key result << rec.val j = j - 1 end result.reverse! j = i + 1 while j < n and (rec = Record.new(get_record(j))).key == key result << rec.val j = j + 1 end DEBUG.print "#{result.size} hits found!!\n" return result end end until i_prev == i DEBUG.print "no hits found\n" #nil [] end end #class FlatMappingFile # primary name space # # Internal use only. class PrimaryNameSpace < Template::NameSpace def mapping(filename) FlatMappingFile.new(filename) end def filename File.join(dbname, "key_#{name}.key") end end #class PrimaryNameSpace # secondary name space # # Internal use only. class SecondaryNameSpace < Template::NameSpace def mapping(filename) FlatMappingFile.new(filename) end def filename File.join(dbname, "id_#{name}.index") end def search(key) r = super(key) file.close r.flatten! r end end #class SecondaryNameSpace end #module Flat_1 # namespaces # # Internal use only. class NameSpaces < Hash def initialize(dbname, nsclass, arg) @dbname = dbname @nsclass = nsclass if arg.is_a?(String) then a = arg.split("\t") else a = arg end a.each do |x| self[x] = @nsclass.new(@dbname, x) end self end def each_names self.names.each do |x| yield x end end def each_files self.values.each do |x| yield x end end def names keys end def close_all values.each { |x| x.file.close } end alias close close_all def search(key) r = [] values.each do |ns| r.concat ns.search(key) end r.sort! r.uniq! r end def search_names(key, *names) r = [] names.each do |x| ns = self[x] raise "undefined namespace #{x.inspect}" unless ns r.concat ns.search(key) end r end def to_s names.join("\t") end end #class NameSpaces # databank # # Internal use only. class DataBank def self.file2hash(fileobj) hash = {} fileobj.each do |line| line.chomp! a = line.split("\t", 2) hash[a[0]] = a[1] end hash end private_class_method :file2hash def self.filename(dbname) File.join(dbname, 'config.dat') end def self.read(name, mode = 'rb', *bdbarg) f = File.open(filename(name), mode) hash = file2hash(f) f.close db = self.new(name, nil, hash) db.bdb_open(*bdbarg) db end def self.open(*arg) self.read(*arg) end def initialize(name, idx_type = nil, hash = {}) @dbname = name.dup @dbname.freeze @bdb = nil @always_check = true self.index_type = (hash['index'] or idx_type) if @bdb then @config = BDBwrapper.new(@dbname, 'config') @bdb_fileids = BDBwrapper.new(@dbname, 'fileids') @nsclass_pri = BDB_1::PrimaryNameSpace @nsclass_sec = BDB_1::SecondaryNameSpace else @config = hash @nsclass_pri = Flat_1::PrimaryNameSpace @nsclass_sec = Flat_1::SecondaryNameSpace end true end attr_reader :dbname, :index_type def index_type=(str) case str when MAGIC_BDB @index_type = MAGIC_BDB @bdb = true unless defined?(BDB) raise RuntimeError, "Berkeley DB support not found" end when MAGIC_FLAT, '', nil, false @index_type = MAGIC_FLAT @bdb = false else raise 'unknown or unsupported index type' end end def to_s a = "" a << "index\t#{@index_type}\n" unless @bdb then a << "format\t#{@format}\n" @fileids.each_with_index do |x, i| a << "#{x.to_s(i)}\n" end a << "primary_namespace\t#{@primary.name}\n" a << "secondary_namespaces\t" a << @secondary.names.join("\t") a << "\n" end a end def bdb_open(*bdbarg) if @bdb then @config.close @config.open(*bdbarg) @bdb_fileids.close @bdb_fileids.open(*bdbarg) true else nil end end def write(mode = 'wb', *bdbarg) unless FileTest.directory?(@dbname) then Dir.mkdir(@dbname) end f = File.open(self.class.filename(@dbname), mode) f.write self.to_s f.close if @bdb then bdb_open(*bdbarg) @config['format'] = format @config['primary_namespace'] = @primary.name @config['secondary_namespaces'] = @secondary.names.join("\t") @bdb_fileids.writeback_array('', fileids, *bdbarg) end true end def close DEBUG.print "DataBank: close #{@dbname}\n" primary.close secondary.close fileids.close if @bdb then @config.close @bdb_fileids.close end nil end ##parameters def primary unless @primary then self.primary = @config['primary_namespace'] end @primary end def primary=(pri_name) if !pri_name or pri_name.empty? then pri_name = 'UNIQUE' end @primary = @nsclass_pri.new(@dbname, pri_name) @primary end def secondary unless @secondary then self.secondary = @config['secondary_namespaces'] end @secondary end def secondary=(sec_names) if !sec_names then sec_names = [] end @secondary = NameSpaces.new(@dbname, @nsclass_sec, sec_names) @secondary end def format=(str) @format = str.to_s.dup end def format unless @format then self.format = @config['format'] end @format end def fileids unless @fileids then init_fileids end @fileids end def init_fileids if @bdb then @fileids = FileIDs.new('', @bdb_fileids) else @fileids = FileIDs.new('fileid_', @config) end @fileids end # high level methods def always_check=(bool) if bool then @always_check = true else @always_check = false end end attr_reader :always_check def get_flatfile_data(f, pos, length) fi = fileids[f.to_i] if @always_check then raise "flatfile #{fi.filename.inspect} may be modified" unless fi.check end fi.get(pos.to_i, length.to_i) end def search_all_get_unique_id(key) s = secondary.search(key) p = primary.include?(key) s.push p if p s.sort! s.uniq! s end def search_primary(*arg) r = Results.new arg.each do |x| a = primary.search(x) # a is empty or a.size==1 because primary key must be unique r.store(x, get_flatfile_data(*a[0])) unless a.empty? end r end def search_all(key) s = search_all_get_unique_id(key) search_primary(*s) end def search_primary_get_unique_id(key) s = [] p = primary.include?(key) s.push p if p s end def search_namespaces_get_unique_id(key, *names) if names.include?(primary.name) then n2 = names.dup n2.delete(primary.name) p = primary.include?(key) else n2 = names p = nil end s = secondary.search_names(key, *n2) s.push p if p s.sort! s.uniq! s end def search_namespaces(key, *names) s = search_namespaces_get_unique_id(key, *names) search_primary(*s) end def check_consistency fileids.check_all end end #class DataBank end #class FlatFileIndex end #module Bio bio-2.0.3/lib/bio/io/flatfile/splitter.rb0000644000175000017500000002036414141516614017532 0ustar nileshnilesh# # = bio/io/flatfile/splitter.rb - input data splitter for FlatFile # # Copyright (C) 2001-2008 Naohisa Goto # # License:: The Ruby License # # $Id:$ # # # See documents for Bio::FlatFile::Splitter and Bio::FlatFile. # require 'bio/io/flatfile' module Bio class FlatFile # The Bio::FlatFile::Splitter is a namespace for flatfile splitters. # Each splitter is a class to get entries from a buffered input stream. # # It is internally called in Bio::FlatFile. # Normally, users do not need to use it directly. module Splitter # This is a template of splitter. class Template # Creates a new splitter. def initialize(klass, bstream) @dbclass = klass @stream = bstream @entry_pos_flag = nil end # skips leader of the entry. def skip_leader raise NotImplementedError end # rewind the stream def rewind @stream.rewind end # Gets entry as a string. (String) def get_entry raise NotImplementedError end # Gets entry as a data class's object def get_parsed_entry ent = get_entry if ent then self.parsed_entry = dbclass.new(ent) else self.parsed_entry = ent end parsed_entry end # the last entry string read from the stream (String) attr_reader :entry # The last parsed entry read from the stream (entry data class). # Note that it is valid only after get_parsed_entry is called, # and the get_entry may not affect the parsed_entry attribute. attr_reader :parsed_entry # a flag to write down entry start and end positions attr_accessor :entry_pos_flag # start position of the entry attr_reader :entry_start_pos # (end position of the entry) + 1 attr_reader :entry_ended_pos #-- #private # ## to prevent warning message "warning: private attribute?", ## private attributes are explicitly declared. #++ # entry data class attr_reader :dbclass private :dbclass # input stream attr_reader :stream private :stream # the last entry string read from the stream attr_writer :entry private :entry= # the last entry as a parsed data object attr_writer :parsed_entry private :parsed_entry= # start position of the entry attr_writer :entry_start_pos private :entry_start_pos= # (end position of the entry) + 1 attr_writer :entry_ended_pos private :entry_ended_pos= # Does stream.pos if entry_pos_flag is not nil. # Otherwise, returns nil. def stream_pos entry_pos_flag ? stream.pos : nil end private :stream_pos end #class Template # Default splitter. # It sees following constants in the given class. # DELIMITER:: (String) delimiter indicates the end of a entry. # FLATFILE_HEADER:: (String) start of a entry, located on head of a line. # DELIMITER_OVERRUN:: (Integer) excess read size included in DELIMITER. # class Default < Template # Creates a new splitter. # klass:: database class # bstream:: input stream. It must be a BufferedInputStream object. def initialize(klass, bstream) super(klass, bstream) @delimiter = klass::DELIMITER rescue nil @header = klass::FLATFILE_HEADER rescue nil # for specific classes' benefit unless header if (defined?(Bio::GenBank) and klass == Bio::GenBank) or (defined?(Bio::GenPept) and klass == Bio::GenPept) @header = 'LOCUS ' end end @delimiter_overrun = klass::DELIMITER_OVERRUN rescue nil end # (String) delimiter indicates the end of a entry. attr_accessor :delimiter # (String) start of a entry, located on head of a line. attr_accessor :header # (Integer) excess read data size included in delimiter. attr_accessor :delimiter_overrun # Skips leader of the entry. # # If @header is not nil, it reads till the contents of @header # comes at the head of a line. # If correct FLATFILE_HEADER is found, returns true. # Otherwise, returns nil. def skip_leader if @header then data = '' while s = stream.gets(@header) data << s if data.split(/[\r\n]+/)[-1] == @header then stream.ungets(@header) return true end end # @header was not found. For safety, # pushes back data with removing white spaces in the head. data.sub(/\A\s+/, '') stream.ungets(data) return nil else stream.skip_spaces return nil end end # gets a entry def get_entry p0 = stream_pos() e = stream.gets(@delimiter) if e and @delimiter_overrun then if e[-@delimiter.size, @delimiter.size ] == @delimiter then overrun = e[-@delimiter_overrun, @delimiter_overrun] e[-@delimiter_overrun, @delimiter_overrun] = '' stream.ungets(overrun) end end p1 = stream_pos() self.entry_start_pos = p0 self.entry = e self.entry_ended_pos = p1 return entry end end #class Defalult # A splitter for line oriented text data. # # The given class's object must have following methods. # Klass#add_header_line(line) # Klass#add_line(line) # where 'line' is a string. They normally returns self. # If the line is not suitable to add to the current entry, # nil or false should be returned. # Then, the line is treated as (for add_header_line) the entry data # or (for add_line) the next entry's data. # class LineOriented < Template # Creates a new splitter. # klass:: database class # bstream:: input stream. It must be a BufferedInputStream object. def initialize(klass, bstream) super(klass, bstream) self.flag_to_fetch_header = true end # do nothing def skip_leader nil end # get an entry and return the entry as a string def get_entry if e = get_parsed_entry then entry else e end end # get an entry and return the entry as a data class object def get_parsed_entry p0 = stream_pos() ent = @dbclass.new() lines = [] line_overrun = nil if flag_to_fetch_header then while line = stream.gets("\n") unless ent.add_header_line(line) then line_overrun = line break end lines.push line end stream.ungets(line_overrun) if line_overrun line_overrun = nil self.flag_to_fetch_header = false end while line = stream.gets("\n") unless ent.add_line(line) then line_overrun = line break end lines.push line end stream.ungets(line_overrun) if line_overrun p1 = stream_pos() return nil if lines.empty? self.entry_start_pos = p0 self.entry = lines.join('') self.parsed_entry = ent self.entry_ended_pos = p1 return ent end # rewinds the stream def rewind ret = super self.flag_to_fetch_header = true ret end #-- #private methods / attributes #++ # flag to fetch header attr_accessor :flag_to_fetch_header private :flag_to_fetch_header private :flag_to_fetch_header= end #class LineOriented end #module Splitter end #class FlatFile end #module Bio bio-2.0.3/lib/bio/io/flatfile/indexer.rb0000644000175000017500000006545114141516614017330 0ustar nileshnilesh# # = bio/io/flatfile/indexer.rb - OBDA flatfile indexer # # Copyright:: Copyright (C) 2002 GOTO Naohisa # License:: The Ruby License # # $Id: indexer.rb,v 1.26 2007/12/11 15:13:32 ngoto Exp $ # require 'bio/io/flatfile/index' module Bio class FlatFileIndex module Indexer class NameSpace def initialize(name, method) @name = name @proc = method end attr_reader :name, :proc end #class NameSpace class NameSpaces < Hash def initialize(*arg) super() arg.each do |x| self.store(x.name, x) end end def names self.keys end def <<(x) self.store(x.name, x) end def add(x) self.store(x.name, x) end #alias each_orig each alias each each_value end module Parser def self.new(format, *arg) case format.to_s when 'embl', 'Bio::EMBL' EMBLParser.new(*arg) when 'swiss', 'Bio::SPTR', 'Bio::TrEMBL', 'Bio::SwissProt' SPTRParser.new(*arg) when 'genbank', 'Bio::GenBank', 'Bio::RefSeq', 'Bio::DDBJ' GenBankParser.new(*arg) when 'Bio::GenPept' GenPeptParser.new(*arg) when 'fasta', 'Bio::FastaFormat' FastaFormatParser.new(*arg) when 'Bio::FANTOM::MaXML::Sequence' MaXMLSequenceParser.new(*arg) when 'Bio::FANTOM::MaXML::Cluster' MaXMLClusterParser.new(*arg) when 'Bio::Blast::Default::Report' BlastDefaultParser.new(Bio::Blast::Default::Report, *arg) when 'Bio::Blast::Default::Report_TBlast' BlastDefaultParser.new(Bio::Blast::Default::Report_TBlast, *arg) when 'Bio::Blast::WU::Report' BlastDefaultParser.new(Bio::Blast::WU::Report, *arg) when 'Bio::Blast::WU::Report_TBlast' BlastDefaultParser.new(Bio::Blast::WU::Report_TBlast, *arg) when 'Bio::PDB::ChemicalComponent' PDBChemicalComponentParser.new(Bio::PDB::ChemicalComponent, *arg) else raise 'unknown or unsupported format' end #case dbclass.to_s end class TemplateParser NAMESTYLE = NameSpaces.new def initialize @namestyle = self.class::NAMESTYLE @secondary = NameSpaces.new @errorlog = [] end attr_reader :primary, :secondary, :format, :dbclass attr_reader :errorlog def set_primary_namespace(name) DEBUG.print "set_primary_namespace: #{name.inspect}\n" if name.is_a?(NameSpace) then @primary = name else @primary = @namestyle[name] end raise 'unknown primary namespace' unless @primary @primary end def add_secondary_namespaces(*names) DEBUG.print "add_secondary_namespaces: #{names.inspect}\n" names.each do |x| unless x.is_a?(NameSpace) then y = @namestyle[x] raise 'unknown secondary namespace' unless y @secondary << y end end true end # administration of a single flatfile def open_flatfile(fileid, file) @fileid = fileid @flatfilename = file DEBUG.print "fileid=#{fileid} file=#{@flatfilename.inspect}\n" @flatfile = Bio::FlatFile.open(@dbclass, file, 'rb') @flatfile.raw = nil @flatfile.entry_pos_flag = true @entry = nil end attr_reader :fileid def each @flatfile.each do |x| @entry = x pos = @flatfile.entry_start_pos len = @flatfile.entry_ended_pos - @flatfile.entry_start_pos begin yield pos, len rescue RuntimeError, NameError => evar DEBUG.print "Caught error: #{evar.inspect}\n" DEBUG.print "in #{@flatfilename.inspect} position #{pos}\n" DEBUG.print "===begin===\n" DEBUG.print @flatfile.entry_raw.to_s.chomp DEBUG.print "\n===end===\n" @errorlog << [ evar, @flatfilename, pos ] if @fatal then DEBUG.print "Fatal error occurred, stop creating index...\n" raise evar else DEBUG.print "This entry shall be incorrectly indexed.\n" end end #rescue end end def parse_primary r = self.primary.proc.call(@entry) unless r.is_a?(String) and r.length > 0 #@fatal = true raise 'primary id must be a non-void string (skipped this entry)' end r end def parse_secondary self.secondary.each do |x| p = x.proc.call(@entry) p.each do |y| yield x.name, y if y.length > 0 end end end def close_flatfile DEBUG.print "close flatfile #{@flatfilename.inspect}\n" @flatfile.close end protected attr_writer :format, :dbclass end #class TemplateParser class GenBankParser < TemplateParser NAMESTYLE = NameSpaces.new( NameSpace.new( 'VERSION', Proc.new { |x| x.acc_version } ), NameSpace.new( 'LOCUS', Proc.new { |x| x.entry_id } ), NameSpace.new( 'ACCESSION', Proc.new { |x| x.accessions } ), NameSpace.new( 'GI', Proc.new { |x| x.gi.to_s.gsub(/\AGI\:/, '') } ) ) PRIMARY = 'VERSION' def initialize(pri_name = nil, sec_names = nil) super() self.format = 'genbank' self.dbclass = Bio::GenBank self.set_primary_namespace((pri_name or PRIMARY)) unless sec_names then sec_names = [] @namestyle.each_value do |x| sec_names << x.name if x.name != self.primary.name end end self.add_secondary_namespaces(*sec_names) end end #class GenBankParser class GenPeptParser < GenBankParser def initialize(*arg) super(*arg) self.dbclass = Bio::GenPept end end #class GenPeptParser class EMBLParser < TemplateParser NAMESTYLE = NameSpaces.new( NameSpace.new( 'ID', Proc.new { |x| x.entry_id } ), NameSpace.new( 'AC', Proc.new { |x| x.accessions } ), NameSpace.new( 'SV', Proc.new { |x| x.sv } ), NameSpace.new( 'DR', Proc.new { |x| y = [] x.dr.each_value { |z| y << z } y.flatten! y.find_all { |z| z.length > 1 } } ) ) PRIMARY = 'ID' SECONDARY = [ 'AC', 'SV' ] def initialize(pri_name = nil, sec_names = nil) super() self.format = 'embl' self.dbclass = Bio::EMBL self.set_primary_namespace((pri_name or PRIMARY)) unless sec_names then sec_names = self.class::SECONDARY end self.add_secondary_namespaces(*sec_names) end end #class EMBLParser class SPTRParser < EMBLParser SECONDARY = [ 'AC' ] def initialize(*arg) super(*arg) self.format = 'swiss' self.dbclass = Bio::SPTR end end #class SPTRParser class FastaFormatParser < TemplateParser NAMESTYLE = NameSpaces.new( NameSpace.new( 'UNIQUE', nil ), NameSpace.new( 'entry_id', Proc.new { |x| x.entry_id } ), NameSpace.new( 'accession', Proc.new { |x| x.accessions } ), NameSpace.new( 'id_string', Proc.new { |x| x.identifiers.id_strings }), NameSpace.new( 'word', Proc.new { |x| x.identifiers.words }) ) PRIMARY = 'UNIQUE' SECONDARY = [ 'entry_id', 'accession', 'id_string', 'word' ] def unique_primary_key r = "#{@flatfilename}:#{@count}" @count += 1 r end private :unique_primary_key def parse_primary if p = self.primary.proc then r = p.call(@entry) unless r.is_a?(String) and r.length > 0 #@fatal = true raise 'primary id must be a non-void string (skipped this entry)' end r else unique_primary_key end end def initialize(pri_name = nil, sec_names = nil) super() self.format = 'fasta' self.dbclass = Bio::FastaFormat self.set_primary_namespace((pri_name or PRIMARY)) unless sec_names then sec_names = self.class::SECONDARY end self.add_secondary_namespaces(*sec_names) end def open_flatfile(fileid, file) super @count = 1 @flatfilename_base = File.basename(@flatfilename) @flatfile.pos = 0 begin pos = @flatfile.pos line = @flatfile.gets end until (!line or line =~ /^\>/) @flatfile.pos = pos end end #class FastaFormatParser class MaXMLSequenceParser < TemplateParser NAMESTYLE = NameSpaces.new( NameSpace.new( 'id', Proc.new { |x| x.entry_id } ), NameSpace.new( 'altid', Proc.new { |x| x.id_strings } ), NameSpace.new( 'gene_ontology', Proc.new { |x| x.annotations.get_all_by_qualifier('gene_ontology').collect { |y| y.anntext } }), NameSpace.new( 'datasrc', Proc.new { |x| a = [] x.annotations.each { |y| y.datasrc.each { |z| a << z.split('|',2)[-1] a << z } } a.sort! a.uniq! a }) ) PRIMARY = 'id' SECONDARY = [ 'altid', 'gene_ontology', 'datasrc' ] def initialize(pri_name = nil, sec_names = nil) super() self.format = 'raw' self.dbclass = Bio::FANTOM::MaXML::Sequence self.set_primary_namespace((pri_name or PRIMARY)) unless sec_names then sec_names = self.class::SECONDARY end self.add_secondary_namespaces(*sec_names) end end #class MaXMLSequenceParser class MaXMLClusterParser < TemplateParser NAMESTYLE = NameSpaces.new( NameSpace.new( 'id', Proc.new { |x| x.entry_id } ), NameSpace.new( 'altid', Proc.new { |x| x.sequences.id_strings } ), NameSpace.new( 'datasrc', Proc.new { |x| a = x.sequences.collect { |y| MaXMLSequenceParser::NAMESTYLE['datasrc'].proc.call(y) } a.flatten! a.sort! a.uniq! a }), NameSpace.new( 'gene_ontology', Proc.new { |x| a = x.sequences.collect { |y| MaXMLSequenceParser::NAMESTYLE['gene_ontology'].proc.call(y) } a.flatten! a.sort! a.uniq! a }) ) PRIMARY = 'id' SECONDARY = [ 'altid', 'gene_ontology', 'datasrc' ] def initialize(pri_name = nil, sec_names = nil) super() self.format = 'raw' self.dbclass = Bio::FANTOM::MaXML::Cluster self.set_primary_namespace((pri_name or PRIMARY)) unless sec_names then sec_names = self.class::SECONDARY end self.add_secondary_namespaces(*sec_names) end end #class MaXMLSequenceParser class BlastDefaultParser < TemplateParser NAMESTYLE = NameSpaces.new( NameSpace.new( 'QUERY', Proc.new { |x| x.query_def } ), NameSpace.new( 'query_id', Proc.new { |x| a = Bio::FastaDefline.new(x.query_def.to_s).id_strings a << x.query_def.to_s.split(/\s+/,2)[0] a } ), NameSpace.new( 'hit', Proc.new { |x| a = x.hits.collect { |y| b = Bio::FastaDefline.new(y.definition.to_s).id_strings b << y.definition b << y.definition.to_s.split(/\s+/,2)[0] b } a.flatten! a } ) ) PRIMARY = 'QUERY' SECONDARY = [ 'query_id', 'hit' ] def initialize(klass, pri_name = nil, sec_names = nil) super() self.format = 'raw' self.dbclass = klass self.set_primary_namespace((pri_name or PRIMARY)) unless sec_names then sec_names = [] @namestyle.each_value do |x| sec_names << x.name if x.name != self.primary.name end end self.add_secondary_namespaces(*sec_names) end def open_flatfile(fileid, file) super @flatfile.rewind @flatfile.dbclass = nil @flatfile.autodetect @flatfile.dbclass = self.dbclass unless @flatfile.dbclass @flatfile.rewind begin pos = @flatfile.pos line = @flatfile.gets end until (!line or line =~ /^T?BLAST/) @flatfile.pos = pos end end #class BlastDefaultReportParser class PDBChemicalComponentParser < TemplateParser NAMESTYLE = NameSpaces.new( NameSpace.new( 'UNIQUE', Proc.new { |x| x.entry_id } ) ) PRIMARY = 'UNIQUE' def initialize(klass, pri_name = nil, sec_names = nil) super() self.format = 'raw' self.dbclass = Bio::PDB::ChemicalComponent self.set_primary_namespace((pri_name or PRIMARY)) unless sec_names then sec_names = [] @namestyle.each_value do |x| sec_names << x.name if x.name != self.primary.name end end self.add_secondary_namespaces(*sec_names) end def open_flatfile(fileid, file) super @flatfile.pos = 0 begin pos = @flatfile.pos line = @flatfile.gets end until (!line or line =~ /^RESIDUE /) @flatfile.pos = pos end end #class PDBChemicalComponentParser end #module Parser def self.makeindexBDB(name, parser, options, *files) # options are not used in this method unless defined?(BDB) raise RuntimeError, "Berkeley DB support not found" end DEBUG.print "makeing BDB DataBank...\n" db = DataBank.new(name, MAGIC_BDB) db.format = parser.format db.fileids.add(*files) db.fileids.recalc db.primary = parser.primary.name db.secondary = parser.secondary.names DEBUG.print "writing config.dat, config, fileids ...\n" db.write('wb', BDBdefault::flag_write) DEBUG.print "reading files...\n" addindex_bdb(db, BDBdefault::flag_write, (0...(files.size)), parser, options) db.close true end #def def self.addindex_bdb(db, flag, need_update, parser, options) DEBUG.print "reading files...\n" pn = db.primary pn.file.close pn.file.flag = flag db.secondary.each_files do |x| x.file.close x.file.flag = flag x.file.open x.file.close end need_update.each do |fileid| filename = db.fileids[fileid].filename parser.open_flatfile(fileid, filename) parser.each do |pos, len| p = parser.parse_primary #pn.file.add_exclusive(p, [ fileid, pos, len ]) pn.file.add_overwrite(p, [ fileid, pos, len ]) #DEBUG.print "#{p} #{fileid} #{pos} #{len}\n" parser.parse_secondary do |sn, sp| db.secondary[sn].file.add_nr(sp, p) #DEBUG.print "#{sp} #{p}\n" end end parser.close_flatfile end true end #def def self.makeindexFlat(name, parser, options, *files) DEBUG.print "makeing flat/1 DataBank using temporary files...\n" db = DataBank.new(name, nil) db.format = parser.format db.fileids.add(*files) db.primary = parser.primary.name db.secondary = parser.secondary.names db.fileids.recalc DEBUG.print "writing DabaBank...\n" db.write('wb') addindex_flat(db, :new, (0...(files.size)), parser, options) db.close true end #def def self.addindex_flat(db, mode, need_update, parser, options) require 'tempfile' prog = options['sort_program'] env = options['env_program'] env_args = options['env_program_arguments'] return false if need_update.to_a.size == 0 DEBUG.print "prepare temporary files...\n" tempbase = "bioflat#{rand(10000)}-" pfile = Tempfile.open(tempbase + 'primary-') DEBUG.print "open temporary file #{pfile.path.inspect}\n" sfiles = {} parser.secondary.names.each do |x| sfiles[x] = Tempfile.open(tempbase + 'secondary-') DEBUG.print "open temporary file #{sfiles[x].path.inspect}\n" end DEBUG.print "reading files...\n" need_update.each do |fileid| filename = db.fileids[fileid].filename parser.open_flatfile(fileid, filename) parser.each do |pos, len| p = parser.parse_primary pfile << "#{p}\t#{fileid}\t#{pos}\t#{len}\n" #DEBUG.print "#{p} #{fileid} #{pos} #{len}\n" parser.parse_secondary do |sn, sp| sfiles[sn] << "#{sp}\t#{p}\n" #DEBUG.print "#{sp} #{p}\n" end end parser.close_flatfile fileid += 1 end sort_proc = chose_sort_proc(prog, mode, env, env_args) pfile.close(false) DEBUG.print "sorting primary (#{parser.primary.name})...\n" db.primary.file.import_tsv_files(true, mode, sort_proc, pfile.path) pfile.close(true) parser.secondary.names.each do |x| DEBUG.print "sorting secondary (#{x})...\n" sfiles[x].close(false) db.secondary[x].file.import_tsv_files(false, mode, sort_proc, sfiles[x].path) sfiles[x].close(true) end true end #def # default sort program DEFAULT_SORT = '/usr/bin/sort' # default env program (run a program in a modified environment) DEFAULT_ENV = '/usr/bin/env' # default arguments for env program DEFAULT_ENV_ARGS = [ 'LC_ALL=C' ] def self.chose_sort_proc(prog, mode = :new, env = nil, env_args = nil) case prog when /^builtin$/i, /^hs$/i, /^lm$/i DEBUG.print "sort: internal sort routine\n" sort_proc = Flat_1::FlatMappingFile::internal_sort_proc when nil, '' if FileTest.executable?(DEFAULT_SORT) return chose_sort_proc(DEFAULT_SORT, mode, env, env_args) else DEBUG.print "sort: internal sort routine\n" sort_proc = Flat_1::FlatMappingFile::internal_sort_proc end else env_args ||= DEFAULT_ENV_ARGS if env == '' or env == false then # inhibit to use env program prefixes = [ prog ] elsif env then # uses given env program prefixes = [ env ] + env_args + [ prog ] else # env == nil; uses default env program if possible if FileTest.executable?(DEFAULT_ENV) prefixes = [ DEFAULT_ENV ] + env_args + [ prog ] else prefixes = [ prog ] end end DEBUG.print "sort: #{prefixes.join(' ')}\n" if mode == :new then sort_proc = Flat_1::FlatMappingFile::external_sort_proc(prefixes) else sort_proc = Flat_1::FlatMappingFile::external_merge_sort_proc(prefixes) end end sort_proc end def self.update_index(name, parser, options, *files) db = DataBank.open(name) if parser then raise 'file format mismatch' if db.format != parser.format else begin dbclass_orig = Bio::FlatFile.autodetect_file(db.fileids[0].filename) rescue TypeError, Errno::ENOENT end begin dbclass_new = Bio::FlatFile.autodetect_file(files[0]) rescue TypeError, Errno::ENOENT end case db.format when 'swiss', 'embl' parser = Parser.new(db.format) if dbclass_new and dbclass_new != parser.dbclass raise 'file format mismatch' end when 'genbank' dbclass = dbclass_orig or dbclass_new if dbclass == Bio::GenBank or dbclass == Bio::GenPept parser = Parser.new(dbclass_orig) elsif !dbclass then raise 'cannnot determine format. please specify manually.' else raise 'file format mismatch' end if dbclass_new and dbclass_new != parser.dbclass raise 'file format mismatch' end else raise 'unsupported format' end end parser.set_primary_namespace(db.primary.name) parser.add_secondary_namespaces(*db.secondary.names) if options['renew'] then newfiles = db.fileids.filenames.find_all do |x| FileTest.exist?(x) end newfiles.concat(files) newfiles2 = newfiles.sort newfiles2.uniq! newfiles3 = [] newfiles.each do |x| newfiles3 << x if newfiles2.delete(x) end t = db.index_type db.close case t when MAGIC_BDB Indexer::makeindexBDB(name, parser, options, *newfiles3) when MAGIC_FLAT Indexer::makeindexFlat(name, parser, options, *newfiles3) else raise 'Unsupported index type' end return true end need_update = [] newfiles = files.dup db.fileids.cache_all db.fileids.each_with_index do |f, i| need_update << i unless f.check newfiles.delete(f.filename) end b = db.fileids.size begin db.fileids.recalc rescue Errno::ENOENT => evar DEBUG.print "Error: #{evar}\n" DEBUG.print "assumed --renew option\n" db.close options = options.dup options['renew'] = true update_index(name, parser, options, *files) return true end # add new files db.fileids.add(*newfiles) db.fileids.recalc need_update.concat((b...(b + newfiles.size)).to_a) DEBUG.print "writing DabaBank...\n" db.write('wb', BDBdefault::flag_append) case db.index_type when MAGIC_BDB addindex_bdb(db, BDBdefault::flag_append, need_update, parser, options) when MAGIC_FLAT addindex_flat(db, :add, need_update, parser, options) else raise 'Unsupported index type' end db.close true end #def end #module Indexer ############################################################## def self.formatstring2class(format_string) case format_string when /genbank/i dbclass = Bio::GenBank when /genpept/i dbclass = Bio::GenPept when /embl/i dbclass = Bio::EMBL when /sptr/i dbclass = Bio::SPTR when /fasta/i dbclass = Bio::FastaFormat else raise "Unsupported format : #{format}" end end def self.makeindex(is_bdb, dbname, format, options, *files) if format then dbclass = formatstring2class(format) else dbclass = Bio::FlatFile.autodetect_file(files[0]) raise "Cannot determine format" unless dbclass DEBUG.print "file format is #{dbclass}\n" end options = {} unless options pns = options['primary_namespace'] sns = options['secondary_namespaces'] parser = Indexer::Parser.new(dbclass, pns, sns) #if /(EMBL|SPTR)/ =~ dbclass.to_s then #a = [ 'DR' ] #parser.add_secondary_namespaces(*a) #end if sns = options['additional_secondary_namespaces'] then parser.add_secondary_namespaces(*sns) end if is_bdb then Indexer::makeindexBDB(dbname, parser, options, *files) else Indexer::makeindexFlat(dbname, parser, options, *files) end end #def makeindex def self.update_index(dbname, format, options, *files) if format then parser = Indexer::Parser.new(dbclass) else parser = nil end Indexer::update_index(dbname, parser, options, *files) end #def update_index end #class FlatFileIndex end #module Bio =begin = Bio::FlatFile --- Bio::FlatFile.makeindex(is_bdb, dbname, format, options, *files) Create index files (called a databank) of given files. --- Bio::FlatFile.update_index(dbname, format, options, *files) Add entries to databank. =end bio-2.0.3/lib/bio/io/flatfile/bdb.rb0000644000175000017500000001306314141516614016411 0ustar nileshnilesh# # bio/io/flatfile/bdb.rb - OBDA flatfile index by Berkley DB # # Copyright:: Copyright (C) 2002 GOTO Naohisa # License:: The Ruby License # # $Id: bdb.rb,v 1.10 2007/04/05 23:35:41 trevor Exp $ # begin require 'bdb' rescue LoadError,NotImplementedError end require 'bio/io/flatfile/index' require 'bio/io/flatfile/indexer' module Bio class FlatFileIndex module BDBdefault def permission (0666 & (0777 ^ File.umask)) end module_function :permission def flag_read BDB::RDONLY end module_function :flag_read def flag_write (BDB::CREATE | BDB::TRUNCATE) end module_function :flag_write def flag_append 'r+' end module_function :flag_append end #module BDBdefault class BDBwrapper def initialize(name, filename, *arg) @dbname = name @file = nil @filename = filename #self.open(*arg) end def filename File.join(@dbname, @filename) end def open(flag = BDBdefault.flag_read, permission = BDBdefault.permission) unless @file then DEBUG.print "BDBwrapper: open #{filename}\n" @file = BDB::Btree.open(filename, nil, flag, permission) end true end def close if @file DEBUG.print "BDBwrapper: close #{filename}\n" @file.close @file = nil end nil end def [](arg) #self.open if @file then @file[arg] else nil end end def []=(key, val) #self.open @file[key.to_s] = val.to_s end def writeback_array(prefix, array, *arg) self.close self.open(*arg) array.each_with_index do |val, key| @file["#{prefix}#{key}"] = val.to_s end end def keys if @file then @file.keys else [] end end end #class BDBwrapper module BDB_1 class BDBMappingFile def self.open(*arg) self.new(*arg) end def initialize(filename, flag = BDBdefault.flag_read, permission = BDBdefault.permission) @filename = filename @flag = flag @permission = permission #@bdb = BDB::Btree.open(@filename, nil, @flag, @permission) end attr_reader :filename attr_accessor :flag, :permission def open unless @bdb then DEBUG.print "BDBMappingFile: open #{@filename}\n" @bdb = BDB::Btree.open(@filename, nil, @flag, @permission) true else nil end end def close if @bdb then DEBUG.print "BDBMappingFile: close #{@filename}\n" @bdb.close @bdb = nil end nil end def records @bdb.size end alias size records # methods for writing def add(key, val) open val = val.to_a.join("\t") s = @bdb[key] if s then s << "\t" s << val val = s end @bdb[key] = val #DEBUG.print "add: key=#{key.inspect}, val=#{val.inspect}\n" val end def add_exclusive(key, val) open val = val.to_a.join("\t") s = @bdb[key] if s then raise RuntimeError, "keys must be unique, but key #{key.inspect} already exists" end @bdb[key] = val #DEBUG.print "add_exclusive: key=#{key.inspect}, val=#{val.inspect}\n" val end def add_overwrite(key, val) open val = val.to_a.join("\t") s = @bdb[key] if s then DEBUG.print "Warining: overwrote unique id #{key.inspect}\n" end @bdb[key] = val #DEBUG.print "add_overwrite: key=#{key.inspect}, val=#{val.inspect}\n" val end def add_nr(key, val) open s = @bdb[key] if s then a = s.split("\t") else a = [] end a.concat val.to_a a.sort! a.uniq! str = a.join("\t") @bdb[key] = str #DEBUG.print "add_nr: key=#{key.inspect}, val=#{str.inspect}\n" str end # methods for searching def search(key) open s = @bdb[key] if s then a = s.split("\t") a else [] end end end #class BDBMappingFile class PrimaryNameSpace < Template::NameSpace def mapping(filename) BDBMappingFile.new(filename) end def filename File.join(dbname, "key_#{name}") end def search(key) r = super(key) unless r.empty? then [ r ] else r end end end #class PrimaryNameSpace class SecondaryNameSpace < Template::NameSpace def mapping(filename) BDBMappingFile.new(filename) end def filename File.join(dbname, "id_#{name}") end #class SecondaryNameSpaces def search(key) r = super(key) file.close r end end #class SecondaryNameSpace end #module BDB_1 end #class FlatFileIndex end #module Bio =begin * Classes/modules in this file are internal use only. =end bio-2.0.3/lib/bio/io/flatfile/autodetection.rb0000644000175000017500000004103314141516614020527 0ustar nileshnilesh# # = bio/io/flatfile/autodetection.rb - file format auto-detection # # Copyright (C) 2001-2006 Naohisa Goto # # License:: The Ruby License # # $Id:$ # # # See documents for Bio::FlatFile::AutoDetect and Bio::FlatFile. # require 'tsort' require 'bio/io/flatfile' module Bio class FlatFile # AutoDetect automatically determines database class of given data. class AutoDetect include TSort # Array to store autodetection rules. # This is defined only for inspect. class RulesArray < Array # visualize contents def inspect "[#{self.collect { |e| e.name.inspect }.join(' ')}]" end end #class RulesArray # Template of a single rule of autodetection class RuleTemplate # Creates a new element. def self.[](*arg) self.new(*arg) end # Creates a new element. def initialize @higher_priority_elements = RulesArray.new @lower_priority_elements = RulesArray.new @name = nil end # self is prior to the _elem_. def is_prior_to(elem) return nil if self == elem elem.higher_priority_elements << self self.lower_priority_elements << elem true end # higher priority elements attr_reader :higher_priority_elements # lower priority elements attr_reader :lower_priority_elements # database classes attr_reader :dbclasses # unique name of the element attr_accessor :name # If given text (and/or meta information) is known, returns # the database class. # Otherwise, returns nil or false. # # _text_ will be a String. # _meta_ will be a Hash. # _meta_ may contain following keys. # :path => pathname, filename or uri. def guess(text, meta) nil end private # Gets constant from constant name given as a string. def str2const(str) const = Object str.split(/\:\:/).each do |x| const = const.const_get(x) end const end # Gets database class from given object. # Current implementation is: # if _obj_ is kind of String, regarded as a constant. # Otherwise, returns _obj_ as is. def get_dbclass(obj) obj.kind_of?(String) ? str2const(obj) : obj end end #class Rule_Template # RuleDebug is a class for debugging autodetect classes/methods class RuleDebug < RuleTemplate # Creates a new instance. def initialize(name) super() @name = name end # prints information to the $stderr. def guess(text, meta) $stderr.puts @name $stderr.puts text.inspect $stderr.puts meta.inspect nil end end #class RuleDebug # Special element that is always top or bottom priority. class RuleSpecial < RuleTemplate def initialize(name) #super() @name = name end # modification of @name is inhibited. def name=(x) raise 'cannot modify name' end # always returns void array def higher_priority_elements [] end # always returns void array def lower_priority_elements [] end end #class RuleSpecial # Special element that is always top priority. TopRule = RuleSpecial.new('top') # Special element that is always bottom priority. BottomRule = RuleSpecial.new('bottom') # A autodetection rule to use a regular expression class RuleRegexp < RuleTemplate # Creates a new instance. def initialize(dbclass, re) super() @re = re @name = dbclass.to_s @dbclass = nil @dbclass_lazy = dbclass end # database class (lazy evaluation) def dbclass unless @dbclass @dbclass = get_dbclass(@dbclass_lazy) end @dbclass end private :dbclass # returns database classes def dbclasses [ dbclass ] end # If given text matches the regexp, returns the database class. # Otherwise, returns nil or false. # _meta_ is ignored. def guess(text, meta) @re =~ text ? dbclass : nil end end #class RuleRegexp # A autodetection rule to use more than two regular expressions. # If given string matches one of the regular expressions, # returns the database class. class RuleRegexp2 < RuleRegexp # Creates a new instance. def initialize(dbclass, *regexps) super(dbclass, nil) @regexps = regexps end # If given text matches one of the regexp, returns the database class. # Otherwise, returns nil or false. # _meta_ is ignored. def guess(text, meta) @regexps.each do |re| return dbclass if re =~ text end nil end end #class RuleRegexp # A autodetection rule that passes data to the proc object. class RuleProc < RuleTemplate # Creates a new instance. def initialize(*dbclasses, &proc) super() @proc = proc @dbclasses = nil @dbclasses_lazy = dbclasses @name = dbclasses.collect { |x| x.to_s }.join('|') end # database classes (lazy evaluation) def dbclasses unless @dbclasses @dbclasses = @dbclasses_lazy.collect { |x| get_dbclass(x) } end @dbclasses end # If given text (and/or meta information) is known, returns # the database class. # Otherwise, returns nil or false. # # Refer RuleTemplate#guess for _meta_. def guess(text, meta) @proc.call(text) end end #class RuleProc # Creates a new Autodetect object def initialize # stores autodetection rules. @rules = Hash.new # stores elements (cache) @elements = nil self.add(TopRule) self.add(BottomRule) end # Adds a new element. # Returns _elem_. def add(elem) raise 'element name conflicts' if @rules[elem.name] @elements = nil @rules[elem.name] = elem elem end # (required by TSort.) # For all elements, yields each element. def tsort_each_node(&x) @rules.each_value(&x) end # (required by TSort.) # For a given element, yields each child # (= lower priority elements) of the element. def tsort_each_child(elem) if elem == TopRule then @rules.each_value do |e| yield e unless e == TopRule or e.lower_priority_elements.index(TopRule) end elsif elem == BottomRule then @rules.each_value do |e| yield e if e.higher_priority_elements.index(BottomRule) end else elem.lower_priority_elements.each do |e| yield e if e != BottomRule end unless elem.higher_priority_elements.index(BottomRule) yield BottomRule end end end # Returns current elements as an array # whose order fulfills all elements' priorities. def elements unless @elements ary = tsort ary.reverse! @elements = ary end @elements end # rebuilds the object and clears internal cache. def rehash @rules.rehash @elements = nil end # visualizes the object (mainly for debug) def inspect "<#{self.class.to_s} " + self.elements.collect { |e| e.name.inspect }.join(' ') + ">" end # Iterates over each element. def each_rule(&x) #:yields: elem elements.each(&x) end # Autodetect from the text. # Returns a database class if succeeded. # Returns nil if failed. def autodetect(text, meta = {}) r = nil elements.each do |e| #$stderr.puts e.name r = e.guess(text, meta) break if r end r end # autodetect from the FlatFile object. # Returns a database class if succeeded. # Returns nil if failed. def autodetect_flatfile(ff, lines = 31) meta = {} stream = ff.instance_eval { @stream } begin path = stream.path rescue NameError end if path then meta[:path] = path # call autodetect onece with meta and without any read action if r = self.autodetect(stream.prefetch_buffer, meta) return r end end # reading stream 1.upto(lines) do |x| break unless line = stream.prefetch_gets if line.strip.size > 0 then if r = self.autodetect(stream.prefetch_buffer, meta) return r end end end return nil end # default autodetect object for class method @default = nil # returns the default autodetect object def self.default unless @default then @default = self.make_default end @default end # sets the default autodetect object. def self.default=(ad) @default = ad end # make a new autodetect object def self.[](*arg) a = self.new arg.each { |e| a.add(e) } a end # make a default of default autodetect object def self.make_default a = self[ genbank = RuleRegexp[ 'Bio::GenBank', /^LOCUS .+ bp .*[a-z]*[DR]?NA/ ], genpept = RuleRegexp[ 'Bio::GenPept', /^LOCUS .+ aa .+/ ], medline = RuleRegexp[ 'Bio::MEDLINE', /^PMID\- [0-9]+$/ ], embl = RuleRegexp[ 'Bio::EMBL', /^ID .+\; .*(DNA|RNA|XXX)\;/ ], sptr = RuleRegexp2[ 'Bio::SPTR', /^ID .+\; *PRT\;/, /^ID [-A-Za-z0-9_\.]+ .+\; *[0-9]+ *AA\./ ], prosite = RuleRegexp[ 'Bio::PROSITE', /^ID [-A-Za-z0-9_\.]+\; (PATTERN|RULE|MATRIX)\.$/ ], transfac = RuleRegexp[ 'Bio::TRANSFAC', /^AC [-A-Za-z0-9_\.]+$/ ], aaindex = RuleProc.new('Bio::AAindex1', 'Bio::AAindex2') do |text| if /^H [-A-Z0-9_\.]+$/ =~ text then if text =~ /^M [rc]/ then Bio::AAindex2 elsif text =~ /^I A\/L/ then Bio::AAindex1 else false #fail to determine end else nil end end, litdb = RuleRegexp[ 'Bio::LITDB', /^CODE [0-9]+$/ ], pathway_module = RuleRegexp[ 'Bio::KEGG::MODULE', /^ENTRY .+ Pathway\s+Module\s*/ ], pathway = RuleRegexp[ 'Bio::KEGG::PATHWAY', /^ENTRY .+ Pathway\s*/ ], brite = RuleRegexp[ 'Bio::KEGG::BRITE', /^Entry [A-Z0-9]+/ ], orthology = RuleRegexp[ 'Bio::KEGG::ORTHOLOGY', /^ENTRY .+ KO\s*/ ], drug = RuleRegexp[ 'Bio::KEGG::DRUG', /^ENTRY .+ Drug\s*/ ], glycan = RuleRegexp[ 'Bio::KEGG::GLYCAN', /^ENTRY .+ Glycan\s*/ ], enzyme = RuleRegexp2[ 'Bio::KEGG::ENZYME', /^ENTRY EC [0-9\.]+$/, /^ENTRY .+ Enzyme\s*/ ], compound = RuleRegexp2[ 'Bio::KEGG::COMPOUND', /^ENTRY C[A-Za-z0-9\._]+$/, /^ENTRY .+ Compound\s*/ ], reaction = RuleRegexp2[ 'Bio::KEGG::REACTION', /^ENTRY R[A-Za-z0-9\._]+$/, /^ENTRY .+ Reaction\s*/ ], genes = RuleRegexp[ 'Bio::KEGG::GENES', /^ENTRY .+ (CDS|gene|.*RNA|Contig) / ], genome = RuleRegexp[ 'Bio::KEGG::GENOME', /^ENTRY [a-z]+$/ ], fantom = RuleProc.new('Bio::FANTOM::MaXML::Cluster', 'Bio::FANTOM::MaXML::Sequence') do |text| if /\<\!DOCTYPE\s+maxml\-(sequences|clusters)\s+SYSTEM/ =~ text case $1 when 'clusters' Bio::FANTOM::MaXML::Cluster when 'sequences' Bio::FANTOM::MaXML::Sequence else nil #unknown end else nil end end, pdb = RuleRegexp[ 'Bio::PDB', /^HEADER .{40}\d\d\-[A-Z]{3}\-\d\d [0-9A-Z]{4}/ ], het = RuleRegexp[ 'Bio::PDB::ChemicalComponent', /^RESIDUE +.+ +\d+\s*$/ ], clustal = RuleRegexp2[ 'Bio::ClustalW::Report', /^CLUSTAL .*\(.*\).*sequence +alignment/, /^CLUSTAL FORMAT for T-COFFEE/ ], gcg_msf = RuleRegexp[ 'Bio::GCG::Msf', /^!!(N|A)A_MULTIPLE_ALIGNMENT .+/ ], gcg_seq = RuleRegexp[ 'Bio::GCG::Seq', /^!!(N|A)A_SEQUENCE .+/ ], blastxml = RuleRegexp[ 'Bio::Blast::Report', /\<\!DOCTYPE BlastOutput PUBLIC / ], wublast = RuleRegexp[ 'Bio::Blast::WU::Report', /^BLAST.? +[\-\.\w]+\-WashU +\[[\-\.\w ]+\]/ ], wutblast = RuleRegexp[ 'Bio::Blast::WU::Report_TBlast', /^TBLAST.? +[\-\.\w]+\-WashU +\[[\-\.\w ]+\]/ ], blast = RuleRegexp[ 'Bio::Blast::Default::Report', /^BLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ], tblast = RuleRegexp[ 'Bio::Blast::Default::Report_TBlast', /^TBLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ], rpsblast = RuleRegexp[ 'Bio::Blast::RPSBlast::Report', /^RPS\-BLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ], blat = RuleRegexp[ 'Bio::Blat::Report', /^psLayout version \d+/ ], spidey = RuleRegexp[ 'Bio::Spidey::Report', /^\-\-SPIDEY version .+\-\-$/ ], hmmer = RuleRegexp[ 'Bio::HMMER::Report', /^HMMER +\d+\./ ], sim4 = RuleRegexp[ 'Bio::Sim4::Report', /^seq1 \= .*\, \d+ bp(\r|\r?\n)seq2 \= .*\, \d+ bp(\r|\r?\n)/ ], fastq = RuleRegexp[ 'Bio::Fastq', /^\@.+(?:\r|\r?\n)(?:[^\@\+].*(?:\r|\r?\n))+/ ], fastaformat = RuleProc.new('Bio::FastaFormat', 'Bio::NBRF', 'Bio::FastaNumericFormat') do |text| if /^>.+$/ =~ text case text when /^>([PF]1|[DR][LC]|N[13]|XX)\;.+/ Bio::NBRF when /^>.+$\s+(^\#.*$\s*)*^\s*\d*\s*[-a-zA-Z_\.\[\]\(\)\*\+\$]+/ Bio::FastaFormat when /^>.+$\s+^\s*\d+(\s+\d+)*\s*$/ Bio::FastaNumericFormat else false end else nil end end ] # dependencies # NCBI genbank.is_prior_to genpept # EMBL/UniProt embl.is_prior_to sptr sptr.is_prior_to prosite prosite.is_prior_to transfac # KEGG #aaindex.is_prior_to litdb #litdb.is_prior_to brite pathway_module.is_prior_to pathway pathway.is_prior_to brite brite.is_prior_to orthology orthology.is_prior_to drug drug.is_prior_to glycan glycan.is_prior_to enzyme enzyme.is_prior_to compound compound.is_prior_to reaction reaction.is_prior_to genes genes.is_prior_to genome # PDB pdb.is_prior_to het # BLAST wublast.is_prior_to wutblast wutblast.is_prior_to blast blast.is_prior_to tblast # Fastq BottomRule.is_prior_to(fastq) fastq.is_prior_to(fastaformat) # FastaFormat BottomRule.is_prior_to(fastaformat) # for debug #debug_first = RuleDebug.new('debug_first') #a.add(debug_first) #debug_first.is_prior_to(TopRule) ## for debug #debug_last = RuleDebug.new('debug_last') #a.add(debug_last) #BottomRule.is_prior_to(debug_last) #fastaformat.is_prior_to(debug_last) ## for suppressing warnings p medline, aaindex, litdb, fantom, clustal, gcg_msf, gcg_seq, blastxml, rpsblast, blat, spidey, hmmer, sim4 if false a.rehash return a end end #class AutoDetect end #class FlatFile end #module Bio bio-2.0.3/lib/bio/io/registry.rb0000644000175000017500000001660714141516614015753 0ustar nileshnilesh# # = bio/io/registry.rb - OBDA BioRegistry module # # Copyright:: Copyright (C) 2002, 2003, 2004, 2005 # Toshiaki Katayama # License:: The Ruby License # # $Id:$ # # == Description # # BioRegistry read the OBDA (Open Bio Database Access) configuration file # (seqdatabase.ini) and create a registry object. OBDA is created during # the BioHackathon held in Tucson and South Africa in 2002 as a project # independent set of protocols to access biological databases. The spec # is refined in the BioHackathon 2003 held in Singapore. # # By using the OBDA, user can access to the database by get_database method # without knowing where and how the database is stored, and each database # has the get_by_id method to obtain a sequence entry. # # Sample configuration file is distributed with BioRuby package which # consists of stanza format entries as following: # # VERSION=1.00 # # [myembl] # protocol=biofetch # location=http://www.ebi.ac.uk/cgi-bin/dbfetch # dbname=embl # # [mysp] # protocol=biosql # location=db.bioruby.org # dbname=biosql # driver=mysql # user=root # pass= # biodbname=swissprot # # The first line means that this configration file is version 1.00. # # The [myembl] line defines a user defined database name 'myembl' and # following block indicates how the database can be accessed. # In this example, the 'myembl' database is accecced via the OBDA's # BioFetch protocol to the dbfetch server at EBI, where the EMBL # database is accessed by the name 'embl' on the server side. # # The [mysp] line defines another database 'mysp' which accesses the # RDB (Relational Database) at the db.bioruby.org via the OBDA's # BioSQL protocol. This BioSQL server is running MySQL database as # its backend and stores the SwissProt database by the name 'swissprot' # and which can be accessed by 'root' user without password. # Note that the db.bioruby.org server is a dummy for the explanation. # # The configuration file is searched by the following order. # # 1. Local file name given to the Bio::Registry.new(filename). # # 2. Remote or local file list given by the environmenetal variable # 'OBDA_SEARCH_PATH', which is a '+' separated string of the # remote (HTTP) and/or local files. # # e.g. OBDA_SEARCH_PATH="http://example.org/obda.ini+$HOME/lib/myobda.ini" # # 3. Local file "$HOME/.bioinformatics/seqdatabase.ini" in the user's # home directory. # # 4. Local file "/etc/bioinformatics/seqdatabase.ini" in the system # configuration directry. # # All these configuration files are loaded. If there are database # definitions having the same name, the first one is used. # # If none of these files can be found, Bio::Registry.new will try # to use http://www.open-bio.org/registry/seqdatabase.ini file. # # == References # # * http://obda.open-bio.org/ # * http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/obda-specs/?cvsroot=obf-common # * http://www.open-bio.org/registry/seqdatabase.ini # require 'uri' require 'net/http' require 'bio/command' module Bio autoload :Fetch, 'bio/io/fetch' autoload :SQL, 'bio/io/sql' autoload :FlatFile, 'bio/io/flatfile' autoload :FlatFileIndex, 'bio/io/flatfile/index' class Registry def initialize(file = nil) @spec_version = nil @databases = Array.new read_local(file) if file env_path = ENV['OBDA_SEARCH_PATH'] if env_path and env_path.size > 0 read_env(env_path) else read_local("#{ENV['HOME']}/.bioinformatics/seqdatabase.ini") read_local("/etc/bioinformatics/seqdatabase.ini") if @databases.empty? read_remote("http://www.open-bio.org/registry/seqdatabase.ini") end end end # Version string of the first configulation file attr_reader :spec_version # List of databases (Array of Bio::Registry::DB) attr_reader :databases # Returns a dababase handle (Bio::SQL, Bio::Fetch etc.) or nil # if not found (case insensitive). # The handles should have get_by_id method. def get_database(dbname) @databases.each do |db| if db.database == dbname.downcase case db.protocol when 'biofetch' return serv_biofetch(db) when 'biosql' return serv_biosql(db) when 'flat', 'index-flat', 'index-berkeleydb' return serv_flat(db) when 'bsane-corba', 'biocorba' raise NotImplementedError when 'xembl' raise NotImplementedError end end end return nil end alias db get_database # Returns a Registry::DB object corresponding to the first dbname # entry in the registry records (case insensitive). def query(dbname) @databases.each do |db| return db if db.database == dbname.downcase end end private def read_env(path) path.split('+').each do |elem| if /:/.match(elem) read_remote(elem) else read_local(elem) end end end def read_local(file) if File.readable?(file) stanza = File.read(file) parse_stanza(stanza) end end def read_remote(url) schema, user, host, port, reg, path, = URI.split(url) Bio::Command.start_http(host, port) do |http| response = http.get(path) parse_stanza(response.body) end end def parse_stanza(stanza) return unless stanza if stanza[/.*/] =~ /VERSION\s*=\s*(\S+)/ @spec_version ||= $1 # for internal use (may differ on each file) stanza[/.*/] = '' # remove VERSION line end stanza.each_line do |line| case line when /^\[(.*)\]/ dbname = $1.downcase db = Bio::Registry::DB.new($1) @databases.push(db) when /=/ tag, value = line.chomp.split(/\s*=\s*/) @databases.last[tag] = value end end end def serv_biofetch(db) serv = Bio::Fetch.new(db.location) serv.database = db.dbname return serv end def serv_biosql(db) location, port = db.location.split(':') port = db.port unless port case db.driver when /mysql/i driver = 'Mysql' when /pg|postgres/i driver = 'Pg' when /oracle/ when /sybase/ when /sqlserver/ when /access/ when /csv/ when /informix/ when /odbc/ when /rdb/ end dbi = [ "dbi", driver, db.dbname, location ].compact.join(':') dbi += ';port=' + port if port serv = Bio::SQL.new(dbi, db.user, db.pass) # We can not manage biodbname (for name space) in BioSQL yet. # use db.biodbname here!! return serv end def serv_flat(db) path = db.location path = File.join(path, db.dbname) if db.dbname serv = Bio::FlatFileIndex.open(path) return serv end class DB def initialize(dbname) @database = dbname @property = Hash.new end attr_reader :database def method_missing(meth_id) @property[meth_id.id2name] end def []=(tag, value) @property[tag] = value end end end # class Registry end # module Bio if __FILE__ == $0 begin require 'pp' alias p pp rescue end # Usually, you don't need to pass ARGV. reg = Bio::Registry.new(ARGV[0]) p reg p reg.query('genbank_biosql') serv = reg.get_database('genbank_biofetch') puts serv.get_by_id('AA2CG') serv = reg.get_database('genbank_biosql') puts serv.get_by_id('AA2CG') serv = reg.get_database('swissprot_biofetch') puts serv.get_by_id('CYC_BOVIN') serv = reg.get_database('swissprot_biosql') puts serv.get_by_id('CYC_BOVIN') end bio-2.0.3/lib/bio/io/flatfile.rb0000644000175000017500000003247514141516614015672 0ustar nileshnilesh# # = bio/io/flatfile.rb - flatfile access wrapper class # # Copyright (C) 2001-2006 Naohisa Goto # # License:: The Ruby License # # $Id:$ # # # Bio::FlatFile is a helper and wrapper class to read a biological data file. # It acts like a IO object. # It can automatically detect data format, and users do not need to tell # the class what the data is. # module Bio # Bio::FlatFile is a helper and wrapper class to read a biological data file. # It acts like a IO object. # It can automatically detect data format, and users do not need to tell # the class what the data is. class FlatFile autoload :AutoDetect, 'bio/io/flatfile/autodetection' autoload :Splitter, 'bio/io/flatfile/splitter' autoload :BufferedInputStream, 'bio/io/flatfile/buffer' include Enumerable # # Bio::FlatFile.open(file, *arg) # Bio::FlatFile.open(dbclass, file, *arg) # # Creates a new Bio::FlatFile object to read a file or a stream # which contains _dbclass_ data. # # _dbclass_ should be a class (or module) or nil. # e.g. Bio::GenBank, Bio::FastaFormat. # # If _file_ is a filename (which doesn't have gets method), # the method opens a local file named _file_ # with File.open(filename, *arg). # # When _dbclass_ is omitted or nil is given to _dbclass_, # the method tries to determine database class # (file format) automatically. # When it fails to determine, dbclass is set to nil # and FlatFile#next_entry would fail. # You can still set dbclass using FlatFile#dbclass= method. # # * Example 1 # Bio::FlatFile.open(Bio::GenBank, "genbank/gbest40.seq") # * Example 2 # Bio::FlatFile.open(nil, "embl/est_hum17.dat") # * Example 3 # Bio::FlatFile.open("genbank/gbest40.seq") # # * Example 4 # Bio::FlatFile.open(Bio::GenBank, $stdin) # # If it is called with a block, the block will be executed with # a new Bio::FlatFile object. If filename is given, # the file is automatically closed when leaving the block. # # * Example 5 # Bio::FlatFile.open(nil, 'test4.fst') do |ff| # ff.each { |e| print e.definition, "\n" } # end # # * Example 6 # Bio::FlatFile.open('test4.fst') do |ff| # ff.each { |e| print e.definition, "\n" } # end # # Compatibility Note: # *arg is completely passed to the File.open # and you cannot specify ":raw => true" or ":raw => false". # def self.open(*arg, &block) # FlatFile.open(dbclass, file, mode, perm) # FlatFile.open(file, mode, perm) if arg.size <= 0 raise ArgumentError, 'wrong number of arguments (0 for 1)' end x = arg.shift if x.is_a?(Module) then # FlatFile.open(dbclass, filename_or_io, ...) dbclass = x elsif x.nil? then # FlatFile.open(nil, filename_or_io, ...) dbclass = nil else # FlatFile.open(filename, ...) dbclass = nil arg.unshift(x) end if arg.size <= 0 raise ArgumentError, 'wrong number of arguments (1 for 2)' end file = arg.shift # check if file is filename or IO object unless file.respond_to?(:gets) # 'file' is a filename _open_file(dbclass, file, *arg, &block) else # 'file' is a IO object ff = self.new(dbclass, file) block_given? ? (yield ff) : ff end end # Same as Bio::FlatFile.open(nil, filename_or_stream, mode, perm, options). # # * Example 1 # Bio::FlatFile.auto(ARGF) # * Example 2 # Bio::FlatFile.auto("embl/est_hum17.dat") # * Example 3 # Bio::FlatFile.auto(IO.popen("gzip -dc nc1101.flat.gz")) # def self.auto(*arg, &block) self.open(nil, *arg, &block) end # Same as FlatFile.auto(filename_or_stream, *arg).to_a # # (This method might be OBSOLETED in the future.) def self.to_a(*arg) self.auto(*arg) do |ff| raise 'cannot determine file format' unless ff.dbclass ff.to_a end end # Same as FlatFile.auto(filename, *arg), # except that it only accept filename and doesn't accept IO object. # File format is automatically determined. # # It can accept a block. # If a block is given, it returns the block's return value. # Otherwise, it returns a new FlatFile object. # def self.open_file(filename, *arg) _open_file(nil, filename, *arg) end # Same as FlatFile.open(dbclass, filename, *arg), # except that it only accept filename and doesn't accept IO object. # # It can accept a block. # If a block is given, it returns the block's return value. # Otherwise, it returns a new FlatFile object. # def self._open_file(dbclass, filename, *arg) if block_given? then BufferedInputStream.open_file(filename, *arg) do |stream| yield self.new(dbclass, stream) end else stream = BufferedInputStream.open_file(filename, *arg) self.new(dbclass, stream) end end private_class_method :_open_file # Opens URI specified as _uri_. # _uri_ must be a String or URI object. # *arg is passed to OpenURI.open_uri or URI#open. # # Like FlatFile#open, it can accept a block. # # Note that you MUST explicitly require 'open-uri'. # Because open-uri.rb modifies existing class, # it isn't required by default. # def self.open_uri(uri, *arg) if block_given? then BufferedInputStream.open_uri(uri, *arg) do |stream| yield self.new(nil, stream) end else stream = BufferedInputStream.open_uri(uri, *arg) self.new(nil, stream) end end # Executes the block for every entry in the stream. # Same as FlatFile.open(*arg) { |ff| ff.each { |entry| ... }}. # # * Example # Bio::FlatFile.foreach('test.fst') { |e| puts e.definition } # def self.foreach(*arg) self.open(*arg) do |flatfileobj| flatfileobj.each do |entry| yield entry end end end # Same as FlatFile.open, except that 'stream' should be a opened # stream object (IO, File, ..., who have the 'gets' method). # # * Example 1 # Bio::FlatFile.new(Bio::GenBank, ARGF) # * Example 2 # Bio::FlatFile.new(Bio::GenBank, IO.popen("gzip -dc nc1101.flat.gz")) # # Compatibility Note: # Now, you cannot specify ":raw => true" or ":raw => false". # Below styles are DEPRECATED. # # * Example 3 (deprecated) # # Bio::FlatFile.new(nil, $stdin, :raw=>true) # => ERROR # # Please rewrite as below. # ff = Bio::FlatFile.new(nil, $stdin) # ff.raw = true # * Example 3 in old style (deprecated) # # Bio::FlatFile.new(nil, $stdin, true) # => ERROR # # Please rewrite as below. # ff = Bio::FlatFile.new(nil, $stdin) # ff.raw = true # def initialize(dbclass, stream) # 2nd arg: IO object if stream.kind_of?(BufferedInputStream) @stream = stream else @stream = BufferedInputStream.for_io(stream) end # 1st arg: database class (or file format autodetection) if dbclass then self.dbclass = dbclass else autodetect end # @skip_leader_mode = :firsttime @firsttime_flag = true # default raw mode is false self.raw = false end # The mode how to skip leader of the data. # :firsttime :: (DEFAULT) only head of file (= first time to read) # :everytime :: everytime to read entry # nil :: never skip attr_accessor :skip_leader_mode # (DEPRECATED) IO object in the flatfile object. # # Compatibility Note: Bio::FlatFile#io is deprecated. # Please use Bio::FlatFile#to_io instead. def io warn "Bio::FlatFile#io is deprecated." @stream.to_io end # IO object in the flatfile object. # # Compatibility Note: Bio::FlatFile#io is deprecated. def to_io @stream.to_io end # Pathname, filename or URI (or nil). def path @stream.path end # Exception class to be raised when data format hasn't been specified. class UnknownDataFormatError < IOError end # Get next entry. def next_entry raise UnknownDataFormatError, 'file format auto-detection failed?' unless @dbclass if @skip_leader_mode and ((@firsttime_flag and @skip_leader_mode == :firsttime) or @skip_leader_mode == :everytime) @splitter.skip_leader end if raw then r = @splitter.get_entry else r = @splitter.get_parsed_entry end @firsttime_flag = false return nil unless r if raw then r else @entry = r @entry end end attr_reader :entry # Returns the last raw entry as a string. def entry_raw @splitter.entry end # a flag to write down entry start and end positions def entry_pos_flag @splitter.entry_pos_flag end # Sets flag to write down entry start and end positions def entry_pos_flag=(x) @splitter.entry_pos_flag = x end # start position of the last entry def entry_start_pos @splitter.entry_start_pos end # (end position of the last entry) + 1 def entry_ended_pos @splitter.entry_ended_pos end # Iterates over each entry in the flatfile. # # * Example # include Bio # ff = FlatFile.open(GenBank, "genbank/gbhtg14.seq") # ff.each_entry do |x| # puts x.definition # end def each_entry while e = self.next_entry yield e end end alias :each :each_entry # Resets file pointer to the start of the flatfile. # (similar to IO#rewind) def rewind r = (@splitter || @stream).rewind @firsttime_flag = true r end # Closes input stream. # (similar to IO#close) def close @stream.close end # Returns current position of input stream. # If the input stream is not a normal file, # the result is not guaranteed. # It is similar to IO#pos. # Note that it will not be equal to io.pos, # because FlatFile has its own internal buffer. def pos @stream.pos end # (Not recommended to use it.) # Sets position of input stream. # If the input stream is not a normal file, # the result is not guaranteed. # It is similar to IO#pos=. # Note that it will not be equal to io.pos=, # because FlatFile has its own internal buffer. def pos=(p) @stream.pos=(p) end # Returns true if input stream is end-of-file. # Otherwise, returns false. # (Similar to IO#eof?, but may not be equal to io.eof?, # because FlatFile has its own internal buffer.) def eof? @stream.eof? end # If true is given, the next_entry method returns # a entry as a text, whereas if false, returns as a parsed object. def raw=(bool) @raw = (bool ? true : false) end # If true, raw mode. attr_reader :raw # Similar to IO#gets. # Internal use only. Users should not call it directly. def gets(*arg) @stream.gets(*arg) end # Sets database class. Plese use only if autodetect fails. def dbclass=(klass) if klass then @dbclass = klass begin @splitter = @dbclass.flatfile_splitter(@dbclass, @stream) rescue NameError, NoMethodError begin splitter_class = @dbclass::FLATFILE_SPLITTER rescue NameError splitter_class = Splitter::Default end @splitter = splitter_class.new(klass, @stream) end else @dbclass = nil @splitter = nil end end # Returns database class which is automatically detected or # given in FlatFile#initialize. attr_reader :dbclass # Performs determination of database class (file format). # Pre-reads +lines+ lines for format determination (default 31 lines). # If fails, returns nil or false. Otherwise, returns database class. # # The method can be called anytime if you want (but not recommended). # This might be useful if input file is a mixture of muitiple format data. def autodetect(lines = 31, ad = AutoDetect.default) if r = ad.autodetect_flatfile(self, lines) self.dbclass = r else self.dbclass = nil unless self.dbclass end r end # Detects database class (== file format) of given file. # If fails to determine, returns nil. def self.autodetect_file(filename) self.open_file(filename).dbclass end # Detects database class (== file format) of given input stream. # If fails to determine, returns nil. # Caution: the method reads some data from the input stream, # and the data will be lost. def self.autodetect_io(io) self.new(nil, io).dbclass end # This is OBSOLETED. Please use autodetect_io(io) instead. def self.autodetect_stream(io) $stderr.print "Bio::FlatFile.autodetect_stream will be deprecated." if $VERBOSE self.autodetect_io(io) end # Detects database class (== file format) of given string. # If fails to determine, returns false or nil. def self.autodetect(text) AutoDetect.default.autodetect(text) end end #class FlatFile end #module Bio bio-2.0.3/lib/bio/io/pubmed.rb0000644000175000017500000001443714141516614015356 0ustar nileshnilesh# # = bio/io/pubmed.rb - NCBI Entrez/PubMed client module # # Copyright:: Copyright (C) 2001, 2007, 2008 Toshiaki Katayama # Copyright:: Copyright (C) 2006 Jan Aerts # License:: The Ruby License # require 'bio/io/ncbirest' module Bio # == Description # # The Bio::PubMed class provides several ways to retrieve bibliographic # information from the PubMed database at NCBI. # # Basically, two types of queries are possible: # # * searching for PubMed IDs given a query string: # * Bio::PubMed#esearch (recommended) # * Bio::PubMed#search (only retrieves top 20 hits; will be deprecated) # # * retrieving the MEDLINE text (i.e. authors, journal, abstract, ...) # given a PubMed ID # * Bio::PubMed#efetch (recommended) # * Bio::PubMed#query (will be deprecated) # * Bio::PubMed#pmfetch (will be deprecated) # # Since BioRuby 1.5, all implementations uses NCBI E-Utilities services. # The different methods within the same group still remain because # specifications of arguments and/or return values are different. # The search, query, and pmfetch will be obsoleted in the future. # # Additional information about the MEDLINE format and PubMed programmable # APIs can be found on the following websites: # # * PubMed Tutorial: # http://www.nlm.nih.gov/bsd/disted/pubmedtutorial/index.html # * E-utilities Quick Start: # http://www.ncbi.nlm.nih.gov/books/NBK25500/ # * Creating a Web Link to PubMed: # http://www.ncbi.nlm.nih.gov/books/NBK3862/ # # == Usage # # require 'bio' # # # If you don't know the pubmed ID: # Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics").each do |x| # p x # end # # Bio::PubMed.search("(genome AND analysis) OR bioinformatics").each do |x| # p x # end # # # To retrieve the MEDLINE entry for a given PubMed ID: # Bio::PubMed.efetch("10592173").each { |x| puts x } # puts Bio::PubMed.query("10592173") # puts Bio::PubMed.pmfetch("10592173") # # # To retrieve MEDLINE entries for given PubMed IDs: # Bio::PubMed.efetch([ "10592173", "14693808" ]).each { |x| puts x } # puts Bio::PubMed.query("10592173", "14693808") # returns a String # # # This can be converted into a Bio::MEDLINE object: # manuscript = Bio::PubMed.query("10592173") # medline = Bio::MEDLINE.new(manuscript) # class PubMed < Bio::NCBI::REST # Search the PubMed database by given keywords using E-Utils and returns # an array of PubMed IDs. # # For information on the possible arguments, see # http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html#PubMed # --- # *Arguments*: # * _str_: query string (required) # * _hash_: hash of E-Utils options # * _"retmode"_: "xml", "html", ... # * _"rettype"_: "medline", ... # * _"retmax"_: integer (default 100) # * _"retstart"_: integer # * _"field"_ # * _"reldate"_ # * _"mindate"_ # * _"maxdate"_ # * _"datetype"_ # *Returns*:: array of PubMed IDs or a number of results def esearch(str, hash = {}) opts = { "db" => "pubmed" } opts.update(hash) super(str, opts) end # Retrieve PubMed entry by PMID and returns MEDLINE formatted string using # entrez efetch. Multiple PubMed IDs can be provided: # Bio::PubMed.efetch(123) # Bio::PubMed.efetch([123,456,789]) # --- # *Arguments*: # * _ids_: list of PubMed IDs (required) # * _hash_: hash of E-Utils options # * _"retmode"_: "xml", "html", ... # * _"rettype"_: "medline", ... # * _"retmax"_: integer (default 100) # * _"retstart"_: integer # * _"field"_ # * _"reldate"_ # * _"mindate"_ # * _"maxdate"_ # * _"datetype"_ # *Returns*:: Array of MEDLINE formatted String def efetch(ids, hash = {}) opts = { "db" => "pubmed", "rettype" => "medline" } opts.update(hash) result = super(ids, opts) if !opts["retmode"] or opts["retmode"] == "text" result = result.split(/\n\n+/) end result end # This method will be DEPRECATED in the future. # # Search the PubMed database by given keywords using entrez query and returns # an array of PubMed IDs. # # Caution: this method returns the first 20 hits only, # # Instead, use of the 'esearch' method is strongly recomended. # # Implementation details: Since BioRuby 1.5, this method internally uses # NCBI EUtils with retmax=20 by using Bio::PubMed#efetch method. # # --- # *Arguments*: # * _id_: query string (required) # *Returns*:: array of PubMed IDs def search(str) warn "Bio::PubMed#search is now a subset of Bio::PubMed#esearch. Using Bio::PubMed#esearch is recommended." if $VERBOSE esearch(str, { "retmax" => 20 }) end # This method will be DEPRECATED in the future. # # Retrieve PubMed entry by PMID and returns MEDLINE formatted string using # entrez query. # --- # *Arguments*: # * _id_: PubMed ID (required) # *Returns*:: MEDLINE formatted String def query(*ids) warn "Bio::PubMed#query internally uses Bio::PubMed#efetch. Using Bio::PubMed#efetch is recommended." if $VERBOSE ret = efetch(ids) if ret && ret.size > 0 then ret.join("\n\n") + "\n" else "" end end # This method will be DEPRECATED in the future. # # Retrieve PubMed entry by PMID and returns MEDLINE formatted string. # # --- # *Arguments*: # * _id_: PubMed ID (required) # *Returns*:: MEDLINE formatted String def pmfetch(id) warn "Bio::PubMed#pmfetch internally use Bio::PubMed#efetch. Using Bio::PubMed#efetch is recommended." if $VERBOSE ret = efetch(id) if ret && ret.size > 0 then ret.join("\n\n") + "\n" else "" end end # The same as Bio::PubMed.new.esearch(*args). def self.esearch(*args) self.new.esearch(*args) end # The same as Bio::PubMed.new.efetch(*args). def self.efetch(*args) self.new.efetch(*args) end # This method will be DEPRECATED. Use esearch method. # # The same as Bio::PubMed.new.search(*args). def self.search(*args) self.new.search(*args) end # This method will be DEPRECATED. Use efetch method. # # The same as Bio::PubMed.new.query(*args). def self.query(*args) self.new.query(*args) end # This method will be DEPRECATED. Use efetch method. # # The same as Bio::PubMed.new.pmfetch(*args). def self.pmfetch(*args) self.new.pmfetch(*args) end end # PubMed end # Bio bio-2.0.3/lib/bio/io/fastacmd.rb0000644000175000017500000000731514141516614015661 0ustar nileshnilesh# # = bio/io/fastacmd.rb - NCBI fastacmd wrapper class # # Copyright:: Copyright (C) 2005, 2006 # Shuji SHIGENOBU , # Toshiaki Katayama , # Mitsuteru C. Nakao , # Jan Aerts # License:: The Ruby License # # require 'bio/db/fasta' require 'bio/io/flatfile' require 'bio/command' module Bio class Blast # = DESCRIPTION # # Retrieves FASTA formatted sequences from a blast database using # NCBI fastacmd command. # # This class requires 'fastacmd' command and a blast database # (formatted using the '-o' option of 'formatdb'). # # = USAGE # require 'bio' # # fastacmd = Bio::Blast::Fastacmd.new("/db/myblastdb") # # entry = fastacmd.get_by_id("sp:128U_DROME") # fastacmd.fetch("sp:128U_DROME") # fastacmd.fetch(["sp:1433_SPIOL", "sp:1432_MAIZE"]) # # fastacmd.fetch(["sp:1433_SPIOL", "sp:1432_MAIZE"]).each do |fasta| # puts fasta # end # # = REFERENCES # # * NCBI tool # ftp://ftp.ncbi.nih.gov/blast/executables/LATEST/ncbi.tar.gz # # * fastacmd.html # http://biowulf.nih.gov/apps/blast/doc/fastacmd.html # class Fastacmd include Enumerable # Database file path. attr_accessor :database # fastacmd command file path. attr_accessor :fastacmd # This method provides a handle to a BLASTable database, which you can then # use to retrieve sequences. # # Prerequisites: # * You have created a BLASTable database with the '-o T' option. # * You have the NCBI fastacmd tool installed. # # For example, suppose the original input file looks like: # >my_seq_1 # ACCGACCTCCGGAACGGATAGCCCGACCTACG # >my_seq_2 # TCCGACCTTTCCTACCGCACACCTACGCCATCAC # ... # and you've created a BLASTable database from that with the command # cd /my_dir/ # formatdb -i my_input_file -t Test -n Test -o T # then you can get a handle to this database with the command # fastacmd = Bio::Blast::Fastacmd.new("/my_dir/Test") # --- # *Arguments*: # * _database_:: path and name of BLASTable database def initialize(blast_database_file_path) @database = blast_database_file_path @fastacmd = 'fastacmd' end # Get the sequence of a specific entry in the BLASTable database. # For example: # entry = fastacmd.get_by_id("sp:128U_DROME") # --- # *Arguments*: # * _id_: id of an entry in the BLAST database # *Returns*:: a Bio::FastaFormat object def get_by_id(entry_id) fetch(entry_id).shift end # Get the sequence for a _list_ of IDs in the database. # # For example: # p fastacmd.fetch(["sp:1433_SPIOL", "sp:1432_MAIZE"]) # # This method always returns an array of Bio::FastaFormat objects, even when # the result is a single entry. # --- # *Arguments*: # * _ids_: list of IDs to retrieve from the database # *Returns*:: array of Bio::FastaFormat objects def fetch(list) if list.respond_to?(:join) entry_id = list.join(",") else entry_id = list end cmd = [ @fastacmd, '-d', @database, '-s', entry_id ] Bio::Command.call_command(cmd) do |io| io.close_write Bio::FlatFile.new(Bio::FastaFormat, io).to_a end end # Iterates over _all_ sequences in the database. # # fastacmd.each_entry do |fasta| # p [ fasta.definition[0..30], fasta.seq.size ] # end # --- # *Returns*:: a Bio::FastaFormat object for each iteration def each_entry cmd = [ @fastacmd, '-d', @database, '-D', '1' ] Bio::Command.call_command(cmd) do |io| io.close_write Bio::FlatFile.open(Bio::FastaFormat, io) do |f| f.each_entry do |entry| yield entry end end end self end alias each each_entry end # class Fastacmd end # class Blast end # module Bio bio-2.0.3/lib/bio/location.rb0000644000175000017500000006471414141516614015306 0ustar nileshnilesh# # = bio/location.rb - Locations/Location class (GenBank location format) # # Copyright:: Copyright (C) 2001, 2005 Toshiaki Katayama # 2006 Jan Aerts # 2008 Naohisa Goto # License:: The Ruby License # # $Id:$ # module Bio # == Description # # The Bio::Location class describes the position of a genomic locus. # Typically, Bio::Location objects are created automatically when the # user creates a Bio::Locations object, instead of initialized directly. # # == Usage # # location = Bio::Location.new('500..550') # puts "start=" + location.from.to_s + ";end=" + location.to.to_s # # #, or better: through Bio::Locations # locations = Bio::Locations.new('500..550') # locations.each do |location| # puts "start=" + location.from.to_s + ";end=" + location.to.to_s # end # class Location include Comparable # Parses a'location' segment, which can be 'ID:' + ('n' or 'n..m' or 'n^m' # or "seq") with '<' or '>', and returns a Bio::Location object. # # location = Bio::Location.new('500..550') # # --- # *Arguments*: # * (required) _str_: GenBank style position string (see Bio::Locations # documentation) # *Returns*:: the Bio::Location object def initialize(location = nil) if location if location =~ /:/ # (G) ID:location xref_id, location = location.split(':') end if location =~ / lt = true end if location =~ />/ gt = true end end # s : start base, e : end base => from, to case location when /^[<>]?(\d+)$/ # (A, I) n s = e = $1.to_i when /^[<>]?(\d+)\.\.[<>]?(\d+)$/ # (B, I) n..m s = $1.to_i e = $2.to_i if e - s < 0 # raise "Error: invalid range : #{location}" $stderr.puts "[Warning] invalid range : #{location}" if $DEBUG end when /^[<>]?(\d+)\^[<>]?(\d+)$/ # (C, I) n^m s = $1.to_i e = $2.to_i carat = true if e - s != 1 or e != 1 # assert n^n+1 or n^1 # raise "Error: invalid range : #{location}" $stderr.puts "[Warning] invalid range : #{location}" if $DEBUG end when /^"?([ATGCatgc]+)"?$/ # (H) literal sequence sequence = $1.downcase s = e = nil when nil ; else raise "Error: unknown location format : #{location}" end @from = s # start position of the location @to = e # end position of the location @strand = 1 # strand direction of the location # forward => 1 or complement => -1 @sequence = sequence # literal sequence of the location @lt = lt # true if the position contains '<' @gt = gt # true if the position contains '>' @xref_id = xref_id # link to the external entry as GenBank ID @carat = carat # true if the location indicates the site # between two adjoining nucleotides end # (Integer) start position of the location attr_accessor :from # (Integer) end position of the location attr_accessor :to # (Integer) strand direction of the location # (forward => 1 or complement => -1) attr_accessor :strand # (String) literal sequence of the location attr_accessor :sequence # (true, false or nil) true if the position contains '<' attr_accessor :lt # (true, false or nil) true if the position contains '>' attr_accessor :gt # (String) link to the external entry as GenBank ID attr_accessor :xref_id # (true, false or nil) true if the location indicates the site # between two adjoining nucleotides attr_accessor :carat # Complements the sequence location (i.e. alternates the strand). # Note that it is destructive method (i.e. modifies itself), # but it does not modify the "sequence" attribute. # --- # *Returns*:: the Bio::Location object def complement @strand *= -1 self # return Location object end # Replaces the sequence of the location. # --- # *Arguments*: # * (required) _sequence_: sequence to be used to replace the sequence # at the location # *Returns*:: the Bio::Location object def replace(sequence) @sequence = sequence.downcase self # return Location object end # Returns the range (from..to) of the location as a Range object. def range @from..@to end # Check where a Bio::Location object is located compared to another # Bio::Location object (mainly to facilitate the use of Comparable). # A location A is upstream of location B if the start position of # location A is smaller than the start position of location B. If # they're the same, the end positions are checked. # --- # *Arguments*: # * (required) _other location_: a Bio::Location object # *Returns*:: # * 1 if self < other location # * -1 if self > other location # * 0 if both location are the same # * nil if the argument is not a Bio::Location object def <=>(other) if ! other.kind_of?(Bio::Location) return nil end if @from.to_f < other.from.to_f return -1 elsif @from.to_f > other.from.to_f return 1 end if @to.to_f < other.to.to_f return -1 elsif @to.to_f > other.to.to_f return 1 end return 0 end # If _other_ is equal with the self, returns true. # Otherwise, returns false. # --- # *Arguments*: # * (required) _other_: any object # *Returns*:: true or false def ==(other) return true if super(other) return false unless other.instance_of?(self.class) flag = false [ :from, :to, :strand, :sequence, :lt, :gt, :xref_id, :carat ].each do |m| begin flag = (self.__send__(m) == other.__send__(m)) rescue NoMethodError, ArgumentError, NameError flag = false end break unless flag end flag end end # Location # == Description # # The Bio::Locations class is a container for Bio::Location objects: # creating a Bio::Locations object (based on a GenBank style position string) # will spawn an array of Bio::Location objects. # # == Usage # # locations = Bio::Locations.new('join(complement(500..550), 600..625)') # locations.each do |loc| # puts "class = " + loc.class.to_s # puts "range = #{loc.from}..#{loc.to} (strand = #{loc.strand})" # end # # Output would be: # # class = Bio::Location # # range = 500..550 (strand = -1) # # class = Bio::Location # # range = 600..625 (strand = 1) # # # For the following three location strings, print the span and range # ['one-of(898,900)..983', # 'one-of(5971..6308,5971..6309)', # '8050..one-of(10731,10758,10905,11242)'].each do |loc| # location = Bio::Locations.new(loc) # puts location.span # puts location.range # end # # === GenBank location descriptor classification # # ==== Definition of the position notation of the GenBank location format # # According to the GenBank manual 'gbrel.txt', position notations were # classified into 10 patterns - (A) to (J). # # 3.4.12.2 Feature Location # # The second column of the feature descriptor line designates the # location of the feature in the sequence. The location descriptor # begins at position 22. Several conventions are used to indicate # sequence location. # # Base numbers in location descriptors refer to numbering in the entry, # which is not necessarily the same as the numbering scheme used in the # published report. The first base in the presented sequence is numbered # base 1. Sequences are presented in the 5 to 3 direction. # # Location descriptors can be one of the following: # # (A) 1. A single base; # # (B) 2. A contiguous span of bases; # # (C) 3. A site between two bases; # # (D) 4. A single base chosen from a range of bases; # # (E) 5. A single base chosen from among two or more specified bases; # # (F) 6. A joining of sequence spans; # # (G) 7. A reference to an entry other than the one to which the feature # belongs (i.e., a remote entry), followed by a location descriptor # referring to the remote sequence; # # (H) 8. A literal sequence (a string of bases enclosed in quotation marks). # # ==== Description commented with pattern IDs. # # (C) A site between two residues, such as an endonuclease cleavage site, is # indicated by listing the two bases separated by a carat (e.g., 23^24). # # (D) A single residue chosen from a range of residues is indicated by the # number of the first and last bases in the range separated by a single # period (e.g., 23.79). The symbols < and > indicate that the end point # (I) of the range is beyond the specified base number. # # (B) A contiguous span of bases is indicated by the number of the first and # last bases in the range separated by two periods (e.g., 23..79). The # (I) symbols < and > indicate that the end point of the range is beyond the # specified base number. Starting and ending positions can be indicated # by base number or by one of the operators described below. # # Operators are prefixes that specify what must be done to the indicated # sequence to locate the feature. The following are the operators # available, along with their most common format and a description. # # (J) complement (location): The feature is complementary to the location # indicated. Complementary strands are read 5 to 3. # # (F) join (location, location, .. location): The indicated elements should # be placed end to end to form one contiguous sequence. # # (F) order (location, location, .. location): The elements are found in the # specified order in the 5 to 3 direction, but nothing is implied about # the rationality of joining them. # # (F) group (location, location, .. location): The elements are related and # should be grouped together, but no order is implied. # # (E) one-of (location, location, .. location): The element can be any one, # but only one, of the items listed. # # === Reduction strategy of the position notations # # * (A) Location n # * (B) Location n..m # * (C) Location n^m # * (D) (n.m) => Location n # * (E) # * one-of(n,m,..) => Location n # * one-of(n..m,..) => Location n..m # * (F) # * order(loc,loc,..) => join(loc, loc,..) # * group(loc,loc,..) => join(loc, loc,..) # * join(loc,loc,..) => Sequence # * (G) ID:loc => Location with ID # * (H) "atgc" => Location only with Sequence # * (I) # * Location n with lt flag # * >n => Location n with gt flag # * Location n..m with lt flag # * n..>m => Location n..m with gt flag # * m => Location n..m with lt, gt flag # * (J) complement(loc) => Sequence # * (K) replace(loc, str) => Location with replacement Sequence # class Locations include Enumerable # Parses a GenBank style position string and returns a Bio::Locations # object, which contains a list of Bio::Location objects. # # locations = Bio::Locations.new('join(complement(500..550), 600..625)') # # --- # *Arguments*: # * (required) _str_: GenBank style position string # *Returns*:: Bio::Locations object def initialize(position) @operator = nil if position.is_a? Array @locations = position else position = gbl_cleanup(position) # preprocessing @locations = gbl_pos2loc(position) # create an Array of Bio::Location objects end end # (Array) An Array of Bio::Location objects attr_accessor :locations # (Symbol or nil) Operator. # nil (means :join), :order, or :group (obsolete). attr_accessor :operator # Evaluate equality of Bio::Locations object. def equals?(other) if ! other.kind_of?(Bio::Locations) return nil end if self.sort == other.sort return true else return false end end # If _other_ is equal with the self, returns true. # Otherwise, returns false. # --- # *Arguments*: # * (required) _other_: any object # *Returns*:: true or false def ==(other) return true if super(other) return false unless other.instance_of?(self.class) if self.locations == other.locations and self.operator == other.operator then true else false end end # Iterates on each Bio::Location object. def each @locations.each do |x| yield(x) end end # Returns nth Bio::Location object. def [](n) @locations[n] end # Returns first Bio::Location object. def first @locations.first end # Returns last Bio::Location object. def last @locations.last end # Returns an Array containing overall min and max position [min, max] # of this Bio::Locations object. def span span_min = @locations.min { |a,b| a.from <=> b.from } span_max = @locations.max { |a,b| a.to <=> b.to } return span_min.from, span_max.to end # Similar to span, but returns a Range object min..max def range min, max = span min..max end # Returns a length of the spliced RNA. def length len = 0 @locations.each do |x| if x.sequence len += x.sequence.size else len += (x.to - x.from + 1) end end len end alias size length # Converts absolute position in the whole of the DNA sequence to relative # position in the locus. # # This method can for example be used to relate positions in a DNA-sequence # with those in RNA. In this use, the optional ':aa'-flag returns the # position of the associated amino-acid rather than the nucleotide. # # loc = Bio::Locations.new('complement(12838..13533)') # puts loc.relative(13524) # => 10 # puts loc.relative(13506, :aa) # => 3 # # --- # *Arguments*: # * (required) _position_: nucleotide position within whole of the sequence # * _:aa_: flag that lets method return position in aminoacid coordinates # *Returns*:: position within the location def relative(n, type = nil) case type when :location ; when :aa if n = abs2rel(n) (n - 1) / 3 + 1 else nil end else abs2rel(n) end end # Converts relative position in the locus to position in the whole of the # DNA sequence. # # This method can for example be used to relate positions in a DNA-sequence # with those in RNA. In this use, the optional ':aa'-flag returns the # position of the associated amino-acid rather than the nucleotide. # # loc = Bio::Locations.new('complement(12838..13533)') # puts loc.absolute(10) # => 13524 # puts loc.absolute(10, :aa) # => 13506 # # --- # *Arguments*: # * (required) _position_: nucleotide position within locus # * _:aa_: flag to be used if _position_ is a aminoacid position rather than # a nucleotide position # *Returns*:: position within the whole of the sequence def absolute(n, type = nil) case type when :location ; when :aa n = (n - 1) * 3 + 1 rel2abs(n) else rel2abs(n) end end # String representation. # # Note: In some cases, it fails to detect whether # "complement(join(...))" or "join(complement(..))", and whether # "complement(order(...))" or "order(complement(..))". # # --- # *Returns*:: String def to_s return '' if @locations.empty? complement_join = false locs = @locations if locs.size >= 2 and locs.inject(true) do |flag, loc| # check if each location is complement (flag && (loc.strand == -1) && !loc.xref_id) end and locs.inject(locs[0].from) do |pos, loc| if pos then (pos >= loc.from) ? loc.from : false else false end end then locs = locs.reverse complement_join = true end locs = locs.collect do |loc| lt = loc.lt ? '<' : '' gt = loc.gt ? '>' : '' str = if loc.from == loc.to then "#{lt}#{gt}#{loc.from.to_i}" elsif loc.carat then "#{lt}#{loc.from.to_i}^#{gt}#{loc.to.to_i}" else "#{lt}#{loc.from.to_i}..#{gt}#{loc.to.to_i}" end if loc.xref_id and !loc.xref_id.empty? then str = "#{loc.xref_id}:#{str}" end if loc.strand == -1 and !complement_join then str = "complement(#{str})" end if loc.sequence then str = "replace(#{str},\"#{loc.sequence}\")" end str end if locs.size >= 2 then op = (self.operator || 'join').to_s result = "#{op}(#{locs.join(',')})" else result = locs[0] end if complement_join then result = "complement(#{result})" end result end private # Preprocessing to clean up the position notation. def gbl_cleanup(position) # sometimes position contains white spaces... position = position.gsub(/\s+/, '') # select one base # (D) n.m # .. n m : # $1 ( $2 $3 not ) position.gsub!(/(\.{2})?\(?([<>\d]+)\.([<>\d]+)(?!:)\)?/) do |match| if $1 $1 + $3 # ..(n.m) => ..m else $2 # (?n.m)? => n end end # select the 1st location # (E) one-of() # .. one-of ($2 ,$3 ) position.gsub!(/(\.{2})?one-of\(([^,]+),([^)]+)\)/) do |match| if $1 $1 + $3.gsub(/.*,(.*)/, '\1') # ..one-of(n,m) => ..m else $2 # one-of(n,m) => n end end ## substitute order(), group() by join() # (F) group(), order() #position.gsub!(/(order|group)/, 'join') return position end # Parse position notation and create Location objects. def gbl_pos2loc(position) ary = [] case position when /^(join|order|group)\((.*)\)$/ # (F) join() if $1 != "join" then @operator = $1.intern end position = $2 join_list = [] # sub positions to join bracket = [] # position with bracket s_count = 0 # stack counter position.split(',').each do |sub_pos| case sub_pos when /\(.*\)/ join_list << sub_pos when /\(/ s_count += 1 bracket << sub_pos when /\)/ s_count -= 1 bracket << sub_pos if s_count == 0 join_list << bracket.join(',') end else if s_count == 0 join_list << sub_pos else bracket << sub_pos end end end join_list.each do |pos| ary << gbl_pos2loc(pos) end when /^complement\((.*)\)$/ # (J) complement() position = $1 gbl_pos2loc(position).reverse_each do |location| ary << location.complement end when /^replace\(([^,]+),"?([^"]*)"?\)/ # (K) replace() position = $1 sequence = $2 ary << gbl_pos2loc(position).first.replace(sequence) else # (A, B, C, G, H, I) ary << Location.new(position) end return ary.flatten end # Convert the relative position to the absolute position def rel2abs(n) return nil unless n > 0 # out of range cursor = 0 @locations.each do |x| if x.sequence len = x.sequence.size else len = x.to - x.from + 1 end if n > cursor + len cursor += len else if x.strand < 0 return x.to - (n - cursor - 1) else return x.from + (n - cursor - 1) end end end return nil # out of range end # Convert the absolute position to the relative position def abs2rel(n) return nil unless n > 0 # out of range cursor = 0 @locations.each do |x| if x.sequence len = x.sequence.size else len = x.to - x.from + 1 end if n < x.from or n > x.to then cursor += len else if x.strand < 0 then return x.to - (n - cursor - 1) else return n + cursor + 1 - x.from end end end return nil # out of range end end # Locations end # Bio # === GenBank location examples # # (C) n^m # # * [AB015179] 754^755 # * [AF179299] complement(53^54) # * [CELXOL1ES] replace(4480^4481,"") # * [ECOUW87] replace(4792^4793,"a") # * [APLPCII] replace(1905^1906,"acaaagacaccgccctacgcc") # # (D) (n.m) # # * [HACSODA] 157..(800.806) # * [HALSODB] (67.68)..(699.703) # * [AP001918] (45934.45974)..46135 # * [BACSPOJ] <180..(731.761) # * [BBU17998] (88.89)..>1122 # * [ECHTGA] complement((1700.1708)..(1715.1721)) # * [ECPAP17] complement(<22..(255.275)) # * [LPATOVGNS] complement((64.74)..1525) # * [PIP404CG] join((8298.8300)..10206,1..855) # * [BOVMHDQBY4] join(M30006.1:(392.467)..575,M30005.1:415..681,M30004.1:129..410,M30004.1:907..1017,521..534) # * [HUMMIC2A] replace((651.655)..(651.655),"") # * [HUMSOD102] order(L44135.1:(454.445)..>538,<1..181) # # (E) one-of # # * [ECU17136] one-of(898,900)..983 # * [CELCYT1A] one-of(5971..6308,5971..6309) # * [DMU17742] 8050..one-of(10731,10758,10905,11242) # * [PFU27807] one-of(623,627,632)..one-of(628,633,637) # * [BTBAINH1] one-of(845,953,963,1078,1104)..1354 # * [ATU39449] join(one-of(969..1094,970..1094,995..1094,1018..1094),1518..1587,1726..2119,2220..2833,2945..3215) # # (F) join, order, group # # * [AB037374S2] join(AB037374.1:1..177,1..807) # * [AP000001] join(complement(1..61),complement(AP000007.1:252907..253505)) # * [ASNOS11] join(AF130124.1:<2563..2964,AF130125.1:21..157,AF130126.1:12..174,AF130127.1:21..112,AF130128.1:21..162,AF130128.1:281..595,AF130128.1:661..842,AF130128.1:916..1030,AF130129.1:21..115,AF130130.1:21..165,AF130131.1:21..125,AF130132.1:21..428,AF130132.1:492..746,AF130133.1:21..168,AF130133.1:232..401,AF130133.1:475..906,AF130133.1:970..1107,AF130133.1:1176..1367,21..>128) # # * [AARPOB2] order(AF194507.1:<1..510,1..>871) # * [AF006691] order(912..1918,20410..21416) # * [AF024666] order(complement(18919..19224),complement(13965..14892)) # * [AF264948] order(27066..27076,27089..27099,27283..27314,27330..27352) # * [D63363] order(3..26,complement(964..987)) # * [ECOCURLI2] order(complement(1009..>1260),complement(AF081827.1:<1..177)) # * [S72388S2] order(join(S72388.1:757..911,S72388.1:609..1542),1..>139) # * [HEYRRE07] order(complement(1..38),complement(M82666.1:1..140),complement(M82665.1:1..176),complement(M82664.1:1..215),complement(M82663.1:1..185),complement(M82662.1:1..49),complement(M82661.1:1..133)) # * [COL11A1G34] order(AF101079.1:558..1307,AF101080.1:1..749,AF101081.1:1..898,AF101082.1:1..486,AF101083.1:1..942,AF101084.1:1..1734,AF101085.1:1..2385,AF101086.1:1..1813,AF101087.1:1..2287,AF101088.1:1..1073,AF101089.1:1..989,AF101090.1:1..5017,AF101091.1:1..3401,AF101092.1:1..1225,AF101093.1:1..1072,AF101094.1:1..989,AF101095.1:1..1669,AF101096.1:1..918,AF101097.1:1..1114,AF101098.1:1..1074,AF101099.1:1..1709,AF101100.1:1..986,AF101101.1:1..1934,AF101102.1:1..1699,AF101103.1:1..940,AF101104.1:1..2330,AF101105.1:1..4467,AF101106.1:1..1876,AF101107.1:1..2465,AF101108.1:1..1150,AF101109.1:1..1170,AF101110.1:1..1158,AF101111.1:1..1193,1..611) # # group() are found in the COMMENT field only (in GenBank 122.0) # # gbpat2.seq: FT repeat_region group(598..606,611..619) # gbpat2.seq: FT repeat_region group(8..16,1457..1464). # gbpat2.seq: FT variation group(t1,t2) # gbpat2.seq: FT variation group(t1,t3) # gbpat2.seq: FT variation group(t1,t2,t3) # gbpat2.seq: FT repeat_region group(11..202,203..394) # gbpri9.seq:COMMENT Residues reported = 'group(1..2145);'. # # (G) ID:location # # * [AARPOB2] order(AF194507.1:<1..510,1..>871) # * [AF178221S4] join(AF178221.1:<1..60,AF178222.1:1..63,AF178223.1:1..42,1..>90) # * [BOVMHDQBY4] join(M30006.1:(392.467)..575,M30005.1:415..681,M30004.1:129..410,M30004.1:907..1017,521..534) # * [HUMSOD102] order(L44135.1:(454.445)..>538,<1..181) # * [SL16SRRN1] order(<1..>267,X67092.1:<1..>249,X67093.1:<1..>233) # # (I) <, > # # * [A5U48871] <1..>318 # * [AA23SRRNP] <1..388 # * [AA23SRRNP] 503..>1010 # * [AAM5961] complement(<1..229) # * [AAM5961] complement(5231..>5598) # * [AF043934] join(<1,60..99,161..241,302..370,436..594,676..887,993..1141,1209..1329,1387..1559,1626..1646,1708..>1843) # * [BACSPOJ] <180..(731.761) # * [BBU17998] (88.89)..>1122 # * [AARPOB2] order(AF194507.1:<1..510,1..>871) # * [SL16SRRN1] order(<1..>267,X67092.1:<1..>249,X67093.1:<1..>233) # # (J) complement # # * [AF179299] complement(53^54) <= hoge insertion site etc. # * [AP000001] join(complement(1..61),complement(AP000007.1:252907..253505)) # * [AF209868S2] order(complement(1..>308),complement(AF209868.1:75..336)) # * [AP000001] join(complement(1..61),complement(AP000007.1:252907..253505)) # * [CPPLCG] complement(<1..(1093.1098)) # * [D63363] order(3..26,complement(964..987)) # * [ECHTGA] complement((1700.1708)..(1715.1721)) # * [ECOUXW] order(complement(1658..1663),complement(1636..1641)) # * [LPATOVGNS] complement((64.74)..1525) # * [AF129075] complement(join(71606..71829,75327..75446,76039..76203,76282..76353,76914..77029,77114..77201,77276..77342,78138..78316,79755..79892,81501..81562,81676..81856,82341..82490,84208..84287,85032..85122,88316..88403)) # * [ZFDYST2] join(AF137145.1:<1..18,complement(<1..99)) # # (K) replace # # * [CSU27710] replace(64,"A") # * [CELXOL1ES] replace(5256,"t") # * [ANICPC] replace(1..468,"") # * [CSU27710] replace(67..68,"GC") # * [CELXOL1ES] replace(4480^4481,"") <= ? only one case in GenBank 122.0 # * [ECOUW87] replace(4792^4793,"a") # * [CEU34893] replace(1..22,"ggttttaacccagttactcaag") # * [APLPCII] replace(1905^1906,"acaaagacaccgccctacgcc") # * [MBDR3S1] replace(1400..>9281,"") # * [HUMMHDPB1F] replace(complement(36..37),"ttc") # * [HUMMIC2A] replace((651.655)..(651.655),"") # * [LEIMDRPGP] replace(1..1554,"L01572") # * [TRBND3] replace(376..395,"atttgtgtgtggtaatta") # * [TRBND3] replace(376..395,"atttgtgtgggtaatttta") # * [TRBND3] replace(376..395,"attttgttgttgttttgttttgaatta") # * [TRBND3] replace(376..395,"atgtgtggtgaatta") # * [TRBND3] replace(376..395,"atgtgtgtggtaatta") # * [TRBND3] replace(376..395,"gatttgttgtggtaatttta") # * [MSU09460] replace(193, <= replace(193, "t") # * [HUMMAGE12X] replace(3002..3003, <= replace(3002..3003, "GC") # * [ADR40FIB] replace(510..520, <= replace(510..520, "taatcctaccg") # * [RATDYIIAAB] replace(1306..1443,"aagaacatccacggagtcagaactgggctcttcacgccggatttggcgttcgaggccattgtgaaaaagcaggcaatgcaccagcaagctcagttcctacccctgcgtggacctggttatccaggagctaatcagtacagttaggtggtcaagctgaaagagccctgtctgaaa") # bio-2.0.3/lib/bio/util/0000755000175000017500000000000014141516614014112 5ustar nileshnileshbio-2.0.3/lib/bio/util/color_scheme.rb0000644000175000017500000001170414141516614017104 0ustar nileshnilesh# # bio/util/color_scheme.rb - Popular color codings for nucleic and amino acids # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # # $Id:$ # module Bio # # bio/util/color_scheme.rb - Popular color codings for nucleic and amino acids # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # # # = Description # # The Bio::ColorScheme module contains classes that return popular color codings # for nucleic and amino acids in RGB hex format suitable for HTML code. # # The current schemes supported are: # * Buried - Buried index # * Helix - Helix propensity # * Hydropathy - Hydrophobicity # * Nucleotide - Nucelotide color coding # * Strand - Strand propensity # * Taylor - Taylor color coding # * Turn - Turn propensity # * Zappo - Zappo color coding # # Planned color schemes include: # * BLOSUM62 # * ClustalX # * Percentage Identity (PID) # # Color schemes BLOSUM62, ClustalX, and Percentage Identity are all dependent # on the alignment consensus. # # This data is currently referenced from the JalView alignment editor. # Clamp, M., Cuff, J., Searle, S. M. and Barton, G. J. (2004), # "The Jalview Java Alignment Editor," Bioinformatics, 12, 426-7 # http://www.jalview.org # # Currently the score data for things such as hydropathy, helix, turn, etc. are contained # here but should be moved to bio/data/aa once a good reference is found for these # values. # # # = Usage # # require 'bio' # # seq = 'gattaca' # scheme = Bio::ColorScheme::Zappo # postfix = '' # html = '' # seq.each_byte do |c| # color = scheme[c.chr] # prefix = %Q() # html += prefix + c.chr + postfix # end # # puts html # # # == Accessing colors # # puts Bio::ColorScheme::Buried['A'] # 00DC22 # puts Bio::ColorScheme::Buried[:c] # 00BF3F # puts Bio::ColorScheme::Buried[nil] # nil # puts Bio::ColorScheme::Buried['-'] # FFFFFF # puts Bio::ColorScheme::Buried[7] # FFFFFF # puts Bio::ColorScheme::Buried['junk'] # FFFFFF # puts Bio::ColorScheme::Buried['t'] # 00CC32 # module ColorScheme cs_location = File.join(File.dirname(File.expand_path(__FILE__)), 'color_scheme') # Score sub-classes autoload :Buried, File.join(cs_location, 'buried') autoload :Helix, File.join(cs_location, 'helix') autoload :Hydropathy, File.join(cs_location, 'hydropathy') autoload :Strand, File.join(cs_location, 'strand') autoload :Turn, File.join(cs_location, 'turn') # Simple sub-classes autoload :Nucleotide, File.join(cs_location, 'nucleotide') autoload :Taylor, File.join(cs_location, 'taylor') autoload :Zappo, File.join(cs_location, 'zappo') # Consensus sub-classes # NOTE todo # BLOSUM62 # ClustalX # PID # A very basic class template for color code referencing. class Simple #:nodoc: def self.[](x) return if x.nil? # accept symbols and any case @colors[x.to_s.upcase] end def self.colors() @colors end ####### private ####### # Example @colors = { 'A' => '64F73F', } @colors.default = 'FFFFFF' # return white by default end # A class template for color code referencing of color schemes # that are score based. This template is expected to change # when the scores are moved into bio/data/aa class Score #:nodoc: def self.[](x) return if x.nil? # accept symbols and any case @colors[x.to_s.upcase] end def self.min(x) @min end def self.max(x) @max end def self.scores() @scores end def self.colors() @colors end ######### protected ######### def self.percent_to_hex(percent) percent = percent.to_f if percent.is_a?(String) if (percent > 1.0) or (percent < 0.0) or percent.nil? raise 'Percentage must be between 0.0 and 1.0' end "%02X" % (percent * 255.0) end def self.rgb_percent_to_hex(red, green, blue) percent_to_hex(red) + percent_to_hex(green) + percent_to_hex(blue) end def self.score_to_percent(score, min, max) # .to_f to ensure every operation is float-aware percent = (score.to_f - min) / (max.to_f - min) percent = 1.0 if percent > 1.0 percent = 0.0 if percent < 0.0 percent end ####### private ####### # Example def self.score_to_rgb_hex(score, min, max) percent = score_to_percent(score, min, max) rgb_percent_to_hex(percent, 0.0, 1.0-percent) end @colors = {} @scores = { 'A' => 0.83, } @min = 0.37 @max = 1.7 @scores.each { |k,s| @colors[k] = score_to_rgb_hex(s, @min, @max) } @colors.default = 'FFFFFF' # return white by default end # TODO class Consensus #:nodoc: end end # module ColorScheme end # module Bio bio-2.0.3/lib/bio/util/contingency_table.rb0000644000175000017500000002673214141516614020140 0ustar nileshnilesh# # bio/util/contingency_table.rb - Statistical contingency table analysis for aligned sequences # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # # $Id:$ # module Bio # # bio/util/contingency_table.rb - Statistical contingency table analysis for aligned sequences # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # # = Description # # The Bio::ContingencyTable class provides basic statistical contingency table # analysis for two positions within aligned sequences. # # When ContingencyTable is instantiated the set of characters in the # aligned sequences may be passed to it as an array. This is # important since it uses these characters to create the table's rows # and columns. If this array is not passed it will use it's default # of an amino acid and nucleotide alphabet in lowercase along with the # clustal spacer '-'. # # To get data from the table the most used functions will be # chi_square and contingency_coefficient: # # ctable = Bio::ContingencyTable.new() # ctable['a']['t'] += 1 # # .. put more values into the table # puts ctable.chi_square # puts ctable.contingency_coefficient # between 0.0 and 1.0 # # The contingency_coefficient represents the degree of correlation of # change between two sequence positions in a multiple-sequence # alignment. 0.0 indicates no correlation, 1.0 is the maximum # correlation. # # # = Further Reading # # * http://en.wikipedia.org/wiki/Contingency_table # * http://www.physics.csbsju.edu/stats/exact.details.html # * Numerical Recipes in C by Press, Flannery, Teukolsky, and Vetterling # # = Usage # # What follows is an example of ContingencyTable in typical usage # analyzing results from a clustal alignment. # # require 'bio' # # seqs = {} # max_length = 0 # Bio::ClustalW::Report.new( IO.read('sample.aln') ).to_a.each do |entry| # data = entry.data.strip # seqs[entry.definition] = data.downcase # max_length = data.size if max_length == 0 # raise "Aligned sequences must be the same length!" unless data.size == max_length # end # # VERBOSE = true # puts "i\tj\tchi_square\tcontingency_coefficient" if VERBOSE # correlations = {} # # 0.upto(max_length - 1) do |i| # (i+1).upto(max_length - 1) do |j| # ctable = Bio::ContingencyTable.new() # seqs.each_value { |seq| ctable.table[ seq[i].chr ][ seq[j].chr ] += 1 } # # chi_square = ctable.chi_square # contingency_coefficient = ctable.contingency_coefficient # puts [(i+1), (j+1), chi_square, contingency_coefficient].join("\t") if VERBOSE # # correlations["#{i+1},#{j+1}"] = contingency_coefficient # correlations["#{j+1},#{i+1}"] = contingency_coefficient # Both ways are accurate # end # end # # require 'yaml' # File.new('results.yml', 'a+') { |f| f.puts correlations.to_yaml } # # # = Tutorial # # ContingencyTable returns the statistical significance of change # between two positions in an alignment. If you would like to see how # every possible combination of positions in your alignment compares # to one another you must set this up yourself. Hopefully the # provided examples will help you get started without too much # trouble. # # def lite_example(sequences, max_length, characters) # # %w{i j chi_square contingency_coefficient}.each { |x| print x.ljust(12) } # puts # # 0.upto(max_length - 1) do |i| # (i+1).upto(max_length - 1) do |j| # ctable = Bio::ContingencyTable.new( characters ) # sequences.each do |seq| # i_char = seq[i].chr # j_char = seq[j].chr # ctable.table[i_char][j_char] += 1 # end # chi_square = ctable.chi_square # contingency_coefficient = ctable.contingency_coefficient # [(i+1), (j+1), chi_square, contingency_coefficient].each { |x| print x.to_s.ljust(12) } # puts # end # end # # end # # allowed_letters = Array.new # allowed_letters = 'abcdefghijk'.split('') # # seqs = Array.new # seqs << 'abcde' # seqs << 'abcde' # seqs << 'aacje' # seqs << 'aacae' # # length_of_every_sequence = seqs[0].size # 5 letters long # # lite_example(seqs, length_of_every_sequence, allowed_letters) # # # Producing the following results: # # i j chi_square contingency_coefficient # 1 2 0.0 0.0 # 1 3 0.0 0.0 # 1 4 0.0 0.0 # 1 5 0.0 0.0 # 2 3 0.0 0.0 # 2 4 4.0 0.707106781186548 # 2 5 0.0 0.0 # 3 4 0.0 0.0 # 3 5 0.0 0.0 # 4 5 0.0 0.0 # # The position i=2 and j=4 has a high contingency coefficient # indicating that the changes at these positions are related. Note # that i and j are arbitrary, this could be represented as i=4 and j=2 # since they both refer to position two and position four in the # alignment. Here are some more examples: # # seqs = Array.new # seqs << 'abcde' # seqs << 'abcde' # seqs << 'aacje' # seqs << 'aacae' # seqs << 'akcfe' # seqs << 'akcfe' # # length_of_every_sequence = seqs[0].size # 5 letters long # # lite_example(seqs, length_of_every_sequence, allowed_letters) # # # Results: # # i j chi_square contingency_coefficient # 1 2 0.0 0.0 # 1 3 0.0 0.0 # 1 4 0.0 0.0 # 1 5 0.0 0.0 # 2 3 0.0 0.0 # 2 4 12.0 0.816496580927726 # 2 5 0.0 0.0 # 3 4 0.0 0.0 # 3 5 0.0 0.0 # 4 5 0.0 0.0 # # Here we can see that the strength of the correlation of change has # increased when more data is added with correlated changes at the # same positions. # # seqs = Array.new # seqs << 'abcde' # seqs << 'abcde' # seqs << 'kacje' # changed first letter # seqs << 'aacae' # seqs << 'akcfa' # changed last letter # seqs << 'akcfe' # # length_of_every_sequence = seqs[0].size # 5 letters long # # lite_example(seqs, length_of_every_sequence, allowed_letters) # # # Results: # # i j chi_square contingency_coefficient # 1 2 2.4 0.534522483824849 # 1 3 0.0 0.0 # 1 4 6.0 0.707106781186548 # 1 5 0.24 0.196116135138184 # 2 3 0.0 0.0 # 2 4 12.0 0.816496580927726 # 2 5 2.4 0.534522483824849 # 3 4 0.0 0.0 # 3 5 0.0 0.0 # 4 5 2.4 0.534522483824849 # # With random changes it becomes more difficult to identify correlated # changes, yet positions two and four still have the highest # correlation as indicated by the contingency coefficient. The best # way to improve the accuracy of your results, as is often the case # with statistics, is to increase the sample size. # # # = A Note on Efficiency # # ContingencyTable is slow. It involves many calculations for even a # seemingly small five-string data set. Even worse, it's very # dependent on matrix traversal, and this is done with two dimensional # hashes which dashes any hope of decent speed. # # Finally, half of the matrix is redundant and positions could be # summed with their companion position to reduce calculations. For # example the positions (5,2) and (2,5) could both have their values # added together and just stored in (2,5) while (5,2) could be an # illegal position. Also, positions (1,1), (2,2), (3,3), etc. will # never be used. # # The purpose of this package is flexibility and education. The code # is short and to the point in aims of achieving that purpose. If the # BioRuby project moves towards C extensions in the future a # professional caliber version will likely be created. # class ContingencyTable # Since we're making this math-notation friendly here is the layout of @table: # * @table[row][column] # * @table[i][j] # * @table[y][x] attr_accessor :table attr_reader :characters # Create a ContingencyTable that has characters_in_sequence.size rows and # characters_in_sequence.size columns for each row # # --- # *Arguments* # * +characters_in_sequences+: (_optional_) The allowable characters that will be present in the aligned sequences. # *Returns*:: +ContingencyTable+ object to be filled with values and calculated upon def initialize(characters_in_sequences = nil) @characters = ( characters_in_sequences or %w{a c d e f g h i k l m n p q r s t v w y - x u} ) tmp = Hash[*@characters.collect { |v| [v, 0] }.flatten] @table = Hash[*@characters.collect { |v| [v, tmp.dup] }.flatten] end # Report the sum of all values in a given row # # --- # *Arguments* # * +i+: Row to sum # *Returns*:: +Integer+ sum of row def row_sum(i) total = 0 @table[i].each { |k, v| total += v } total end # Report the sum of all values in a given column # # --- # *Arguments* # * +j+: Column to sum # *Returns*:: +Integer+ sum of column def column_sum(j) total = 0 @table.each { |row_key, column| total += column[j] } total end # Report the sum of all values in all columns. # # * This is the same thing as asking for the sum of all values in the table. # # --- # *Arguments* # * _none_ # *Returns*:: +Integer+ sum of all columns def column_sum_all total = 0 @characters.each { |j| total += column_sum(j) } total end # Report the sum of all values in all rows. # # * This is the same thing as asking for the sum of all values in the table. # # --- # *Arguments* # * _none_ # *Returns*:: +Integer+ sum of all rows def row_sum_all total = 0 @characters.each { |i| total += row_sum(i) } total end alias table_sum_all row_sum_all # Calculate _e_, the _expected_ value. # # --- # *Arguments* # * +i+: row # * +j+: column # *Returns*:: +Float+ e(sub:ij) = (r(sub:i)/N) * (c(sub:j)) def expected(i, j) (row_sum(i).to_f / table_sum_all) * column_sum(j) end # Report the chi square of the entire table # # --- # *Arguments* # * _none_ # *Returns*:: +Float+ chi square value def chi_square total = 0 @characters.each do |i| # Loop through every row in the ContingencyTable @characters.each do |j| # Loop through every column in the ContingencyTable total += chi_square_element(i, j) end end total end # Report the chi-square relation of two elements in the table # # --- # *Arguments* # * +i+: row # * +j+: column # *Returns*:: +Float+ chi-square of an intersection def chi_square_element(i, j) eij = expected(i, j) return 0 if eij == 0 ( @table[i][j] - eij )**2 / eij end # Report the contingency coefficient of the table # # --- # *Arguments* # * _none_ # *Returns*:: +Float+ contingency_coefficient of the table def contingency_coefficient c_s = chi_square Math.sqrt(c_s / (table_sum_all + c_s) ) end end # ContingencyTable end # Bio bio-2.0.3/lib/bio/util/color_scheme/0000755000175000017500000000000014141516614016554 5ustar nileshnileshbio-2.0.3/lib/bio/util/color_scheme/turn.rb0000644000175000017500000000237114141516614020074 0ustar nileshnilesh# # bio/util/color_scheme/turn.rb - Color codings for turn propensity # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # # $Id: turn.rb,v 1.4 2007/04/05 23:35:41 trevor Exp $ # require 'bio/util/color_scheme' module Bio::ColorScheme class Turn < Score #:nodoc: ######### protected ######### def self.score_to_rgb_hex(score, min, max) percent = score_to_percent(score, min, max) rgb_percent_to_hex(percent, 1.0-percent, 1.0-percent) end @colors = {} @scores = { 'A' => 0.66, 'C' => 1.19, 'D' => 1.46, 'E' => 0.74, 'F' => 0.6, 'G' => 1.56, 'H' => 0.95, 'I' => 0.47, 'K' => 1.01, 'L' => 0.59, 'M' => 0.6, 'N' => 1.56, 'P' => 1.52, 'Q' => 0.98, 'R' => 0.95, 'S' => 1.43, 'T' => 0.96, 'U' => 0, 'V' => 0.5, 'W' => 0.96, 'Y' => 1.14, 'B' => 1.51, 'X' => 1.0, 'Z' => 0.86, } @min = 0.47 @max = 1.56 @scores.each { |k,s| @colors[k] = score_to_rgb_hex(s, @min, @max) } @colors.default = 'FFFFFF' # return white by default end end bio-2.0.3/lib/bio/util/color_scheme/zappo.rb0000644000175000017500000000210714141516614020232 0ustar nileshnilesh# # bio/util/color_scheme/zappo.rb - Zappo color codings for amino acids # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # # $Id: zappo.rb,v 1.4 2007/04/05 23:35:41 trevor Exp $ # require 'bio/util/color_scheme' module Bio::ColorScheme class Zappo < Simple #:nodoc: ######### protected ######### @colors = { 'A' => 'FFAFAF', 'C' => 'FFFF00', 'D' => 'FF0000', 'E' => 'FF0000', 'F' => 'FFC800', 'G' => 'FF00FF', 'H' => 'FF0000', 'I' => 'FFAFAF', 'K' => '6464FF', 'L' => 'FFAFAF', 'M' => 'FFAFAF', 'N' => '00FF00', 'P' => 'FF00FF', 'Q' => '00FF00', 'R' => '6464FF', 'S' => '00FF00', 'T' => '00FF00', 'U' => 'FFFFFF', 'V' => 'FFAFAF', 'W' => 'FFC800', 'Y' => 'FFC800', 'B' => 'FFFFFF', 'X' => 'FFFFFF', 'Z' => 'FFFFFF', } @colors.default = 'FFFFFF' # return white by default end end bio-2.0.3/lib/bio/util/color_scheme/taylor.rb0000644000175000017500000000211314141516614020410 0ustar nileshnilesh# # bio/util/color_scheme/taylor.rb - Taylor color codings for amino acids # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # # $Id: taylor.rb,v 1.4 2007/04/05 23:35:41 trevor Exp $ # require 'bio/util/color_scheme' module Bio::ColorScheme class Taylor < Simple #:nodoc: ######### protected ######### @colors = { 'A' => 'CCFF00', 'C' => 'FFFF00', 'D' => 'FF0000', 'E' => 'FF0066', 'F' => '00FF66', 'G' => 'FF9900', 'H' => '0066FF', 'I' => '66FF00', 'K' => '6600FF', 'L' => '33FF00', 'M' => '00FF00', 'N' => 'CC00FF', 'P' => 'FFCC00', 'Q' => 'FF00CC', 'R' => '0000FF', 'S' => 'FF3300', 'T' => 'FF6600', 'U' => 'FFFFFF', 'V' => '99FF00', 'W' => '00CCFF', 'Y' => '00FFCC', 'B' => 'FFFFFF', 'X' => 'FFFFFF', 'Z' => 'FFFFFF', } @colors.default = 'FFFFFF' # return white by default end end bio-2.0.3/lib/bio/util/color_scheme/hydropathy.rb0000644000175000017500000000254114141516614021276 0ustar nileshnilesh# # bio/util/color_scheme/hydropathy.rb - Color codings for hydrophobicity # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # # $Id: hydropathy.rb,v 1.4 2007/04/05 23:35:41 trevor Exp $ # require 'bio/util/color_scheme' module Bio::ColorScheme # Hydropathy index # Kyte, J., and Doolittle, R.F., J. Mol. Biol. # 1157, 105-132, 1982 class Hydropathy < Score #:nodoc: ######### protected ######### def self.score_to_rgb_hex(score, min, max) percent = score_to_percent(score, min, max) rgb_percent_to_hex(percent, 0.0, 1.0-percent) end @colors = {} @scores = { 'A' => 1.8, 'C' => 2.5, 'D' => -3.5, 'E' => -3.5, 'F' => 2.8, 'G' => -0.4, 'H' => -3.2, 'I' => 4.5, 'K' => -3.9, 'L' => 3.8, 'M' => 1.9, 'N' => -3.5, 'P' => -1.6, 'Q' => -3.5, 'R' => -4.5, 'S' => -0.8, 'T' => -0.7, 'U' => 0.0, 'V' => 4.2, 'W' => -0.9, 'Y' => -1.3, 'B' => -3.5, 'X' => -0.49, 'Z' => -3.5, } @min = -3.9 @max = 4.5 @scores.each { |k,s| @colors[k] = score_to_rgb_hex(s, @min, @max) } @colors.default = 'FFFFFF' # return white by default end end bio-2.0.3/lib/bio/util/color_scheme/nucleotide.rb0000644000175000017500000000126214141516614021235 0ustar nileshnilesh# # bio/util/color_scheme/nucleotide.rb - Color codings for nucleotides # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # # $Id: nucleotide.rb,v 1.4 2007/04/05 23:35:41 trevor Exp $ # require 'bio/util/color_scheme' module Bio::ColorScheme class Nucleotide < Simple #:nodoc: ######### protected ######### @colors = { 'A' => '64F73F', 'C' => 'FFB340', 'G' => 'EB413C', 'T' => '3C88EE', 'U' => '3C88EE', } @colors.default = 'FFFFFF' # return white by default end NA = Nuc = Nucleotide end bio-2.0.3/lib/bio/util/color_scheme/buried.rb0000644000175000017500000000237114141516614020356 0ustar nileshnilesh# # bio/util/color_scheme/buried.rb - Color codings for buried amino acids # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # # $Id: buried.rb,v 1.4 2007/04/05 23:35:41 trevor Exp $ # require 'bio/util/color_scheme' module Bio::ColorScheme class Buried < Score #:nodoc: ######### protected ######### def self.score_to_rgb_hex(score, min, max) percent = score_to_percent(score, min, max) rgb_percent_to_hex(0.0, 1.0-percent, percent) end @colors = {} @scores = { 'A' => 0.66, 'C' => 1.19, 'D' => 1.46, 'E' => 0.74, 'F' => 0.6, 'G' => 1.56, 'H' => 0.95, 'I' => 0.47, 'K' => 1.01, 'L' => 0.59, 'M' => 0.6, 'N' => 1.56, 'P' => 1.52, 'Q' => 0.98, 'R' => 0.95, 'S' => 1.43, 'T' => 0.96, 'U' => 0, 'V' => 0.5, 'W' => 0.96, 'Y' => 1.14, 'B' => 1.51, 'X' => 1.0, 'Z' => 0.86, } @min = 0.05 @max = 4.6 @scores.each { |k,s| @colors[k] = score_to_rgb_hex(s, @min, @max) } @colors.default = 'FFFFFF' # return white by default end end bio-2.0.3/lib/bio/util/color_scheme/helix.rb0000644000175000017500000000237414141516614020220 0ustar nileshnilesh# # bio/util/color_scheme/helix.rb - Color codings for helix propensity # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # # $Id: helix.rb,v 1.4 2007/04/05 23:35:41 trevor Exp $ # require 'bio/util/color_scheme' module Bio::ColorScheme class Helix < Score #:nodoc: ######### protected ######### def self.score_to_rgb_hex(score, min, max) percent = score_to_percent(score, min, max) rgb_percent_to_hex(percent, 1.0-percent, percent) end @colors = {} @scores = { 'A' => 1.42, 'C' => 0.7, 'D' => 1.01, 'E' => 1.51, 'F' => 1.13, 'G' => 0.57, 'H' => 1.0, 'I' => 1.08, 'K' => 1.16, 'L' => 1.21, 'M' => 1.45, 'N' => 0.67, 'P' => 0.57, 'Q' => 1.11, 'R' => 0.98, 'S' => 0.77, 'T' => 0.83, 'U' => 0.0, 'V' => 1.06, 'W' => 1.08, 'Y' => 0.69, 'B' => 0.84, 'X' => 1.0, 'Z' => 1.31, } @min = 0.57 @max = 1.51 @scores.each { |k,s| @colors[k] = score_to_rgb_hex(s, @min, @max) } @colors.default = 'FFFFFF' # return white by default end end bio-2.0.3/lib/bio/util/color_scheme/strand.rb0000644000175000017500000000237514141516614020403 0ustar nileshnilesh# # bio/util/color_scheme/strand.rb - Color codings for strand propensity # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # # $Id: strand.rb,v 1.5 2007/04/05 23:35:41 trevor Exp $ # require 'bio/util/color_scheme' module Bio::ColorScheme class Strand < Score #:nodoc: ######### protected ######### def self.score_to_rgb_hex(score, min, max) percent = score_to_percent(score, min, max) rgb_percent_to_hex(percent, percent, 1.0-percent) end @colors = {} @scores = { 'A' => 0.83, 'C' => 1.19, 'D' => 0.54, 'E' => 0.37, 'F' => 1.38, 'G' => 0.75, 'H' => 0.87, 'I' => 1.6, 'K' => 0.74, 'L' => 1.3, 'M' => 1.05, 'N' => 0.89, 'P' => 0.55, 'Q' => 1.1, 'R' => 0.93, 'S' => 0.75, 'T' => 1.19, 'U' => 0.0, 'V' => 1.7, 'W' => 1.37, 'Y' => 1.47, 'B' => 0.72, 'X' => 1.0, 'Z' => 0.74, } @min = 0.37 @max = 1.7 @scores.each { |k,s| @colors[k] = score_to_rgb_hex(s, @min, @max) } @colors.default = 'FFFFFF' # return white by default end end bio-2.0.3/lib/bio/util/sirna.rb0000644000175000017500000002111714141516614015555 0ustar nileshnilesh# # = bio/util/sirna.rb - Class for designing small inhibitory RNAs # # Copyright:: Copyright (C) 2004-2013 # Itoshi NIKAIDO # Yuki NAITO # License:: The Ruby License # # $Id:$ # # == Bio::SiRNA - Designing siRNA. # # This class implements the selection rules described by Kumiko Ui-Tei # et al. (2004) and Reynolds et al. (2004). # # == Example # # seq = Bio::Sequence::NA.new(ARGF.read) # # sirna = Bio::SiRNA.new(seq) # pairs = sirna.design # # pairs.each do |pair| # puts pair.report # shrna = Bio::SiRNA::ShRNA.new(pair) # shrna.design # puts shrna.report # # puts shrna.top_strand.dna # puts shrna.bottom_strand.dna # end # # == References # # * Kumiko Ui-Tei et al. Guidelines for the selection of highly effective # siRNA sequences for mammalian and chick RNA interference. # Nucleic Acids Res. 2004 32: 936-948. # # * Angela Reynolds et al. Rational siRNA design for RNA interference. # Nat. Biotechnol. 2004 22: 326-330. # require 'bio/sequence' module Bio # = Bio::SiRNA # Designing siRNA. # # This class implements the selection rules described by Kumiko Ui-Tei # et al. (2004) and Reynolds et al. (2004). class SiRNA # A parameter of size of antisense. attr_accessor :antisense_size # A parameter of maximal %GC. attr_accessor :max_gc_percent # A parameter of minimum %GC. attr_accessor :min_gc_percent # Input is a Bio::Sequence::NA object (the target sequence). # Output is a list of Bio::SiRNA::Pair object. def initialize(seq, antisense_size = 21, max_gc_percent = 60.0, min_gc_percent = 40.0) @seq = seq.rna! @pairs = Array.new @antisense_size = antisense_size @max_gc_percent = max_gc_percent @min_gc_percent = min_gc_percent end # Ui-Tei's rule. def uitei?(target) return false if target.length != 23 # 21 nt target + 2 nt overhang seq19 = target[2..20] # 19 nt double-stranded region of siRNA # criteria i return false unless seq19[18..18].match(/[AU]/i) # criteria ii return false unless seq19[0..0].match(/[GC]/i) # criteria iii au_number = seq19[12..18].scan(/[AU]/i).size return false unless au_number >= 4 # criteria iv return false if seq19.match(/[GC]{10}/i) return true end # Reynolds' rule. def reynolds?(target) return false if target.length != 23 # 21 nt target + 2 nt overhang seq19 = target[2..20] # 19 nt double-stranded region of siRNA score = 0 # criteria I gc_number = seq19.scan(/[GC]/i).size score += 1 if (7 <= gc_number and gc_number <= 10) # criteria II au_number = seq19[14..18].scan(/[AU]/i).size score += au_number # criteria III # NotImpremented: Tm # criteria IV score += 1 if seq19[18..18].match(/A/i) # criteria V score += 1 if seq19[2..2].match(/A/i) # criteria VI score += 1 if seq19[9..9].match(/[U]/i) # criteria VII score -= 1 if seq19[18..18].match(/[GC]/i) # criteria VIII score -= 1 if seq19[12..12].match(/G/i) if score >= 6 return score else return false end end # same as design('uitei'). def uitei design('uitei') end # same as design('reynolds'). def reynolds design('reynolds') end # rule can be one of 'uitei' (default) and 'reynolds'. def design(rule = 'uitei') @target_size = @antisense_size + 2 target_start = 0 @seq.window_search(@target_size) do |target| antisense = target.subseq(1, @target_size - 2).complement.rna sense = target.subseq(3, @target_size) target_start += 1 target_stop = target_start + @target_size antisense_gc_percent = antisense.gc_percent next if antisense_gc_percent > @max_gc_percent next if antisense_gc_percent < @min_gc_percent case rule when 'uitei' next unless uitei?(target) when 'reynolds' next unless reynolds?(target) else raise NotImplementedError end pair = Bio::SiRNA::Pair.new(target, sense, antisense, target_start, target_stop, rule, antisense_gc_percent) @pairs.push(pair) end return @pairs end # = Bio::SiRNA::Pair class Pair attr_accessor :target attr_accessor :sense attr_accessor :antisense attr_accessor :start attr_accessor :stop attr_accessor :rule attr_accessor :gc_percent def initialize(target, sense, antisense, start, stop, rule, gc_percent) @target = target @sense = sense @antisense = antisense @start = start @stop = stop @rule = rule @gc_percent = gc_percent end # human readable report def report report = "### siRNA\n" report << 'Start: ' + @start.to_s + "\n" report << 'Stop: ' + @stop.to_s + "\n" report << 'Rule: ' + @rule.to_s + "\n" report << 'GC %: ' + @gc_percent.to_s + "\n" report << 'Target: ' + @target.upcase + "\n" report << 'Sense: ' + ' ' + @sense.upcase + "\n" report << 'Antisense: ' + @antisense.reverse.upcase + "\n" end # computer parsable report #def to_s # [ @antisense, @start, @stop ].join("\t") #end end # class Pair # = Bio::SiRNA::ShRNA # Designing shRNA. class ShRNA # Bio::Sequence::NA attr_accessor :top_strand # Bio::Sequence::NA attr_accessor :bottom_strand # Input is a Bio::SiRNA::Pair object (the target sequence). def initialize(pair) @pair = pair end # only the 'BLOCK-iT' rule is implemented for now. def design(method = 'BLOCK-iT') case method when 'BLOCK-iT' block_it else raise NotImplementedError end end # same as design('BLOCK-iT'). # method can be one of 'piGENE' (default) and 'BLOCK-iT'. def block_it(method = 'piGENE') top = Bio::Sequence::NA.new('CACC') # top_strand_shrna_overhang bot = Bio::Sequence::NA.new('AAAA') # bottom_strand_shrna_overhang fwd = @pair.sense rev = @pair.sense.complement case method when 'BLOCK-iT' # From BLOCK-iT's manual loop_fwd = Bio::Sequence::NA.new('CGAA') loop_rev = loop_fwd.complement when 'piGENE' # From piGENE document loop_fwd = Bio::Sequence::NA.new('GTGTGCTGTCC') loop_rev = loop_fwd.complement else raise NotImplementedError end if /^G/i =~ fwd @top_strand = top + fwd + loop_fwd + rev @bottom_strand = bot + fwd + loop_rev + rev else @top_strand = top + 'G' + fwd + loop_fwd + rev @bottom_strand = bot + fwd + loop_rev + rev + 'C' end end # human readable report def report # raise NomethodError for compatibility raise NoMethodError if !defined?(@top_strand) || !@top_strand report = "### shRNA\n" report << "Top strand shRNA (#{@top_strand.length} nt):\n" report << " 5'-#{@top_strand.upcase}-3'\n" report << "Bottom strand shRNA (#{@bottom_strand.length} nt):\n" report << " 3'-#{@bottom_strand.reverse.upcase}-5'\n" end end # class ShRNA end # class SiRNA end # module Bio =begin = ChangeLog 2013/04/03 Yuki NAITO Modified siRNA design rules: - Ui-Tei's rule: - Restricted target length to 23 nt (21 nt plus 2 nt overhang) for selecting functional siRNAs. - Avoided contiguous GCs 10 nt or more. (not 9 nt or more) - Reynolds' rule: - Restricted target length to 23 nt (21 nt plus 2 nt overhang) for selecting functional siRNAs. - Reynolds' rule does not require to fulfill all the criteria simultaneously. Total score of eight criteria is calculated and used for the siRNA efficacy prediction. This change may significantly alter an output. - Returns total score of eight criteria for functional siRNA, instead of returning 'true'. - Returns 'false' for non-functional siRNA, as usual. 2005/03/21 Itoshi NIKAIDO Bio::SiRNA#ShRNA_designer method was changed design method. 2004/06/25 Bio::ShRNA class was added. 2004/06/17 Itoshi NIKAIDO We can use shRNA loop sequence from piGene document. =end bio-2.0.3/lib/bio/util/restriction_enzyme/0000755000175000017500000000000014141516614020046 5ustar nileshnileshbio-2.0.3/lib/bio/util/restriction_enzyme/analysis_basic.rb0000644000175000017500000002032714141516614023363 0ustar nileshnilesh# # bio/util/restriction_enzyme/analysis_basic.rb - Does the work of fragmenting the DNA from the enzymes # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # require 'set' # for method create_enzyme_actions require 'bio/sequence' module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme class Analysis # See cut_without_permutations instance method def self.cut_without_permutations( sequence, *args ) self.new.cut_without_permutations( sequence, *args ) end # See main documentation for Bio::RestrictionEnzyme # # Bio::RestrictionEnzyme.cut is preferred over this! # # USE AT YOUR OWN RISK # # This is a simpler version of method +cut+. +cut+ takes into account # permutations of cut variations based on competitiveness of enzymes for an # enzyme cutsite or enzyme bindsite on a sequence. This does not take into # account those possibilities and is therefore faster, but less likely to be # accurate. # # This code is mainly included as an academic example # without having to wade through the extra layer of complexity added by the # permutations. # # Example: # # FIXME add output # # Bio::RestrictionEnzyme::Analysis.cut_without_permutations('gaattc', 'EcoRI') # # _same as:_ # # Bio::RestrictionEnzyme::Analysis.cut_without_permutations('gaattc', 'g^aattc') # --- # *Arguments* # * +sequence+: +String+ kind of object that will be used as a nucleic acid sequence. # * +args+: Series of enzyme names, enzymes sequences with cut marks, or RestrictionEnzyme objects. # *Returns*:: Bio::RestrictionEnzyme::Fragments object populated with Bio::RestrictionEnzyme::Fragment objects. (Note: unrelated to Bio::RestrictionEnzyme::Range::SequenceRange::Fragments) def cut_without_permutations( sequence, *args ) return fragments_for_display( {} ) if !sequence.kind_of?(String) or sequence.empty? sequence = Bio::Sequence::NA.new( sequence ) # create_enzyme_actions returns two seperate array elements, they're not # needed separated here so we put them into one array enzyme_actions = create_enzyme_actions( sequence, *args ).flatten return fragments_for_display( {} ) if enzyme_actions.empty? # Primary and complement strands are both measured from '0' to 'sequence.size-1' here sequence_range = Bio::RestrictionEnzyme::Range::SequenceRange.new( 0, 0, sequence.size-1, sequence.size-1 ) # Add the cuts to the sequence_range from each enzyme_action enzyme_actions.each do |enzyme_action| enzyme_action.cut_ranges.each do |cut_range| sequence_range.add_cut_range(cut_range) end end # Fill in the source sequence for sequence_range so it knows what bases # to use sequence_range.fragments.primary = sequence sequence_range.fragments.complement = sequence.forward_complement # Format the fragments for the user fragments_for_display( {0 => sequence_range} ) end ######### protected ######### # Take the fragments from SequenceRange objects generated from add_cut_range # and return unique results as a Bio::RestrictionEnzyme::Analysis::Fragment object. # # --- # *Arguments* # * +hsh+: +Hash+ Keys are a permutation ID, if any. Values are SequenceRange objects that have cuts applied. # *Returns*:: Bio::RestrictionEnzyme::Analysis::Fragments object populated with Bio::RestrictionEnzyme::Analysis::Fragment objects. def fragments_for_display( hsh, view_ranges=false ) ary = Fragments.new return ary unless hsh hsh.each do |permutation_id, sequence_range| sequence_range.fragments.for_display.each do |fragment| if view_ranges ary << Bio::RestrictionEnzyme::Fragment.new(fragment.primary, fragment.complement, fragment.p_left, fragment.p_right, fragment.c_left, fragment.c_right) else ary << Bio::RestrictionEnzyme::Fragment.new(fragment.primary, fragment.complement) end end end ary.uniq! unless view_ranges ary end # Creates an array of EnzymeActions based on the DNA sequence and supplied enzymes. # # --- # *Arguments* # * +sequence+: The string of DNA to match the enzyme recognition sites against # * +args+:: The enzymes to use. # *Returns*:: +Array+ with the first element being an array of EnzymeAction objects that +sometimes_cut+, and are subject to competition. The second is an array of EnzymeAction objects that +always_cut+ and are not subject to competition. def create_enzyme_actions( sequence, *args ) all_enzyme_actions = [] args.each do |enzyme| enzyme = Bio::RestrictionEnzyme.new(enzyme) unless enzyme.class == Bio::RestrictionEnzyme::DoubleStranded # make sure pattern is the proper size # for more info see the internal documentation of # Bio::RestrictionEnzyme::DoubleStranded.create_action_at pattern = Bio::Sequence::NA.new( Bio::RestrictionEnzyme::DoubleStranded::AlignedStrands.align( enzyme.primary, enzyme.complement ).primary ).to_re find_match_locations( sequence, pattern ).each do |offset| all_enzyme_actions << enzyme.create_action_at( offset ) end end # FIXME VerticalCutRange should really be called VerticalAndHorizontalCutRange # * all_enzyme_actions is now full of EnzymeActions at specific locations across # the sequence. # * all_enzyme_actions will now be examined to see if any EnzymeActions may # conflict with one another, and if they do they'll be made note of in # indicies_of_sometimes_cut. They will then be remove FIXME # * a conflict occurs if another enzyme's bind site is compromised do due # to another enzyme's cut. Enzyme's bind sites may overlap and not be # competitive, however neither bind site may be part of the other # enzyme's cut or else they do become competitive. # # Take current EnzymeAction's entire bind site and compare it to all other # EzymeAction's cut ranges. Only look for vertical cuts as boundaries # since trailing horizontal cuts would have no influence on the bind site. # # If example Enzyme A makes this cut pattern (cut range 2..5): # # 0 1 2|3 4 5 6 7 # +-----+ # 0 1 2 3 4 5|6 7 # # Then the bind site (and EnzymeAction range) for Enzyme B would need it's # right side to be at index 2 or less, or it's left side to be 6 or greater. competition_indexes = Set.new all_enzyme_actions[0..-2].each_with_index do |current_enzyme_action, i| next if competition_indexes.include? i next if current_enzyme_action.cut_ranges.empty? # no cuts, some enzymes are like this (ex. CjuI) all_enzyme_actions[i+1..-1].each_with_index do |comparison_enzyme_action, j| j += (i + 1) next if competition_indexes.include? j next if comparison_enzyme_action.cut_ranges.empty? # no cuts if (current_enzyme_action.right <= comparison_enzyme_action.cut_ranges.min_vertical) or (current_enzyme_action.left > comparison_enzyme_action.cut_ranges.max_vertical) # no conflict else competition_indexes += [i, j] # merge both indexes into the flat set end end end sometimes_cut = all_enzyme_actions.values_at( *competition_indexes ) always_cut = all_enzyme_actions always_cut.delete_if {|x| sometimes_cut.include? x } [sometimes_cut, always_cut] end # Returns an +Array+ of the match indicies of a +RegExp+ to a string. # # Example: # # find_match_locations('abccdefeg', /[ce]/) # => [2,3,5,7] # # --- # *Arguments* # * +string+: The string to scan # * +re+: A RegExp to use # *Returns*:: +Array+ with indicies of match locations def find_match_locations( string, re ) md = string.match( re ) locations = [] counter = 0 while md # save the match index relative to the original string locations << (counter += md.begin(0)) # find the next match md = string[ (counter += 1)..-1 ].match( re ) end locations end end # Analysis end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/single_strand_complement.rb0000644000175000017500000000127514141516614025457 0ustar nileshnilesh# # bio/util/restriction_enzyme/single_strand_complement.rb - Single strand restriction enzyme sequence in complement orientation # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme # A single strand of restriction enzyme sequence pattern with a 3' to 5' orientation. # class SingleStrandComplement < SingleStrand # Orientation of the strand, 3' to 5' def orientation; [3, 5]; end end # SingleStrandComplement end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/single_strand/0000755000175000017500000000000014141516614022702 5ustar nileshnileshbio-2.0.3/lib/bio/util/restriction_enzyme/single_strand/cut_locations_in_enzyme_notation.rb0000644000175000017500000000767114141516614032100 0ustar nileshnilesh# # bio/util/restriction_enzyme/single_strand/cut_locations_in_enzyme_notation.rb - The cut locations, in enzyme notation # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme class SingleStrand # Stores the cut location in thier enzyme index notation # # May be initialized with a series of cuts or an enzyme pattern marked # with cut symbols. # # Enzyme index notation:: 1.._n_, value before 1 is -1 # # example:: [-3][-2][-1][1][2][3][4][5] # # Negative values are used to indicate when a cut may occur at a specified # distance before the sequence begins. This would be padded with 'n' # nucleotides to represent wildcards. # # Notes: # * 0 is invalid as it does not refer to any index # * +nil+ is not allowed here as it has no meaning # * +nil+ values are kept track of in DoubleStranded::CutLocations as they # need a reference point on the correlating strand. In # DoubleStranded::CutLocations +nil+ represents no cut or a partial # digestion. # class CutLocationsInEnzymeNotation < Array include CutSymbol extend CutSymbol # First cut, in enzyme-index notation attr_reader :min # Last cut, in enzyme-index notation attr_reader :max # Constructor for CutLocationsInEnzymeNotation # # --- # *Arguments* # * +a+: Locations of cuts represented as a string with cuts or an array of values # Examples: # * n^ng^arraxt^n # * 2 # * -1, 5 # * [-1, 5] # *Returns*:: nothing def initialize(*a) a.flatten! # in case an array was passed as an argument if a.size == 1 and a[0].kind_of? String and a[0] =~ re_cut_symbol # Initialize with a cut symbol pattern such as 'n^ng^arraxt^n' s = a[0] a = [] i = -( s.tr(cut_symbol, '') =~ %r{[^n]} ) # First character that's not 'n' s.each_byte { |c| (a << i; next) if c.chr == cut_symbol; i += 1 } a.collect! { |n| n <= 0 ? n-1 : n } # 0 is not a valid enzyme index, decrement from 0 and all negative else a.collect! { |n| n.to_i } # Cut locations are always integers end validate_cut_locations( a ) super(a) self.sort! @min = self.first @max = self.last self.freeze end # Transform the cut locations from enzyme index notation to 0-based index # notation. # # input -> output # [ 1, 2, 3 ] -> [ 0, 1, 2 ] # [ 1, 3, 5 ] -> [ 0, 2, 4 ] # [ -1, 1, 2 ] -> [ 0, 1, 2 ] # [ -2, 1, 3 ] -> [ 0, 2, 4 ] # # --- # *Arguments* # * _none_ # *Returns*:: +Array+ of cuts in 0-based index notation def to_array_index return [] if @min == nil if @min < 0 calc = lambda do |n| n -= 1 unless n < 0 n + @min.abs end else calc = lambda { |n| n - 1 } end self.collect(&calc) end ######### protected ######### def validate_cut_locations( input_cut_locations ) unless input_cut_locations == input_cut_locations.uniq err = "The cut locations supplied contain duplicate values. Redundant / undefined meaning.\n" err += "cuts: #{input_cut_locations.inspect}\n" err += "unique: #{input_cut_locations.uniq.inspect}" raise ArgumentError, err end if input_cut_locations.include?(nil) err = "The cut locations supplied contained a nil. nil has no index for enzyme notation, alternative meaning is 'no cut'.\n" err += "cuts: #{input_cut_locations.inspect}" raise ArgumentError, err end if input_cut_locations.include?(0) err = "The cut locations supplied contained a '0'. '0' has no index for enzyme notation, alternative meaning is 'no cut'.\n" err += "cuts: #{input_cut_locations.inspect}" raise ArgumentError, err end end end # CutLocationsInEnzymeNotation end # SingleStrand end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/range/0000755000175000017500000000000014141516614021142 5ustar nileshnileshbio-2.0.3/lib/bio/util/restriction_enzyme/range/sequence_range/0000755000175000017500000000000014141516614024126 5ustar nileshnileshbio-2.0.3/lib/bio/util/restriction_enzyme/range/sequence_range/fragment.rb0000644000175000017500000000237414141516614026264 0ustar nileshnilesh# # bio/util/restriction_enzyme/range/sequence_range/fragment.rb - # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme class Range class SequenceRange class Fragment attr_reader :size def initialize( primary_bin, complement_bin ) @primary_bin = primary_bin @complement_bin = complement_bin end DisplayFragment = Struct.new(:primary, :complement, :p_left, :p_right, :c_left, :c_right) def for_display(p_str=nil, c_str=nil) df = DisplayFragment.new df.primary = '' df.complement = '' both_bins = @primary_bin + @complement_bin both_bins.each do |item| @primary_bin.include?(item) ? df.primary << p_str[item] : df.primary << ' ' @complement_bin.include?(item) ? df.complement << c_str[item] : df.complement << ' ' end df.p_left = @primary_bin.first df.p_right = @primary_bin.last df.c_left = @complement_bin.first df.c_right = @complement_bin.last df end end # Fragment end # SequenceRange end # Range end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/range/sequence_range/fragments.rb0000644000175000017500000000164714141516614026451 0ustar nileshnilesh# # bio/util/restriction_enzyme/analysis/fragments.rb - # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme class Range class SequenceRange class Fragments < Array attr_accessor :primary attr_accessor :complement def initialize(primary, complement) @primary = primary @complement = complement end DisplayFragment = Struct.new(:primary, :complement) def for_display(p_str=nil, c_str=nil) p_str ||= @primary c_str ||= @complement pretty_fragments = [] self.each { |fragment| pretty_fragments << fragment.for_display(p_str, c_str) } pretty_fragments end end # Fragments end # SequenceRange end # Range end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/range/sequence_range/calculated_cuts.rb0000644000175000017500000002203214141516614027611 0ustar nileshnilesh# # bio/util/restriction_enzyme/range/sequence_range/calculated_cuts.rb - # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme class Range class SequenceRange # cc = CalculatedCuts.new(@size) # cc.add_cuts_from_cut_ranges(@cut_ranges) # cc.remove_incomplete_cuts # # 1 2 3 4 5 6 7 # G A|T T A C A # +-----+ # C T A A T|G T # 1 2 3 4 5 6 7 # # Primary cut = 2 # Complement cut = 5 # Horizontal cuts = 3, 4, 5 # class CalculatedCuts include CutSymbol include StringFormatting # +Array+ of vertical cuts on the primary strand in 0-based index notation def vc_primary #$stderr.puts caller[0].inspect ###DEBUG @vc_primary.to_a end # Returns the same contents as vc_primary, but returns original data # structure used in the class. def vc_primary_as_original_class @vc_primary end # +Array+ of vertical cuts on the complementary strand in 0-based index notation def vc_complement #$stderr.puts caller[0].inspect ###DEBUG @vc_complement.to_a end # Returns the same contents as vc_complement, but returns original data # structure used in the class. def vc_complement_as_original_class @vc_complement end # +Array+ of horizontal cuts between strands in 0-based index notation def hc_between_strands #$stderr.puts caller[0].inspect ###DEBUG @hc_between_strands.to_a end # Returns the same contents as hc_between_strands, but returns original data # structure used in the class. def hc_between_strands_as_original_class @hc_between_strands end # Set to +true+ if the fragment CalculatedCuts is working on is circular attr_accessor :circular #-- ## An +Array+ with the primary strand with vertical cuts, the horizontal cuts, and the complementary strand with vertical cuts. #attr_reader :strands_for_display #++ # If +false+ the strands_for_display method needs to be called to update the contents # of @strands_for_display. Becomes out of date whenever add_cuts_from_cut_ranges is called. attr_reader :strands_for_display_current # Size of the sequence being digested. attr_reader :size def initialize(size=nil, circular=false) @size = size @circular = circular @vc_primary = SortedNumArray[] @vc_complement = SortedNumArray[] @hc_between_strands = SortedNumArray[] end # Accepts an +Array+ of CutRange type objects and applies them to # @vc_complement, @vc_primary, and @hc_between_strands. # # --- # *Arguments* # * +cut_ranges+: An +Array+ of HorizontalCutRange or VerticalCutRange objects # *Returns*:: nothing def add_cuts_from_cut_ranges(cut_ranges) @strands_for_display_current = false @vc_primary = @vc_primary.dup @vc_complement = @vc_complement.dup cut_ranges.each do |cut_range| @vc_primary.concat [cut_range.p_cut_left, cut_range.p_cut_right] @vc_complement.concat [cut_range.c_cut_left, cut_range.c_cut_right] # Add horizontal cut ranges. This may happen from cuts made inbetween a # VerticalCutRange or may be specifically defined by a HorizontalCutRange. if cut_range.class == VerticalCutRange ( cut_range.min + 1 ).upto( cut_range.max ){|i| @hc_between_strands << i} if cut_range.min < cut_range.max elsif cut_range.class == HorizontalCutRange ( cut_range.hcuts.first ).upto( cut_range.hcuts.last ){|i| @hc_between_strands << i} end end clean_all #return end # There may be incomplete cuts made, this method removes the cuts that don't # create sub-sequences for easier processing. # # For example, stray horizontal cuts that do not end with a left # and right separation: # # G A T T A C A # +-- --- # C T|A A T G T # # Or stray vertical cuts: # # G A T T A C A # +-- + # C T|A A T|G T # # However note that for non-circular sequences this would be a successful # cut which would result in a floating 'GT' sub-sequence: # # G A T T A C A # +--- # C T A A T|G T # # Blunt cuts are also complete cuts. # --- # *Arguments* # * +size+: (_optional_) Size of the sequence being digested. Defined here or during initalization of CalculatedCuts. # *Returns*:: nothing def remove_incomplete_cuts(size=nil) @strands_for_display_current = false @size = size if size raise IndexError, "Size of the strand must be provided here or during initalization." if !@size.kind_of?(Integer) and not @circular vcuts = @vc_primary + @vc_complement hcuts = @hc_between_strands last_index = @size - 1 good_hcuts = SortedNumArray[] potential_hcuts = [] if @circular # NOTE # if it's circular we should start at the beginning of a cut for orientation, # scan for it, hack off the first set of hcuts and move them to the back else vcuts.unshift(-1) unless vcuts.include?(-1) vcuts.push(last_index) unless vcuts.include?(last_index) end hcuts.each do |hcut| raise IndexError if hcut < -1 or hcut > last_index # skipped a nucleotide potential_hcuts.clear if !potential_hcuts.empty? and (hcut - potential_hcuts.last).abs > 1 if potential_hcuts.empty? if vcuts.include?( hcut ) and vcuts.include?( hcut - 1 ) good_hcuts << hcut elsif vcuts.include?( hcut - 1 ) potential_hcuts << hcut end else if vcuts.include?( hcut ) good_hcuts.concat(potential_hcuts) good_hcuts << hcut potential_hcuts.clear else potential_hcuts << hcut end end end check_vc = lambda do |vertical_cuts, opposing_vcuts| # opposing_vcuts is here only to check for blunt cuts, so there shouldn't # be any out-of-order problems with this good_vc = SortedNumArray[] vertical_cuts.each { |vc| good_vc << vc if good_hcuts.include?( vc ) or good_hcuts.include?( vc + 1 ) or opposing_vcuts.include?( vc ) } good_vc end @vc_primary = check_vc.call(@vc_primary, @vc_complement) @vc_complement = check_vc.call(@vc_complement, @vc_primary) @hc_between_strands = good_hcuts clean_all end # Sets @strands_for_display_current to +true+ and populates @strands_for_display. # # --- # *Arguments* # * +str1+: (_optional_) For displaying a primary strand. If +nil+ a numbered sequence will be used in place. # * +str2+: (_optional_) For displaying a complementary strand. If +nil+ a numbered sequence will be used in place. # * +vcp+: (_optional_) An array of vertical cut locations on the primary strand. If +nil+ the contents of @vc_primary is used. # * +vcc+: (_optional_) An array of vertical cut locations on the complementary strand. If +nil+ the contents of @vc_complementary is used. # * +hc+: (_optional_) An array of horizontal cut locations between strands. If +nil+ the contents of @hc_between_strands is used. # *Returns*:: +Array+ An array with the primary strand with vertical cuts, the horizontal cuts, and the complementary strand with vertical cuts. # def strands_for_display(str1 = nil, str2 = nil, vcp=nil, vcc=nil, hc=nil) return @strands_for_display if @strands_for_display_current vcs = '|' # Vertical cut symbol hcs = '-' # Horizontal cut symbol vhcs = '+' # Intersection of vertical and horizontal cut symbol num_txt_repeat = lambda { num_txt = '0123456789'; (num_txt * (@size.div(num_txt.size) + 1))[0..@size-1] } (str1 == nil) ? a = num_txt_repeat.call : a = str1.dup (str2 == nil) ? b = num_txt_repeat.call : b = str2.dup if vcp and !vcp.is_a?(SortedNumArray) then vcp = SortedNumArray.new.concat(vcp) end if vcc and !vcc.is_a?(SortedNumArray) then vcc = SortedNumArray.new.concat(vcc) end if hc and !hc.is_a?(SortedNumArray) then hc = SortedNumArray.new.concat(hc) end vcp = @vc_primary if vcp==nil vcc = @vc_complement if vcc==nil hc = @hc_between_strands if hc==nil vcp.reverse_each { |c| a.insert(c+1, vcs) } vcc.reverse_each { |c| b.insert(c+1, vcs) } between = ' ' * @size hc.each {|hcut| between[hcut,1] = hcs } s_a = add_spacing(a, vcs) s_b = add_spacing(b, vcs) s_bet = add_spacing(between) # NOTE watch this for circular i = 0 0.upto( s_a.size-1 ) do if (s_a[i,1] == vcs) or (s_b[i,1] == vcs) s_bet[i] = vhcs elsif i != 0 and s_bet[i-1,1] == hcs and s_bet[i+1,1] == hcs s_bet[i] = hcs end i+=1 end @strands_for_display_current = true @strands_for_display = [s_a, s_bet, s_b] end ######### protected ######### # remove nil values, remove duplicate values, and # sort @vc_primary, @vc_complement, and @hc_between_strands def clean_all [@vc_primary, @vc_complement, @hc_between_strands].collect { |a| a.delete(nil); a.uniq!; a.sort! } end end # CalculatedCuts end # SequenceRange end # Range end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/range/cut_range.rb0000644000175000017500000000110314141516614023431 0ustar nileshnilesh# # bio/util/restriction_enzyme/range/cut_range.rb - Abstract base class for HorizontalCutRange and VerticalCutRange # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme class Range # Abstract base class for HorizontalCutRange and VerticalCutRange # class CutRange end # CutRange end # Range end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/range/horizontal_cut_range.rb0000644000175000017500000000341714141516614025714 0ustar nileshnilesh# # bio/util/restriction_enzyme/range/horizontal_cut_range.rb - # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme class Range class HorizontalCutRange < CutRange attr_reader :p_cut_left, :p_cut_right attr_reader :c_cut_left, :c_cut_right attr_reader :min, :max attr_reader :hcuts def initialize( left, right=left ) raise "left > right" if left > right # The 'range' here is actually off by one on the left # side in relation to a normal CutRange, so using the normal # variables from CutRange would result in bad behavior. # # See below - the first horizontal cut is the primary cut plus one. # # 1 2 3 4 5 6 7 # G A|T T A C A # +-----+ # C T A A T|G T # 1 2 3 4 5 6 7 # # Primary cut = 2 # Complement cut = 5 # Horizontal cuts = 3, 4, 5 @p_cut_left = nil @p_cut_right = nil @c_cut_left = nil @c_cut_right = nil @min = left # NOTE this used to be 'nil', make sure all tests work @max = right # NOTE this used to be 'nil', make sure all tests work @range = (@min..@max) unless @min == nil or @max == nil # NOTE this used to be 'nil', make sure all tests work @hcuts = (left..right) end # Check if a location falls within the minimum or maximum values of this # range. # # --- # *Arguments* # * +i+: Location to check if it is included in the range # *Returns*:: +true+ _or_ +false+ def include?(i) @range.include?(i) end end # HorizontalCutRange end # Range end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/range/cut_ranges.rb0000644000175000017500000000231114141516614023616 0ustar nileshnilesh# # bio/util/restriction_enzyme/range/cut_ranges.rb - Container for many CutRange objects or CutRange child objects. # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme class Range # Container for many CutRange objects or CutRange child objects. Inherits from array. # class CutRanges < Array def min; self.collect{|a| a.min}.flatten.sort.first; end def max; self.collect{|a| a.max}.flatten.sort.last; end def include?(i); self.collect{|a| a.include?(i)}.include?(true); end def min_vertical vertical_min_max_helper( :min ) end def max_vertical vertical_min_max_helper( :max ) end protected def vertical_min_max_helper( sym_which ) tmp = [] self.each do |a| next unless a.class == Bio::RestrictionEnzyme::Range::VerticalCutRange tmp << a.send( sym_which ) end z = (sym_which == :max) ? :last : :first tmp.flatten.sort.send(z) end end # CutRanges end # Range end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/range/vertical_cut_range.rb0000644000175000017500000000475114141516614025336 0ustar nileshnilesh# # bio/util/restriction_enzyme/range/vertical_cut_range.rb - # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme class Range # FIXME docs are kind of out of date. Change this to VerticalAndHorizontalCutRange class VerticalCutRange < CutRange attr_reader :p_cut_left, :p_cut_right attr_reader :c_cut_left, :c_cut_right attr_reader :min, :max attr_reader :range # VerticalCutRange provides an extremely raw, yet precise, method of # defining the location of cuts on primary and complementary sequences. # # Many VerticalCutRange objects are used with HorizontalCutRange objects # to be contained in CutRanges to define the cut pattern that a # specific enzyme may make. # # VerticalCutRange takes up to four possible cuts, two on the primary # strand and two on the complementary strand. In typical usage # you will want to make a single cut on the primary strand and a single # cut on the complementary strand. # # However, you can construct it with whatever cuts you desire to accomadate # the most eccentric of imaginary restriction enzymes. # # --- # *Arguments* # * +p_cut_left+: (_optional_) Left-most cut on the primary strand. +nil+ to skip # * +p_cut_right+: (_optional_) Right-most cut on the primary strand. +nil+ to skip # * +c_cut_left+: (_optional_) Left-most cut on the complementary strand. +nil+ to skip # * +c_cut_right+: (_optional_) Right-most cut on the complementary strand. +nil+ to skip # *Returns*:: nothing def initialize( p_cut_left=nil, p_cut_right=nil, c_cut_left=nil, c_cut_right=nil ) @p_cut_left = p_cut_left @p_cut_right = p_cut_right @c_cut_left = c_cut_left @c_cut_right = c_cut_right a = [@p_cut_left, @c_cut_left, @p_cut_right, @c_cut_right] a.delete(nil) a.sort! @min = a.first @max = a.last @range = nil @range = (@min..@max) unless @min == nil or @max == nil return end # Check if a location falls within the minimum or maximum values of this # range. # # --- # *Arguments* # * +i+: Location to check if it is included in the range # *Returns*:: +true+ _or_ +false+ def include?(i) return false if @range == nil @range.include?(i) end end # VerticalCutRange end # Range end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/range/sequence_range.rb0000644000175000017500000002262714141516614024464 0ustar nileshnilesh# # bio/util/restriction_enzyme/range/sequence_range.rb - A defined range over a nucleotide sequence # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme class Range autoload :CutRange, 'bio/util/restriction_enzyme/range/cut_range' autoload :CutRanges, 'bio/util/restriction_enzyme/range/cut_ranges' autoload :HorizontalCutRange, 'bio/util/restriction_enzyme/range/horizontal_cut_range' autoload :VerticalCutRange, 'bio/util/restriction_enzyme/range/vertical_cut_range' # A defined range over a nucleotide sequence. # # This class accomadates having cuts defined on a sequence and returning the # fragments made by those cuts. class SequenceRange autoload :Fragment, 'bio/util/restriction_enzyme/range/sequence_range/fragment' autoload :Fragments, 'bio/util/restriction_enzyme/range/sequence_range/fragments' autoload :CalculatedCuts, 'bio/util/restriction_enzyme/range/sequence_range/calculated_cuts' # Left-most index of primary strand attr_reader :p_left # Right-most index of primary strand attr_reader :p_right # Left-most index of complementary strand attr_reader :c_left # Right-most index of complementary strand attr_reader :c_right # Left-most index of DNA sequence attr_reader :left # Right-most index of DNA sequence attr_reader :right # Size of DNA sequence attr_reader :size # CutRanges in this SequenceRange attr_reader :cut_ranges def initialize( p_left = nil, p_right = nil, c_left = nil, c_right = nil ) raise ArgumentError if p_left == nil and c_left == nil raise ArgumentError if p_right == nil and c_right == nil (raise ArgumentError unless p_left <= p_right) unless p_left == nil or p_right == nil (raise ArgumentError unless c_left <= c_right) unless c_left == nil or c_right == nil @p_left, @p_right, @c_left, @c_right = p_left, p_right, c_left, c_right @left = [p_left, c_left].compact.sort.first @right = [p_right, c_right].compact.sort.last @size = (@right - @left) + 1 unless @left == nil or @right == nil @cut_ranges = CutRanges.new @__fragments_current = false end # If the first object is HorizontalCutRange or VerticalCutRange, that is # added to the SequenceRange. Otherwise this method # builds a VerticalCutRange object and adds it to the SequenceRange. # # Note: # Cut occurs immediately after the index supplied. # For example, a cut at '0' would mean a cut occurs between bases 0 and 1. # # --- # *Arguments* # * +p_cut_left+: (_optional_) Left-most cut on the primary strand *or* a CutRange object. +nil+ to skip # * +p_cut_right+: (_optional_) Right-most cut on the primary strand. +nil+ to skip # * +c_cut_left+: (_optional_) Left-most cut on the complementary strand. +nil+ to skip # * +c_cut_right+: (_optional_) Right-most cut on the complementary strand. +nil+ to skip # *Returns*:: nothing def add_cut_range( p_cut_left=nil, p_cut_right=nil, c_cut_left=nil, c_cut_right=nil ) @__fragments_current = false if p_cut_left.kind_of? CutRange # shortcut @cut_ranges << p_cut_left else [p_cut_left, p_cut_right, c_cut_left, c_cut_right].each { |n| (raise IndexError unless n >= @left and n <= @right) unless n == nil } @cut_ranges << VerticalCutRange.new( p_cut_left, p_cut_right, c_cut_left, c_cut_right ) end end # Add a series of CutRange objects (HorizontalCutRange or VerticalCutRange). # # --- # *Arguments* # * +cut_ranges+: A series of CutRange objects # *Returns*:: nothing def add_cut_ranges(*cut_ranges) cut_ranges.flatten.each do |cut_range| raise TypeError, "Not of type CutRange" unless cut_range.kind_of? CutRange self.add_cut_range( cut_range ) end end # Builds a HorizontalCutRange object and adds it to the SequenceRange. # # --- # *Arguments* # * +left+: Left-most cut # * +right+: (_optional_) Right side - by default this equals the left side, default is recommended. # *Returns*:: nothing def add_horizontal_cut_range( left, right=left ) @__fragments_current = false @cut_ranges << HorizontalCutRange.new( left, right ) end # A Bio::RestrictionEnzyme::Range::SequenceRange::Bin holds an +Array+ of # indexes for the primary and complement strands (+p+ and +c+ accessors). # # Example hash with Bin values: # {0=>#, # 2=>#, # 3=>#, # 4=>#} # # Note that the bin cannot be easily stored as a range since there may be # nucleotides excised in the middle of a range. # # TODO: Perhaps store the bins as one-or-many ranges since missing # nucleotides due to enzyme cutting is a special case. Bin = Struct.new(:c, :p) # Calculates the fragments over this sequence range as defined after using # the methods add_cut_range, add_cut_ranges, and/or add_horizontal_cut_range # # Example return value: # [#, # #, # #, # #] # # --- # *Arguments* # * _none_ # *Returns*:: Bio::RestrictionEnzyme::Range::SequenceRange::Fragments def fragments return @__fragments if @__fragments_current == true @__fragments_current = true num_txt = '0123456789' num_txt_repeat = (num_txt * ( @size.div(num_txt.size) + 1))[0..@size-1] fragments = Fragments.new(num_txt_repeat, num_txt_repeat) cc = Bio::RestrictionEnzyme::Range::SequenceRange::CalculatedCuts.new(@size) cc.add_cuts_from_cut_ranges(@cut_ranges) cc.remove_incomplete_cuts create_bins(cc).sort.each { |k, bin| fragments << Fragment.new( bin.p, bin.c ) } @__fragments = fragments return fragments end ######### protected ######### # Example: # cc = Bio::RestrictionEnzyme::Range::SequenceRange::CalculatedCuts.new(@size) # cc.add_cuts_from_cut_ranges(@cut_ranges) # cc.remove_incomplete_cuts # bins = create_bins(cc) # # Example return value: # {0=>#, # 2=>#, # 3=>#, # 4=>#} # # --- # *Arguments* # * +cc+: Bio::RestrictionEnzyme::Range::SequenceRange::CalculatedCuts # *Returns*:: +Hash+ Keys are unique, values are Bio::RestrictionEnzyme::Range::SequenceRange::Bin objects filled with indexes of the sequence locations they represent. def create_bins(cc) p_cut = cc.vc_primary_as_original_class c_cut = cc.vc_complement_as_original_class h_cut = cc.hc_between_strands_as_original_class if (defined? @circular) && @circular # NOTE # if it's circular we should start at the beginning of a cut for orientation # scan for it, hack off the first set of hcuts and move them to the back unique_id = 0 else p_cut.unshift(-1) unless p_cut.include?(-1) c_cut.unshift(-1) unless c_cut.include?(-1) unique_id = -1 end p_bin_id = c_bin_id = unique_id bins = {} setup_new_bin(bins, unique_id) -1.upto(@size-1) do |idx| # NOTE - circular, for the future - should '-1' be replace with 'unique_id'? # if bin_ids are out of sync but the strands are attached if (p_bin_id != c_bin_id) and !h_cut.include?(idx) min_id, max_id = [p_bin_id, c_bin_id].sort bins.delete(max_id) p_bin_id = c_bin_id = min_id end bins[ p_bin_id ].p << idx bins[ c_bin_id ].c << idx if p_cut.include? idx p_bin_id = (unique_id += 1) setup_new_bin(bins, p_bin_id) end if c_cut.include? idx # repetition c_bin_id = (unique_id += 1) # repetition setup_new_bin(bins, c_bin_id) # repetition end # repetition end # Bin "-1" is an easy way to indicate the start of a strand just in case # there is a horizontal cut at position 0 bins.delete(-1) unless ((defined? @circular) && @circular) bins end # Modifies bins in place by creating a new element with key bin_id and # initializing the bin. def setup_new_bin(bins, bin_id) bins[ bin_id ] = Bin.new bins[ bin_id ].p = DenseIntArray[] #could be replaced by SortedNumArray[] bins[ bin_id ].c = DenseIntArray[] #could be replaced by SortedNumArray[] end end # SequenceRange end # Range end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/single_strand.rb0000644000175000017500000001523314141516614023233 0ustar nileshnilesh# # bio/util/restriction_enzyme/single_strand.rb - Single strand of a restriction enzyme sequence # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # require 'bio/sequence' module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme # A single strand of restriction enzyme sequence pattern with a 5' to 3' # orientation. # # DoubleStranded puts the SingleStrand and SingleStrandComplement together to # create the sequence pattern with cuts on both strands. # class SingleStrand < Bio::Sequence::NA autoload :CutLocationsInEnzymeNotation, 'bio/util/restriction_enzyme/single_strand/cut_locations_in_enzyme_notation' include CutSymbol include StringFormatting # The cut locations in enzyme notation. Contains a # CutLocationsInEnzymeNotation object set when the SingleStrand # object is initialized. attr_reader :cut_locations_in_enzyme_notation # The cut locations transformed from enzyme index notation to 0-based # array index notation. Contains an Array. attr_reader :cut_locations # Orientation of the strand, 5' to 3' def orientation; [5,3]; end # Constructor for a Bio::RestrictionEnzyme::StingleStrand object. # # A single strand of restriction enzyme sequence pattern with a 5' to 3' orientation. # # --- # *Arguments* # * +sequence+: (_required_) The enzyme sequence. # * +c+: (_optional_) Cut locations in enzyme notation. # See Bio::RestrictionEnzyme::SingleStrand::CutLocationsInEnzymeNotation # # *Constraints* # * +sequence+ cannot contain immediately adjacent cut symbols (ex. atg^^c). # * +c+ is in enzyme index notation and therefore cannot contain a 0. # * If +c+ is omitted, +sequence+ must contain a cut symbol. # * You cannot provide both a sequence with cut symbols and provide cut locations - ambiguous. # # +sequence+ must be a kind of: # * String # * Bio::Sequence::NA # * Bio::RestrictionEnzyme::SingleStrand # # +c+ must be a kind of: # * Bio::RestrictionEnzyme::SingleStrand::CutLocationsInEnzymeNotation # * Integer, one or more # * Array # # *Returns*:: nothing def initialize( sequence, *c ) c.flatten! # if an array was supplied as an argument # NOTE t| 2009-09-19 commented out for library efficiency # validate_args(sequence, c) sequence = sequence.downcase if sequence =~ re_cut_symbol @cut_locations_in_enzyme_notation = CutLocationsInEnzymeNotation.new( strip_padding(sequence) ) else @cut_locations_in_enzyme_notation = CutLocationsInEnzymeNotation.new( c ) end @stripped = Bio::Sequence::NA.new( strip_cuts_and_padding( sequence ) ) super( pattern ) @cut_locations = @cut_locations_in_enzyme_notation.to_array_index return end # Returns true if this enzyme is palindromic with its reverse complement. # Does not report if the +cut_locations+ are palindromic or not. # # Examples: # * This would be palindromic: # 5' - ATGCAT - 3' # TACGTA # # * This would not be palindromic: # 5' - ATGCGTA - 3' # TACGCAT # # --- # *Arguments* # * _none_ # *Returns*:: +true+ _or_ +false+ def palindromic? @stripped.reverse_complement == @stripped end # Sequence pattern with no cut symbols and no 'n' padding. # * SingleStrand.new('garraxt', [-2, 1, 7]).stripped # => "garraxt" attr_reader :stripped # The sequence with 'n' padding and cut symbols. # * SingleStrand.new('garraxt', [-2, 1, 7]).with_cut_symbols # => "n^ng^arraxt^n" # # --- # *Arguments* # * _none_ # *Returns*:: The sequence with 'n' padding and cut symbols. def with_cut_symbols s = pattern @cut_locations_in_enzyme_notation.to_array_index.sort.reverse.each { |c| s.insert(c+1, cut_symbol) } s end # The sequence with 'n' padding on the left and right for cuts larger than the sequence. # * SingleStrand.new('garraxt', [-2, 1, 7]).pattern # => "nngarraxtn" # # --- # *Arguments* # * _none_ # *Returns*:: The sequence with 'n' padding on the left and right for cuts larger than the sequence. def pattern return stripped if @cut_locations_in_enzyme_notation.min == nil left = (@cut_locations_in_enzyme_notation.min < 0 ? 'n' * @cut_locations_in_enzyme_notation.min.abs : '') # Add one more 'n' if a cut is at the last position right = ( (@cut_locations_in_enzyme_notation.max >= @stripped.length) ? ('n' * (@cut_locations_in_enzyme_notation.max - @stripped.length + 1)) : '') [left, stripped, right].join('') end # The sequence with 'n' pads, cut symbols, and spacing for alignment. # * SingleStrand.new('garraxt', [-2, 1, 7]).with_spaces # => "n^n g^a r r a x t^n" # # --- # *Arguments* # * _none_ # *Returns*:: The sequence with 'n' pads, cut symbols, and spacing for alignment. def with_spaces add_spacing( with_cut_symbols ) end ######### protected ######### def validate_args( input_pattern, input_cut_locations ) unless input_pattern.kind_of?(String) err = "input_pattern is not a String, Bio::Sequence::NA, or Bio::RestrictionEnzyme::SingleStrand object\n" err += "pattern: #{input_pattern}\n" err += "class: #{input_pattern.class}" raise ArgumentError, err end if ( input_pattern =~ re_cut_symbol ) and !input_cut_locations.empty? err = "Cut symbol found in sequence, but cut locations were also supplied. Ambiguous.\n" err += "pattern: #{input_pattern}\n" err += "symbol: #{cut_symbol}\n" err += "locations: #{input_cut_locations.inspect}" raise ArgumentError, err end input_pattern.each_byte do |c| c = c.chr.downcase unless Bio::NucleicAcid::NAMES.has_key?(c) or c == 'x' or c == 'X' or c == cut_symbol err = "Invalid character in pattern.\n" err += "Not a nucleotide or representation of possible nucleotides. See Bio::NucleicAcid::NAMES for more information.\n" err += "char: #{c}\n" err += "input_pattern: #{input_pattern}" raise ArgumentError, err end end end # Tadayoshi Funaba's method as discussed in Programming Ruby 2ed, p390 def self.once(*ids) for id in ids module_eval <<-"end;" alias_method :__#{id.__id__}__, :#{id.to_s} private :__#{id.__id__}__ def #{id.to_s}(*args, &block) (@__#{id.__id__}__ ||= [__#{id.__id__}__(*args, &block)])[0] end end; end end private_class_method :once once :pattern, :with_cut_symbols, :with_spaces, :to_re end # SingleStrand end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/sorted_num_array.rb0000644000175000017500000001061314141516614023751 0ustar nileshnilesh# # bio/util/restriction_enzyme/sorted_num_array.rb - Internal data storage for Bio::RestrictionEnzyme::Range::SequenceRange # # Copyright:: Copyright (C) 2011 # Naohisa Goto # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme # a class to store sorted numerics. # # Bio::RestrictionEnzyme internal use only. # Please do not create the instance outside Bio::RestrictionEnzyme. class SortedNumArray # Same usage as Array.[] def self.[](*args) a = self.new args.each do |elem| a.push elem end a end # Creates a new object def initialize @hash = {} #clear_cache end # initialize copy def initialize_copy(other) super(other) @hash = @hash.dup end # sets internal hash object def internal_data_hash=(h) #clear_cache @hash = h self end protected :internal_data_hash= # gets internal hash object def internal_data_hash @hash end protected :internal_data_hash #--- ## clear the internal cache #def clear_cache # @sorted_keys = nil #end #protected :clear_cache #+++ # sorted keys def sorted_keys #@sorted_keys ||= @hash.keys.sort #@sorted_keys @hash.keys.sort end private :sorted_keys # adds a new element def push_element(n) #return if @hash.has_key?(n) #already existed; do nothing @hash.store(n, true) #if @sorted_keys then # if thelast = @sorted_keys[-1] and n > thelast then # @sorted_keys.push n # else # clear_cache # end #end nil end private :push_element # adds a new element in the beginning of the array def unshift_element(n) #return if @hash.has_key?(n) #already existed; do nothing @hash.store(n, true) #if @sorted_keys then # if thefirst = @sorted_keys[0] and n < thefirst then # @sorted_keys.unshift n # else # clear_cache # end #end nil end private :unshift_element # Same usage as Array#[] def [](*arg) #$stderr.puts "SortedNumArray#[]" sorted_keys[*arg] end # Not implemented def []=(*arg) raise NotImplementedError, 'SortedNumArray#[]= is not implemented.' end # Same usage as Array#each def each(&block) sorted_keys.each(&block) end # Same usage as Array#reverse_each def reverse_each(&block) sorted_keys.reverse_each(&block) end # Same usage as Array#+, but accepts only the same classes instance. def +(other) unless other.is_a?(self.class) then raise TypeError, 'unsupported data type' end new_hash = @hash.merge(other.internal_data_hash) result = self.class.new result.internal_data_hash = new_hash result end # Same usage as Array#== def ==(other) if r = super(other) then r elsif other.is_a?(self.class) then other.internal_data_hash == @hash else false end end # Same usage as Array#concat def concat(ary) ary.each { |elem| push_element(elem) } self end # Same usage as Array#push def push(*args) args.each do |elem| push_element(elem) end self end # Same usage as Array#unshift def unshift(*arg) arg.reverse_each do |elem| unshift_element(elem) end self end # Same usage as Array#<< def <<(elem) push_element(elem) self end # Same usage as Array#include? def include?(elem) @hash.has_key?(elem) end # Same usage as Array#first def first sorted_keys.first end # Same usage as Array#last def last sorted_keys.last end # Same usage as Array#size def size @hash.size end alias length size # Same usage as Array#delete def delete(elem) #clear_cache @hash.delete(elem) ? elem : nil end # Does nothing def sort!(&block) # does nothing self end # Does nothing def uniq! # does nothing self end # Converts to an array def to_a #sorted_keys.dup sorted_keys end end #class SortedNumArray end #class RestrictionEnzyme end #module Bio bio-2.0.3/lib/bio/util/restriction_enzyme/double_stranded/0000755000175000017500000000000014141516614023204 5ustar nileshnileshbio-2.0.3/lib/bio/util/restriction_enzyme/double_stranded/cut_locations.rb0000644000175000017500000000337514141516614026407 0ustar nileshnilesh# # bio/util/restriction_enzyme/double_stranded/cut_locations.rb - Contains an Array of CutLocationPair objects # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme class DoubleStranded # Contains an +Array+ of CutLocationPair objects. # class CutLocations < Array # CutLocations constructor. # # Contains an +Array+ of CutLocationPair objects. # # Example: # clp1 = CutLocationPair.new(3,2) # clp2 = CutLocationPair.new(7,9) # pairs = CutLocations.new(clp1, clp2) # # --- # *Arguments* # * +args+: Any number of +CutLocationPair+ objects # *Returns*:: nothing def initialize(*args) validate_args(args) super(args) end # Returns an +Array+ of locations of cuts on the primary strand # # --- # *Arguments* # * _none_ # *Returns*:: +Array+ of locations of cuts on the primary strand def primary self.collect {|a| a[0]} end # Returns an +Array+ of locations of cuts on the complementary strand # # --- # *Arguments* # * _none_ # *Returns*:: +Array+ of locations of cuts on the complementary strand def complement self.collect {|a| a[1]} end ######### protected ######### def validate_args(args) args.each do |a| unless a.class == Bio::RestrictionEnzyme::DoubleStranded::CutLocationPair err = "Not a CutLocationPair\n" err += "class: #{a.class}\n" err += "inspect: #{a.inspect}" raise ArgumentError, err end end end end # CutLocations end # DoubleStranded end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/double_stranded/cut_location_pair_in_enzyme_notation.rb0000644000175000017500000000175714141516614033231 0ustar nileshnilesh# # bio/util/restriction_enzyme/double_stranded/cut_location_pair_in_enzyme_notation.rb - Inherits from DoubleStranded::CutLocationPair # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme class DoubleStranded # Inherits from DoubleStranded::CutLocationPair , stores the cut location pair in # enzyme notation instead of 0-based. # class CutLocationPairInEnzymeNotation < CutLocationPair ######### protected ######### def validate_2( a, b ) if (a == 0) or (b == 0) raise ArgumentError, "Enzyme index notation only. 0 values are illegal." end if a == nil and b == nil raise ArgumentError, "Neither strand has a cut. Ambiguous." end end end # CutLocationPair end # DoubleStranded end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/double_stranded/aligned_strands.rb0000644000175000017500000001031414141516614026671 0ustar nileshnilesh# # bio/util/restriction_enzyme/double_stranded/aligned_strands.rb - Align two SingleStrand objects # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme class DoubleStranded # Align two SingleStrand objects and return a Result # object with +primary+ and +complement+ accessors. # class AlignedStrands extend CutSymbol extend StringFormatting # Creates a new object. # --- # *Returns*:: Bio::RestrictionEnzyme::DoubleStranded::AlignedStrands object def initialize; super; end # The object returned for alignments Result = Struct.new(:primary, :complement) # Pad and align two String objects without cut symbols. # # This will look for the sub-sequence without left and right 'n' padding # and re-apply 'n' padding to both strings on both sides equal to the # maximum previous padding on that side. # # The sub-sequences stripped of left and right 'n' padding must be of equal # length. # # Example: # AlignedStrands.align('nngattacannnnn', 'nnnnnctaatgtnn') # => # # # --- # *Arguments* # * +a+: Primary strand # * +b+: Complementary strand # *Returns*:: +Result+ object with equal padding on both strings def self.align(a, b) a = a.to_s b = b.to_s validate_input( strip_padding(a), strip_padding(b) ) left = [left_padding(a), left_padding(b)].sort.last right = [right_padding(a), right_padding(b)].sort.last p = left + strip_padding(a) + right c = left + strip_padding(b) + right Result.new(p,c) end # Pad and align two String objects with cut symbols. # # Example: # AlignedStrands.with_cuts('nngattacannnnn', 'nnnnnctaatgtnn', [0, 10, 12], [0, 2, 12]) # => # # # Notes: # * To make room for the cut symbols each nucleotide is spaced out. # * This is meant to be able to handle multiple cuts and completely # unrelated cutsites on the two strands, therefore no biological # algorithm assumptions (shortcuts) are made. # # The sequences stripped of left and right 'n' padding must be of equal # length. # # --- # *Arguments* # * +a+: Primary sequence # * +b+: Complementary sequence # * +a_cuts+: Primary strand cut locations in 0-based index notation # * +b_cuts+: Complementary strand cut locations in 0-based index notation # *Returns*:: +Result+ object with equal padding on both strings and spacing between bases def self.align_with_cuts(a,b,a_cuts,b_cuts) a = a.to_s b = b.to_s validate_input( strip_padding(a), strip_padding(b) ) a_left, a_right = left_padding(a), right_padding(a) b_left, b_right = left_padding(b), right_padding(b) left_diff = a_left.length - b_left.length right_diff = a_right.length - b_right.length (right_diff > 0) ? (b_right += 'n' * right_diff) : (a_right += 'n' * right_diff.abs) a_adjust = b_adjust = 0 if left_diff > 0 b_left += 'n' * left_diff b_adjust = left_diff else a_left += 'n' * left_diff.abs a_adjust = left_diff.abs end a = a_left + strip_padding(a) + a_right b = b_left + strip_padding(b) + b_right a_cuts.sort.reverse.each { |c| a.insert(c+1+a_adjust, cut_symbol) } b_cuts.sort.reverse.each { |c| b.insert(c+1+b_adjust, cut_symbol) } Result.new( add_spacing(a), add_spacing(b) ) end ######### protected ######### def self.validate_input(a,b) unless a.size == b.size err = "Result sequences are not the same size. Does not align sequences with differing lengths after strip_padding.\n" err += "#{a.size}, #{a.inspect}\n" err += "#{b.size}, #{b.inspect}" raise ArgumentError, err end end end # AlignedStrands end # DoubleStranded end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/double_stranded/cut_locations_in_enzyme_notation.rb0000644000175000017500000000621414141516614032372 0ustar nileshnilesh# # bio/util/restriction_enzyme/double_stranded/cut_locations_in_enzyme_notation.rb - Inherits from DoubleStrand::CutLocations # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme class DoubleStranded # Inherits from DoubleStranded::CutLocations. Contains CutLocationPairInEnzymeNotation objects. # Adds helper methods to convert from enzyme index notation to 0-based array index notation. # class CutLocationsInEnzymeNotation < CutLocations # Returns +Array+ of locations of cuts on the primary # strand in 0-based array index notation. # # --- # *Arguments* # * _none_ # *Returns*:: +Array+ of locations of cuts on the primary strand in 0-based array index notation. def primary_to_array_index helper_for_to_array_index(self.primary) end # Returns +Array+ of locations of cuts on the complementary # strand in 0-based array index notation. # # --- # *Arguments* # * _none_ # *Returns*:: +Array+ of locations of cuts on the complementary strand in 0-based array index notation. def complement_to_array_index helper_for_to_array_index(self.complement) end # Returns the contents of the present CutLocationsInEnzymeNotation object as # a CutLocations object with the contents converted from enzyme notation # to 0-based array index notation. # # --- # *Arguments* # * _none_ # *Returns*:: +CutLocations+ def to_array_index unless self.primary_to_array_index.size == self.complement_to_array_index.size err = "Primary and complement strand cut locations are not available in equal numbers.\n" err += "primary: #{self.primary_to_array_index.inspect}\n" err += "primary.size: #{self.primary_to_array_index.size}\n" err += "complement: #{self.complement_to_array_index.inspect}\n" err += "complement.size: #{self.complement_to_array_index.size}" raise IndexError, err end a = self.primary_to_array_index.zip(self.complement_to_array_index) CutLocations.new( *a.collect {|cl| CutLocationPair.new(cl)} ) end ######### protected ######### def helper_for_to_array_index(a) minimum = (self.primary + self.complement).flatten minimum.delete(nil) minimum = minimum.sort.first return [] if minimum == nil # no elements if minimum < 0 calc = lambda do |n| unless n == nil n -= 1 unless n < 0 n += minimum.abs end n end else calc = lambda do |n| n -= 1 unless n == nil n end end a.collect(&calc) end def validate_args(args) args.each do |a| unless a.class == Bio::RestrictionEnzyme::DoubleStranded::CutLocationPairInEnzymeNotation err = "Not a CutLocationPairInEnzymeNotation\n" err += "class: #{a.class}\n" err += "inspect: #{a.inspect}" raise TypeError, err end end end end # CutLocationsInEnzymeNotation end # DoubleStranded end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/double_stranded/cut_location_pair.rb0000644000175000017500000000560714141516614027237 0ustar nileshnilesh# # bio/util/restriction_enzyme/double_stranded/cut_location_pair.rb - Stores a cut location pair in 0-based index notation # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme class DoubleStranded # Stores a single cut location pair in 0-based index notation for use with # DoubleStranded enzyme sequences. # class CutLocationPair < Array # Location of the cut on the primary strand. # Corresponds - or 'pairs' - to the complement cut. # A value of +nil+ is an explicit representation of 'no cut'. attr_reader :primary # Location of the cut on the complementary strand. # Corresponds - or 'pairs' - to the primary cut. # A value of +nil+ is an explicit representation of 'no cut'. attr_reader :complement # CutLocationPair constructor. # # Stores a single cut location pair in 0-based index notation for use with # DoubleStranded enzyme sequences. # # Example: # clp = CutLocationPair.new(3,2) # clp.primary # 3 # clp.complement # 2 # # --- # *Arguments* # * +pair+: May be two values represented as an Array, a Range, or a # combination of Integer and nil values. The first value # represents a cut on the primary strand, the second represents # a cut on the complement strand. # *Returns*:: nothing def initialize( *pair ) a = b = nil if pair[0].kind_of? Array a,b = init_with_array( pair[0] ) # no idea why this barfs without the second half during test/runner.rb # are there two Range objects running around? elsif pair[0].kind_of? Range or (pair[0].class.to_s == 'Range') #elsif pair[0].kind_of? Range a,b = init_with_array( [pair[0].first, pair[0].last] ) elsif pair[0].kind_of? Integer or pair[0].kind_of? NilClass a,b = init_with_array( [pair[0], pair[1]] ) else raise ArgumentError, "#{pair[0].class} is an invalid class type to initalize CutLocationPair." end super( [a,b] ) @primary = a @complement = b return end ######### protected ######### def init_with_array( ary ) validate_1(ary) a = ary.shift ary.empty? ? b = nil : b = ary.shift validate_2(a,b) [a,b] end def validate_1( ary ) unless ary.size == 1 or ary.size == 2 raise ArgumentError, "Must be one or two elements." end end def validate_2( a, b ) if (a != nil and a < 0) or (b != nil and b < 0) raise ArgumentError, "0-based index notation only. Negative values are illegal." end if a == nil and b == nil raise ArgumentError, "Neither strand has a cut. Ambiguous." end end end # CutLocationPair end # DoubleStranded end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/analysis.rb0000644000175000017500000002322114141516614022216 0ustar nileshnilesh# # bio/util/restriction_enzyme/analysis.rb - Does the work of fragmenting the DNA from the enzymes # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme class Analysis #-- # require "analysis_basic.rb" here to avoid cyclic require #++ require 'bio/util/restriction_enzyme/analysis_basic' # See cut instance method def self.cut( sequence, *args ) self.new.cut( sequence, *args ) end # See main documentation for Bio::RestrictionEnzyme # # # +cut+ takes into account # permutations of cut variations based on competitiveness of enzymes for an # enzyme cutsite or enzyme bindsite on a sequence. # # Example: # # FIXME add output # # Bio::RestrictionEnzyme::Analysis.cut('gaattc', 'EcoRI') # # _same as:_ # # Bio::RestrictionEnzyme::Analysis.cut('gaattc', 'g^aattc') # --- # *Arguments* # * +sequence+: +String+ kind of object that will be used as a nucleic acid sequence. # * +args+: Series of enzyme names, enzymes sequences with cut marks, or RestrictionEnzyme objects. # *Returns*:: Bio::RestrictionEnzyme::Fragments object populated with Bio::RestrictionEnzyme::Fragment objects. (Note: unrelated to Bio::RestrictionEnzyme::Range::SequenceRange::Fragments) or a +Symbol+ containing an error code def cut( sequence, *args ) view_ranges = false args.select { |i| i.class == Hash }.each do |hsh| hsh.each do |key, value| if key == :view_ranges unless ( value.kind_of?(TrueClass) or value.kind_of?(FalseClass) ) raise ArgumentError, "view_ranges must be set to true or false, currently #{value.inspect}." end view_ranges = value end end end res = cut_and_return_by_permutations( sequence, *args ) return res if res.class == Symbol # Format the fragments for the user fragments_for_display( res, view_ranges ) end ######### protected ######### # See cut instance method # # --- # *Arguments* # * +sequence+: +String+ kind of object that will be used as a nucleic acid sequence. # * +args+: Series of enzyme names, enzymes sequences with cut marks, or RestrictionEnzyme objects. # May also supply a +Hash+ with the key ":max_permutations" to specificy how many permutations are allowed - a value of 0 indicates no permutations are allowed. # *Returns*:: +Hash+ Keys are a permutation ID, values are SequenceRange objects that have cuts applied. # _also_ may return the +Symbol+ ':sequence_empty', ':no_cuts_found', or ':too_many_permutations' def cut_and_return_by_permutations( sequence, *args ) my_hash = {} maximum_permutations = nil hashes_in_args = args.select { |i| i.class == Hash } args.delete_if { |i| i.class == Hash } hashes_in_args.each do |hsh| hsh.each do |key, value| case key when :max_permutations, 'max_permutations', :maximum_permutations, 'maximum_permutations' maximum_permutations = value.to_i unless value == nil when :view_ranges else raise ArgumentError, "Received key #{key.inspect} in argument - I only know the key ':max_permutations' and ':view_ranges' currently. Hash passed: #{hsh.inspect}" end end end if !sequence.kind_of?(String) or sequence.empty? logger.warn "The supplied sequence is empty." if defined?(logger) return :sequence_empty end sequence = Bio::Sequence::NA.new( sequence ) enzyme_actions, initial_cuts = create_enzyme_actions( sequence, *args ) if enzyme_actions.empty? and initial_cuts.empty? logger.warn "This enzyme does not make any cuts on this sequence." if defined?(logger) return :no_cuts_found end # * When enzyme_actions.size is equal to '1' that means there are no permutations. # * If enzyme_actions.size is equal to '2' there is one # permutation ("[0, 1]") # * If enzyme_actions.size is equal to '3' there are two # permutations ("[0, 1, 2]") # * and so on.. if maximum_permutations and enzyme_actions.size > 1 if (enzyme_actions.size - 1) > maximum_permutations.to_i logger.warn "More permutations than maximum, skipping. Found: #{enzyme_actions.size-1} Max: #{maximum_permutations.to_i}" if defined?(logger) return :too_many_permutations end end if enzyme_actions.size > 1 permutations = permute(enzyme_actions.size) permutations.each do |permutation| previous_cut_ranges = [] # Primary and complement strands are both measured from '0' to 'sequence.size-1' here sequence_range = Bio::RestrictionEnzyme::Range::SequenceRange.new( 0, 0, sequence.size-1, sequence.size-1 ) # Add the cuts to the sequence_range from each enzyme_action contained # in initial_cuts. These are the cuts that have no competition so are # not subject to permutations. initial_cuts.each do |enzyme_action| enzyme_action.cut_ranges.each do |cut_range| sequence_range.add_cut_range(cut_range) end end permutation.each do |id| enzyme_action = enzyme_actions[id] # conflict is false if the current enzyme action may cut in it's range. # conflict is true if it cannot due to a previous enzyme action making # a cut where this enzyme action needs a whole recognition site. conflict = false # If current size of enzyme_action overlaps with previous cut_range, don't cut # note that the enzyme action may fall in the middle of a previous enzyme action # so all cut locations must be checked that would fall underneath. previous_cut_ranges.each do |cut_range| next unless cut_range.class == Bio::RestrictionEnzyme::Range::VerticalCutRange # we aren't concerned with horizontal cuts previous_cut_left = cut_range.range.first previous_cut_right = cut_range.range.last # Keep in mind: # * The cut location is to the immediate right of the base located at the index. # ex: at^gc -- the cut location is at index 1 # * The enzyme action location is located at the base of the index. # ex: atgc -- 0 => 'a', 1 => 't', 2 => 'g', 3 => 'c' # method create_enzyme_actions has similar commentary if interested if (enzyme_action.right <= previous_cut_left) or (enzyme_action.left > previous_cut_right) or (enzyme_action.left > previous_cut_left and enzyme_action.right <= previous_cut_right) # in between cuts # no conflict else conflict = true end end next if conflict == true enzyme_action.cut_ranges.each { |cut_range| sequence_range.add_cut_range(cut_range) } previous_cut_ranges += enzyme_action.cut_ranges end # permutation.each # Fill in the source sequence for sequence_range so it knows what bases # to use sequence_range.fragments.primary = sequence sequence_range.fragments.complement = sequence.forward_complement my_hash[permutation] = sequence_range end # permutations.each else # if enzyme_actions.size == 1 # no permutations, just do it sequence_range = Bio::RestrictionEnzyme::Range::SequenceRange.new( 0, 0, sequence.size-1, sequence.size-1 ) initial_cuts.each { |enzyme_action| enzyme_action.cut_ranges.each { |cut_range| sequence_range.add_cut_range(cut_range) } } sequence_range.fragments.primary = sequence sequence_range.fragments.complement = sequence.forward_complement my_hash[0] = sequence_range end my_hash end # Returns permutation orders for a given number of elements. # # Examples: # permute(0) # => [[0]] # permute(1) # => [[0]] # permute(2) # => [[1, 0], [0, 1]] # permute(3) # => [[2, 1, 0], [2, 0, 1], [1, 2, 0], [0, 2, 1], [1, 0, 2], [0, 1, 2]] # permute(4) # => [[3, 2, 1, 0], # [3, 2, 0, 1], # [3, 1, 2, 0], # [3, 0, 2, 1], # [3, 1, 0, 2], # [3, 0, 1, 2], # [2, 3, 1, 0], # [2, 3, 0, 1], # [1, 3, 2, 0], # [0, 3, 2, 1], # [1, 3, 0, 2], # [0, 3, 1, 2], # [2, 1, 3, 0], # [2, 0, 3, 1], # [1, 2, 3, 0], # [0, 2, 3, 1], # [1, 0, 3, 2], # [0, 1, 3, 2], # [2, 1, 0, 3], # [2, 0, 1, 3], # [1, 2, 0, 3], # [0, 2, 1, 3], # [1, 0, 2, 3], # [0, 1, 2, 3]] # # --- # *Arguments* # * +count+: +Number+ of different elements to be permuted # * +permutations+: ignore - for the recursive algorithm # *Returns*:: +Array+ of +Array+ objects with different possible permutation orders. See examples. def permute(count, permutations = [[0]]) return permutations if count <= 1 new_arrays = [] new_array = [] (permutations[0].size + 1).times do |n| new_array.clear permutations.each { |a| new_array << a.dup } new_array.each { |e| e.insert(n, permutations[0].size) } new_arrays += new_array end permute(count-1, new_arrays) end end # Analysis end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/double_stranded.rb0000644000175000017500000002513314141516614023535 0ustar nileshnilesh# # bio/util/restriction_enzyme/double_stranded.rb - DoubleStranded restriction enzyme sequence # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme # A pair of SingleStrand and SingleStrandComplement objects with methods to # add utility to their relation. # # = Notes # * This is created by Bio::RestrictionEnzyme.new for convenience. # * The two strands accessible are +primary+ and +complement+. # * SingleStrand methods may be used on DoubleStranded and they will be passed to +primary+. # # # FIXME needs better docs class DoubleStranded autoload :AlignedStrands, 'bio/util/restriction_enzyme/double_stranded/aligned_strands' autoload :CutLocations, 'bio/util/restriction_enzyme/double_stranded/cut_locations' autoload :CutLocationPair, 'bio/util/restriction_enzyme/double_stranded/cut_location_pair' autoload :CutLocationsInEnzymeNotation, 'bio/util/restriction_enzyme/double_stranded/cut_locations_in_enzyme_notation' autoload :CutLocationPairInEnzymeNotation, 'bio/util/restriction_enzyme/double_stranded/cut_location_pair_in_enzyme_notation' include CutSymbol extend CutSymbol include StringFormatting extend StringFormatting # The primary strand attr_reader :primary # The complement strand attr_reader :complement # Cut locations in 0-based index format, DoubleStranded::CutLocations object attr_reader :cut_locations # Cut locations in enzyme index notation, DoubleStranded::CutLocationsInEnzymeNotation object attr_reader :cut_locations_in_enzyme_notation # [+erp+] One of three possible parameters: The name of an enzyme, a REBASE::EnzymeEntry object, or a nucleotide pattern with a cut mark. # [+raw_cut_pairs+] The cut locations in enzyme index notation. # # Enzyme index notation:: 1.._n_, value before 1 is -1 # # Examples of the allowable cut locations for +raw_cut_pairs+ follows. 'p' and # 'c' refer to a cut location on the 'p'rimary and 'c'omplement strands. # # 1, [3,2], [20,22], 57 # p, [p,c], [p, c], p # # Which is the same as: # # 1, (3..2), (20..22), 57 # p, (p..c), (p..c), p # # Examples of partial cuts: # 1, [nil,2], [20,nil], 57 # p, [p, c], [p, c], p # def initialize(erp, *raw_cut_pairs) # 'erp' : 'E'nzyme / 'R'ebase / 'P'attern k = erp.class if k == Bio::REBASE::EnzymeEntry # Passed a Bio::REBASE::EnzymeEntry object unless raw_cut_pairs.empty? err = "A Bio::REBASE::EnzymeEntry object was passed, however the cut locations contained values. Ambiguous or redundant.\n" err += "inspect = #{raw_cut_pairs.inspect}" raise ArgumentError, err end initialize_with_rebase( erp ) elsif erp.kind_of? String # Passed something that could be an enzyme pattern or an anzyme name # Decide if this String is an enzyme name or a pattern if Bio::RestrictionEnzyme.enzyme_name?( erp ) # FIXME we added this to rebase... # Check if it's a known name known_enzyme = false known_enzyme = true if Bio::RestrictionEnzyme.rebase[ erp ] # Try harder to find the enzyme unless known_enzyme re = %r"^#{erp}$"i Bio::RestrictionEnzyme.rebase.each { |name, v| (known_enzyme = true; erp = name; break) if name =~ re } end if known_enzyme initialize_with_rebase( Bio::RestrictionEnzyme.rebase[erp] ) else raise IndexError, "No entry found for enzyme named '#{erp}'" end else # Not an enzyme name, so a pattern is assumed if erp =~ re_cut_symbol initialize_with_pattern_and_cut_symbols( erp ) else initialize_with_pattern_and_cut_locations( erp, raw_cut_pairs ) end end elsif k == NilClass err = "Passed a nil value. Perhaps you tried to pass a Bio::REBASE::EnzymeEntry that does not exist?\n" err += "inspect = #{erp.inspect}" raise ArgumentError, err else err = "I don't know what to do with class #{k} for erp.\n" err += "inspect = #{erp.inspect}" raise ArgumentError, err end end # See AlignedStrands.align def aligned_strands AlignedStrands.align(@primary.pattern, @complement.pattern) end # See AlignedStrands.align_with_cuts def aligned_strands_with_cuts AlignedStrands.align_with_cuts(@primary.pattern, @complement.pattern, @primary.cut_locations, @complement.cut_locations) end # Returns +true+ if the cut pattern creates blunt fragments. # (opposite of sticky) def blunt? as = aligned_strands_with_cuts ary = [as.primary, as.complement] ary.collect! { |seq| seq.split( cut_symbol ) } # convert the cut sections to their lengths ary.each { |i| i.collect! { |c| c.length } } ary[0] == ary[1] end # Returns +true+ if the cut pattern creates sticky fragments. # (opposite of blunt) def sticky? !blunt? end # Takes a RestrictionEnzyme object and a numerical offset to the sequence and # returns an EnzymeAction # # +restriction_enzyme+:: RestrictionEnzyme # +offset+:: Numerical offset of where the enzyme action occurs on the seqeunce def create_action_at( offset ) # x is the size of the fully aligned sequence with maximum padding needed # to make a match on the primary and complement strand. # # For example - # Note how EcoRII needs extra padding on the beginning and ending of the # sequence 'ccagg' to make the match since the cut must occur between # two nucleotides and can not occur on the very end of the sequence. # # EcoRII: # :blunt: "0" # :c2: "5" # :c4: "0" # :c1: "-1" # :pattern: CCWGG # :len: "5" # :name: EcoRII # :c3: "0" # :ncuts: "2" # # -1 1 2 3 4 5 # 5' - n^c c w g g n - 3' # 3' - n g g w c c^n - 5' # # (w == [at]) x = aligned_strands.primary.size enzyme_action = EnzymeAction.new( offset, offset + x-1, offset, offset + x-1) @cut_locations.each do |cut_location_pair| # cut_pair is a DoubleStranded::CutLocationPair p, c = cut_location_pair.primary, cut_location_pair.complement if c >= p enzyme_action.add_cut_range(offset+p, nil, nil, offset+c) else enzyme_action.add_cut_range(nil, offset+p, offset+c, nil) end end enzyme_action end # An EnzymeAction is a way of representing a potential effect that a # RestrictionEnzyme may have on a nucleotide sequence, an 'action'. # # Multiple cuts in multiple locations on a sequence may occur in one # 'action' if it is done by a single enzyme. # # An EnzymeAction is a series of locations that represents where the restriction # enzyme will bind on the sequence, as well as what ranges are cut on the # sequence itself. The complexity is due to the fact that our virtual # restriction enzyme may create multiple segments from its cutting action, # on which another restriction enzyme may operate upon. # # For example, the DNA sequence: # # 5' - G A A T A A A C G A - 3' # 3' - C T T A T T T G C T - 5' # # When mixed with the restriction enzyme with the following cut pattern: # # 5' - A|A T A A A C|G - 3' # +-+ + # 3' - T T|A T T T G|C - 5' # # And also mixed with the restriction enzyme of the following cut pattern: # # 5' - A A|A C - 3' # +-+ # 3' - T|T T G - 5' # # Would result in a DNA sequence with these cuts: # # 5' - G A|A T A A|A C|G A - 3' # +-+ +-+ + # 3' - C T T|A T|T T G|C T - 5' # # Or these separate "free-floating" sequences: # # 5' - G A - 3' # 3' - C T T - 5' # # 5' - A T A A - 3' # 3' - A T - 5' # # 5' - A C - 3' # 3' - T T G - 5' # # 5' - G A - 3' # 3' - C T - 5' # # This would be represented by two EnzymeActions - one for each # RestrictionEnzyme. # # This is, however, subject to competition. If the second enzyme reaches # the target first, the the first enzyme will not be able to find the # appropriate bind site. # # FIXME complete these docs # # To initialize an EnzymeAction you must first instantiate it with the # beginning and ending locations of where it will operate on a nucleotide # sequence. # # Next the ranges of cu # # An EnzymeAction is # Defines a single enzyme action, in this case being a range that correlates # to the DNA sequence that may contain it's own internal cuts. class EnzymeAction < Bio::RestrictionEnzyme::Range::SequenceRange end ######### protected ######### def initialize_with_pattern_and_cut_symbols( s ) p_cl = SingleStrand::CutLocationsInEnzymeNotation.new( strip_padding(s) ) s = Bio::Sequence::NA.new( strip_cuts_and_padding(s) ) # * Reflect cuts that are in enzyme notation # * 0 is not a valid enzyme index, decrement 0 and all negative c_cl = p_cl.collect {|n| (n >= s.length or n < 1) ? ((s.length - n) - 1) : (s.length - n)} create_cut_locations( p_cl.zip(c_cl) ) create_primary_and_complement( s, p_cl, c_cl ) end def initialize_with_pattern_and_cut_locations( s, raw_cl ) create_cut_locations(raw_cl) create_primary_and_complement( Bio::Sequence::NA.new(s), @cut_locations_in_enzyme_notation.primary, @cut_locations_in_enzyme_notation.complement ) end def create_primary_and_complement(primary_seq, p_cuts, c_cuts) @primary = SingleStrand.new( primary_seq, p_cuts ) @complement = SingleStrandComplement.new( primary_seq.forward_complement, c_cuts ) end def create_cut_locations(raw_cl) @cut_locations_in_enzyme_notation = CutLocationsInEnzymeNotation.new( *raw_cl.collect {|cl| CutLocationPairInEnzymeNotation.new(cl)} ) @cut_locations = @cut_locations_in_enzyme_notation.to_array_index end def initialize_with_rebase( e ) p_cl = [e.primary_strand_cut1, e.primary_strand_cut2] c_cl = [e.complementary_strand_cut1, e.complementary_strand_cut2] # If there's no cut in REBASE it's represented as a 0. # 0 is an invalid index, it just means no cut. p_cl.delete(0) c_cl.delete(0) raise IndexError unless p_cl.size == c_cl.size initialize_with_pattern_and_cut_locations( e.pattern, p_cl.zip(c_cl) ) end end # DoubleStranded end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/enzymes.yaml0000644000175000017500000025705714141516614022444 0ustar nileshnilesh--- TspRI: :len: "5" :c1: "7" :c4: "0" :ncuts: "2" :pattern: CASTG :c2: "-3" :name: TspRI :blunt: "0" :c3: "0" MvnI: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CGCG :c2: "2" :name: MvnI :blunt: "1" :c3: "0" AclI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: AACGTT :c2: "4" :name: AclI :blunt: "0" :c3: "0" SfuI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: TTCGAA :c2: "4" :name: SfuI :blunt: "0" :c3: "0" ScrFI: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCNGG :c2: "3" :name: ScrFI :blunt: "0" :c3: "0" EcoO109I: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: RGGNCCY :c2: "5" :name: EcoO109I :blunt: "0" :c3: "0" TssI: :len: "9" :c1: "0" :c4: "0" :ncuts: "0" :pattern: gagnnnctc :c2: "0" :name: TssI :blunt: "0" :c3: "0" PpiI: :len: "12" :c1: "-8" :c4: "20" :ncuts: "4" :pattern: GAACNNNNNCTC :c2: "-13" :name: PpiI :blunt: "0" :c3: "25" Mph1103I: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: ATGCAT :c2: "1" :name: Mph1103I :blunt: "0" :c3: "0" Eco81I: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCTNAGG :c2: "5" :name: Eco81I :blunt: "0" :c3: "0" BspACI: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCGC :c2: "3" :name: BspACI :blunt: "0" :c3: "0" Eco105I: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TACGTA :c2: "3" :name: Eco105I :blunt: "1" :c3: "0" Eco24I: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GRGCYC :c2: "1" :name: Eco24I :blunt: "0" :c3: "0" BseRI: :len: "6" :c1: "16" :c4: "0" :ncuts: "2" :pattern: GAGGAG :c2: "14" :name: BseRI :blunt: "0" :c3: "0" AxyI: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCTNAGG :c2: "5" :name: AxyI :blunt: "0" :c3: "0" SecI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ccnngg :c2: "5" :name: SecI :blunt: "0" :c3: "0" PmaCI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: CACGTG :c2: "3" :name: PmaCI :blunt: "1" :c3: "0" HgiJII: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: grgcyc :c2: "1" :name: HgiJII :blunt: "0" :c3: "0" CauII: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: ccsgg :c2: "3" :name: CauII :blunt: "0" :c3: "0" BssKI: :len: "5" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: CCNGG :c2: "5" :name: BssKI :blunt: "0" :c3: "0" AarI: :len: "7" :c1: "11" :c4: "0" :ncuts: "2" :pattern: CACCTGC :c2: "15" :name: AarI :blunt: "0" :c3: "0" StsI: :len: "5" :c1: "15" :c4: "0" :ncuts: "2" :pattern: ggatg :c2: "19" :name: StsI :blunt: "0" :c3: "0" Rsr2I: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CGGWCCG :c2: "5" :name: Rsr2I :blunt: "0" :c3: "0" BbvI: :len: "5" :c1: "13" :c4: "0" :ncuts: "2" :pattern: GCAGC :c2: "17" :name: BbvI :blunt: "0" :c3: "0" MmeI: :len: "6" :c1: "26" :c4: "0" :ncuts: "2" :pattern: TCCRAC :c2: "24" :name: MmeI :blunt: "0" :c3: "0" FseI: :len: "8" :c1: "6" :c4: "0" :ncuts: "2" :pattern: GGCCGGCC :c2: "2" :name: FseI :blunt: "0" :c3: "0" SciI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: ctcgag :c2: "3" :name: SciI :blunt: "1" :c3: "0" PacI: :len: "8" :c1: "5" :c4: "0" :ncuts: "2" :pattern: TTAATTAA :c2: "3" :name: PacI :blunt: "0" :c3: "0" Bse21I: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCTNAGG :c2: "5" :name: Bse21I :blunt: "0" :c3: "0" AcvI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: CACGTG :c2: "3" :name: AcvI :blunt: "1" :c3: "0" DsaI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ccrygg :c2: "5" :name: DsaI :blunt: "0" :c3: "0" Bsp119I: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: TTCGAA :c2: "4" :name: Bsp119I :blunt: "0" :c3: "0" TliI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTCGAG :c2: "5" :name: TliI :blunt: "0" :c3: "0" PpsI: :len: "5" :c1: "9" :c4: "0" :ncuts: "2" :pattern: GAGTC :c2: "10" :name: PpsI :blunt: "0" :c3: "0" Ksp22I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TGATCA :c2: "5" :name: Ksp22I :blunt: "0" :c3: "0" BccI: :len: "5" :c1: "9" :c4: "0" :ncuts: "2" :pattern: CCATC :c2: "10" :name: BccI :blunt: "0" :c3: "0" BtrI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: CACGTC :c2: "3" :name: BtrI :blunt: "1" :c3: "0" BptI: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCWGG :c2: "3" :name: BptI :blunt: "0" :c3: "0" Bce83I: :len: "6" :c1: "22" :c4: "0" :ncuts: "2" :pattern: cttgag :c2: "20" :name: Bce83I :blunt: "0" :c3: "0" SmiI: :len: "8" :c1: "4" :c4: "0" :ncuts: "2" :pattern: ATTTAAAT :c2: "4" :name: SmiI :blunt: "1" :c3: "0" Sfr274I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTCGAG :c2: "5" :name: Sfr274I :blunt: "0" :c3: "0" PvuI: :len: "6" :c1: "4" :c4: "0" :ncuts: "2" :pattern: CGATCG :c2: "2" :name: PvuI :blunt: "0" :c3: "0" BslFI: :len: "5" :c1: "15" :c4: "0" :ncuts: "2" :pattern: GGGAC :c2: "19" :name: BslFI :blunt: "0" :c3: "0" AssI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: AGTACT :c2: "3" :name: AssI :blunt: "1" :c3: "0" VpaK11AI: :len: "5" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: ggwcc :c2: "5" :name: VpaK11AI :blunt: "0" :c3: "0" TspDTI: :len: "5" :c1: "16" :c4: "0" :ncuts: "2" :pattern: ATGAA :c2: "14" :name: TspDTI :blunt: "0" :c3: "0" MslI: :len: "10" :c1: "5" :c4: "0" :ncuts: "2" :pattern: CAYNNNNRTG :c2: "5" :name: MslI :blunt: "1" :c3: "0" HindIII: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: AAGCTT :c2: "5" :name: HindIII :blunt: "0" :c3: "0" AlwNI: :len: "9" :c1: "6" :c4: "0" :ncuts: "2" :pattern: CAGNNNCTG :c2: "3" :name: AlwNI :blunt: "0" :c3: "0" BstBI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: TTCGAA :c2: "4" :name: BstBI :blunt: "0" :c3: "0" BspDI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: ATCGAT :c2: "4" :name: BspDI :blunt: "0" :c3: "0" Csp6I: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GTAC :c2: "3" :name: Csp6I :blunt: "0" :c3: "0" Aor13HI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TCCGGA :c2: "5" :name: Aor13HI :blunt: "0" :c3: "0" UbaF14I: :len: "11" :c1: "0" :c4: "0" :ncuts: "0" :pattern: ccannnnntcg :c2: "0" :name: UbaF14I :blunt: "0" :c3: "0" TaaI: :len: "5" :c1: "3" :c4: "0" :ncuts: "2" :pattern: ACNGT :c2: "2" :name: TaaI :blunt: "0" :c3: "0" SatI: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GCNGC :c2: "3" :name: SatI :blunt: "0" :c3: "0" MjaIV: :len: "6" :c1: "0" :c4: "0" :ncuts: "0" :pattern: gtnnac :c2: "0" :name: MjaIV :blunt: "0" :c3: "0" LpnI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: rgcgcy :c2: "3" :name: LpnI :blunt: "1" :c3: "0" BanI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGYRCC :c2: "5" :name: BanI :blunt: "0" :c3: "0" FauNDI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CATATG :c2: "4" :name: FauNDI :blunt: "0" :c3: "0" AspA2I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCTAGG :c2: "5" :name: AspA2I :blunt: "0" :c3: "0" Eco130I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCWWGG :c2: "5" :name: Eco130I :blunt: "0" :c3: "0" PalAI: :len: "8" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GGCGCGCC :c2: "6" :name: PalAI :blunt: "0" :c3: "0" MwoI: :len: "11" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GCNNNNNNNGC :c2: "4" :name: MwoI :blunt: "0" :c3: "0" BstEII: :len: "7" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGTNACC :c2: "6" :name: BstEII :blunt: "0" :c3: "0" Bsp120I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGGCCC :c2: "5" :name: Bsp120I :blunt: "0" :c3: "0" SspI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: AATATT :c2: "3" :name: SspI :blunt: "1" :c3: "0" PmlI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: CACGTG :c2: "3" :name: PmlI :blunt: "1" :c3: "0" MfeI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CAATTG :c2: "5" :name: MfeI :blunt: "0" :c3: "0" HpyCH4V: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: TGCA :c2: "2" :name: HpyCH4V :blunt: "1" :c3: "0" AvaIII: :len: "6" :c1: "0" :c4: "0" :ncuts: "0" :pattern: atgcat :c2: "0" :name: AvaIII :blunt: "0" :c3: "0" RcaI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TCATGA :c2: "5" :name: RcaI :blunt: "0" :c3: "0" PsiI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TTATAA :c2: "3" :name: PsiI :blunt: "1" :c3: "0" Hsp92II: :len: "4" :c1: "4" :c4: "0" :ncuts: "2" :pattern: CATG :c2: "-1" :name: Hsp92II :blunt: "0" :c3: "0" Alw21I: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GWGCWC :c2: "1" :name: Alw21I :blunt: "0" :c3: "0" BstENI: :len: "11" :c1: "5" :c4: "0" :ncuts: "2" :pattern: CCTNNNNNAGG :c2: "6" :name: BstENI :blunt: "0" :c3: "0" BstAPI: :len: "11" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GCANNNNNTGC :c2: "4" :name: BstAPI :blunt: "0" :c3: "0" SbfI: :len: "8" :c1: "6" :c4: "0" :ncuts: "2" :pattern: CCTGCAGG :c2: "2" :name: SbfI :blunt: "0" :c3: "0" MaeII: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ACGT :c2: "3" :name: MaeII :blunt: "0" :c3: "0" HapII: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCGG :c2: "3" :name: HapII :blunt: "0" :c3: "0" BpuAI: :len: "6" :c1: "8" :c4: "0" :ncuts: "2" :pattern: GAAGAC :c2: "12" :name: BpuAI :blunt: "0" :c3: "0" DdeI: :len: "5" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTNAG :c2: "4" :name: DdeI :blunt: "0" :c3: "0" Ama87I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CYCGRG :c2: "5" :name: Ama87I :blunt: "0" :c3: "0" AbsI: :len: "8" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCTCGAGG :c2: "6" :name: AbsI :blunt: "0" :c3: "0" SseBI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: AGGCCT :c2: "3" :name: SseBI :blunt: "1" :c3: "0" SlaI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTCGAG :c2: "5" :name: SlaI :blunt: "0" :c3: "0" SgrDI: :len: "8" :c1: "0" :c4: "0" :ncuts: "0" :pattern: cgtcgacg :c2: "0" :name: SgrDI :blunt: "0" :c3: "0" Hpy188I: :len: "5" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TCNGA :c2: "2" :name: Hpy188I :blunt: "0" :c3: "0" Hin4I: :len: "11" :c1: "-9" :c4: "19" :ncuts: "4" :pattern: GAYNNNNNVTC :c2: "-14" :name: Hin4I :blunt: "0" :c3: "24" EcoT22I: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: ATGCAT :c2: "1" :name: EcoT22I :blunt: "0" :c3: "0" BseAI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TCCGGA :c2: "5" :name: BseAI :blunt: "0" :c3: "0" Alw26I: :len: "5" :c1: "6" :c4: "0" :ncuts: "2" :pattern: GTCTC :c2: "10" :name: Alw26I :blunt: "0" :c3: "0" BstAUI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TGTACA :c2: "5" :name: BstAUI :blunt: "0" :c3: "0" Bsp143II: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: RGCGCY :c2: "1" :name: Bsp143II :blunt: "0" :c3: "0" Bpu14I: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: TTCGAA :c2: "4" :name: Bpu14I :blunt: "0" :c3: "0" BmrI: :len: "6" :c1: "11" :c4: "0" :ncuts: "2" :pattern: ACTGGG :c2: "10" :name: BmrI :blunt: "0" :c3: "0" BspNCI: :len: "5" :c1: "0" :c4: "0" :ncuts: "0" :pattern: ccaga :c2: "0" :name: BspNCI :blunt: "0" :c3: "0" BamHI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGATCC :c2: "5" :name: BamHI :blunt: "0" :c3: "0" SfiI: :len: "13" :c1: "8" :c4: "0" :ncuts: "2" :pattern: GGCCNNNNNGGCC :c2: "5" :name: SfiI :blunt: "0" :c3: "0" Psp1406I: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: AACGTT :c2: "4" :name: Psp1406I :blunt: "0" :c3: "0" NdeII: :len: "4" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: GATC :c2: "4" :name: NdeII :blunt: "0" :c3: "0" BstX2I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: RGATCY :c2: "5" :name: BstX2I :blunt: "0" :c3: "0" XceI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: RCATGY :c2: "1" :name: XceI :blunt: "0" :c3: "0" PssI: :len: "7" :c1: "5" :c4: "0" :ncuts: "2" :pattern: rggnccy :c2: "2" :name: PssI :blunt: "0" :c3: "0" Fsp4HI: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GCNGC :c2: "3" :name: Fsp4HI :blunt: "0" :c3: "0" ApeKI: :len: "5" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GCWGC :c2: "4" :name: ApeKI :blunt: "0" :c3: "0" BscGI: :len: "5" :c1: "0" :c4: "0" :ncuts: "0" :pattern: cccgt :c2: "0" :name: BscGI :blunt: "0" :c3: "0" BsaHI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GRCGYC :c2: "4" :name: BsaHI :blunt: "0" :c3: "0" BbeI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GGCGCC :c2: "1" :name: BbeI :blunt: "0" :c3: "0" Sth132I: :len: "4" :c1: "8" :c4: "0" :ncuts: "2" :pattern: cccg :c2: "12" :name: Sth132I :blunt: "0" :c3: "0" PvuII: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: CAGCTG :c2: "3" :name: PvuII :blunt: "1" :c3: "0" Hpy99I: :len: "5" :c1: "5" :c4: "0" :ncuts: "2" :pattern: CGWCG :c2: "-1" :name: Hpy99I :blunt: "0" :c3: "0" Fnu4HI: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GCNGC :c2: "3" :name: Fnu4HI :blunt: "0" :c3: "0" BspXI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: ATCGAT :c2: "4" :name: BspXI :blunt: "0" :c3: "0" BsmBI: :len: "6" :c1: "7" :c4: "0" :ncuts: "2" :pattern: CGTCTC :c2: "11" :name: BsmBI :blunt: "0" :c3: "0" MaeIII: :len: "5" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: GTNAC :c2: "5" :name: MaeIII :blunt: "0" :c3: "0" HhaI: :len: "4" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GCGC :c2: "1" :name: HhaI :blunt: "0" :c3: "0" Cfr9I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCCGGG :c2: "5" :name: Cfr9I :blunt: "0" :c3: "0" TauI: :len: "5" :c1: "4" :c4: "0" :ncuts: "2" :pattern: GCSGC :c2: "1" :name: TauI :blunt: "0" :c3: "0" Cfr13I: :len: "5" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGNCC :c2: "4" :name: Cfr13I :blunt: "0" :c3: "0" BsaMI: :len: "6" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GAATGC :c2: "5" :name: BsaMI :blunt: "0" :c3: "0" BpcI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTRYAG :c2: "5" :name: BpcI :blunt: "0" :c3: "0" NotI: :len: "8" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GCGGCCGC :c2: "6" :name: NotI :blunt: "0" :c3: "0" SacII: :len: "6" :c1: "4" :c4: "0" :ncuts: "2" :pattern: CCGCGG :c2: "2" :name: SacII :blunt: "0" :c3: "0" PdmI: :len: "10" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GAANNNNTTC :c2: "5" :name: PdmI :blunt: "1" :c3: "0" CstMI: :len: "6" :c1: "26" :c4: "0" :ncuts: "2" :pattern: aaggag :c2: "24" :name: CstMI :blunt: "0" :c3: "0" CjePI: :len: "12" :c1: "-8" :c4: "20" :ncuts: "4" :pattern: ccannnnnnntc :c2: "-14" :name: CjePI :blunt: "0" :c3: "26" DpnI: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GATC :c2: "2" :name: DpnI :blunt: "1" :c3: "0" XapI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: RAATTY :c2: "5" :name: XapI :blunt: "0" :c3: "0" NheI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GCTAGC :c2: "5" :name: NheI :blunt: "0" :c3: "0" BsiHKCI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CYCGRG :c2: "5" :name: BsiHKCI :blunt: "0" :c3: "0" BsePI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GCGCGC :c2: "5" :name: BsePI :blunt: "0" :c3: "0" BveI: :len: "6" :c1: "10" :c4: "0" :ncuts: "2" :pattern: ACCTGC :c2: "14" :name: BveI :blunt: "0" :c3: "0" BfmI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTRYAG :c2: "5" :name: BfmI :blunt: "0" :c3: "0" DraII: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: RGGNCCY :c2: "5" :name: DraII :blunt: "0" :c3: "0" SacI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GAGCTC :c2: "1" :name: SacI :blunt: "0" :c3: "0" AclWI: :len: "5" :c1: "9" :c4: "0" :ncuts: "2" :pattern: GGATC :c2: "10" :name: AclWI :blunt: "0" :c3: "0" AcoI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: YGGCCR :c2: "5" :name: AcoI :blunt: "0" :c3: "0" Bso31I: :len: "6" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GGTCTC :c2: "11" :name: Bso31I :blunt: "0" :c3: "0" KspI: :len: "6" :c1: "4" :c4: "0" :ncuts: "2" :pattern: CCGCGG :c2: "2" :name: KspI :blunt: "0" :c3: "0" BfrI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTTAAG :c2: "5" :name: BfrI :blunt: "0" :c3: "0" FalI: :len: "11" :c1: "-9" :c4: "19" :ncuts: "4" :pattern: AAGNNNNNCTT :c2: "-14" :name: FalI :blunt: "0" :c3: "24" BcefI: :len: "5" :c1: "17" :c4: "0" :ncuts: "2" :pattern: acggc :c2: "18" :name: BcefI :blunt: "0" :c3: "0" Mly113I: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GGCGCC :c2: "4" :name: Mly113I :blunt: "0" :c3: "0" HpyCH4IV: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ACGT :c2: "3" :name: HpyCH4IV :blunt: "0" :c3: "0" FspBI: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTAG :c2: "3" :name: FspBI :blunt: "0" :c3: "0" BspT104I: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: TTCGAA :c2: "4" :name: BspT104I :blunt: "0" :c3: "0" BssNI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GRCGYC :c2: "4" :name: BssNI :blunt: "0" :c3: "0" Bst6I: :len: "6" :c1: "7" :c4: "0" :ncuts: "2" :pattern: CTCTTC :c2: "10" :name: Bst6I :blunt: "0" :c3: "0" BsiSI: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCGG :c2: "3" :name: BsiSI :blunt: "0" :c3: "0" BsaWI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: WCCGGW :c2: "5" :name: BsaWI :blunt: "0" :c3: "0" BpmI: :len: "6" :c1: "22" :c4: "0" :ncuts: "2" :pattern: CTGGAG :c2: "20" :name: BpmI :blunt: "0" :c3: "0" BanIII: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: ATCGAT :c2: "4" :name: BanIII :blunt: "0" :c3: "0" AsuC2I: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCSGG :c2: "3" :name: AsuC2I :blunt: "0" :c3: "0" CspI: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CGGWCCG :c2: "5" :name: CspI :blunt: "0" :c3: "0" Bsa29I: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: ATCGAT :c2: "4" :name: Bsa29I :blunt: "0" :c3: "0" AccII: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CGCG :c2: "2" :name: AccII :blunt: "1" :c3: "0" Sth302II: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: ccgg :c2: "2" :name: Sth302II :blunt: "1" :c3: "0" Hpy188III: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: TCNNGA :c2: "4" :name: Hpy188III :blunt: "0" :c3: "0" FaqI: :len: "5" :c1: "15" :c4: "0" :ncuts: "2" :pattern: GGGAC :c2: "19" :name: FaqI :blunt: "0" :c3: "0" EcoT14I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCWWGG :c2: "5" :name: EcoT14I :blunt: "0" :c3: "0" Acc36I: :len: "6" :c1: "10" :c4: "0" :ncuts: "2" :pattern: ACCTGC :c2: "14" :name: Acc36I :blunt: "0" :c3: "0" MseI: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TTAA :c2: "3" :name: MseI :blunt: "0" :c3: "0" CviQI: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GTAC :c2: "3" :name: CviQI :blunt: "0" :c3: "0" BsuRI: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GGCC :c2: "2" :name: BsuRI :blunt: "1" :c3: "0" BssT1I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCWWGG :c2: "5" :name: BssT1I :blunt: "0" :c3: "0" BssSI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CACGAG :c2: "5" :name: BssSI :blunt: "0" :c3: "0" ClaI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: ATCGAT :c2: "4" :name: ClaI :blunt: "0" :c3: "0" BanII: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GRGCYC :c2: "1" :name: BanII :blunt: "0" :c3: "0" PceI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: AGGCCT :c2: "3" :name: PceI :blunt: "1" :c3: "0" HspAI: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GCGC :c2: "3" :name: HspAI :blunt: "0" :c3: "0" Csp45I: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: TTCGAA :c2: "4" :name: Csp45I :blunt: "0" :c3: "0" AflIII: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ACRYGT :c2: "5" :name: AflIII :blunt: "0" :c3: "0" AcyI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GRCGYC :c2: "4" :name: AcyI :blunt: "0" :c3: "0" BceAI: :len: "5" :c1: "17" :c4: "0" :ncuts: "2" :pattern: ACGGC :c2: "19" :name: BceAI :blunt: "0" :c3: "0" Ple19I: :len: "6" :c1: "4" :c4: "0" :ncuts: "2" :pattern: CGATCG :c2: "2" :name: Ple19I :blunt: "0" :c3: "0" McrI: :len: "6" :c1: "4" :c4: "0" :ncuts: "2" :pattern: cgrycg :c2: "2" :name: McrI :blunt: "0" :c3: "0" BshFI: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GGCC :c2: "2" :name: BshFI :blunt: "1" :c3: "0" BglII: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: AGATCT :c2: "5" :name: BglII :blunt: "0" :c3: "0" EcoT38I: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GRGCYC :c2: "1" :name: EcoT38I :blunt: "0" :c3: "0" DraIII: :len: "9" :c1: "6" :c4: "0" :ncuts: "2" :pattern: CACNNNGTG :c2: "3" :name: DraIII :blunt: "0" :c3: "0" UbaF12I: :len: "10" :c1: "0" :c4: "0" :ncuts: "0" :pattern: ctacnnngtc :c2: "0" :name: UbaF12I :blunt: "0" :c3: "0" SmlI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTYRAG :c2: "5" :name: SmlI :blunt: "0" :c3: "0" SinI: :len: "5" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGWCC :c2: "4" :name: SinI :blunt: "0" :c3: "0" BalI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TGGCCA :c2: "3" :name: BalI :blunt: "1" :c3: "0" AhdI: :len: "11" :c1: "6" :c4: "0" :ncuts: "2" :pattern: GACNNNNNGTC :c2: "5" :name: AhdI :blunt: "0" :c3: "0" AfeI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: AGCGCT :c2: "3" :name: AfeI :blunt: "1" :c3: "0" DinI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GGCGCC :c2: "3" :name: DinI :blunt: "1" :c3: "0" SsiI: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCGC :c2: "3" :name: SsiI :blunt: "0" :c3: "0" PmeI: :len: "8" :c1: "4" :c4: "0" :ncuts: "2" :pattern: GTTTAAAC :c2: "4" :name: PmeI :blunt: "1" :c3: "0" NaeI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GCCGGC :c2: "3" :name: NaeI :blunt: "1" :c3: "0" ItaI: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GCNGC :c2: "3" :name: ItaI :blunt: "0" :c3: "0" FmuI: :len: "5" :c1: "4" :c4: "0" :ncuts: "2" :pattern: ggncc :c2: "1" :name: FmuI :blunt: "0" :c3: "0" AccB7I: :len: "11" :c1: "7" :c4: "0" :ncuts: "2" :pattern: CCANNNNNTGG :c2: "4" :name: AccB7I :blunt: "0" :c3: "0" Vha464I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTTAAG :c2: "5" :name: Vha464I :blunt: "0" :c3: "0" MunI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CAATTG :c2: "5" :name: MunI :blunt: "0" :c3: "0" HpyCH4III: :len: "5" :c1: "3" :c4: "0" :ncuts: "2" :pattern: ACNGT :c2: "2" :name: HpyCH4III :blunt: "0" :c3: "0" GlaI: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GCGC :c2: "2" :name: GlaI :blunt: "1" :c3: "0" Bsh1236I: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CGCG :c2: "2" :name: Bsh1236I :blunt: "1" :c3: "0" BstMCI: :len: "6" :c1: "4" :c4: "0" :ncuts: "2" :pattern: CGRYCG :c2: "2" :name: BstMCI :blunt: "0" :c3: "0" BsrFI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: RCCGGY :c2: "5" :name: BsrFI :blunt: "0" :c3: "0" BspGI: :len: "6" :c1: "0" :c4: "0" :ncuts: "0" :pattern: ctggac :c2: "0" :name: BspGI :blunt: "0" :c3: "0" Tsp45I: :len: "5" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: GTSAC :c2: "5" :name: Tsp45I :blunt: "0" :c3: "0" KpnI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GGTACC :c2: "1" :name: KpnI :blunt: "0" :c3: "0" GsuI: :len: "6" :c1: "22" :c4: "0" :ncuts: "2" :pattern: CTGGAG :c2: "20" :name: GsuI :blunt: "0" :c3: "0" Bsp13I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TCCGGA :c2: "5" :name: Bsp13I :blunt: "0" :c3: "0" Esp3I: :len: "6" :c1: "7" :c4: "0" :ncuts: "2" :pattern: CGTCTC :c2: "11" :name: Esp3I :blunt: "0" :c3: "0" Pfl23II: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CGTACG :c2: "5" :name: Pfl23II :blunt: "0" :c3: "0" NciI: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCSGG :c2: "3" :name: NciI :blunt: "0" :c3: "0" MstI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: tgcgca :c2: "3" :name: MstI :blunt: "1" :c3: "0" HgiCI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ggyrcc :c2: "5" :name: HgiCI :blunt: "0" :c3: "0" BspLI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GGNNCC :c2: "3" :name: BspLI :blunt: "1" :c3: "0" DrdII: :len: "6" :c1: "0" :c4: "0" :ncuts: "0" :pattern: gaacca :c2: "0" :name: DrdII :blunt: "0" :c3: "0" Eco52I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CGGCCG :c2: "5" :name: Eco52I :blunt: "0" :c3: "0" Ksp632I: :len: "6" :c1: "7" :c4: "0" :ncuts: "2" :pattern: CTCTTC :c2: "10" :name: Ksp632I :blunt: "0" :c3: "0" BmcAI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: AGTACT :c2: "3" :name: BmcAI :blunt: "1" :c3: "0" BbvCI: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCTCAGC :c2: "5" :name: BbvCI :blunt: "0" :c3: "0" Tth111II: :len: "6" :c1: "17" :c4: "0" :ncuts: "2" :pattern: caarca :c2: "15" :name: Tth111II :blunt: "0" :c3: "0" TaiI: :len: "4" :c1: "4" :c4: "0" :ncuts: "2" :pattern: ACGT :c2: "-1" :name: TaiI :blunt: "0" :c3: "0" Sse8387I: :len: "8" :c1: "6" :c4: "0" :ncuts: "2" :pattern: CCTGCAGG :c2: "2" :name: Sse8387I :blunt: "0" :c3: "0" SgrBI: :len: "6" :c1: "4" :c4: "0" :ncuts: "2" :pattern: CCGCGG :c2: "2" :name: SgrBI :blunt: "0" :c3: "0" RsrII: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CGGWCCG :c2: "5" :name: RsrII :blunt: "0" :c3: "0" PctI: :len: "6" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GAATGC :c2: "5" :name: PctI :blunt: "0" :c3: "0" PauI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GCGCGC :c2: "5" :name: PauI :blunt: "0" :c3: "0" BetI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: wccggw :c2: "5" :name: BetI :blunt: "0" :c3: "0" BcuI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ACTAGT :c2: "5" :name: BcuI :blunt: "0" :c3: "0" BsaAI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: YACGTR :c2: "3" :name: BsaAI :blunt: "1" :c3: "0" McaTI: :len: "6" :c1: "4" :c4: "0" :ncuts: "2" :pattern: gcgcgc :c2: "2" :name: McaTI :blunt: "0" :c3: "0" Eco57I: :len: "6" :c1: "22" :c4: "0" :ncuts: "2" :pattern: CTGAAG :c2: "20" :name: Eco57I :blunt: "0" :c3: "0" BstOI: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCWGG :c2: "3" :name: BstOI :blunt: "0" :c3: "0" BspQI: :len: "7" :c1: "8" :c4: "0" :ncuts: "2" :pattern: GCTCTTC :c2: "11" :name: BspQI :blunt: "0" :c3: "0" BsmI: :len: "6" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GAATGC :c2: "5" :name: BsmI :blunt: "0" :c3: "0" DraI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TTTAAA :c2: "3" :name: DraI :blunt: "1" :c3: "0" BstV1I: :len: "5" :c1: "13" :c4: "0" :ncuts: "2" :pattern: GCAGC :c2: "17" :name: BstV1I :blunt: "0" :c3: "0" BtgZI: :len: "6" :c1: "16" :c4: "0" :ncuts: "2" :pattern: GCGATG :c2: "20" :name: BtgZI :blunt: "0" :c3: "0" CspCI: :len: "12" :c1: "-12" :c4: "22" :ncuts: "4" :pattern: CAANNNNNGTGG :c2: "-14" :name: CspCI :blunt: "0" :c3: "24" MhlI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GDGCHC :c2: "1" :name: MhlI :blunt: "0" :c3: "0" MboI: :len: "4" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: GATC :c2: "4" :name: MboI :blunt: "0" :c3: "0" HinfI: :len: "5" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GANTC :c2: "4" :name: HinfI :blunt: "0" :c3: "0" Eam1104I: :len: "6" :c1: "7" :c4: "0" :ncuts: "2" :pattern: CTCTTC :c2: "10" :name: Eam1104I :blunt: "0" :c3: "0" BseDI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCNNGG :c2: "5" :name: BseDI :blunt: "0" :c3: "0" BmuI: :len: "6" :c1: "11" :c4: "0" :ncuts: "2" :pattern: ACTGGG :c2: "10" :name: BmuI :blunt: "0" :c3: "0" ApoI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: RAATTY :c2: "5" :name: ApoI :blunt: "0" :c3: "0" BfaI: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTAG :c2: "3" :name: BfaI :blunt: "0" :c3: "0" TseI: :len: "5" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GCWGC :c2: "4" :name: TseI :blunt: "0" :c3: "0" BsrI: :len: "5" :c1: "6" :c4: "0" :ncuts: "2" :pattern: ACTGG :c2: "4" :name: BsrI :blunt: "0" :c3: "0" VspI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: ATTAAT :c2: "4" :name: VspI :blunt: "0" :c3: "0" RsaI: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GTAC :c2: "2" :name: RsaI :blunt: "1" :c3: "0" PpuMI: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: RGGWCCY :c2: "5" :name: PpuMI :blunt: "0" :c3: "0" PfeI: :len: "5" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GAWTC :c2: "4" :name: PfeI :blunt: "0" :c3: "0" AccI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GTMKAC :c2: "4" :name: AccI :blunt: "0" :c3: "0" BmgT120I: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GGNCC :c2: "3" :name: BmgT120I :blunt: "0" :c3: "0" TasI: :len: "4" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: AATT :c2: "4" :name: TasI :blunt: "0" :c3: "0" SrfI: :len: "8" :c1: "4" :c4: "0" :ncuts: "2" :pattern: GCCCGGGC :c2: "4" :name: SrfI :blunt: "1" :c3: "0" LweI: :len: "5" :c1: "10" :c4: "0" :ncuts: "2" :pattern: GCATC :c2: "14" :name: LweI :blunt: "0" :c3: "0" BsuTUI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: ATCGAT :c2: "4" :name: BsuTUI :blunt: "0" :c3: "0" AsiSI: :len: "8" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GCGATCGC :c2: "3" :name: AsiSI :blunt: "0" :c3: "0" NspI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: RCATGY :c2: "1" :name: NspI :blunt: "0" :c3: "0" BstMWI: :len: "11" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GCNNNNNNNGC :c2: "4" :name: BstMWI :blunt: "0" :c3: "0" BstYI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: RGATCY :c2: "5" :name: BstYI :blunt: "0" :c3: "0" SplI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: cgtacg :c2: "5" :name: SplI :blunt: "0" :c3: "0" MabI: :len: "7" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ACCWGGT :c2: "6" :name: MabI :blunt: "0" :c3: "0" FaeI: :len: "4" :c1: "4" :c4: "0" :ncuts: "2" :pattern: CATG :c2: "-1" :name: FaeI :blunt: "0" :c3: "0" XcmI: :len: "15" :c1: "8" :c4: "0" :ncuts: "2" :pattern: CCANNNNNNNNNTGG :c2: "7" :name: XcmI :blunt: "0" :c3: "0" TsoI: :len: "6" :c1: "17" :c4: "0" :ncuts: "2" :pattern: TARCCA :c2: "15" :name: TsoI :blunt: "0" :c3: "0" NdeI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CATATG :c2: "4" :name: NdeI :blunt: "0" :c3: "0" BsiHKAI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GWGCWC :c2: "1" :name: BsiHKAI :blunt: "0" :c3: "0" BseNI: :len: "5" :c1: "6" :c4: "0" :ncuts: "2" :pattern: ACTGG :c2: "4" :name: BseNI :blunt: "0" :c3: "0" VneI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GTGCAC :c2: "5" :name: VneI :blunt: "0" :c3: "0" TspGWI: :len: "5" :c1: "16" :c4: "0" :ncuts: "2" :pattern: ACGGA :c2: "14" :name: TspGWI :blunt: "0" :c3: "0" HaeII: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: RGCGCY :c2: "1" :name: HaeII :blunt: "0" :c3: "0" EcoHI: :len: "5" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: ccsgg :c2: "5" :name: EcoHI :blunt: "0" :c3: "0" Bsh1285I: :len: "6" :c1: "4" :c4: "0" :ncuts: "2" :pattern: CGRYCG :c2: "2" :name: Bsh1285I :blunt: "0" :c3: "0" Tsp509I: :len: "4" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: AATT :c2: "4" :name: Tsp509I :blunt: "0" :c3: "0" PfoI: :len: "7" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TCCNGGA :c2: "6" :name: PfoI :blunt: "0" :c3: "0" AseI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: ATTAAT :c2: "4" :name: AseI :blunt: "0" :c3: "0" Bsp1286I: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GDGCHC :c2: "1" :name: Bsp1286I :blunt: "0" :c3: "0" Bsp24I: :len: "12" :c1: "-9" :c4: "19" :ncuts: "4" :pattern: gacnnnnnntgg :c2: "-14" :name: Bsp24I :blunt: "0" :c3: "24" TstI: :len: "12" :c1: "-9" :c4: "19" :ncuts: "4" :pattern: CACNNNNNNTCC :c2: "-14" :name: TstI :blunt: "0" :c3: "24" MlyI: :len: "5" :c1: "10" :c4: "0" :ncuts: "2" :pattern: GAGTC :c2: "10" :name: MlyI :blunt: "1" :c3: "0" BseSI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GKGCMC :c2: "1" :name: BseSI :blunt: "0" :c3: "0" CviJI: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: RGCY :c2: "2" :name: CviJI :blunt: "1" :c3: "0" Psp03I: :len: "5" :c1: "4" :c4: "0" :ncuts: "2" :pattern: ggwcc :c2: "1" :name: Psp03I :blunt: "0" :c3: "0" NlaIV: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GGNNCC :c2: "3" :name: NlaIV :blunt: "1" :c3: "0" AasI: :len: "12" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GACNNNNNNGTC :c2: "5" :name: AasI :blunt: "0" :c3: "0" EcoO65I: :len: "7" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGTNACC :c2: "6" :name: EcoO65I :blunt: "0" :c3: "0" Sfr303I: :len: "6" :c1: "4" :c4: "0" :ncuts: "2" :pattern: CCGCGG :c2: "2" :name: Sfr303I :blunt: "0" :c3: "0" MalI: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GATC :c2: "2" :name: MalI :blunt: "1" :c3: "0" BfuI: :len: "6" :c1: "12" :c4: "0" :ncuts: "2" :pattern: GTATCC :c2: "11" :name: BfuI :blunt: "0" :c3: "0" Eco47III: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: AGCGCT :c2: "3" :name: Eco47III :blunt: "1" :c3: "0" Bse3DI: :len: "6" :c1: "8" :c4: "0" :ncuts: "2" :pattern: GCAATG :c2: "6" :name: Bse3DI :blunt: "0" :c3: "0" Psp124BI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GAGCTC :c2: "1" :name: Psp124BI :blunt: "0" :c3: "0" PaeR7I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTCGAG :c2: "5" :name: PaeR7I :blunt: "0" :c3: "0" MscI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TGGCCA :c2: "3" :name: MscI :blunt: "1" :c3: "0" ChaI: :len: "4" :c1: "4" :c4: "0" :ncuts: "2" :pattern: gatc :c2: "-1" :name: ChaI :blunt: "0" :c3: "0" BstDSI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCRYGG :c2: "5" :name: BstDSI :blunt: "0" :c3: "0" Bse118I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: RCCGGY :c2: "5" :name: Bse118I :blunt: "0" :c3: "0" BseXI: :len: "5" :c1: "13" :c4: "0" :ncuts: "2" :pattern: GCAGC :c2: "17" :name: BseXI :blunt: "0" :c3: "0" BspT107I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGYRCC :c2: "5" :name: BspT107I :blunt: "0" :c3: "0" MspA1I: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: CMGCKG :c2: "3" :name: MspA1I :blunt: "1" :c3: "0" HindII: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GTYRAC :c2: "3" :name: HindII :blunt: "1" :c3: "0" EcoRI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GAATTC :c2: "5" :name: EcoRI :blunt: "0" :c3: "0" Asp718I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGTACC :c2: "5" :name: Asp718I :blunt: "0" :c3: "0" XhoII: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: RGATCY :c2: "5" :name: XhoII :blunt: "0" :c3: "0" Van91I: :len: "11" :c1: "7" :c4: "0" :ncuts: "2" :pattern: CCANNNNNTGG :c2: "4" :name: Van91I :blunt: "0" :c3: "0" StyI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCWWGG :c2: "5" :name: StyI :blunt: "0" :c3: "0" BspANI: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GGCC :c2: "2" :name: BspANI :blunt: "1" :c3: "0" BaeI: :len: "11" :c1: "-11" :c4: "18" :ncuts: "4" :pattern: ACNNNNGTAYC :c2: "-16" :name: BaeI :blunt: "0" :c3: "23" FatI: :len: "4" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: CATG :c2: "4" :name: FatI :blunt: "0" :c3: "0" PspXI: :len: "8" :c1: "2" :c4: "0" :ncuts: "2" :pattern: VCTCGAGB :c2: "6" :name: PspXI :blunt: "0" :c3: "0" Ppu10I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: atgcat :c2: "5" :name: Ppu10I :blunt: "0" :c3: "0" BtsI: :len: "6" :c1: "8" :c4: "0" :ncuts: "2" :pattern: GCAGTG :c2: "6" :name: BtsI :blunt: "0" :c3: "0" CjeI: :len: "11" :c1: "-9" :c4: "20" :ncuts: "4" :pattern: ccannnnnngt :c2: "-15" :name: CjeI :blunt: "0" :c3: "26" Sau96I: :len: "5" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGNCC :c2: "4" :name: Sau96I :blunt: "0" :c3: "0" SapI: :len: "7" :c1: "8" :c4: "0" :ncuts: "2" :pattern: GCTCTTC :c2: "11" :name: SapI :blunt: "0" :c3: "0" HgaI: :len: "5" :c1: "10" :c4: "0" :ncuts: "2" :pattern: GACGC :c2: "15" :name: HgaI :blunt: "0" :c3: "0" BtsCI: :len: "5" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GGATG :c2: "5" :name: BtsCI :blunt: "0" :c3: "0" CviAII: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CATG :c2: "3" :name: CviAII :blunt: "0" :c3: "0" BstNSI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: RCATGY :c2: "1" :name: BstNSI :blunt: "0" :c3: "0" Tru1I: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TTAA :c2: "3" :name: Tru1I :blunt: "0" :c3: "0" NlaIII: :len: "4" :c1: "4" :c4: "0" :ncuts: "2" :pattern: CATG :c2: "-1" :name: NlaIII :blunt: "0" :c3: "0" BsaI: :len: "6" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GGTCTC :c2: "11" :name: BsaI :blunt: "0" :c3: "0" Bsc4I: :len: "11" :c1: "7" :c4: "0" :ncuts: "2" :pattern: CCNNNNNNNGG :c2: "4" :name: Bsc4I :blunt: "0" :c3: "0" FbaI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TGATCA :c2: "5" :name: FbaI :blunt: "0" :c3: "0" VpaK11BI: :len: "5" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGWCC :c2: "4" :name: VpaK11BI :blunt: "0" :c3: "0" PinAI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ACCGGT :c2: "5" :name: PinAI :blunt: "0" :c3: "0" NspV: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: TTCGAA :c2: "4" :name: NspV :blunt: "0" :c3: "0" FspI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TGCGCA :c2: "3" :name: FspI :blunt: "1" :c3: "0" BstMAI: :len: "5" :c1: "6" :c4: "0" :ncuts: "2" :pattern: GTCTC :c2: "10" :name: BstMAI :blunt: "0" :c3: "0" Eco57MI: :len: "6" :c1: "22" :c4: "0" :ncuts: "2" :pattern: CTGRAG :c2: "20" :name: Eco57MI :blunt: "0" :c3: "0" AccIII: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TCCGGA :c2: "5" :name: AccIII :blunt: "0" :c3: "0" BsrDI: :len: "6" :c1: "8" :c4: "0" :ncuts: "2" :pattern: GCAATG :c2: "6" :name: BsrDI :blunt: "0" :c3: "0" BspEI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TCCGGA :c2: "5" :name: BspEI :blunt: "0" :c3: "0" ZrmI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: AGTACT :c2: "3" :name: ZrmI :blunt: "1" :c3: "0" Sse9I: :len: "4" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: AATT :c2: "4" :name: Sse9I :blunt: "0" :c3: "0" SmoI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTYRAG :c2: "5" :name: SmoI :blunt: "0" :c3: "0" SauI: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: cctnagg :c2: "5" :name: SauI :blunt: "0" :c3: "0" AleI: :len: "10" :c1: "5" :c4: "0" :ncuts: "2" :pattern: CACNNNNGTG :c2: "5" :name: AleI :blunt: "1" :c3: "0" BcnI: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCSGG :c2: "3" :name: BcnI :blunt: "0" :c3: "0" SstII: :len: "6" :c1: "4" :c4: "0" :ncuts: "2" :pattern: CCGCGG :c2: "2" :name: SstII :blunt: "0" :c3: "0" HgiEII: :len: "12" :c1: "0" :c4: "0" :ncuts: "0" :pattern: accnnnnnnggt :c2: "0" :name: HgiEII :blunt: "0" :c3: "0" HgiAI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: gwgcwc :c2: "1" :name: HgiAI :blunt: "0" :c3: "0" HaeIV: :len: "11" :c1: "-8" :c4: "20" :ncuts: "4" :pattern: gaynnnnnrtc :c2: "-14" :name: HaeIV :blunt: "0" :c3: "25" Bsp1720I: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GCTNAGC :c2: "5" :name: Bsp1720I :blunt: "0" :c3: "0" Eco31I: :len: "6" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GGTCTC :c2: "11" :name: Eco31I :blunt: "0" :c3: "0" BssNAI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GTATAC :c2: "3" :name: BssNAI :blunt: "1" :c3: "0" BshNI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGYRCC :c2: "5" :name: BshNI :blunt: "0" :c3: "0" Cac8I: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GCNNGC :c2: "3" :name: Cac8I :blunt: "1" :c3: "0" Bse8I: :len: "10" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GATNNNNATC :c2: "5" :name: Bse8I :blunt: "1" :c3: "0" BmiI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GGNNCC :c2: "3" :name: BmiI :blunt: "1" :c3: "0" BglI: :len: "11" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GCCNNNNNGGC :c2: "4" :name: BglI :blunt: "0" :c3: "0" UnbI: :len: "5" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: ggncc :c2: "5" :name: UnbI :blunt: "0" :c3: "0" SspD5I: :len: "5" :c1: "13" :c4: "0" :ncuts: "2" :pattern: ggtga :c2: "13" :name: SspD5I :blunt: "1" :c3: "0" SdaI: :len: "8" :c1: "6" :c4: "0" :ncuts: "2" :pattern: CCTGCAGG :c2: "2" :name: SdaI :blunt: "0" :c3: "0" OliI: :len: "10" :c1: "5" :c4: "0" :ncuts: "2" :pattern: CACNNNNGTG :c2: "5" :name: OliI :blunt: "1" :c3: "0" Msp20I: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TGGCCA :c2: "3" :name: Msp20I :blunt: "1" :c3: "0" BstSCI: :len: "5" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: CCNGG :c2: "5" :name: BstSCI :blunt: "0" :c3: "0" BspLU11I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ACATGT :c2: "5" :name: BspLU11I :blunt: "0" :c3: "0" Bme1580I: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GKGCMC :c2: "1" :name: Bme1580I :blunt: "0" :c3: "0" AspLEI: :len: "4" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GCGC :c2: "1" :name: AspLEI :blunt: "0" :c3: "0" Asp700I: :len: "10" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GAANNNNTTC :c2: "5" :name: Asp700I :blunt: "1" :c3: "0" EagI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CGGCCG :c2: "5" :name: EagI :blunt: "0" :c3: "0" Sse232I: :len: "8" :c1: "2" :c4: "0" :ncuts: "2" :pattern: cgccggcg :c2: "6" :name: Sse232I :blunt: "0" :c3: "0" PasI: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCCWGGG :c2: "5" :name: PasI :blunt: "0" :c3: "0" EclHKI: :len: "11" :c1: "6" :c4: "0" :ncuts: "2" :pattern: GACNNNNNGTC :c2: "5" :name: EclHKI :blunt: "0" :c3: "0" AhlI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ACTAGT :c2: "5" :name: AhlI :blunt: "0" :c3: "0" AsiGI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ACCGGT :c2: "5" :name: AsiGI :blunt: "0" :c3: "0" BssHII: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GCGCGC :c2: "5" :name: BssHII :blunt: "0" :c3: "0" Zsp2I: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: ATGCAT :c2: "1" :name: Zsp2I :blunt: "0" :c3: "0" SfeI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ctryag :c2: "5" :name: SfeI :blunt: "0" :c3: "0" Bpu10I: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCTNAGC :c2: "5" :name: Bpu10I :blunt: "0" :c3: "0" CspAI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ACCGGT :c2: "5" :name: CspAI :blunt: "0" :c3: "0" BmgBI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: CACGTC :c2: "3" :name: BmgBI :blunt: "1" :c3: "0" SnaI: :len: "6" :c1: "0" :c4: "0" :ncuts: "0" :pattern: gtatac :c2: "0" :name: SnaI :blunt: "0" :c3: "0" MluNI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TGGCCA :c2: "3" :name: MluNI :blunt: "1" :c3: "0" AloI: :len: "13" :c1: "-8" :c4: "20" :ncuts: "4" :pattern: GAACNNNNNNTCC :c2: "-13" :name: AloI :blunt: "0" :c3: "25" BseBI: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCWGG :c2: "3" :name: BseBI :blunt: "0" :c3: "0" BstZ17I: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GTATAC :c2: "3" :name: BstZ17I :blunt: "1" :c3: "0" StyD4I: :len: "5" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: CCNGG :c2: "5" :name: StyD4I :blunt: "0" :c3: "0" SfaNI: :len: "5" :c1: "10" :c4: "0" :ncuts: "2" :pattern: GCATC :c2: "14" :name: SfaNI :blunt: "0" :c3: "0" PshAI: :len: "10" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GACNNNNGTC :c2: "5" :name: PshAI :blunt: "1" :c3: "0" NarI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GGCGCC :c2: "4" :name: NarI :blunt: "0" :c3: "0" KspAI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GTTAAC :c2: "3" :name: KspAI :blunt: "1" :c3: "0" BseX3I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CGGCCG :c2: "5" :name: BseX3I :blunt: "0" :c3: "0" EcoRV: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GATATC :c2: "3" :name: EcoRV :blunt: "1" :c3: "0" BsrSI: :len: "5" :c1: "6" :c4: "0" :ncuts: "2" :pattern: ACTGG :c2: "4" :name: BsrSI :blunt: "0" :c3: "0" BspTI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTTAAG :c2: "5" :name: BspTI :blunt: "0" :c3: "0" NsiI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: ATGCAT :c2: "1" :name: NsiI :blunt: "0" :c3: "0" BciVI: :len: "6" :c1: "12" :c4: "0" :ncuts: "2" :pattern: GTATCC :c2: "11" :name: BciVI :blunt: "0" :c3: "0" AjuI: :len: "14" :c1: "-8" :c4: "20" :ncuts: "4" :pattern: GAANNNNNNNTTGG :c2: "-13" :name: AjuI :blunt: "0" :c3: "25" CelII: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GCTNAGC :c2: "5" :name: CelII :blunt: "0" :c3: "0" DrdI: :len: "12" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GACNNNNNNGTC :c2: "5" :name: DrdI :blunt: "0" :c3: "0" XmaI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCCGGG :c2: "5" :name: XmaI :blunt: "0" :c3: "0" XagI: :len: "11" :c1: "5" :c4: "0" :ncuts: "2" :pattern: CCTNNNNNAGG :c2: "6" :name: XagI :blunt: "0" :c3: "0" TaqI: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TCGA :c2: "3" :name: TaqI :blunt: "0" :c3: "0" SpeI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ACTAGT :c2: "5" :name: SpeI :blunt: "0" :c3: "0" PstI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: CTGCAG :c2: "1" :name: PstI :blunt: "0" :c3: "0" MnlI: :len: "4" :c1: "11" :c4: "0" :ncuts: "2" :pattern: CCTC :c2: "10" :name: MnlI :blunt: "0" :c3: "0" BsiEI: :len: "6" :c1: "4" :c4: "0" :ncuts: "2" :pattern: CGRYCG :c2: "2" :name: BsiEI :blunt: "0" :c3: "0" BseGI: :len: "5" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GGATG :c2: "5" :name: BseGI :blunt: "0" :c3: "0" Ppu21I: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: YACGTR :c2: "3" :name: Ppu21I :blunt: "1" :c3: "0" BsoBI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CYCGRG :c2: "5" :name: BsoBI :blunt: "0" :c3: "0" Bsp1407I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TGTACA :c2: "5" :name: Bsp1407I :blunt: "0" :c3: "0" SfoI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GGCGCC :c2: "3" :name: SfoI :blunt: "1" :c3: "0" PflMI: :len: "11" :c1: "7" :c4: "0" :ncuts: "2" :pattern: CCANNNNNTGG :c2: "4" :name: PflMI :blunt: "0" :c3: "0" PdiI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GCCGGC :c2: "3" :name: PdiI :blunt: "1" :c3: "0" Hpy178III: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: tcnnga :c2: "4" :name: Hpy178III :blunt: "0" :c3: "0" AspS9I: :len: "5" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGNCC :c2: "4" :name: AspS9I :blunt: "0" :c3: "0" DriI: :len: "11" :c1: "6" :c4: "0" :ncuts: "2" :pattern: GACNNNNNGTC :c2: "5" :name: DriI :blunt: "0" :c3: "0" AviII: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TGCGCA :c2: "3" :name: AviII :blunt: "1" :c3: "0" PsyI: :len: "9" :c1: "4" :c4: "0" :ncuts: "2" :pattern: GACNNNGTC :c2: "5" :name: PsyI :blunt: "0" :c3: "0" PleI: :len: "5" :c1: "9" :c4: "0" :ncuts: "2" :pattern: GAGTC :c2: "10" :name: PleI :blunt: "0" :c3: "0" MroI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TCCGGA :c2: "5" :name: MroI :blunt: "0" :c3: "0" BlfI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TCCGGA :c2: "5" :name: BlfI :blunt: "0" :c3: "0" BfiI: :len: "6" :c1: "11" :c4: "0" :ncuts: "2" :pattern: ACTGGG :c2: "10" :name: BfiI :blunt: "0" :c3: "0" BmeT110I: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CYCGRG :c2: "4" :name: BmeT110I :blunt: "0" :c3: "0" BseLI: :len: "11" :c1: "7" :c4: "0" :ncuts: "2" :pattern: CCNNNNNNNGG :c2: "4" :name: BseLI :blunt: "0" :c3: "0" SnaBI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TACGTA :c2: "3" :name: SnaBI :blunt: "1" :c3: "0" PspGI: :len: "5" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: CCWGG :c2: "5" :name: PspGI :blunt: "0" :c3: "0" BmeRI: :len: "11" :c1: "6" :c4: "0" :ncuts: "2" :pattern: GACNNNNNGTC :c2: "5" :name: BmeRI :blunt: "0" :c3: "0" BstPAI: :len: "10" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GACNNNNGTC :c2: "5" :name: BstPAI :blunt: "1" :c3: "0" SduI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GDGCHC :c2: "1" :name: SduI :blunt: "0" :c3: "0" MaeI: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTAG :c2: "3" :name: MaeI :blunt: "0" :c3: "0" LguI: :len: "7" :c1: "8" :c4: "0" :ncuts: "2" :pattern: GCTCTTC :c2: "11" :name: LguI :blunt: "0" :c3: "0" AscI: :len: "8" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GGCGCGCC :c2: "6" :name: AscI :blunt: "0" :c3: "0" AccBSI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: CCGCTC :c2: "3" :name: AccBSI :blunt: "1" :c3: "0" Bst2UI: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCWGG :c2: "3" :name: Bst2UI :blunt: "0" :c3: "0" CjuII: :len: "11" :c1: "0" :c4: "0" :ncuts: "0" :pattern: caynnnnnctc :c2: "0" :name: CjuII :blunt: "0" :c3: "0" BtgI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCRYGG :c2: "5" :name: BtgI :blunt: "0" :c3: "0" BpiI: :len: "6" :c1: "8" :c4: "0" :ncuts: "2" :pattern: GAAGAC :c2: "12" :name: BpiI :blunt: "0" :c3: "0" PspLI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CGTACG :c2: "5" :name: PspLI :blunt: "0" :c3: "0" MvrI: :len: "6" :c1: "4" :c4: "0" :ncuts: "2" :pattern: CGATCG :c2: "2" :name: MvrI :blunt: "0" :c3: "0" Aor51HI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: AGCGCT :c2: "3" :name: Aor51HI :blunt: "1" :c3: "0" StrI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTCGAG :c2: "5" :name: StrI :blunt: "0" :c3: "0" MspR9I: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCNGG :c2: "3" :name: MspR9I :blunt: "0" :c3: "0" HphI: :len: "5" :c1: "13" :c4: "0" :ncuts: "2" :pattern: GGTGA :c2: "12" :name: HphI :blunt: "0" :c3: "0" Hin1II: :len: "4" :c1: "4" :c4: "0" :ncuts: "2" :pattern: CATG :c2: "-1" :name: Hin1II :blunt: "0" :c3: "0" AhaIII: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: tttaaa :c2: "3" :name: AhaIII :blunt: "1" :c3: "0" BbuI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GCATGC :c2: "1" :name: BbuI :blunt: "0" :c3: "0" EheI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GGCGCC :c2: "3" :name: EheI :blunt: "1" :c3: "0" PspPPI: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: RGGWCCY :c2: "5" :name: PspPPI :blunt: "0" :c3: "0" NmuCI: :len: "5" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: GTSAC :c2: "5" :name: NmuCI :blunt: "0" :c3: "0" BsaXI: :len: "11" :c1: "-10" :c4: "18" :ncuts: "4" :pattern: ACNNNNNCTCC :c2: "-13" :name: BsaXI :blunt: "0" :c3: "21" BlpI: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GCTNAGC :c2: "5" :name: BlpI :blunt: "0" :c3: "0" BspMAI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: CTGCAG :c2: "1" :name: BspMAI :blunt: "0" :c3: "0" Eco147I: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: AGGCCT :c2: "3" :name: Eco147I :blunt: "1" :c3: "0" AspEI: :len: "11" :c1: "6" :c4: "0" :ncuts: "2" :pattern: GACNNNNNGTC :c2: "5" :name: AspEI :blunt: "0" :c3: "0" Eco47I: :len: "5" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGWCC :c2: "4" :name: Eco47I :blunt: "0" :c3: "0" SgfI: :len: "8" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GCGATCGC :c2: "3" :name: SgfI :blunt: "0" :c3: "0" SchI: :len: "5" :c1: "10" :c4: "0" :ncuts: "2" :pattern: GAGTC :c2: "10" :name: SchI :blunt: "1" :c3: "0" PabI: :len: "4" :c1: "3" :c4: "0" :ncuts: "2" :pattern: gtac :c2: "1" :name: PabI :blunt: "0" :c3: "0" AcuI: :len: "6" :c1: "22" :c4: "0" :ncuts: "2" :pattern: CTGAAG :c2: "20" :name: AcuI :blunt: "0" :c3: "0" Bbv12I: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GWGCWC :c2: "1" :name: Bbv12I :blunt: "0" :c3: "0" ZraI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GACGTC :c2: "3" :name: ZraI :blunt: "1" :c3: "0" PciSI: :len: "7" :c1: "8" :c4: "0" :ncuts: "2" :pattern: GCTCTTC :c2: "11" :name: PciSI :blunt: "0" :c3: "0" FinI: :len: "5" :c1: "0" :c4: "0" :ncuts: "0" :pattern: gggac :c2: "0" :name: FinI :blunt: "0" :c3: "0" Bme1390I: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCNGG :c2: "3" :name: Bme1390I :blunt: "0" :c3: "0" Bsu36I: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCTNAGG :c2: "5" :name: Bsu36I :blunt: "0" :c3: "0" FokI: :len: "5" :c1: "14" :c4: "0" :ncuts: "2" :pattern: GGATG :c2: "18" :name: FokI :blunt: "0" :c3: "0" CviRI: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: tgca :c2: "2" :name: CviRI :blunt: "1" :c3: "0" BsiYI: :len: "11" :c1: "7" :c4: "0" :ncuts: "2" :pattern: CCNNNNNNNGG :c2: "4" :name: BsiYI :blunt: "0" :c3: "0" Bst1107I: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GTATAC :c2: "3" :name: Bst1107I :blunt: "1" :c3: "0" SelI: :len: "4" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: cgcg :c2: "4" :name: SelI :blunt: "0" :c3: "0" PagI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TCATGA :c2: "5" :name: PagI :blunt: "0" :c3: "0" Bbr7I: :len: "6" :c1: "13" :c4: "0" :ncuts: "2" :pattern: gaagac :c2: "17" :name: Bbr7I :blunt: "0" :c3: "0" BfuAI: :len: "6" :c1: "10" :c4: "0" :ncuts: "2" :pattern: ACCTGC :c2: "14" :name: BfuAI :blunt: "0" :c3: "0" AfaI: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GTAC :c2: "2" :name: AfaI :blunt: "1" :c3: "0" Bse1I: :len: "5" :c1: "6" :c4: "0" :ncuts: "2" :pattern: ACTGG :c2: "4" :name: Bse1I :blunt: "0" :c3: "0" BcgI: :len: "12" :c1: "-11" :c4: "22" :ncuts: "4" :pattern: CGANNNNNNTGC :c2: "-13" :name: BcgI :blunt: "0" :c3: "24" BsrBI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: CCGCTC :c2: "3" :name: BsrBI :blunt: "1" :c3: "0" UbaF13I: :len: "13" :c1: "0" :c4: "0" :ncuts: "0" :pattern: gagnnnnnnctgg :c2: "0" :name: UbaF13I :blunt: "0" :c3: "0" ApaI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GGGCCC :c2: "1" :name: ApaI :blunt: "0" :c3: "0" BclI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TGATCA :c2: "5" :name: BclI :blunt: "0" :c3: "0" FriOI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GRGCYC :c2: "1" :name: FriOI :blunt: "0" :c3: "0" PscI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ACATGT :c2: "5" :name: PscI :blunt: "0" :c3: "0" MspI: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCGG :c2: "3" :name: MspI :blunt: "0" :c3: "0" HinP1I: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GCGC :c2: "3" :name: HinP1I :blunt: "0" :c3: "0" CjeNII: :len: "10" :c1: "0" :c4: "0" :ncuts: "0" :pattern: gagnnnnngt :c2: "0" :name: CjeNII :blunt: "0" :c3: "0" CfoI: :len: "4" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GCGC :c2: "1" :name: CfoI :blunt: "0" :c3: "0" BmgI: :len: "6" :c1: "0" :c4: "0" :ncuts: "0" :pattern: gkgccc :c2: "0" :name: BmgI :blunt: "0" :c3: "0" Sau3AI: :len: "4" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: GATC :c2: "4" :name: Sau3AI :blunt: "0" :c3: "0" NruI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TCGCGA :c2: "3" :name: NruI :blunt: "1" :c3: "0" Bme18I: :len: "5" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGWCC :c2: "4" :name: Bme18I :blunt: "0" :c3: "0" BsrGI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TGTACA :c2: "5" :name: BsrGI :blunt: "0" :c3: "0" BspHI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TCATGA :c2: "5" :name: BspHI :blunt: "0" :c3: "0" EaeI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: YGGCCR :c2: "5" :name: EaeI :blunt: "0" :c3: "0" ApaBI: :len: "11" :c1: "8" :c4: "0" :ncuts: "2" :pattern: gcannnnntgc :c2: "3" :name: ApaBI :blunt: "0" :c3: "0" AjiI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: CACGTC :c2: "3" :name: AjiI :blunt: "1" :c3: "0" XhoI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTCGAG :c2: "5" :name: XhoI :blunt: "0" :c3: "0" Tru9I: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TTAA :c2: "3" :name: Tru9I :blunt: "0" :c3: "0" PspOMI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGGCCC :c2: "5" :name: PspOMI :blunt: "0" :c3: "0" Kzo9I: :len: "4" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: GATC :c2: "4" :name: Kzo9I :blunt: "0" :c3: "0" GdiII: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: cggccr :c2: "5" :name: GdiII :blunt: "0" :c3: "0" BseMII: :len: "5" :c1: "15" :c4: "0" :ncuts: "2" :pattern: CTCAG :c2: "13" :name: BseMII :blunt: "0" :c3: "0" Bpu1102I: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GCTNAGC :c2: "5" :name: Bpu1102I :blunt: "0" :c3: "0" BspMI: :len: "6" :c1: "10" :c4: "0" :ncuts: "2" :pattern: ACCTGC :c2: "14" :name: BspMI :blunt: "0" :c3: "0" BstF5I: :len: "5" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GGATG :c2: "5" :name: BstF5I :blunt: "0" :c3: "0" BsiI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: cacgag :c2: "5" :name: BsiI :blunt: "0" :c3: "0" BinI: :len: "5" :c1: "9" :c4: "0" :ncuts: "2" :pattern: ggatc :c2: "10" :name: BinI :blunt: "0" :c3: "0" Eco72I: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: CACGTG :c2: "3" :name: Eco72I :blunt: "1" :c3: "0" SfcI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTRYAG :c2: "5" :name: SfcI :blunt: "0" :c3: "0" Psp6I: :len: "5" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: CCWGG :c2: "5" :name: Psp6I :blunt: "0" :c3: "0" NsbI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TGCGCA :c2: "3" :name: NsbI :blunt: "1" :c3: "0" Kpn2I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TCCGGA :c2: "5" :name: Kpn2I :blunt: "0" :c3: "0" BstSFI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTRYAG :c2: "5" :name: BstSFI :blunt: "0" :c3: "0" CpoI: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CGGWCCG :c2: "5" :name: CpoI :blunt: "0" :c3: "0" EciI: :len: "6" :c1: "17" :c4: "0" :ncuts: "2" :pattern: GGCGGA :c2: "15" :name: EciI :blunt: "0" :c3: "0" Eco91I: :len: "7" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGTNACC :c2: "6" :name: Eco91I :blunt: "0" :c3: "0" Bsp143I: :len: "4" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: GATC :c2: "4" :name: Bsp143I :blunt: "0" :c3: "0" SstI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GAGCTC :c2: "1" :name: SstI :blunt: "0" :c3: "0" Hin4II: :len: "5" :c1: "11" :c4: "0" :ncuts: "2" :pattern: ccttc :c2: "10" :name: Hin4II :blunt: "0" :c3: "0" Bst4CI: :len: "5" :c1: "3" :c4: "0" :ncuts: "2" :pattern: ACNGT :c2: "2" :name: Bst4CI :blunt: "0" :c3: "0" BscAI: :len: "5" :c1: "9" :c4: "0" :ncuts: "2" :pattern: gcatc :c2: "11" :name: BscAI :blunt: "0" :c3: "0" BsaBI: :len: "10" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GATNNNNATC :c2: "5" :name: BsaBI :blunt: "1" :c3: "0" BisI: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GCNGC :c2: "3" :name: BisI :blunt: "0" :c3: "0" AjnI: :len: "5" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: CCWGG :c2: "5" :name: AjnI :blunt: "0" :c3: "0" Bsp19I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCATGG :c2: "5" :name: Bsp19I :blunt: "0" :c3: "0" TspEI: :len: "4" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: AATT :c2: "4" :name: TspEI :blunt: "0" :c3: "0" NcoI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCATGG :c2: "5" :name: NcoI :blunt: "0" :c3: "0" MvaI: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCWGG :c2: "3" :name: MvaI :blunt: "0" :c3: "0" BthCI: :len: "5" :c1: "4" :c4: "0" :ncuts: "2" :pattern: gcngc :c2: "1" :name: BthCI :blunt: "0" :c3: "0" BshVI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: ATCGAT :c2: "4" :name: BshVI :blunt: "0" :c3: "0" BsnI: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GGCC :c2: "2" :name: BsnI :blunt: "1" :c3: "0" Alw44I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GTGCAC :c2: "5" :name: Alw44I :blunt: "0" :c3: "0" BstPI: :len: "7" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGTNACC :c2: "6" :name: BstPI :blunt: "0" :c3: "0" PflFI: :len: "9" :c1: "4" :c4: "0" :ncuts: "2" :pattern: GACNNNGTC :c2: "5" :name: PflFI :blunt: "0" :c3: "0" MroNI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GCCGGC :c2: "5" :name: MroNI :blunt: "0" :c3: "0" HincII: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GTYRAC :c2: "3" :name: HincII :blunt: "1" :c3: "0" BbvII: :len: "6" :c1: "8" :c4: "0" :ncuts: "2" :pattern: gaagac :c2: "12" :name: BbvII :blunt: "0" :c3: "0" BpuEI: :len: "6" :c1: "22" :c4: "0" :ncuts: "2" :pattern: CTTGAG :c2: "20" :name: BpuEI :blunt: "0" :c3: "0" BstV2I: :len: "6" :c1: "8" :c4: "0" :ncuts: "2" :pattern: GAAGAC :c2: "12" :name: BstV2I :blunt: "0" :c3: "0" PsrI: :len: "13" :c1: "-8" :c4: "20" :ncuts: "4" :pattern: GAACNNNNNNTAC :c2: "-13" :name: PsrI :blunt: "0" :c3: "25" CaiI: :len: "9" :c1: "6" :c4: "0" :ncuts: "2" :pattern: CAGNNNCTG :c2: "3" :name: CaiI :blunt: "0" :c3: "0" Eam1105I: :len: "11" :c1: "6" :c4: "0" :ncuts: "2" :pattern: GACNNNNNGTC :c2: "5" :name: Eam1105I :blunt: "0" :c3: "0" ApaLI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GTGCAC :c2: "5" :name: ApaLI :blunt: "0" :c3: "0" XmaJI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCTAGG :c2: "5" :name: XmaJI :blunt: "0" :c3: "0" SexAI: :len: "7" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ACCWGGT :c2: "6" :name: SexAI :blunt: "0" :c3: "0" RigI: :len: "8" :c1: "6" :c4: "0" :ncuts: "2" :pattern: GGCCGGCC :c2: "2" :name: RigI :blunt: "0" :c3: "0" AsuII: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: TTCGAA :c2: "4" :name: AsuII :blunt: "0" :c3: "0" BstKTI: :len: "4" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GATC :c2: "1" :name: BstKTI :blunt: "0" :c3: "0" BstUI: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CGCG :c2: "2" :name: BstUI :blunt: "1" :c3: "0" BstBAI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: YACGTR :c2: "3" :name: BstBAI :blunt: "1" :c3: "0" EclXI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CGGCCG :c2: "5" :name: EclXI :blunt: "0" :c3: "0" BsmAI: :len: "5" :c1: "6" :c4: "0" :ncuts: "2" :pattern: GTCTC :c2: "10" :name: BsmAI :blunt: "0" :c3: "0" Sse8647I: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: aggwcct :c2: "5" :name: Sse8647I :blunt: "0" :c3: "0" SphI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GCATGC :c2: "1" :name: SphI :blunt: "0" :c3: "0" HpyF10VI: :len: "11" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GCNNNNNNNGC :c2: "4" :name: HpyF10VI :blunt: "0" :c3: "0" BbrPI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: CACGTG :c2: "3" :name: BbrPI :blunt: "1" :c3: "0" AlwI: :len: "5" :c1: "9" :c4: "0" :ncuts: "2" :pattern: GGATC :c2: "10" :name: AlwI :blunt: "0" :c3: "0" TatI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: WGTACW :c2: "5" :name: TatI :blunt: "0" :c3: "0" Hpy8I: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GTNNAC :c2: "3" :name: Hpy8I :blunt: "1" :c3: "0" BsmFI: :len: "5" :c1: "15" :c4: "0" :ncuts: "2" :pattern: GGGAC :c2: "19" :name: BsmFI :blunt: "0" :c3: "0" BseJI: :len: "10" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GATNNNNATC :c2: "5" :name: BseJI :blunt: "1" :c3: "0" PspEI: :len: "7" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGTNACC :c2: "6" :name: PspEI :blunt: "0" :c3: "0" MroXI: :len: "10" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GAANNNNTTC :c2: "5" :name: MroXI :blunt: "1" :c3: "0" KasI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGCGCC :c2: "5" :name: KasI :blunt: "0" :c3: "0" HpaII: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCGG :c2: "3" :name: HpaII :blunt: "0" :c3: "0" BstZI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CGGCCG :c2: "5" :name: BstZI :blunt: "0" :c3: "0" BstDEI: :len: "5" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTNAG :c2: "4" :name: BstDEI :blunt: "0" :c3: "0" DseDI: :len: "12" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GACNNNNNNGTC :c2: "5" :name: DseDI :blunt: "0" :c3: "0" CseI: :len: "5" :c1: "10" :c4: "0" :ncuts: "2" :pattern: GACGC :c2: "15" :name: CseI :blunt: "0" :c3: "0" HpaI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GTTAAC :c2: "3" :name: HpaI :blunt: "1" :c3: "0" HaeIII: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GGCC :c2: "2" :name: HaeIII :blunt: "1" :c3: "0" CviKI-1: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: RGCY :c2: "2" :name: CviKI-1 :blunt: "1" :c3: "0" AciI: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCGC :c2: "3" :name: AciI :blunt: "0" :c3: "0" XmiI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GTMKAC :c2: "4" :name: XmiI :blunt: "0" :c3: "0" MluI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ACGCGT :c2: "5" :name: MluI :blunt: "0" :c3: "0" EspI: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: gctnagc :c2: "5" :name: EspI :blunt: "0" :c3: "0" Bst98I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTTAAG :c2: "5" :name: Bst98I :blunt: "0" :c3: "0" AatII: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GACGTC :c2: "1" :name: AatII :blunt: "0" :c3: "0" TaqII: :len: "6" :c1: "17" :c4: "0" :ncuts: "2" :pattern: CACCCA :c2: "15" :name: TaqII :blunt: "0" :c3: "0" ScaI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: AGTACT :c2: "3" :name: ScaI :blunt: "1" :c3: "0" AflII: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTTAAG :c2: "5" :name: AflII :blunt: "0" :c3: "0" BstHHI: :len: "4" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GCGC :c2: "1" :name: BstHHI :blunt: "0" :c3: "0" FnuDII: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: cgcg :c2: "2" :name: FnuDII :blunt: "1" :c3: "0" BspTNI: :len: "6" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GGTCTC :c2: "11" :name: BspTNI :blunt: "0" :c3: "0" XmaIII: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: cggccg :c2: "5" :name: XmaIII :blunt: "0" :c3: "0" PhoI: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GGCC :c2: "2" :name: PhoI :blunt: "1" :c3: "0" BbsI: :len: "6" :c1: "8" :c4: "0" :ncuts: "2" :pattern: GAAGAC :c2: "12" :name: BbsI :blunt: "0" :c3: "0" XmnI: :len: "10" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GAANNNNTTC :c2: "5" :name: XmnI :blunt: "1" :c3: "0" TsuI: :len: "5" :c1: "0" :c4: "0" :ncuts: "0" :pattern: gcgac :c2: "0" :name: TsuI :blunt: "0" :c3: "0" FspAI: :len: "8" :c1: "4" :c4: "0" :ncuts: "2" :pattern: RTGCGCAY :c2: "4" :name: FspAI :blunt: "1" :c3: "0" BstFNI: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CGCG :c2: "2" :name: BstFNI :blunt: "1" :c3: "0" BssMI: :len: "4" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: GATC :c2: "4" :name: BssMI :blunt: "0" :c3: "0" BstC8I: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GCNNGC :c2: "3" :name: BstC8I :blunt: "1" :c3: "0" BplI: :len: "11" :c1: "-9" :c4: "19" :ncuts: "4" :pattern: GAGNNNNNCTC :c2: "-14" :name: BplI :blunt: "0" :c3: "24" BlnI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCTAGG :c2: "5" :name: BlnI :blunt: "0" :c3: "0" EcoNI: :len: "11" :c1: "5" :c4: "0" :ncuts: "2" :pattern: CCTNNNNNAGG :c2: "6" :name: EcoNI :blunt: "0" :c3: "0" Ecl136II: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GAGCTC :c2: "3" :name: Ecl136II :blunt: "1" :c3: "0" AcsI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: RAATTY :c2: "5" :name: AcsI :blunt: "0" :c3: "0" AspCNI: :len: "5" :c1: "0" :c4: "0" :ncuts: "0" :pattern: gccgc :c2: "0" :name: AspCNI :blunt: "0" :c3: "0" AatI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: AGGCCT :c2: "3" :name: AatI :blunt: "1" :c3: "0" EsaBC3I: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: tcga :c2: "2" :name: EsaBC3I :blunt: "1" :c3: "0" XbaI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TCTAGA :c2: "5" :name: XbaI :blunt: "0" :c3: "0" TfiI: :len: "5" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GAWTC :c2: "4" :name: TfiI :blunt: "0" :c3: "0" StuI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: AGGCCT :c2: "3" :name: StuI :blunt: "1" :c3: "0" SmaI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: CCCGGG :c2: "3" :name: SmaI :blunt: "1" :c3: "0" Psp5II: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: RGGWCCY :c2: "5" :name: Psp5II :blunt: "0" :c3: "0" MboII: :len: "5" :c1: "13" :c4: "0" :ncuts: "2" :pattern: GAAGA :c2: "12" :name: MboII :blunt: "0" :c3: "0" MamI: :len: "10" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GATNNNNATC :c2: "5" :name: MamI :blunt: "1" :c3: "0" Bsp68I: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TCGCGA :c2: "3" :name: Bsp68I :blunt: "1" :c3: "0" Acc16I: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TGCGCA :c2: "3" :name: Acc16I :blunt: "1" :c3: "0" XspI: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTAG :c2: "3" :name: XspI :blunt: "0" :c3: "0" BsiWI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CGTACG :c2: "5" :name: BsiWI :blunt: "0" :c3: "0" BseYI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCCAGC :c2: "5" :name: BseYI :blunt: "0" :c3: "0" Eco88I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CYCGRG :c2: "5" :name: Eco88I :blunt: "0" :c3: "0" Bsu15I: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: ATCGAT :c2: "4" :name: Bsu15I :blunt: "0" :c3: "0" AlwFI: :len: "13" :c1: "0" :c4: "0" :ncuts: "0" :pattern: gaaaynnnnnrtg :c2: "0" :name: AlwFI :blunt: "0" :c3: "0" SalI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GTCGAC :c2: "5" :name: SalI :blunt: "0" :c3: "0" RleAI: :len: "6" :c1: "18" :c4: "0" :ncuts: "2" :pattern: cccaca :c2: "15" :name: RleAI :blunt: "0" :c3: "0" PaeI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GCATGC :c2: "1" :name: PaeI :blunt: "0" :c3: "0" Hsp92I: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GRCGYC :c2: "4" :name: Hsp92I :blunt: "0" :c3: "0" SwaI: :len: "8" :c1: "4" :c4: "0" :ncuts: "2" :pattern: ATTTAAAT :c2: "4" :name: SwaI :blunt: "1" :c3: "0" SspBI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: TGTACA :c2: "5" :name: SspBI :blunt: "0" :c3: "0" BspCNI: :len: "5" :c1: "14" :c4: "0" :ncuts: "2" :pattern: CTCAG :c2: "12" :name: BspCNI :blunt: "0" :c3: "0" ErhI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCWWGG :c2: "5" :name: ErhI :blunt: "0" :c3: "0" FauI: :len: "5" :c1: "9" :c4: "0" :ncuts: "2" :pattern: CCCGC :c2: "11" :name: FauI :blunt: "0" :c3: "0" AceIII: :len: "6" :c1: "13" :c4: "0" :ncuts: "2" :pattern: cagctc :c2: "17" :name: AceIII :blunt: "0" :c3: "0" AsuNHI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GCTAGC :c2: "5" :name: AsuNHI :blunt: "0" :c3: "0" AccB1I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGYRCC :c2: "5" :name: AccB1I :blunt: "0" :c3: "0" AspI: :len: "9" :c1: "4" :c4: "0" :ncuts: "2" :pattern: GACNNNGTC :c2: "5" :name: AspI :blunt: "0" :c3: "0" HaeI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: wggccw :c2: "3" :name: HaeI :blunt: "1" :c3: "0" EcoICRI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GAGCTC :c2: "3" :name: EcoICRI :blunt: "1" :c3: "0" BstACI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GRCGYC :c2: "4" :name: BstACI :blunt: "0" :c3: "0" BspMII: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: tccgga :c2: "5" :name: BspMII :blunt: "0" :c3: "0" CdiI: :len: "5" :c1: "4" :c4: "0" :ncuts: "2" :pattern: catcg :c2: "4" :name: CdiI :blunt: "1" :c3: "0" UbaF11I: :len: "5" :c1: "0" :c4: "0" :ncuts: "0" :pattern: tcgta :c2: "0" :name: UbaF11I :blunt: "0" :c3: "0" SimI: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: gggtc :c2: "5" :name: SimI :blunt: "0" :c3: "0" PciI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ACATGT :c2: "5" :name: PciI :blunt: "0" :c3: "0" MspCI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTTAAG :c2: "5" :name: MspCI :blunt: "0" :c3: "0" HpyF3I: :len: "5" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CTNAG :c2: "4" :name: HpyF3I :blunt: "0" :c3: "0" AsuHPI: :len: "5" :c1: "13" :c4: "0" :ncuts: "2" :pattern: GGTGA :c2: "12" :name: AsuHPI :blunt: "0" :c3: "0" BmrFI: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCNGG :c2: "3" :name: BmrFI :blunt: "0" :c3: "0" AdeI: :len: "9" :c1: "6" :c4: "0" :ncuts: "2" :pattern: CACNNNGTG :c2: "3" :name: AdeI :blunt: "0" :c3: "0" BsbI: :len: "6" :c1: "0" :c4: "0" :ncuts: "0" :pattern: caacac :c2: "0" :name: BsbI :blunt: "0" :c3: "0" AsuI: :len: "5" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ggncc :c2: "4" :name: AsuI :blunt: "0" :c3: "0" Cfr42I: :len: "6" :c1: "4" :c4: "0" :ncuts: "2" :pattern: CCGCGG :c2: "2" :name: Cfr42I :blunt: "0" :c3: "0" NspBII: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: cmgckg :c2: "3" :name: NspBII :blunt: "1" :c3: "0" Mva1269I: :len: "6" :c1: "7" :c4: "0" :ncuts: "2" :pattern: GAATGC :c2: "5" :name: Mva1269I :blunt: "0" :c3: "0" BstMBI: :len: "4" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: GATC :c2: "4" :name: BstMBI :blunt: "0" :c3: "0" SgsI: :len: "8" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GGCGCGCC :c2: "6" :name: SgsI :blunt: "0" :c3: "0" SetI: :len: "4" :c1: "4" :c4: "0" :ncuts: "2" :pattern: ASST :c2: "-1" :name: SetI :blunt: "0" :c3: "0" AvaI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CYCGRG :c2: "5" :name: AvaI :blunt: "0" :c3: "0" AlfI: :len: "12" :c1: "-11" :c4: "22" :ncuts: "4" :pattern: GCANNNNNNTGC :c2: "-13" :name: AlfI :blunt: "0" :c3: "24" AfiI: :len: "11" :c1: "7" :c4: "0" :ncuts: "2" :pattern: CCNNNNNNNGG :c2: "4" :name: AfiI :blunt: "0" :c3: "0" AvrII: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCTAGG :c2: "5" :name: AvrII :blunt: "0" :c3: "0" SanDI: :len: "7" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GGGWCCC :c2: "5" :name: SanDI :blunt: "0" :c3: "0" PspN4I: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GGNNCC :c2: "3" :name: PspN4I :blunt: "1" :c3: "0" Pfl1108I: :len: "6" :c1: "0" :c4: "0" :ncuts: "0" :pattern: tcgtag :c2: "0" :name: Pfl1108I :blunt: "0" :c3: "0" MssI: :len: "8" :c1: "4" :c4: "0" :ncuts: "2" :pattern: GTTTAAAC :c2: "4" :name: MssI :blunt: "1" :c3: "0" BsgI: :len: "6" :c1: "22" :c4: "0" :ncuts: "2" :pattern: GTGCAG :c2: "20" :name: BsgI :blunt: "0" :c3: "0" CfrI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: YGGCCR :c2: "5" :name: CfrI :blunt: "0" :c3: "0" Eco32I: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GATATC :c2: "3" :name: Eco32I :blunt: "1" :c3: "0" BstH2I: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: RGCGCY :c2: "1" :name: BstH2I :blunt: "0" :c3: "0" BpvUI: :len: "6" :c1: "4" :c4: "0" :ncuts: "2" :pattern: CGATCG :c2: "2" :name: BpvUI :blunt: "0" :c3: "0" BtuMI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TCGCGA :c2: "3" :name: BtuMI :blunt: "1" :c3: "0" SmuI: :len: "5" :c1: "9" :c4: "0" :ncuts: "2" :pattern: CCCGC :c2: "11" :name: SmuI :blunt: "0" :c3: "0" SgrAI: :len: "8" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CRCCGGYG :c2: "6" :name: SgrAI :blunt: "0" :c3: "0" MbiI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: CCGCTC :c2: "3" :name: MbiI :blunt: "1" :c3: "0" Hin1I: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GRCGYC :c2: "4" :name: Hin1I :blunt: "0" :c3: "0" FblI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GTMKAC :c2: "4" :name: FblI :blunt: "0" :c3: "0" EgeI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: GGCGCC :c2: "3" :name: EgeI :blunt: "1" :c3: "0" Bst2BI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CACGAG :c2: "5" :name: Bst2BI :blunt: "0" :c3: "0" BauI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CACGAG :c2: "5" :name: BauI :blunt: "0" :c3: "0" XmaCI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCCGGG :c2: "5" :name: XmaCI :blunt: "0" :c3: "0" UbaF9I: :len: "12" :c1: "0" :c4: "0" :ncuts: "0" :pattern: tacnnnnnrtgt :c2: "0" :name: UbaF9I :blunt: "0" :c3: "0" RgaI: :len: "8" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GCGATCGC :c2: "3" :name: RgaI :blunt: "0" :c3: "0" BstNI: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCWGG :c2: "3" :name: BstNI :blunt: "0" :c3: "0" BspPI: :len: "5" :c1: "9" :c4: "0" :ncuts: "2" :pattern: GGATC :c2: "10" :name: BspPI :blunt: "0" :c3: "0" BshTI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ACCGGT :c2: "5" :name: BshTI :blunt: "0" :c3: "0" BslI: :len: "11" :c1: "7" :c4: "0" :ncuts: "2" :pattern: CCNNNNNNNGG :c2: "4" :name: BslI :blunt: "0" :c3: "0" CjuI: :len: "11" :c1: "0" :c4: "0" :ncuts: "0" :pattern: caynnnnnrtg :c2: "0" :name: CjuI :blunt: "0" :c3: "0" UbaPI: :len: "6" :c1: "0" :c4: "0" :ncuts: "0" :pattern: cgaacg :c2: "0" :name: UbaPI :blunt: "0" :c3: "0" MflI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: RGATCY :c2: "5" :name: MflI :blunt: "0" :c3: "0" Hin6I: :len: "4" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GCGC :c2: "3" :name: Hin6I :blunt: "0" :c3: "0" BseCI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: ATCGAT :c2: "4" :name: BseCI :blunt: "0" :c3: "0" BdaI: :len: "12" :c1: "-11" :c4: "22" :ncuts: "4" :pattern: TGANNNNNNTCA :c2: "-13" :name: BdaI :blunt: "0" :c3: "24" Acc65I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGTACC :c2: "5" :name: Acc65I :blunt: "0" :c3: "0" PshBI: :len: "6" :c1: "2" :c4: "0" :ncuts: "2" :pattern: ATTAAT :c2: "4" :name: PshBI :blunt: "0" :c3: "0" BmtI: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GCTAGC :c2: "1" :name: BmtI :blunt: "0" :c3: "0" EarI: :len: "6" :c1: "7" :c4: "0" :ncuts: "2" :pattern: CTCTTC :c2: "10" :name: EarI :blunt: "0" :c3: "0" BstSNI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TACGTA :c2: "3" :name: BstSNI :blunt: "1" :c3: "0" AvaII: :len: "5" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GGWCC :c2: "4" :name: AvaII :blunt: "0" :c3: "0" AluI: :len: "4" :c1: "2" :c4: "0" :ncuts: "2" :pattern: AGCT :c2: "2" :name: AluI :blunt: "1" :c3: "0" Tth111I: :len: "9" :c1: "4" :c4: "0" :ncuts: "2" :pattern: GACNNNGTC :c2: "5" :name: Tth111I :blunt: "0" :c3: "0" Tsp4CI: :len: "5" :c1: "3" :c4: "0" :ncuts: "2" :pattern: acngt :c2: "2" :name: Tsp4CI :blunt: "0" :c3: "0" PsuI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: RGATCY :c2: "5" :name: PsuI :blunt: "0" :c3: "0" DpnII: :len: "4" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: GATC :c2: "4" :name: DpnII :blunt: "0" :c3: "0" Cfr10I: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: RCCGGY :c2: "5" :name: Cfr10I :blunt: "0" :c3: "0" BoxI: :len: "10" :c1: "5" :c4: "0" :ncuts: "2" :pattern: GACNNNNGTC :c2: "5" :name: BoxI :blunt: "1" :c3: "0" BsaJI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCNNGG :c2: "5" :name: BsaJI :blunt: "0" :c3: "0" TspMI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCCGGG :c2: "5" :name: TspMI :blunt: "0" :c3: "0" SmiMI: :len: "10" :c1: "5" :c4: "0" :ncuts: "2" :pattern: CAYNNNNRTG :c2: "5" :name: SmiMI :blunt: "1" :c3: "0" PspCI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: CACGTG :c2: "3" :name: PspCI :blunt: "1" :c3: "0" Nli3877I: :len: "6" :c1: "5" :c4: "0" :ncuts: "2" :pattern: cycgrg :c2: "1" :name: Nli3877I :blunt: "0" :c3: "0" BstXI: :len: "12" :c1: "8" :c4: "0" :ncuts: "2" :pattern: CCANNNNNNTGG :c2: "4" :name: BstXI :blunt: "0" :c3: "0" BssAI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: RCCGGY :c2: "5" :name: BssAI :blunt: "0" :c3: "0" RseI: :len: "10" :c1: "5" :c4: "0" :ncuts: "2" :pattern: CAYNNNNRTG :c2: "5" :name: RseI :blunt: "1" :c3: "0" NgoMIV: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: GCCGGC :c2: "5" :name: NgoMIV :blunt: "0" :c3: "0" BpuMI: :len: "5" :c1: "2" :c4: "0" :ncuts: "2" :pattern: CCSGG :c2: "3" :name: BpuMI :blunt: "0" :c3: "0" AgeI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: ACCGGT :c2: "5" :name: AgeI :blunt: "0" :c3: "0" MlsI: :len: "6" :c1: "3" :c4: "0" :ncuts: "2" :pattern: TGGCCA :c2: "3" :name: MlsI :blunt: "1" :c3: "0" BssECI: :len: "6" :c1: "1" :c4: "0" :ncuts: "2" :pattern: CCNNGG :c2: "5" :name: BssECI :blunt: "0" :c3: "0" CciNI: :len: "8" :c1: "2" :c4: "0" :ncuts: "2" :pattern: GCGGCCGC :c2: "6" :name: CciNI :blunt: "0" :c3: "0" BseMI: :len: "6" :c1: "8" :c4: "0" :ncuts: "2" :pattern: GCAATG :c2: "6" :name: BseMI :blunt: "0" :c3: "0" EcoRII: :len: "5" :c1: "-1" :c4: "0" :ncuts: "2" :pattern: CCWGG :c2: "5" :name: EcoRII :blunt: "0" :c3: "0" bio-2.0.3/lib/bio/util/restriction_enzyme/cut_symbol.rb0000644000175000017500000000526214141516614022560 0ustar nileshnilesh# # bio/util/restriction_enzyme/cut_symbol.rb - Defines the symbol used to mark a cut in an enzyme sequence # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme # = Usage # # #require 'bio/util/restriction_enzyme/cut_symbol' # require 'cut_symbol' # include Bio::RestrictionEnzyme::CutSymbol # # cut_symbol # => "^" # set_cut_symbol('|') # => "|" # cut_symbol # => "|" # escaped_cut_symbol # => "\\|" # re_cut_symbol # => /\|/ # set_cut_symbol('^') # => "^" # "abc^de" =~ re_cut_symbol # => 3 # "abc^de" =~ re_cut_symbol_adjacent # => nil # "abc^^de" =~ re_cut_symbol_adjacent # => 3 # "a^bc^^de" =~ re_cut_symbol_adjacent # => 4 # "a^bc^de" =~ re_cut_symbol_adjacent # => nil # module CutSymbol # Set the token to be used as the cut symbol in a restriction enzyme sequece # # Starts as +^+ character # # --- # *Arguments* # * +glyph+: The single character to be used as the cut symbol in an enzyme sequence # *Returns*:: +glyph+ def set_cut_symbol(glyph) CutSymbol__.cut_symbol = glyph end # Get the token that's used as the cut symbol in a restriction enzyme sequece # # --- # *Arguments* # * _none_ # *Returns*:: +glyph+ def cut_symbol; CutSymbol__.cut_symbol; end # Get the token that's used as the cut symbol in a restriction enzyme sequece with # a back-slash preceding it. # # --- # *Arguments* # * _none_ # *Returns*:: +\glyph+ def escaped_cut_symbol; CutSymbol__.escaped_cut_symbol; end # Used to check if multiple cut symbols are next to each other. # # --- # *Arguments* # * _none_ # *Returns*:: +RegExp+ def re_cut_symbol_adjacent %r"#{escaped_cut_symbol}{2}" end # A Regexp of the cut_symbol. # # --- # *Arguments* # * _none_ # *Returns*:: +RegExp+ def re_cut_symbol %r"#{escaped_cut_symbol}" end ######### #protected # NOTE this is a Module, can't hide CutSymbol__ ######### require 'singleton' # Class to keep state class CutSymbol__ include Singleton @cut_symbol = '^' def self.cut_symbol; @cut_symbol; end def self.cut_symbol=(glyph); raise ArgumentError if glyph.size != 1 @cut_symbol = glyph end def self.escaped_cut_symbol; "\\" + self.cut_symbol; end end end # CutSymbol end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/string_formatting.rb0000644000175000017500000000546714141516614024147 0ustar nileshnilesh# # bio/util/restriction_enzyme/string_formatting.rb - Useful functions for string manipulation # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme module StringFormatting include CutSymbol extend CutSymbol # Return the sequence with spacing for alignment. Does not add whitespace # around cut symbols. # # Example: # pattern = 'n^ng^arraxt^n' # add_spacing( pattern ) # => "n^n g^a r r a x t^n" # # --- # *Arguments* # * +seq+: sequence with cut symbols # * +cs+: (_optional_) Cut symbol along the string. The reason this is # definable outside of CutSymbol is that this is a utility function used # to form vertical and horizontal cuts such as: # # a|t g c # +---+ # t a c|g # *Returns*:: +String+ sequence with single character distance between bases def add_spacing( seq, cs = cut_symbol ) str = '' flag = false seq.each_byte do |c| c = c.chr if c == cs str += c flag = false elsif flag str += ' ' + c else str += c flag = true end end str end # Remove extraneous nucleic acid wildcards ('n' padding) from the # left and right sides # # --- # *Arguments* # * +s+: sequence with extraneous 'n' padding # *Returns*:: +String+ sequence without 'n' padding on the sides def strip_padding( s ) if s[0].chr == 'n' s =~ %r{(n+)(.+)} s = $2 end if s[-1].chr == 'n' s =~ %r{(.+?)(n+)$} s = $1 end s end # Remove extraneous nucleic acid wildcards ('n' padding) from the # left and right sides and remove cut symbols # # --- # *Arguments* # * +s+: sequence with extraneous 'n' padding and cut symbols # *Returns*:: +String+ sequence without 'n' padding on the sides or cut symbols def strip_cuts_and_padding( s ) strip_padding( s.tr(cut_symbol, '') ) end # Return the 'n' padding on the left side of the strand # # --- # *Arguments* # * +s+: sequence with extraneous 'n' padding on the left side of the strand # *Returns*:: +String+ the 'n' padding from the left side def left_padding( s ) s =~ %r{^n+} ret = $& ret ? ret : '' # Don't pass nil values end # Return the 'n' padding on the right side of the strand # # --- # *Arguments* # * +s+: sequence with extraneous 'n' padding on the right side of the strand # *Returns*:: +String+ the 'n' padding from the right side def right_padding( s ) s =~ %r{n+$} ret = $& ret ? ret : '' # Don't pass nil values end end # StringFormatting end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/util/restriction_enzyme/dense_int_array.rb0000644000175000017500000001041614141516614023543 0ustar nileshnilesh# # bio/util/restriction_enzyme/dense_int_array.rb - Internal data storage for Bio::RestrictionEnzyme::Range::SequenceRange # # Copyright:: Copyright (C) 2011 # Naohisa Goto # Tomoaki NISHIYAMA # License:: The Ruby License # module Bio require 'bio/util/restriction_enzyme' unless const_defined?(:RestrictionEnzyme) class RestrictionEnzyme # a class to store integer numbers, containing many contiguous # integral numbers. # # Bio::RestrictionEnzyme internal use only. # Please do not create the instance outside Bio::RestrictionEnzyme. class DenseIntArray MutableRange = Struct.new(:first, :last) include Enumerable # Same usage as Array.[] def self.[](*args) a = self.new args.each do |elem| a.push elem end a end # creates a new object def initialize @data = [] end # initialize copy def initialize_copy(other) super(other) @data = @data.collect { |elem| elem.dup } end # sets internal data object def internal_data=(a) #clear_cache @data = a self end protected :internal_data= # gets internal data object def internal_data @data end protected :internal_data # Same usage as Array#[] def [](*arg) #$stderr.puts "SortedIntArray#[]" to_a[*arg] end # Not implemented def []=(*arg) raise NotImplementedError, 'DenseIntArray#[]= is not implemented.' end # Same usage as Array#each def each @data.each do |elem| elem.first.upto(elem.last) { |num| yield num } end self end # Same usage as Array#reverse_each def reverse_each @data.reverse_each do |elem| elem.last.downto(elem.first) { |num| yield num } end self end # Same usage as Array#+, but accepts only the same classes instance. def +(other) unless other.is_a?(self.class) then raise TypeError, 'unsupported data type' end tmpdata = @data + other.internal_data tmpdata.sort! { |a,b| a.first <=> b.first } result = self.class.new return result if tmpdata.empty? newdata = result.internal_data newdata.push tmpdata[0].dup (1...(tmpdata.size)).each do |i| if (x = newdata[-1].last) >= tmpdata[i].first then newdata[-1].last = tmpdata[i].last if tmpdata[i].last > x else newdata.push tmpdata[i].dup end end result end # Same usage as Array#== def ==(other) if r = super(other) then r elsif other.is_a?(self.class) then other.internal_data == @data else false end end # Same usage as Array#concat def concat(ary) ary.each { |elem| self.<<(elem) } self end # Same usage as Array#push def push(*args) args.each do |elem| self.<<(elem) end self end # Same usage as Array#unshift def unshift(*arg) raise NotImplementedError, 'DenseIntArray#unshift is not implemented.' end # Same usage as Array#<< def <<(elem) if !@data.empty? and @data[-1].last + 1 == elem then @data[-1].last = elem else @data << MutableRange.new(elem, elem) end self end # Same usage as Array#include? def include?(elem) return false if @data.empty? or elem < self.first or self.last < elem @data.any? do |range| range.first <= elem && elem <= range.last end end # Same usage as Array#first def first elem = @data.first elem ? elem.first : nil end # Same usage as Array#last def last elem = @data.last elem ? elem.last : nil end # Same usage as Array#size def size sum = 0 @data.each do |range| sum += (range.last - range.first + 1) end sum end alias length size # Same usage as Array#delete def delete(elem) raise NotImplementedError, 'DenseIntArray#delete is not implemented.' end # Does nothing def sort!(&block) # does nothing self end # Does nothing def uniq! # does nothing self end end #class DenseIntArray end #class RestrictionEnzyme end #module Bio bio-2.0.3/lib/bio/util/restriction_enzyme.rb0000644000175000017500000002012214141516614020370 0ustar nileshnilesh# # bio/util/restriction_enzyme.rb - Digests DNA based on restriction enzyme cut patterns # # Author:: Trevor Wennblom # Copyright:: Copyright (c) 2005-2007 Midwinter Laboratories, LLC (http://midwinterlabs.com) # License:: The Ruby License # module Bio autoload :REBASE, 'bio/db/rebase' unless const_defined?(:REBASE) # = Description # # Bio::RestrictionEnzyme allows you to fragment a DNA strand using one # or more restriction enzymes. Bio::RestrictionEnzyme is aware that # multiple enzymes may be competing for the same recognition site and # returns the various possible fragmentation patterns that result in # such circumstances. # # When using Bio::RestrictionEnzyme you may simply use the name of common # enzymes to cut your sequence or you may construct your own unique enzymes # to use. # # Visit the documentaion for individual classes for more information. # # An examination of the unit tests will also reveal several interesting uses # for the curious programmer. # # = Usage # # == Basic # # EcoRI cut pattern: # G|A A T T C # +-------+ # C T T A A|G # # This can also be written as: # G^AATTC # # Note that to use the method +cut_with_enzyme+ from a Bio::Sequence object # you currently must +require+ +bio/util/restriction_enzyme+ directly. If # instead you're going to directly call Bio::RestrictionEnzyme::Analysis # then only +bio+ needs to be +required+. # # require 'bio' # require 'bio/util/restriction_enzyme' # # seq = Bio::Sequence::NA.new('gaattc') # cuts = seq.cut_with_enzyme('EcoRI') # cuts.primary # => ["aattc", "g"] # cuts.complement # => ["cttaa", "g"] # cuts.inspect # => "[#, #]" # # seq = Bio::Sequence::NA.new('gaattc') # cuts = seq.cut_with_enzyme('g^aattc') # cuts.primary # => ["aattc", "g"] # cuts.complement # => ["cttaa", "g"] # # seq = Bio::Sequence::NA.new('gaattc') # cuts = seq.cut_with_enzyme('g^aattc', 'gaatt^c') # cuts.primary # => ["aattc", "c", "g", "gaatt"] # cuts.complement # => ["c", "cttaa", "g", "ttaag"] # # seq = Bio::Sequence::NA.new('gaattcgaattc') # cuts = seq.cut_with_enzyme('EcoRI') # cuts.primary # => ["aattc", "aattcg", "g"] # cuts.complement # => ["cttaa", "g", "gcttaa"] # # seq = Bio::Sequence::NA.new('gaattcgggaattc') # cuts = seq.cut_with_enzyme('EcoRI') # cuts.primary # => ["aattc", "aattcggg", "g"] # cuts.complement # => ["cttaa", "g", "gcccttaa"] # # cuts[0].inspect # => "#" # # cuts[0].primary # => "g " # cuts[0].complement # => "cttaa" # # cuts[1].primary # => "aattcggg " # cuts[1].complement # => " gcccttaa" # # cuts[2].primary # => "aattc" # cuts[2].complement # => " g" # # == Advanced # # require 'bio' # # enzyme_1 = Bio::RestrictionEnzyme.new('anna', [1,1], [3,3]) # enzyme_2 = Bio::RestrictionEnzyme.new('gg', [1,1]) # a = Bio::RestrictionEnzyme::Analysis.cut('agga', enzyme_1, enzyme_2) # a.primary # => ["a", "ag", "g", "ga"] # a.complement # => ["c", "ct", "t", "tc"] # # a[0].primary # => "ag" # a[0].complement # => "tc" # # a[1].primary # => "ga" # a[1].complement # => "ct" # # a[2].primary # => "a" # a[2].complement # => "t" # # a[3].primary # => "g" # a[3].complement # => "c" # # = Todo / under development # # * Circular DNA cutting # class RestrictionEnzyme #require 'bio/util/restriction_enzyme/cut_symbol' autoload :CutSymbol, 'bio/util/restriction_enzyme/cut_symbol' autoload :StringFormatting, 'bio/util/restriction_enzyme/string_formatting' autoload :SingleStrand, 'bio/util/restriction_enzyme/single_strand' autoload :SingleStrandComplement, 'bio/util/restriction_enzyme/single_strand_complement' autoload :DoubleStranded, 'bio/util/restriction_enzyme/double_stranded' autoload :Analysis, 'bio/util/restriction_enzyme/analysis' autoload :Range, 'bio/util/restriction_enzyme/range/sequence_range' autoload :SortedNumArray, 'bio/util/restriction_enzyme/sorted_num_array' autoload :DenseIntArray, 'bio/util/restriction_enzyme/dense_int_array' include CutSymbol extend CutSymbol # See Bio::RestrictionEnzyme::DoubleStranded.new for more information. # # --- # *Arguments* # * +users_enzyme_or_rebase_or_pattern+: One of three possible parameters: The name of an enzyme, a REBASE::EnzymeEntry object, or a nucleotide pattern with a cut mark. # * +cut_locations+: The cut locations in enzyme index notation. # *Returns*:: Bio::RestrictionEnzyme::DoubleStranded #-- # Factory for DoubleStranded #++ def self.new(users_enzyme_or_rebase_or_pattern, *cut_locations) DoubleStranded.new(users_enzyme_or_rebase_or_pattern, *cut_locations) end # REBASE enzyme data information # # Returns a Bio::REBASE object loaded with all of the enzyme data on file. # # --- # *Arguments* # * _none_ # *Returns*:: Bio::REBASE def self.rebase enzymes_yaml_file = File.join(File.dirname(File.expand_path(__FILE__)), 'restriction_enzyme', 'enzymes.yaml') @@rebase_enzymes ||= Bio::REBASE.load_yaml(enzymes_yaml_file) @@rebase_enzymes end # Check if supplied name is the name of an available enzyme # # See Bio::REBASE.enzyme_name? # # --- # *Arguments* # * +name+: Enzyme name # *Returns*:: +true+ _or_ +false+ def self.enzyme_name?( name ) self.rebase.enzyme_name?(name) end # See Bio::RestrictionEnzyme::Analysis.cut def self.cut( sequence, enzymes ) Bio::RestrictionEnzyme::Analysis.cut( sequence, enzymes ) end # A Bio::RestrictionEnzyme::Fragment is a DNA fragment composed of fused primary and # complementary strands that would be found floating in solution after a full # sequence is digested by one or more RestrictionEnzymes. # # You will notice that either the primary or complement strand will be # padded with spaces to make them line up according to the original DNA # configuration before they were cut. # # Example: # # Fragment 1: # primary = "attaca" # complement = " atga" # # Fragment 2: # primary = "g " # complement = "cta" # # View these with the +primary+ and +complement+ methods. # # Bio::RestrictionEnzyme::Fragment is a simple +Struct+ object. # # Note: unrelated to Bio::RestrictionEnzyme::Range::SequenceRange::Fragment Fragment = Struct.new(:primary, :complement, :p_left, :p_right, :c_left, :c_right) # Bio::RestrictionEnzyme::Fragments inherits from +Array+. # # Bio::RestrictionEnzyme::Fragments is a container for Fragment objects. It adds the # methods +primary+ and +complement+ which returns an +Array+ of all # respective strands from it's Fragment members in alphabetically sorted # order. Note that it will # not return duplicate items and does not return the spacing/padding # that you would # find by accessing the members directly. # # Example: # # primary = ['attaca', 'g'] # complement = ['atga', 'cta'] # # Note: unrelated to Bio::RestrictionEnzyme::Range::SequenceRange::Fragments class Fragments < Array def primary; strip_and_sort(:primary); end def complement; strip_and_sort(:complement); end protected def strip_and_sort( sym_strand ) self.map {|uf| uf.send( sym_strand ).tr(' ', '') }.sort end end end # RestrictionEnzyme end # Bio bio-2.0.3/lib/bio/data/0000755000175000017500000000000014141516614014046 5ustar nileshnileshbio-2.0.3/lib/bio/data/codontable.rb0000644000175000017500000016772314141516614016525 0ustar nileshnilesh# # = bio/data/codontable.rb - Codon Table # # Copyright:: Copyright (C) 2001, 2004 # Toshiaki Katayama # License:: The Ruby License # # # == Data source # # Data in this class is converted from the NCBI's genetic codes page. # # * (()) # # === Examples # # Obtain a codon table No.1 -- Standard (Eukaryote) # # table = Bio::CodonTable[1] # # Obtain a copy of the codon table No.1 to modify. In this example, # reassign a seleno cystein ('U') to the 'tga' codon. # # table = Bio::CodonTable.copy(1) # table['tga'] = 'U' # # Create a new codon table by your own from the Hash which contains # pairs of codon and amino acid. You can also define the table name # in the second argument. # # hash = { 'ttt' => 'F', 'ttc' => 'ttc', ... } # table = Bio::CodonTable.new(hash, "my codon table") # # Obtain a translated amino acid by codon. # # table = Bio::CodonTable[1] # table['ttt'] # => F # # Reverse translation of a amino acid into a list of relevant codons. # # table = Bio::CodonTable[1] # table.revtrans("A") # => ["gcg", "gct", "gca", "gcc"] # module Bio class CodonTable # Select a codon table by number. This method will return one of the # hard coded codon tables in this class as a Bio::CodonTable object. def self.[](i) hash = TABLES[i] raise "ERROR: Unknown codon table No.#{i}" unless hash if AMBIGUITY_CODON_TABLES != nil atable = AMBIGUITY_CODON_TABLES[i] else atable = nil end definition = DEFINITIONS[i] start = STARTS[i] stop = STOPS[i] self.new(hash, definition, start, stop, atable) end # Similar to Bio::CodonTable[num] but returns a copied codon table. # You can modify the codon table without influencing hard coded tables. def self.copy(i) ct = self[i] return Marshal.load(Marshal.dump(ct)) end # Create your own codon table by giving a Hash table of codons and relevant # amino acids. You can also able to define the table's name as a second # argument. # # Two Arrays 'start' and 'stop' can be specified which contains a list of # start and stop codons used by 'start_codon?' and 'stop_codon?' methods. def initialize(hash, definition = nil, start = [], stop = [], atable = nil) @table = hash if atable == nil @atable = gen_ambiguity_map(hash) else @atable = atable end @definition = definition @start = start @stop = stop.empty? ? generate_stop : stop end # Accessor methods for a Hash of the currently selected codon table. attr_accessor :table # Accessor methods for the name of the currently selected table. attr_accessor :definition # Accessor methods for an Array which contains a list of start or stop # codons respectively. attr_accessor :start, :stop # Compute possible ambiguity nucleotide code to amino acid conversion # the codon is defined when all decomposed codon translates to the # same amino acid / stop codon def gen_ambiguity_map(hash) nucleotide_sets={ 'a' => ['a'], 't' => ['t'], 'g' => ['g'], 'c' => ['c'], 'y' => ['t','c'], 'r' => ['a','g'], 'w' => ['a','t'], 's' => ['g','c'], 'k' => ['t','g'], 'm' => ['a','c'], 'b' => ['t','g','c'], 'd' => ['a','t','g'], 'h' => ['a','t','c'], 'v' => ['a','g','c'], 'n' => ['a','t','g','c'], } atable=Hash.new nucleotide_sets.keys.each{|n1| nucleotide_sets.keys.each{|n2| nucleotide_sets.keys.each{|n3| a = Array.new nucleotide_sets[n1].each{|c1| nucleotide_sets[n2].each{|c2| nucleotide_sets[n3].each{|c3| a << hash["#{c1}#{c2}#{c3}"] } } } a.uniq! atable["#{n1}#{n2}#{n3}"] = a.to_a[0] if a.size== 1 } } } atable end # Translate a codon into a relevant amino acid. This method is used for # translating a DNA sequence into amino acid sequence. def [](codon) @atable=gen_ambiguity_map(@table) if @atable == nil @atable[codon] end # Modify the codon table. Use with caution as it may break hard coded # tables. If you want to modify existing table, you should use copy # method instead of [] method to generate CodonTable object to be modified. # # # This is OK. # table = Bio::CodonTable.copy(1) # table['tga'] = 'U' # # # Not recommended as it overrides the hard coded table # table = Bio::CodonTable[1] # table['tga'] = 'U' # def []=(codon, aa) @table[codon] = aa @atable = nil end # Iterates on codon table hash. # # table = Bio::CodonTable[1] # table.each do |codon, aa| # puts "#{codon} -- #{aa}" # end # def each(&block) @table.each(&block) end # Reverse translation of a amino acid into a list of relevant codons. # # table = Bio::CodonTable[1] # table.revtrans("A") # => ["gcg", "gct", "gca", "gcc"] # def revtrans(aa) unless (defined? @reverse) && @reverse @reverse = {} @table.each do |k, v| @reverse[v] ||= [] @reverse[v] << k end end @reverse[aa.upcase] end # Returns true if the codon is a start codon in the currently selected # codon table, otherwise false. def start_codon?(codon) @start.include?(codon.downcase) end # Returns true if the codon is a stop codon in the currently selected # codon table, otherwise false. def stop_codon?(codon) @stop.include?(codon.downcase) end def generate_stop list = [] @table.each do |codon, aa| if aa == '*' list << codon end end return list end private :generate_stop DEFINITIONS = { 1 => "Standard (Eukaryote)", 2 => "Vertebrate Mitochondrial", 3 => "Yeast Mitochondorial", 4 => "Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma", 5 => "Invertebrate Mitochondrial", 6 => "Ciliate Macronuclear and Dasycladacean", 9 => "Echinoderm Mitochondrial", 10 => "Euplotid Nuclear", 11 => "Bacteria", 12 => "Alternative Yeast Nuclear", 13 => "Ascidian Mitochondrial", 14 => "Flatworm Mitochondrial", 15 => "Blepharisma Macronuclear", 16 => "Chlorophycean Mitochondrial", 21 => "Trematode Mitochondrial", 22 => "Scenedesmus obliquus mitochondrial", 23 => "Thraustochytrium Mitochondrial", } STARTS = { 1 => %w(ttg ctg atg gtg), # gtg added (cf. NCBI #SG1 document) 2 => %w(att atc ata atg gtg), 3 => %w(ata atg), 4 => %w(tta ttg ctg att atc ata atg gtg), 5 => %w(ttg att atc ata atg gtg), 6 => %w(atg), 9 => %w(atg gtg), 10 => %w(atg), 11 => %w(ttg ctg att atc ata atg gtg), 12 => %w(ctg atg), 13 => %w(atg), 14 => %w(atg), 15 => %w(atg), 16 => %w(atg), 21 => %w(atg gtg), 22 => %w(atg), 23 => %w(att atg gtg), } STOPS = { 1 => %w(taa tag tga), 2 => %w(taa tag aga agg), 3 => %w(taa tag), 4 => %w(taa tag), 5 => %w(taa tag), 6 => %w(tga), 9 => %w(taa tag), 10 => %w(taa tag), 11 => %w(taa tag tga), 12 => %w(taa tag tga), 13 => %w(taa tag), 14 => %w(tag), 15 => %w(taa tga), 16 => %w(taa tga), 21 => %w(taa tag), 22 => %w(tca taa tga), 23 => %w(tta taa tag tga), } TABLES = { # codon table 1 1 => { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => 'L', 'tca' => 'S', 'taa' => '*', 'tga' => '*', 'ttg' => 'L', 'tcg' => 'S', 'tag' => '*', 'tgg' => 'W', 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'L', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'I', 'aca' => 'T', 'aaa' => 'K', 'aga' => 'R', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'R', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', }, # codon table 2 2 => { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => 'L', 'tca' => 'S', 'taa' => '*', 'tga' => 'W', 'ttg' => 'L', 'tcg' => 'S', 'tag' => '*', 'tgg' => 'W', 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'L', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'M', 'aca' => 'T', 'aaa' => 'K', 'aga' => '*', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => '*', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', }, # codon table 3 3 => { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => 'L', 'tca' => 'S', 'taa' => '*', 'tga' => 'W', 'ttg' => 'L', 'tcg' => 'S', 'tag' => '*', 'tgg' => 'W', 'ctt' => 'T', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'T', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'T', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'T', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'M', 'aca' => 'T', 'aaa' => 'K', 'aga' => 'R', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'R', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', }, # codon table 4 4 => { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => 'L', 'tca' => 'S', 'taa' => '*', 'tga' => 'W', 'ttg' => 'L', 'tcg' => 'S', 'tag' => '*', 'tgg' => 'W', 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'L', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'I', 'aca' => 'T', 'aaa' => 'K', 'aga' => 'R', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'R', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', }, # codon table 5 5 => { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => 'L', 'tca' => 'S', 'taa' => '*', 'tga' => 'W', 'ttg' => 'L', 'tcg' => 'S', 'tag' => '*', 'tgg' => 'W', 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'L', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'M', 'aca' => 'T', 'aaa' => 'K', 'aga' => 'S', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'S', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', }, # codon table 6 6 => { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => 'L', 'tca' => 'S', 'taa' => 'Q', 'tga' => '*', 'ttg' => 'L', 'tcg' => 'S', 'tag' => 'Q', 'tgg' => 'W', 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'L', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'I', 'aca' => 'T', 'aaa' => 'K', 'aga' => 'R', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'R', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', }, # codon table 9 9 => { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => 'L', 'tca' => 'S', 'taa' => '*', 'tga' => 'W', 'ttg' => 'L', 'tcg' => 'S', 'tag' => '*', 'tgg' => 'W', 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'L', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'I', 'aca' => 'T', 'aaa' => 'N', 'aga' => 'S', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'S', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', }, # codon table 10 10 => { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => 'L', 'tca' => 'S', 'taa' => '*', 'tga' => 'C', 'ttg' => 'L', 'tcg' => 'S', 'tag' => '*', 'tgg' => 'W', 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'L', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'I', 'aca' => 'T', 'aaa' => 'K', 'aga' => 'R', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'R', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', }, # codon table 11 11 => { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => 'L', 'tca' => 'S', 'taa' => '*', 'tga' => '*', 'ttg' => 'L', 'tcg' => 'S', 'tag' => '*', 'tgg' => 'W', 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'L', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'I', 'aca' => 'T', 'aaa' => 'K', 'aga' => 'R', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'R', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', }, # codon table 12 12 => { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => 'L', 'tca' => 'S', 'taa' => '*', 'tga' => '*', 'ttg' => 'L', 'tcg' => 'S', 'tag' => '*', 'tgg' => 'W', 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'S', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'I', 'aca' => 'T', 'aaa' => 'K', 'aga' => 'R', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'R', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', }, # codon table 13 13 => { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => 'L', 'tca' => 'S', 'taa' => '*', 'tga' => 'W', 'ttg' => 'L', 'tcg' => 'S', 'tag' => '*', 'tgg' => 'W', 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'L', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'M', 'aca' => 'T', 'aaa' => 'K', 'aga' => 'G', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'G', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', }, # codon table 14 14 => { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => 'L', 'tca' => 'S', 'taa' => 'Y', 'tga' => 'W', 'ttg' => 'L', 'tcg' => 'S', 'tag' => '*', 'tgg' => 'W', 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'L', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'I', 'aca' => 'T', 'aaa' => 'N', 'aga' => 'S', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'S', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', }, # codon table 15 15 => { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => 'L', 'tca' => 'S', 'taa' => '*', 'tga' => '*', 'ttg' => 'L', 'tcg' => 'S', 'tag' => 'Q', 'tgg' => 'W', 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'L', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'I', 'aca' => 'T', 'aaa' => 'K', 'aga' => 'R', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'R', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', }, # codon table 16 16 => { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => 'L', 'tca' => 'S', 'taa' => '*', 'tga' => '*', 'ttg' => 'L', 'tcg' => 'S', 'tag' => 'L', 'tgg' => 'W', 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'L', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'I', 'aca' => 'T', 'aaa' => 'K', 'aga' => 'R', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'R', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', }, # codon table 21 21 => { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => 'L', 'tca' => 'S', 'taa' => '*', 'tga' => 'W', 'ttg' => 'L', 'tcg' => 'S', 'tag' => '*', 'tgg' => 'W', 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'L', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'M', 'aca' => 'T', 'aaa' => 'N', 'aga' => 'S', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'S', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', }, # codon table 22 22 => { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => 'L', 'tca' => '*', 'taa' => '*', 'tga' => '*', 'ttg' => 'L', 'tcg' => 'S', 'tag' => 'L', 'tgg' => 'W', 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'L', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'I', 'aca' => 'T', 'aaa' => 'K', 'aga' => 'R', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'R', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', }, # codon table 23 23 => { 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C', 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C', 'tta' => '*', 'tca' => 'S', 'taa' => '*', 'tga' => '*', 'ttg' => 'L', 'tcg' => 'S', 'tag' => '*', 'tgg' => 'W', 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R', 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R', 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R', 'ctg' => 'L', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R', 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S', 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S', 'ata' => 'I', 'aca' => 'T', 'aaa' => 'K', 'aga' => 'R', 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'R', 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G', 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G', 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G', 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G', }, } end # CodonTable end # module Bio module Bio class CodonTable AMBIGUITY_CODON_TABLES = { 1 => {"aaa"=>"K", "aat"=>"N", "aag"=>"K", "aac"=>"N", "aay"=>"N", "aar"=>"K", "ata"=>"I", "att"=>"I", "atg"=>"M", "atc"=>"I", "aty"=>"I", "atw"=>"I", "atm"=>"I", "ath"=>"I", "aga"=>"R", "agt"=>"S", "agg"=>"R", "agc"=>"S", "agy"=>"S", "agr"=>"R", "aca"=>"T", "act"=>"T", "acg"=>"T", "acc"=>"T", "acy"=>"T", "acr"=>"T", "acw"=>"T", "acs"=>"T", "ack"=>"T", "acm"=>"T", "acb"=>"T", "acd"=>"T", "ach"=>"T", "acv"=>"T", "acn"=>"T", "taa"=>"*", "tat"=>"Y", "tag"=>"*", "tac"=>"Y", "tay"=>"Y", "tar"=>"*", "tta"=>"L", "ttt"=>"F", "ttg"=>"L", "ttc"=>"F", "tty"=>"F", "ttr"=>"L", "tga"=>"*", "tgt"=>"C", "tgg"=>"W", "tgc"=>"C", "tgy"=>"C", "tca"=>"S", "tct"=>"S", "tcg"=>"S", "tcc"=>"S", "tcy"=>"S", "tcr"=>"S", "tcw"=>"S", "tcs"=>"S", "tck"=>"S", "tcm"=>"S", "tcb"=>"S", "tcd"=>"S", "tch"=>"S", "tcv"=>"S", "tcn"=>"S", "tra"=>"*", "gaa"=>"E", "gat"=>"D", "gag"=>"E", "gac"=>"D", "gay"=>"D", "gar"=>"E", "gta"=>"V", "gtt"=>"V", "gtg"=>"V", "gtc"=>"V", "gty"=>"V", "gtr"=>"V", "gtw"=>"V", "gts"=>"V", "gtk"=>"V", "gtm"=>"V", "gtb"=>"V", "gtd"=>"V", "gth"=>"V", "gtv"=>"V", "gtn"=>"V", "gga"=>"G", "ggt"=>"G", "ggg"=>"G", "ggc"=>"G", "ggy"=>"G", "ggr"=>"G", "ggw"=>"G", "ggs"=>"G", "ggk"=>"G", "ggm"=>"G", "ggb"=>"G", "ggd"=>"G", "ggh"=>"G", "ggv"=>"G", "ggn"=>"G", "gca"=>"A", "gct"=>"A", "gcg"=>"A", "gcc"=>"A", "gcy"=>"A", "gcr"=>"A", "gcw"=>"A", "gcs"=>"A", "gck"=>"A", "gcm"=>"A", "gcb"=>"A", "gcd"=>"A", "gch"=>"A", "gcv"=>"A", "gcn"=>"A", "caa"=>"Q", "cat"=>"H", "cag"=>"Q", "cac"=>"H", "cay"=>"H", "car"=>"Q", "cta"=>"L", "ctt"=>"L", "ctg"=>"L", "ctc"=>"L", "cty"=>"L", "ctr"=>"L", "ctw"=>"L", "cts"=>"L", "ctk"=>"L", "ctm"=>"L", "ctb"=>"L", "ctd"=>"L", "cth"=>"L", "ctv"=>"L", "ctn"=>"L", "cga"=>"R", "cgt"=>"R", "cgg"=>"R", "cgc"=>"R", "cgy"=>"R", "cgr"=>"R", "cgw"=>"R", "cgs"=>"R", "cgk"=>"R", "cgm"=>"R", "cgb"=>"R", "cgd"=>"R", "cgh"=>"R", "cgv"=>"R", "cgn"=>"R", "cca"=>"P", "cct"=>"P", "ccg"=>"P", "ccc"=>"P", "ccy"=>"P", "ccr"=>"P", "ccw"=>"P", "ccs"=>"P", "cck"=>"P", "ccm"=>"P", "ccb"=>"P", "ccd"=>"P", "cch"=>"P", "ccv"=>"P", "ccn"=>"P", "yta"=>"L", "ytg"=>"L", "ytr"=>"L", "mga"=>"R", "mgg"=>"R", "mgr"=>"R"}, 2 => {"aaa"=>"K", "aat"=>"N", "aag"=>"K", "aac"=>"N", "aay"=>"N", "aar"=>"K", "ata"=>"M", "att"=>"I", "atg"=>"M", "atc"=>"I", "aty"=>"I", "atr"=>"M", "aga"=>"*", "agt"=>"S", "agg"=>"*", "agc"=>"S", "agy"=>"S", "agr"=>"*", "aca"=>"T", "act"=>"T", "acg"=>"T", "acc"=>"T", "acy"=>"T", "acr"=>"T", "acw"=>"T", "acs"=>"T", "ack"=>"T", "acm"=>"T", "acb"=>"T", "acd"=>"T", "ach"=>"T", "acv"=>"T", "acn"=>"T", "taa"=>"*", "tat"=>"Y", "tag"=>"*", "tac"=>"Y", "tay"=>"Y", "tar"=>"*", "tta"=>"L", "ttt"=>"F", "ttg"=>"L", "ttc"=>"F", "tty"=>"F", "ttr"=>"L", "tga"=>"W", "tgt"=>"C", "tgg"=>"W", "tgc"=>"C", "tgy"=>"C", "tgr"=>"W", "tca"=>"S", "tct"=>"S", "tcg"=>"S", "tcc"=>"S", "tcy"=>"S", "tcr"=>"S", "tcw"=>"S", "tcs"=>"S", "tck"=>"S", "tcm"=>"S", "tcb"=>"S", "tcd"=>"S", "tch"=>"S", "tcv"=>"S", "tcn"=>"S", "gaa"=>"E", "gat"=>"D", "gag"=>"E", "gac"=>"D", "gay"=>"D", "gar"=>"E", "gta"=>"V", "gtt"=>"V", "gtg"=>"V", "gtc"=>"V", "gty"=>"V", "gtr"=>"V", "gtw"=>"V", "gts"=>"V", "gtk"=>"V", "gtm"=>"V", "gtb"=>"V", "gtd"=>"V", "gth"=>"V", "gtv"=>"V", "gtn"=>"V", "gga"=>"G", "ggt"=>"G", "ggg"=>"G", "ggc"=>"G", "ggy"=>"G", "ggr"=>"G", "ggw"=>"G", "ggs"=>"G", "ggk"=>"G", "ggm"=>"G", "ggb"=>"G", "ggd"=>"G", "ggh"=>"G", "ggv"=>"G", "ggn"=>"G", "gca"=>"A", "gct"=>"A", "gcg"=>"A", "gcc"=>"A", "gcy"=>"A", "gcr"=>"A", "gcw"=>"A", "gcs"=>"A", "gck"=>"A", "gcm"=>"A", "gcb"=>"A", "gcd"=>"A", "gch"=>"A", "gcv"=>"A", "gcn"=>"A", "caa"=>"Q", "cat"=>"H", "cag"=>"Q", "cac"=>"H", "cay"=>"H", "car"=>"Q", "cta"=>"L", "ctt"=>"L", "ctg"=>"L", "ctc"=>"L", "cty"=>"L", "ctr"=>"L", "ctw"=>"L", "cts"=>"L", "ctk"=>"L", "ctm"=>"L", "ctb"=>"L", "ctd"=>"L", "cth"=>"L", "ctv"=>"L", "ctn"=>"L", "cga"=>"R", "cgt"=>"R", "cgg"=>"R", "cgc"=>"R", "cgy"=>"R", "cgr"=>"R", "cgw"=>"R", "cgs"=>"R", "cgk"=>"R", "cgm"=>"R", "cgb"=>"R", "cgd"=>"R", "cgh"=>"R", "cgv"=>"R", "cgn"=>"R", "cca"=>"P", "cct"=>"P", "ccg"=>"P", "ccc"=>"P", "ccy"=>"P", "ccr"=>"P", "ccw"=>"P", "ccs"=>"P", "cck"=>"P", "ccm"=>"P", "ccb"=>"P", "ccd"=>"P", "cch"=>"P", "ccv"=>"P", "ccn"=>"P", "yta"=>"L", "ytg"=>"L", "ytr"=>"L"}, 3 => {"aaa"=>"K", "aat"=>"N", "aag"=>"K", "aac"=>"N", "aay"=>"N", "aar"=>"K", "ata"=>"M", "att"=>"I", "atg"=>"M", "atc"=>"I", "aty"=>"I", "atr"=>"M", "aga"=>"R", "agt"=>"S", "agg"=>"R", "agc"=>"S", "agy"=>"S", "agr"=>"R", "aca"=>"T", "act"=>"T", "acg"=>"T", "acc"=>"T", "acy"=>"T", "acr"=>"T", "acw"=>"T", "acs"=>"T", "ack"=>"T", "acm"=>"T", "acb"=>"T", "acd"=>"T", "ach"=>"T", "acv"=>"T", "acn"=>"T", "taa"=>"*", "tat"=>"Y", "tag"=>"*", "tac"=>"Y", "tay"=>"Y", "tar"=>"*", "tta"=>"L", "ttt"=>"F", "ttg"=>"L", "ttc"=>"F", "tty"=>"F", "ttr"=>"L", "tga"=>"W", "tgt"=>"C", "tgg"=>"W", "tgc"=>"C", "tgy"=>"C", "tgr"=>"W", "tca"=>"S", "tct"=>"S", "tcg"=>"S", "tcc"=>"S", "tcy"=>"S", "tcr"=>"S", "tcw"=>"S", "tcs"=>"S", "tck"=>"S", "tcm"=>"S", "tcb"=>"S", "tcd"=>"S", "tch"=>"S", "tcv"=>"S", "tcn"=>"S", "gaa"=>"E", "gat"=>"D", "gag"=>"E", "gac"=>"D", "gay"=>"D", "gar"=>"E", "gta"=>"V", "gtt"=>"V", "gtg"=>"V", "gtc"=>"V", "gty"=>"V", "gtr"=>"V", "gtw"=>"V", "gts"=>"V", "gtk"=>"V", "gtm"=>"V", "gtb"=>"V", "gtd"=>"V", "gth"=>"V", "gtv"=>"V", "gtn"=>"V", "gga"=>"G", "ggt"=>"G", "ggg"=>"G", "ggc"=>"G", "ggy"=>"G", "ggr"=>"G", "ggw"=>"G", "ggs"=>"G", "ggk"=>"G", "ggm"=>"G", "ggb"=>"G", "ggd"=>"G", "ggh"=>"G", "ggv"=>"G", "ggn"=>"G", "gca"=>"A", "gct"=>"A", "gcg"=>"A", "gcc"=>"A", "gcy"=>"A", "gcr"=>"A", "gcw"=>"A", "gcs"=>"A", "gck"=>"A", "gcm"=>"A", "gcb"=>"A", "gcd"=>"A", "gch"=>"A", "gcv"=>"A", "gcn"=>"A", "caa"=>"Q", "cat"=>"H", "cag"=>"Q", "cac"=>"H", "cay"=>"H", "car"=>"Q", "cta"=>"T", "ctt"=>"T", "ctg"=>"T", "ctc"=>"T", "cty"=>"T", "ctr"=>"T", "ctw"=>"T", "cts"=>"T", "ctk"=>"T", "ctm"=>"T", "ctb"=>"T", "ctd"=>"T", "cth"=>"T", "ctv"=>"T", "ctn"=>"T", "cga"=>"R", "cgt"=>"R", "cgg"=>"R", "cgc"=>"R", "cgy"=>"R", "cgr"=>"R", "cgw"=>"R", "cgs"=>"R", "cgk"=>"R", "cgm"=>"R", "cgb"=>"R", "cgd"=>"R", "cgh"=>"R", "cgv"=>"R", "cgn"=>"R", "cca"=>"P", "cct"=>"P", "ccg"=>"P", "ccc"=>"P", "ccy"=>"P", "ccr"=>"P", "ccw"=>"P", "ccs"=>"P", "cck"=>"P", "ccm"=>"P", "ccb"=>"P", "ccd"=>"P", "cch"=>"P", "ccv"=>"P", "ccn"=>"P", "mga"=>"R", "mgg"=>"R", "mgr"=>"R"}, 4 => {"aaa"=>"K", "aat"=>"N", "aag"=>"K", "aac"=>"N", "aay"=>"N", "aar"=>"K", "ata"=>"I", "att"=>"I", "atg"=>"M", "atc"=>"I", "aty"=>"I", "atw"=>"I", "atm"=>"I", "ath"=>"I", "aga"=>"R", "agt"=>"S", "agg"=>"R", "agc"=>"S", "agy"=>"S", "agr"=>"R", "aca"=>"T", "act"=>"T", "acg"=>"T", "acc"=>"T", "acy"=>"T", "acr"=>"T", "acw"=>"T", "acs"=>"T", "ack"=>"T", "acm"=>"T", "acb"=>"T", "acd"=>"T", "ach"=>"T", "acv"=>"T", "acn"=>"T", "taa"=>"*", "tat"=>"Y", "tag"=>"*", "tac"=>"Y", "tay"=>"Y", "tar"=>"*", "tta"=>"L", "ttt"=>"F", "ttg"=>"L", "ttc"=>"F", "tty"=>"F", "ttr"=>"L", "tga"=>"W", "tgt"=>"C", "tgg"=>"W", "tgc"=>"C", "tgy"=>"C", "tgr"=>"W", "tca"=>"S", "tct"=>"S", "tcg"=>"S", "tcc"=>"S", "tcy"=>"S", "tcr"=>"S", "tcw"=>"S", "tcs"=>"S", "tck"=>"S", "tcm"=>"S", "tcb"=>"S", "tcd"=>"S", "tch"=>"S", "tcv"=>"S", "tcn"=>"S", "gaa"=>"E", "gat"=>"D", "gag"=>"E", "gac"=>"D", "gay"=>"D", "gar"=>"E", "gta"=>"V", "gtt"=>"V", "gtg"=>"V", "gtc"=>"V", "gty"=>"V", "gtr"=>"V", "gtw"=>"V", "gts"=>"V", "gtk"=>"V", "gtm"=>"V", "gtb"=>"V", "gtd"=>"V", "gth"=>"V", "gtv"=>"V", "gtn"=>"V", "gga"=>"G", "ggt"=>"G", "ggg"=>"G", "ggc"=>"G", "ggy"=>"G", "ggr"=>"G", "ggw"=>"G", "ggs"=>"G", "ggk"=>"G", "ggm"=>"G", "ggb"=>"G", "ggd"=>"G", "ggh"=>"G", "ggv"=>"G", "ggn"=>"G", "gca"=>"A", "gct"=>"A", "gcg"=>"A", "gcc"=>"A", "gcy"=>"A", "gcr"=>"A", "gcw"=>"A", "gcs"=>"A", "gck"=>"A", "gcm"=>"A", "gcb"=>"A", "gcd"=>"A", "gch"=>"A", "gcv"=>"A", "gcn"=>"A", "caa"=>"Q", "cat"=>"H", "cag"=>"Q", "cac"=>"H", "cay"=>"H", "car"=>"Q", "cta"=>"L", "ctt"=>"L", "ctg"=>"L", "ctc"=>"L", "cty"=>"L", "ctr"=>"L", "ctw"=>"L", "cts"=>"L", "ctk"=>"L", "ctm"=>"L", "ctb"=>"L", "ctd"=>"L", "cth"=>"L", "ctv"=>"L", "ctn"=>"L", "cga"=>"R", "cgt"=>"R", "cgg"=>"R", "cgc"=>"R", "cgy"=>"R", "cgr"=>"R", "cgw"=>"R", "cgs"=>"R", "cgk"=>"R", "cgm"=>"R", "cgb"=>"R", "cgd"=>"R", "cgh"=>"R", "cgv"=>"R", "cgn"=>"R", "cca"=>"P", "cct"=>"P", "ccg"=>"P", "ccc"=>"P", "ccy"=>"P", "ccr"=>"P", "ccw"=>"P", "ccs"=>"P", "cck"=>"P", "ccm"=>"P", "ccb"=>"P", "ccd"=>"P", "cch"=>"P", "ccv"=>"P", "ccn"=>"P", "yta"=>"L", "ytg"=>"L", "ytr"=>"L", "mga"=>"R", "mgg"=>"R", "mgr"=>"R"}, 5 => {"aaa"=>"K", "aat"=>"N", "aag"=>"K", "aac"=>"N", "aay"=>"N", "aar"=>"K", "ata"=>"M", "att"=>"I", "atg"=>"M", "atc"=>"I", "aty"=>"I", "atr"=>"M", "aga"=>"S", "agt"=>"S", "agg"=>"S", "agc"=>"S", "agy"=>"S", "agr"=>"S", "agw"=>"S", "ags"=>"S", "agk"=>"S", "agm"=>"S", "agb"=>"S", "agd"=>"S", "agh"=>"S", "agv"=>"S", "agn"=>"S", "aca"=>"T", "act"=>"T", "acg"=>"T", "acc"=>"T", "acy"=>"T", "acr"=>"T", "acw"=>"T", "acs"=>"T", "ack"=>"T", "acm"=>"T", "acb"=>"T", "acd"=>"T", "ach"=>"T", "acv"=>"T", "acn"=>"T", "taa"=>"*", "tat"=>"Y", "tag"=>"*", "tac"=>"Y", "tay"=>"Y", "tar"=>"*", "tta"=>"L", "ttt"=>"F", "ttg"=>"L", "ttc"=>"F", "tty"=>"F", "ttr"=>"L", "tga"=>"W", "tgt"=>"C", "tgg"=>"W", "tgc"=>"C", "tgy"=>"C", "tgr"=>"W", "tca"=>"S", "tct"=>"S", "tcg"=>"S", "tcc"=>"S", "tcy"=>"S", "tcr"=>"S", "tcw"=>"S", "tcs"=>"S", "tck"=>"S", "tcm"=>"S", "tcb"=>"S", "tcd"=>"S", "tch"=>"S", "tcv"=>"S", "tcn"=>"S", "gaa"=>"E", "gat"=>"D", "gag"=>"E", "gac"=>"D", "gay"=>"D", "gar"=>"E", "gta"=>"V", "gtt"=>"V", "gtg"=>"V", "gtc"=>"V", "gty"=>"V", "gtr"=>"V", "gtw"=>"V", "gts"=>"V", "gtk"=>"V", "gtm"=>"V", "gtb"=>"V", "gtd"=>"V", "gth"=>"V", "gtv"=>"V", "gtn"=>"V", "gga"=>"G", "ggt"=>"G", "ggg"=>"G", "ggc"=>"G", "ggy"=>"G", "ggr"=>"G", "ggw"=>"G", "ggs"=>"G", "ggk"=>"G", "ggm"=>"G", "ggb"=>"G", "ggd"=>"G", "ggh"=>"G", "ggv"=>"G", "ggn"=>"G", "gca"=>"A", "gct"=>"A", "gcg"=>"A", "gcc"=>"A", "gcy"=>"A", "gcr"=>"A", "gcw"=>"A", "gcs"=>"A", "gck"=>"A", "gcm"=>"A", "gcb"=>"A", "gcd"=>"A", "gch"=>"A", "gcv"=>"A", "gcn"=>"A", "caa"=>"Q", "cat"=>"H", "cag"=>"Q", "cac"=>"H", "cay"=>"H", "car"=>"Q", "cta"=>"L", "ctt"=>"L", "ctg"=>"L", "ctc"=>"L", "cty"=>"L", "ctr"=>"L", "ctw"=>"L", "cts"=>"L", "ctk"=>"L", "ctm"=>"L", "ctb"=>"L", "ctd"=>"L", "cth"=>"L", "ctv"=>"L", "ctn"=>"L", "cga"=>"R", "cgt"=>"R", "cgg"=>"R", "cgc"=>"R", "cgy"=>"R", "cgr"=>"R", "cgw"=>"R", "cgs"=>"R", "cgk"=>"R", "cgm"=>"R", "cgb"=>"R", "cgd"=>"R", "cgh"=>"R", "cgv"=>"R", "cgn"=>"R", "cca"=>"P", "cct"=>"P", "ccg"=>"P", "ccc"=>"P", "ccy"=>"P", "ccr"=>"P", "ccw"=>"P", "ccs"=>"P", "cck"=>"P", "ccm"=>"P", "ccb"=>"P", "ccd"=>"P", "cch"=>"P", "ccv"=>"P", "ccn"=>"P", "yta"=>"L", "ytg"=>"L", "ytr"=>"L"}, 6 => {"aaa"=>"K", "aat"=>"N", "aag"=>"K", "aac"=>"N", "aay"=>"N", "aar"=>"K", "ata"=>"I", "att"=>"I", "atg"=>"M", "atc"=>"I", "aty"=>"I", "atw"=>"I", "atm"=>"I", "ath"=>"I", "aga"=>"R", "agt"=>"S", "agg"=>"R", "agc"=>"S", "agy"=>"S", "agr"=>"R", "aca"=>"T", "act"=>"T", "acg"=>"T", "acc"=>"T", "acy"=>"T", "acr"=>"T", "acw"=>"T", "acs"=>"T", "ack"=>"T", "acm"=>"T", "acb"=>"T", "acd"=>"T", "ach"=>"T", "acv"=>"T", "acn"=>"T", "taa"=>"Q", "tat"=>"Y", "tag"=>"Q", "tac"=>"Y", "tay"=>"Y", "tar"=>"Q", "tta"=>"L", "ttt"=>"F", "ttg"=>"L", "ttc"=>"F", "tty"=>"F", "ttr"=>"L", "tga"=>"*", "tgt"=>"C", "tgg"=>"W", "tgc"=>"C", "tgy"=>"C", "tca"=>"S", "tct"=>"S", "tcg"=>"S", "tcc"=>"S", "tcy"=>"S", "tcr"=>"S", "tcw"=>"S", "tcs"=>"S", "tck"=>"S", "tcm"=>"S", "tcb"=>"S", "tcd"=>"S", "tch"=>"S", "tcv"=>"S", "tcn"=>"S", "gaa"=>"E", "gat"=>"D", "gag"=>"E", "gac"=>"D", "gay"=>"D", "gar"=>"E", "gta"=>"V", "gtt"=>"V", "gtg"=>"V", "gtc"=>"V", "gty"=>"V", "gtr"=>"V", "gtw"=>"V", "gts"=>"V", "gtk"=>"V", "gtm"=>"V", "gtb"=>"V", "gtd"=>"V", "gth"=>"V", "gtv"=>"V", "gtn"=>"V", "gga"=>"G", "ggt"=>"G", "ggg"=>"G", "ggc"=>"G", "ggy"=>"G", "ggr"=>"G", "ggw"=>"G", "ggs"=>"G", "ggk"=>"G", "ggm"=>"G", "ggb"=>"G", "ggd"=>"G", "ggh"=>"G", "ggv"=>"G", "ggn"=>"G", "gca"=>"A", "gct"=>"A", "gcg"=>"A", "gcc"=>"A", "gcy"=>"A", "gcr"=>"A", "gcw"=>"A", "gcs"=>"A", "gck"=>"A", "gcm"=>"A", "gcb"=>"A", "gcd"=>"A", "gch"=>"A", "gcv"=>"A", "gcn"=>"A", "caa"=>"Q", "cat"=>"H", "cag"=>"Q", "cac"=>"H", "cay"=>"H", "car"=>"Q", "cta"=>"L", "ctt"=>"L", "ctg"=>"L", "ctc"=>"L", "cty"=>"L", "ctr"=>"L", "ctw"=>"L", "cts"=>"L", "ctk"=>"L", "ctm"=>"L", "ctb"=>"L", "ctd"=>"L", "cth"=>"L", "ctv"=>"L", "ctn"=>"L", "cga"=>"R", "cgt"=>"R", "cgg"=>"R", "cgc"=>"R", "cgy"=>"R", "cgr"=>"R", "cgw"=>"R", "cgs"=>"R", "cgk"=>"R", "cgm"=>"R", "cgb"=>"R", "cgd"=>"R", "cgh"=>"R", "cgv"=>"R", "cgn"=>"R", "cca"=>"P", "cct"=>"P", "ccg"=>"P", "ccc"=>"P", "ccy"=>"P", "ccr"=>"P", "ccw"=>"P", "ccs"=>"P", "cck"=>"P", "ccm"=>"P", "ccb"=>"P", "ccd"=>"P", "cch"=>"P", "ccv"=>"P", "ccn"=>"P", "yaa"=>"Q", "yag"=>"Q", "yar"=>"Q", "yta"=>"L", "ytg"=>"L", "ytr"=>"L", "mga"=>"R", "mgg"=>"R", "mgr"=>"R"}, 9 => {"aaa"=>"N", "aat"=>"N", "aag"=>"K", "aac"=>"N", "aay"=>"N", "aaw"=>"N", "aam"=>"N", "aah"=>"N", "ata"=>"I", "att"=>"I", "atg"=>"M", "atc"=>"I", "aty"=>"I", "atw"=>"I", "atm"=>"I", "ath"=>"I", "aga"=>"S", "agt"=>"S", "agg"=>"S", "agc"=>"S", "agy"=>"S", "agr"=>"S", "agw"=>"S", "ags"=>"S", "agk"=>"S", "agm"=>"S", "agb"=>"S", "agd"=>"S", "agh"=>"S", "agv"=>"S", "agn"=>"S", "aca"=>"T", "act"=>"T", "acg"=>"T", "acc"=>"T", "acy"=>"T", "acr"=>"T", "acw"=>"T", "acs"=>"T", "ack"=>"T", "acm"=>"T", "acb"=>"T", "acd"=>"T", "ach"=>"T", "acv"=>"T", "acn"=>"T", "taa"=>"*", "tat"=>"Y", "tag"=>"*", "tac"=>"Y", "tay"=>"Y", "tar"=>"*", "tta"=>"L", "ttt"=>"F", "ttg"=>"L", "ttc"=>"F", "tty"=>"F", "ttr"=>"L", "tga"=>"W", "tgt"=>"C", "tgg"=>"W", "tgc"=>"C", "tgy"=>"C", "tgr"=>"W", "tca"=>"S", "tct"=>"S", "tcg"=>"S", "tcc"=>"S", "tcy"=>"S", "tcr"=>"S", "tcw"=>"S", "tcs"=>"S", "tck"=>"S", "tcm"=>"S", "tcb"=>"S", "tcd"=>"S", "tch"=>"S", "tcv"=>"S", "tcn"=>"S", "gaa"=>"E", "gat"=>"D", "gag"=>"E", "gac"=>"D", "gay"=>"D", "gar"=>"E", "gta"=>"V", "gtt"=>"V", "gtg"=>"V", "gtc"=>"V", "gty"=>"V", "gtr"=>"V", "gtw"=>"V", "gts"=>"V", "gtk"=>"V", "gtm"=>"V", "gtb"=>"V", "gtd"=>"V", "gth"=>"V", "gtv"=>"V", "gtn"=>"V", "gga"=>"G", "ggt"=>"G", "ggg"=>"G", "ggc"=>"G", "ggy"=>"G", "ggr"=>"G", "ggw"=>"G", "ggs"=>"G", "ggk"=>"G", "ggm"=>"G", "ggb"=>"G", "ggd"=>"G", "ggh"=>"G", "ggv"=>"G", "ggn"=>"G", "gca"=>"A", "gct"=>"A", "gcg"=>"A", "gcc"=>"A", "gcy"=>"A", "gcr"=>"A", "gcw"=>"A", "gcs"=>"A", "gck"=>"A", "gcm"=>"A", "gcb"=>"A", "gcd"=>"A", "gch"=>"A", "gcv"=>"A", "gcn"=>"A", "caa"=>"Q", "cat"=>"H", "cag"=>"Q", "cac"=>"H", "cay"=>"H", "car"=>"Q", "cta"=>"L", "ctt"=>"L", "ctg"=>"L", "ctc"=>"L", "cty"=>"L", "ctr"=>"L", "ctw"=>"L", "cts"=>"L", "ctk"=>"L", "ctm"=>"L", "ctb"=>"L", "ctd"=>"L", "cth"=>"L", "ctv"=>"L", "ctn"=>"L", "cga"=>"R", "cgt"=>"R", "cgg"=>"R", "cgc"=>"R", "cgy"=>"R", "cgr"=>"R", "cgw"=>"R", "cgs"=>"R", "cgk"=>"R", "cgm"=>"R", "cgb"=>"R", "cgd"=>"R", "cgh"=>"R", "cgv"=>"R", "cgn"=>"R", "cca"=>"P", "cct"=>"P", "ccg"=>"P", "ccc"=>"P", "ccy"=>"P", "ccr"=>"P", "ccw"=>"P", "ccs"=>"P", "cck"=>"P", "ccm"=>"P", "ccb"=>"P", "ccd"=>"P", "cch"=>"P", "ccv"=>"P", "ccn"=>"P", "yta"=>"L", "ytg"=>"L", "ytr"=>"L"}, 10 => {"aaa"=>"K", "aat"=>"N", "aag"=>"K", "aac"=>"N", "aay"=>"N", "aar"=>"K", "ata"=>"I", "att"=>"I", "atg"=>"M", "atc"=>"I", "aty"=>"I", "atw"=>"I", "atm"=>"I", "ath"=>"I", "aga"=>"R", "agt"=>"S", "agg"=>"R", "agc"=>"S", "agy"=>"S", "agr"=>"R", "aca"=>"T", "act"=>"T", "acg"=>"T", "acc"=>"T", "acy"=>"T", "acr"=>"T", "acw"=>"T", "acs"=>"T", "ack"=>"T", "acm"=>"T", "acb"=>"T", "acd"=>"T", "ach"=>"T", "acv"=>"T", "acn"=>"T", "taa"=>"*", "tat"=>"Y", "tag"=>"*", "tac"=>"Y", "tay"=>"Y", "tar"=>"*", "tta"=>"L", "ttt"=>"F", "ttg"=>"L", "ttc"=>"F", "tty"=>"F", "ttr"=>"L", "tga"=>"C", "tgt"=>"C", "tgg"=>"W", "tgc"=>"C", "tgy"=>"C", "tgw"=>"C", "tgm"=>"C", "tgh"=>"C", "tca"=>"S", "tct"=>"S", "tcg"=>"S", "tcc"=>"S", "tcy"=>"S", "tcr"=>"S", "tcw"=>"S", "tcs"=>"S", "tck"=>"S", "tcm"=>"S", "tcb"=>"S", "tcd"=>"S", "tch"=>"S", "tcv"=>"S", "tcn"=>"S", "gaa"=>"E", "gat"=>"D", "gag"=>"E", "gac"=>"D", "gay"=>"D", "gar"=>"E", "gta"=>"V", "gtt"=>"V", "gtg"=>"V", "gtc"=>"V", "gty"=>"V", "gtr"=>"V", "gtw"=>"V", "gts"=>"V", "gtk"=>"V", "gtm"=>"V", "gtb"=>"V", "gtd"=>"V", "gth"=>"V", "gtv"=>"V", "gtn"=>"V", "gga"=>"G", "ggt"=>"G", "ggg"=>"G", "ggc"=>"G", "ggy"=>"G", "ggr"=>"G", "ggw"=>"G", "ggs"=>"G", "ggk"=>"G", "ggm"=>"G", "ggb"=>"G", "ggd"=>"G", "ggh"=>"G", "ggv"=>"G", "ggn"=>"G", "gca"=>"A", "gct"=>"A", "gcg"=>"A", "gcc"=>"A", "gcy"=>"A", "gcr"=>"A", "gcw"=>"A", "gcs"=>"A", "gck"=>"A", "gcm"=>"A", "gcb"=>"A", "gcd"=>"A", "gch"=>"A", "gcv"=>"A", "gcn"=>"A", "caa"=>"Q", "cat"=>"H", "cag"=>"Q", "cac"=>"H", "cay"=>"H", "car"=>"Q", "cta"=>"L", "ctt"=>"L", "ctg"=>"L", "ctc"=>"L", "cty"=>"L", "ctr"=>"L", "ctw"=>"L", "cts"=>"L", "ctk"=>"L", "ctm"=>"L", "ctb"=>"L", "ctd"=>"L", "cth"=>"L", "ctv"=>"L", "ctn"=>"L", "cga"=>"R", "cgt"=>"R", "cgg"=>"R", "cgc"=>"R", "cgy"=>"R", "cgr"=>"R", "cgw"=>"R", "cgs"=>"R", "cgk"=>"R", "cgm"=>"R", "cgb"=>"R", "cgd"=>"R", "cgh"=>"R", "cgv"=>"R", "cgn"=>"R", "cca"=>"P", "cct"=>"P", "ccg"=>"P", "ccc"=>"P", "ccy"=>"P", "ccr"=>"P", "ccw"=>"P", "ccs"=>"P", "cck"=>"P", "ccm"=>"P", "ccb"=>"P", "ccd"=>"P", "cch"=>"P", "ccv"=>"P", "ccn"=>"P", "yta"=>"L", "ytg"=>"L", "ytr"=>"L", "mga"=>"R", "mgg"=>"R", "mgr"=>"R"}, 11 => {"aaa"=>"K", "aat"=>"N", "aag"=>"K", "aac"=>"N", "aay"=>"N", "aar"=>"K", "ata"=>"I", "att"=>"I", "atg"=>"M", "atc"=>"I", "aty"=>"I", "atw"=>"I", "atm"=>"I", "ath"=>"I", "aga"=>"R", "agt"=>"S", "agg"=>"R", "agc"=>"S", "agy"=>"S", "agr"=>"R", "aca"=>"T", "act"=>"T", "acg"=>"T", "acc"=>"T", "acy"=>"T", "acr"=>"T", "acw"=>"T", "acs"=>"T", "ack"=>"T", "acm"=>"T", "acb"=>"T", "acd"=>"T", "ach"=>"T", "acv"=>"T", "acn"=>"T", "taa"=>"*", "tat"=>"Y", "tag"=>"*", "tac"=>"Y", "tay"=>"Y", "tar"=>"*", "tta"=>"L", "ttt"=>"F", "ttg"=>"L", "ttc"=>"F", "tty"=>"F", "ttr"=>"L", "tga"=>"*", "tgt"=>"C", "tgg"=>"W", "tgc"=>"C", "tgy"=>"C", "tca"=>"S", "tct"=>"S", "tcg"=>"S", "tcc"=>"S", "tcy"=>"S", "tcr"=>"S", "tcw"=>"S", "tcs"=>"S", "tck"=>"S", "tcm"=>"S", "tcb"=>"S", "tcd"=>"S", "tch"=>"S", "tcv"=>"S", "tcn"=>"S", "tra"=>"*", "gaa"=>"E", "gat"=>"D", "gag"=>"E", "gac"=>"D", "gay"=>"D", "gar"=>"E", "gta"=>"V", "gtt"=>"V", "gtg"=>"V", "gtc"=>"V", "gty"=>"V", "gtr"=>"V", "gtw"=>"V", "gts"=>"V", "gtk"=>"V", "gtm"=>"V", "gtb"=>"V", "gtd"=>"V", "gth"=>"V", "gtv"=>"V", "gtn"=>"V", "gga"=>"G", "ggt"=>"G", "ggg"=>"G", "ggc"=>"G", "ggy"=>"G", "ggr"=>"G", "ggw"=>"G", "ggs"=>"G", "ggk"=>"G", "ggm"=>"G", "ggb"=>"G", "ggd"=>"G", "ggh"=>"G", "ggv"=>"G", "ggn"=>"G", "gca"=>"A", "gct"=>"A", "gcg"=>"A", "gcc"=>"A", "gcy"=>"A", "gcr"=>"A", "gcw"=>"A", "gcs"=>"A", "gck"=>"A", "gcm"=>"A", "gcb"=>"A", "gcd"=>"A", "gch"=>"A", "gcv"=>"A", "gcn"=>"A", "caa"=>"Q", "cat"=>"H", "cag"=>"Q", "cac"=>"H", "cay"=>"H", "car"=>"Q", "cta"=>"L", "ctt"=>"L", "ctg"=>"L", "ctc"=>"L", "cty"=>"L", "ctr"=>"L", "ctw"=>"L", "cts"=>"L", "ctk"=>"L", "ctm"=>"L", "ctb"=>"L", "ctd"=>"L", "cth"=>"L", "ctv"=>"L", "ctn"=>"L", "cga"=>"R", "cgt"=>"R", "cgg"=>"R", "cgc"=>"R", "cgy"=>"R", "cgr"=>"R", "cgw"=>"R", "cgs"=>"R", "cgk"=>"R", "cgm"=>"R", "cgb"=>"R", "cgd"=>"R", "cgh"=>"R", "cgv"=>"R", "cgn"=>"R", "cca"=>"P", "cct"=>"P", "ccg"=>"P", "ccc"=>"P", "ccy"=>"P", "ccr"=>"P", "ccw"=>"P", "ccs"=>"P", "cck"=>"P", "ccm"=>"P", "ccb"=>"P", "ccd"=>"P", "cch"=>"P", "ccv"=>"P", "ccn"=>"P", "yta"=>"L", "ytg"=>"L", "ytr"=>"L", "mga"=>"R", "mgg"=>"R", "mgr"=>"R"}, 12 => {"aaa"=>"K", "aat"=>"N", "aag"=>"K", "aac"=>"N", "aay"=>"N", "aar"=>"K", "ata"=>"I", "att"=>"I", "atg"=>"M", "atc"=>"I", "aty"=>"I", "atw"=>"I", "atm"=>"I", "ath"=>"I", "aga"=>"R", "agt"=>"S", "agg"=>"R", "agc"=>"S", "agy"=>"S", "agr"=>"R", "aca"=>"T", "act"=>"T", "acg"=>"T", "acc"=>"T", "acy"=>"T", "acr"=>"T", "acw"=>"T", "acs"=>"T", "ack"=>"T", "acm"=>"T", "acb"=>"T", "acd"=>"T", "ach"=>"T", "acv"=>"T", "acn"=>"T", "taa"=>"*", "tat"=>"Y", "tag"=>"*", "tac"=>"Y", "tay"=>"Y", "tar"=>"*", "tta"=>"L", "ttt"=>"F", "ttg"=>"L", "ttc"=>"F", "tty"=>"F", "ttr"=>"L", "tga"=>"*", "tgt"=>"C", "tgg"=>"W", "tgc"=>"C", "tgy"=>"C", "tca"=>"S", "tct"=>"S", "tcg"=>"S", "tcc"=>"S", "tcy"=>"S", "tcr"=>"S", "tcw"=>"S", "tcs"=>"S", "tck"=>"S", "tcm"=>"S", "tcb"=>"S", "tcd"=>"S", "tch"=>"S", "tcv"=>"S", "tcn"=>"S", "tra"=>"*", "gaa"=>"E", "gat"=>"D", "gag"=>"E", "gac"=>"D", "gay"=>"D", "gar"=>"E", "gta"=>"V", "gtt"=>"V", "gtg"=>"V", "gtc"=>"V", "gty"=>"V", "gtr"=>"V", "gtw"=>"V", "gts"=>"V", "gtk"=>"V", "gtm"=>"V", "gtb"=>"V", "gtd"=>"V", "gth"=>"V", "gtv"=>"V", "gtn"=>"V", "gga"=>"G", "ggt"=>"G", "ggg"=>"G", "ggc"=>"G", "ggy"=>"G", "ggr"=>"G", "ggw"=>"G", "ggs"=>"G", "ggk"=>"G", "ggm"=>"G", "ggb"=>"G", "ggd"=>"G", "ggh"=>"G", "ggv"=>"G", "ggn"=>"G", "gca"=>"A", "gct"=>"A", "gcg"=>"A", "gcc"=>"A", "gcy"=>"A", "gcr"=>"A", "gcw"=>"A", "gcs"=>"A", "gck"=>"A", "gcm"=>"A", "gcb"=>"A", "gcd"=>"A", "gch"=>"A", "gcv"=>"A", "gcn"=>"A", "caa"=>"Q", "cat"=>"H", "cag"=>"Q", "cac"=>"H", "cay"=>"H", "car"=>"Q", "cta"=>"L", "ctt"=>"L", "ctg"=>"S", "ctc"=>"L", "cty"=>"L", "ctw"=>"L", "ctm"=>"L", "cth"=>"L", "cga"=>"R", "cgt"=>"R", "cgg"=>"R", "cgc"=>"R", "cgy"=>"R", "cgr"=>"R", "cgw"=>"R", "cgs"=>"R", "cgk"=>"R", "cgm"=>"R", "cgb"=>"R", "cgd"=>"R", "cgh"=>"R", "cgv"=>"R", "cgn"=>"R", "cca"=>"P", "cct"=>"P", "ccg"=>"P", "ccc"=>"P", "ccy"=>"P", "ccr"=>"P", "ccw"=>"P", "ccs"=>"P", "cck"=>"P", "ccm"=>"P", "ccb"=>"P", "ccd"=>"P", "cch"=>"P", "ccv"=>"P", "ccn"=>"P", "yta"=>"L", "mga"=>"R", "mgg"=>"R", "mgr"=>"R"}, 13 => {"aaa"=>"K", "aat"=>"N", "aag"=>"K", "aac"=>"N", "aay"=>"N", "aar"=>"K", "ata"=>"M", "att"=>"I", "atg"=>"M", "atc"=>"I", "aty"=>"I", "atr"=>"M", "aga"=>"G", "agt"=>"S", "agg"=>"G", "agc"=>"S", "agy"=>"S", "agr"=>"G", "aca"=>"T", "act"=>"T", "acg"=>"T", "acc"=>"T", "acy"=>"T", "acr"=>"T", "acw"=>"T", "acs"=>"T", "ack"=>"T", "acm"=>"T", "acb"=>"T", "acd"=>"T", "ach"=>"T", "acv"=>"T", "acn"=>"T", "taa"=>"*", "tat"=>"Y", "tag"=>"*", "tac"=>"Y", "tay"=>"Y", "tar"=>"*", "tta"=>"L", "ttt"=>"F", "ttg"=>"L", "ttc"=>"F", "tty"=>"F", "ttr"=>"L", "tga"=>"W", "tgt"=>"C", "tgg"=>"W", "tgc"=>"C", "tgy"=>"C", "tgr"=>"W", "tca"=>"S", "tct"=>"S", "tcg"=>"S", "tcc"=>"S", "tcy"=>"S", "tcr"=>"S", "tcw"=>"S", "tcs"=>"S", "tck"=>"S", "tcm"=>"S", "tcb"=>"S", "tcd"=>"S", "tch"=>"S", "tcv"=>"S", "tcn"=>"S", "gaa"=>"E", "gat"=>"D", "gag"=>"E", "gac"=>"D", "gay"=>"D", "gar"=>"E", "gta"=>"V", "gtt"=>"V", "gtg"=>"V", "gtc"=>"V", "gty"=>"V", "gtr"=>"V", "gtw"=>"V", "gts"=>"V", "gtk"=>"V", "gtm"=>"V", "gtb"=>"V", "gtd"=>"V", "gth"=>"V", "gtv"=>"V", "gtn"=>"V", "gga"=>"G", "ggt"=>"G", "ggg"=>"G", "ggc"=>"G", "ggy"=>"G", "ggr"=>"G", "ggw"=>"G", "ggs"=>"G", "ggk"=>"G", "ggm"=>"G", "ggb"=>"G", "ggd"=>"G", "ggh"=>"G", "ggv"=>"G", "ggn"=>"G", "gca"=>"A", "gct"=>"A", "gcg"=>"A", "gcc"=>"A", "gcy"=>"A", "gcr"=>"A", "gcw"=>"A", "gcs"=>"A", "gck"=>"A", "gcm"=>"A", "gcb"=>"A", "gcd"=>"A", "gch"=>"A", "gcv"=>"A", "gcn"=>"A", "caa"=>"Q", "cat"=>"H", "cag"=>"Q", "cac"=>"H", "cay"=>"H", "car"=>"Q", "cta"=>"L", "ctt"=>"L", "ctg"=>"L", "ctc"=>"L", "cty"=>"L", "ctr"=>"L", "ctw"=>"L", "cts"=>"L", "ctk"=>"L", "ctm"=>"L", "ctb"=>"L", "ctd"=>"L", "cth"=>"L", "ctv"=>"L", "ctn"=>"L", "cga"=>"R", "cgt"=>"R", "cgg"=>"R", "cgc"=>"R", "cgy"=>"R", "cgr"=>"R", "cgw"=>"R", "cgs"=>"R", "cgk"=>"R", "cgm"=>"R", "cgb"=>"R", "cgd"=>"R", "cgh"=>"R", "cgv"=>"R", "cgn"=>"R", "cca"=>"P", "cct"=>"P", "ccg"=>"P", "ccc"=>"P", "ccy"=>"P", "ccr"=>"P", "ccw"=>"P", "ccs"=>"P", "cck"=>"P", "ccm"=>"P", "ccb"=>"P", "ccd"=>"P", "cch"=>"P", "ccv"=>"P", "ccn"=>"P", "yta"=>"L", "ytg"=>"L", "ytr"=>"L", "rga"=>"G", "rgg"=>"G", "rgr"=>"G"}, 14 => {"aaa"=>"N", "aat"=>"N", "aag"=>"K", "aac"=>"N", "aay"=>"N", "aaw"=>"N", "aam"=>"N", "aah"=>"N", "ata"=>"I", "att"=>"I", "atg"=>"M", "atc"=>"I", "aty"=>"I", "atw"=>"I", "atm"=>"I", "ath"=>"I", "aga"=>"S", "agt"=>"S", "agg"=>"S", "agc"=>"S", "agy"=>"S", "agr"=>"S", "agw"=>"S", "ags"=>"S", "agk"=>"S", "agm"=>"S", "agb"=>"S", "agd"=>"S", "agh"=>"S", "agv"=>"S", "agn"=>"S", "aca"=>"T", "act"=>"T", "acg"=>"T", "acc"=>"T", "acy"=>"T", "acr"=>"T", "acw"=>"T", "acs"=>"T", "ack"=>"T", "acm"=>"T", "acb"=>"T", "acd"=>"T", "ach"=>"T", "acv"=>"T", "acn"=>"T", "taa"=>"Y", "tat"=>"Y", "tag"=>"*", "tac"=>"Y", "tay"=>"Y", "taw"=>"Y", "tam"=>"Y", "tah"=>"Y", "tta"=>"L", "ttt"=>"F", "ttg"=>"L", "ttc"=>"F", "tty"=>"F", "ttr"=>"L", "tga"=>"W", "tgt"=>"C", "tgg"=>"W", "tgc"=>"C", "tgy"=>"C", "tgr"=>"W", "tca"=>"S", "tct"=>"S", "tcg"=>"S", "tcc"=>"S", "tcy"=>"S", "tcr"=>"S", "tcw"=>"S", "tcs"=>"S", "tck"=>"S", "tcm"=>"S", "tcb"=>"S", "tcd"=>"S", "tch"=>"S", "tcv"=>"S", "tcn"=>"S", "gaa"=>"E", "gat"=>"D", "gag"=>"E", "gac"=>"D", "gay"=>"D", "gar"=>"E", "gta"=>"V", "gtt"=>"V", "gtg"=>"V", "gtc"=>"V", "gty"=>"V", "gtr"=>"V", "gtw"=>"V", "gts"=>"V", "gtk"=>"V", "gtm"=>"V", "gtb"=>"V", "gtd"=>"V", "gth"=>"V", "gtv"=>"V", "gtn"=>"V", "gga"=>"G", "ggt"=>"G", "ggg"=>"G", "ggc"=>"G", "ggy"=>"G", "ggr"=>"G", "ggw"=>"G", "ggs"=>"G", "ggk"=>"G", "ggm"=>"G", "ggb"=>"G", "ggd"=>"G", "ggh"=>"G", "ggv"=>"G", "ggn"=>"G", "gca"=>"A", "gct"=>"A", "gcg"=>"A", "gcc"=>"A", "gcy"=>"A", "gcr"=>"A", "gcw"=>"A", "gcs"=>"A", "gck"=>"A", "gcm"=>"A", "gcb"=>"A", "gcd"=>"A", "gch"=>"A", "gcv"=>"A", "gcn"=>"A", "caa"=>"Q", "cat"=>"H", "cag"=>"Q", "cac"=>"H", "cay"=>"H", "car"=>"Q", "cta"=>"L", "ctt"=>"L", "ctg"=>"L", "ctc"=>"L", "cty"=>"L", "ctr"=>"L", "ctw"=>"L", "cts"=>"L", "ctk"=>"L", "ctm"=>"L", "ctb"=>"L", "ctd"=>"L", "cth"=>"L", "ctv"=>"L", "ctn"=>"L", "cga"=>"R", "cgt"=>"R", "cgg"=>"R", "cgc"=>"R", "cgy"=>"R", "cgr"=>"R", "cgw"=>"R", "cgs"=>"R", "cgk"=>"R", "cgm"=>"R", "cgb"=>"R", "cgd"=>"R", "cgh"=>"R", "cgv"=>"R", "cgn"=>"R", "cca"=>"P", "cct"=>"P", "ccg"=>"P", "ccc"=>"P", "ccy"=>"P", "ccr"=>"P", "ccw"=>"P", "ccs"=>"P", "cck"=>"P", "ccm"=>"P", "ccb"=>"P", "ccd"=>"P", "cch"=>"P", "ccv"=>"P", "ccn"=>"P", "yta"=>"L", "ytg"=>"L", "ytr"=>"L"}, 15 => {"aaa"=>"K", "aat"=>"N", "aag"=>"K", "aac"=>"N", "aay"=>"N", "aar"=>"K", "ata"=>"I", "att"=>"I", "atg"=>"M", "atc"=>"I", "aty"=>"I", "atw"=>"I", "atm"=>"I", "ath"=>"I", "aga"=>"R", "agt"=>"S", "agg"=>"R", "agc"=>"S", "agy"=>"S", "agr"=>"R", "aca"=>"T", "act"=>"T", "acg"=>"T", "acc"=>"T", "acy"=>"T", "acr"=>"T", "acw"=>"T", "acs"=>"T", "ack"=>"T", "acm"=>"T", "acb"=>"T", "acd"=>"T", "ach"=>"T", "acv"=>"T", "acn"=>"T", "taa"=>"*", "tat"=>"Y", "tag"=>"Q", "tac"=>"Y", "tay"=>"Y", "tta"=>"L", "ttt"=>"F", "ttg"=>"L", "ttc"=>"F", "tty"=>"F", "ttr"=>"L", "tga"=>"*", "tgt"=>"C", "tgg"=>"W", "tgc"=>"C", "tgy"=>"C", "tca"=>"S", "tct"=>"S", "tcg"=>"S", "tcc"=>"S", "tcy"=>"S", "tcr"=>"S", "tcw"=>"S", "tcs"=>"S", "tck"=>"S", "tcm"=>"S", "tcb"=>"S", "tcd"=>"S", "tch"=>"S", "tcv"=>"S", "tcn"=>"S", "tra"=>"*", "gaa"=>"E", "gat"=>"D", "gag"=>"E", "gac"=>"D", "gay"=>"D", "gar"=>"E", "gta"=>"V", "gtt"=>"V", "gtg"=>"V", "gtc"=>"V", "gty"=>"V", "gtr"=>"V", "gtw"=>"V", "gts"=>"V", "gtk"=>"V", "gtm"=>"V", "gtb"=>"V", "gtd"=>"V", "gth"=>"V", "gtv"=>"V", "gtn"=>"V", "gga"=>"G", "ggt"=>"G", "ggg"=>"G", "ggc"=>"G", "ggy"=>"G", "ggr"=>"G", "ggw"=>"G", "ggs"=>"G", "ggk"=>"G", "ggm"=>"G", "ggb"=>"G", "ggd"=>"G", "ggh"=>"G", "ggv"=>"G", "ggn"=>"G", "gca"=>"A", "gct"=>"A", "gcg"=>"A", "gcc"=>"A", "gcy"=>"A", "gcr"=>"A", "gcw"=>"A", "gcs"=>"A", "gck"=>"A", "gcm"=>"A", "gcb"=>"A", "gcd"=>"A", "gch"=>"A", "gcv"=>"A", "gcn"=>"A", "caa"=>"Q", "cat"=>"H", "cag"=>"Q", "cac"=>"H", "cay"=>"H", "car"=>"Q", "cta"=>"L", "ctt"=>"L", "ctg"=>"L", "ctc"=>"L", "cty"=>"L", "ctr"=>"L", "ctw"=>"L", "cts"=>"L", "ctk"=>"L", "ctm"=>"L", "ctb"=>"L", "ctd"=>"L", "cth"=>"L", "ctv"=>"L", "ctn"=>"L", "cga"=>"R", "cgt"=>"R", "cgg"=>"R", "cgc"=>"R", "cgy"=>"R", "cgr"=>"R", "cgw"=>"R", "cgs"=>"R", "cgk"=>"R", "cgm"=>"R", "cgb"=>"R", "cgd"=>"R", "cgh"=>"R", "cgv"=>"R", "cgn"=>"R", "cca"=>"P", "cct"=>"P", "ccg"=>"P", "ccc"=>"P", "ccy"=>"P", "ccr"=>"P", "ccw"=>"P", "ccs"=>"P", "cck"=>"P", "ccm"=>"P", "ccb"=>"P", "ccd"=>"P", "cch"=>"P", "ccv"=>"P", "ccn"=>"P", "yag"=>"Q", "yta"=>"L", "ytg"=>"L", "ytr"=>"L", "mga"=>"R", "mgg"=>"R", "mgr"=>"R"}, 16 => {"aaa"=>"K", "aat"=>"N", "aag"=>"K", "aac"=>"N", "aay"=>"N", "aar"=>"K", "ata"=>"I", "att"=>"I", "atg"=>"M", "atc"=>"I", "aty"=>"I", "atw"=>"I", "atm"=>"I", "ath"=>"I", "aga"=>"R", "agt"=>"S", "agg"=>"R", "agc"=>"S", "agy"=>"S", "agr"=>"R", "aca"=>"T", "act"=>"T", "acg"=>"T", "acc"=>"T", "acy"=>"T", "acr"=>"T", "acw"=>"T", "acs"=>"T", "ack"=>"T", "acm"=>"T", "acb"=>"T", "acd"=>"T", "ach"=>"T", "acv"=>"T", "acn"=>"T", "taa"=>"*", "tat"=>"Y", "tag"=>"L", "tac"=>"Y", "tay"=>"Y", "tta"=>"L", "ttt"=>"F", "ttg"=>"L", "ttc"=>"F", "tty"=>"F", "ttr"=>"L", "tga"=>"*", "tgt"=>"C", "tgg"=>"W", "tgc"=>"C", "tgy"=>"C", "tca"=>"S", "tct"=>"S", "tcg"=>"S", "tcc"=>"S", "tcy"=>"S", "tcr"=>"S", "tcw"=>"S", "tcs"=>"S", "tck"=>"S", "tcm"=>"S", "tcb"=>"S", "tcd"=>"S", "tch"=>"S", "tcv"=>"S", "tcn"=>"S", "tra"=>"*", "twg"=>"L", "gaa"=>"E", "gat"=>"D", "gag"=>"E", "gac"=>"D", "gay"=>"D", "gar"=>"E", "gta"=>"V", "gtt"=>"V", "gtg"=>"V", "gtc"=>"V", "gty"=>"V", "gtr"=>"V", "gtw"=>"V", "gts"=>"V", "gtk"=>"V", "gtm"=>"V", "gtb"=>"V", "gtd"=>"V", "gth"=>"V", "gtv"=>"V", "gtn"=>"V", "gga"=>"G", "ggt"=>"G", "ggg"=>"G", "ggc"=>"G", "ggy"=>"G", "ggr"=>"G", "ggw"=>"G", "ggs"=>"G", "ggk"=>"G", "ggm"=>"G", "ggb"=>"G", "ggd"=>"G", "ggh"=>"G", "ggv"=>"G", "ggn"=>"G", "gca"=>"A", "gct"=>"A", "gcg"=>"A", "gcc"=>"A", "gcy"=>"A", "gcr"=>"A", "gcw"=>"A", "gcs"=>"A", "gck"=>"A", "gcm"=>"A", "gcb"=>"A", "gcd"=>"A", "gch"=>"A", "gcv"=>"A", "gcn"=>"A", "caa"=>"Q", "cat"=>"H", "cag"=>"Q", "cac"=>"H", "cay"=>"H", "car"=>"Q", "cta"=>"L", "ctt"=>"L", "ctg"=>"L", "ctc"=>"L", "cty"=>"L", "ctr"=>"L", "ctw"=>"L", "cts"=>"L", "ctk"=>"L", "ctm"=>"L", "ctb"=>"L", "ctd"=>"L", "cth"=>"L", "ctv"=>"L", "ctn"=>"L", "cga"=>"R", "cgt"=>"R", "cgg"=>"R", "cgc"=>"R", "cgy"=>"R", "cgr"=>"R", "cgw"=>"R", "cgs"=>"R", "cgk"=>"R", "cgm"=>"R", "cgb"=>"R", "cgd"=>"R", "cgh"=>"R", "cgv"=>"R", "cgn"=>"R", "cca"=>"P", "cct"=>"P", "ccg"=>"P", "ccc"=>"P", "ccy"=>"P", "ccr"=>"P", "ccw"=>"P", "ccs"=>"P", "cck"=>"P", "ccm"=>"P", "ccb"=>"P", "ccd"=>"P", "cch"=>"P", "ccv"=>"P", "ccn"=>"P", "yta"=>"L", "ytg"=>"L", "ytr"=>"L", "mga"=>"R", "mgg"=>"R", "mgr"=>"R"}, 21 => {"aaa"=>"N", "aat"=>"N", "aag"=>"K", "aac"=>"N", "aay"=>"N", "aaw"=>"N", "aam"=>"N", "aah"=>"N", "ata"=>"M", "att"=>"I", "atg"=>"M", "atc"=>"I", "aty"=>"I", "atr"=>"M", "aga"=>"S", "agt"=>"S", "agg"=>"S", "agc"=>"S", "agy"=>"S", "agr"=>"S", "agw"=>"S", "ags"=>"S", "agk"=>"S", "agm"=>"S", "agb"=>"S", "agd"=>"S", "agh"=>"S", "agv"=>"S", "agn"=>"S", "aca"=>"T", "act"=>"T", "acg"=>"T", "acc"=>"T", "acy"=>"T", "acr"=>"T", "acw"=>"T", "acs"=>"T", "ack"=>"T", "acm"=>"T", "acb"=>"T", "acd"=>"T", "ach"=>"T", "acv"=>"T", "acn"=>"T", "taa"=>"*", "tat"=>"Y", "tag"=>"*", "tac"=>"Y", "tay"=>"Y", "tar"=>"*", "tta"=>"L", "ttt"=>"F", "ttg"=>"L", "ttc"=>"F", "tty"=>"F", "ttr"=>"L", "tga"=>"W", "tgt"=>"C", "tgg"=>"W", "tgc"=>"C", "tgy"=>"C", "tgr"=>"W", "tca"=>"S", "tct"=>"S", "tcg"=>"S", "tcc"=>"S", "tcy"=>"S", "tcr"=>"S", "tcw"=>"S", "tcs"=>"S", "tck"=>"S", "tcm"=>"S", "tcb"=>"S", "tcd"=>"S", "tch"=>"S", "tcv"=>"S", "tcn"=>"S", "gaa"=>"E", "gat"=>"D", "gag"=>"E", "gac"=>"D", "gay"=>"D", "gar"=>"E", "gta"=>"V", "gtt"=>"V", "gtg"=>"V", "gtc"=>"V", "gty"=>"V", "gtr"=>"V", "gtw"=>"V", "gts"=>"V", "gtk"=>"V", "gtm"=>"V", "gtb"=>"V", "gtd"=>"V", "gth"=>"V", "gtv"=>"V", "gtn"=>"V", "gga"=>"G", "ggt"=>"G", "ggg"=>"G", "ggc"=>"G", "ggy"=>"G", "ggr"=>"G", "ggw"=>"G", "ggs"=>"G", "ggk"=>"G", "ggm"=>"G", "ggb"=>"G", "ggd"=>"G", "ggh"=>"G", "ggv"=>"G", "ggn"=>"G", "gca"=>"A", "gct"=>"A", "gcg"=>"A", "gcc"=>"A", "gcy"=>"A", "gcr"=>"A", "gcw"=>"A", "gcs"=>"A", "gck"=>"A", "gcm"=>"A", "gcb"=>"A", "gcd"=>"A", "gch"=>"A", "gcv"=>"A", "gcn"=>"A", "caa"=>"Q", "cat"=>"H", "cag"=>"Q", "cac"=>"H", "cay"=>"H", "car"=>"Q", "cta"=>"L", "ctt"=>"L", "ctg"=>"L", "ctc"=>"L", "cty"=>"L", "ctr"=>"L", "ctw"=>"L", "cts"=>"L", "ctk"=>"L", "ctm"=>"L", "ctb"=>"L", "ctd"=>"L", "cth"=>"L", "ctv"=>"L", "ctn"=>"L", "cga"=>"R", "cgt"=>"R", "cgg"=>"R", "cgc"=>"R", "cgy"=>"R", "cgr"=>"R", "cgw"=>"R", "cgs"=>"R", "cgk"=>"R", "cgm"=>"R", "cgb"=>"R", "cgd"=>"R", "cgh"=>"R", "cgv"=>"R", "cgn"=>"R", "cca"=>"P", "cct"=>"P", "ccg"=>"P", "ccc"=>"P", "ccy"=>"P", "ccr"=>"P", "ccw"=>"P", "ccs"=>"P", "cck"=>"P", "ccm"=>"P", "ccb"=>"P", "ccd"=>"P", "cch"=>"P", "ccv"=>"P", "ccn"=>"P", "yta"=>"L", "ytg"=>"L", "ytr"=>"L"}, 22 => {"aaa"=>"K", "aat"=>"N", "aag"=>"K", "aac"=>"N", "aay"=>"N", "aar"=>"K", "ata"=>"I", "att"=>"I", "atg"=>"M", "atc"=>"I", "aty"=>"I", "atw"=>"I", "atm"=>"I", "ath"=>"I", "aga"=>"R", "agt"=>"S", "agg"=>"R", "agc"=>"S", "agy"=>"S", "agr"=>"R", "aca"=>"T", "act"=>"T", "acg"=>"T", "acc"=>"T", "acy"=>"T", "acr"=>"T", "acw"=>"T", "acs"=>"T", "ack"=>"T", "acm"=>"T", "acb"=>"T", "acd"=>"T", "ach"=>"T", "acv"=>"T", "acn"=>"T", "taa"=>"*", "tat"=>"Y", "tag"=>"L", "tac"=>"Y", "tay"=>"Y", "tta"=>"L", "ttt"=>"F", "ttg"=>"L", "ttc"=>"F", "tty"=>"F", "ttr"=>"L", "tga"=>"*", "tgt"=>"C", "tgg"=>"W", "tgc"=>"C", "tgy"=>"C", "tca"=>"*", "tct"=>"S", "tcg"=>"S", "tcc"=>"S", "tcy"=>"S", "tcs"=>"S", "tck"=>"S", "tcb"=>"S", "tra"=>"*", "twg"=>"L", "tsa"=>"*", "tma"=>"*", "tva"=>"*", "gaa"=>"E", "gat"=>"D", "gag"=>"E", "gac"=>"D", "gay"=>"D", "gar"=>"E", "gta"=>"V", "gtt"=>"V", "gtg"=>"V", "gtc"=>"V", "gty"=>"V", "gtr"=>"V", "gtw"=>"V", "gts"=>"V", "gtk"=>"V", "gtm"=>"V", "gtb"=>"V", "gtd"=>"V", "gth"=>"V", "gtv"=>"V", "gtn"=>"V", "gga"=>"G", "ggt"=>"G", "ggg"=>"G", "ggc"=>"G", "ggy"=>"G", "ggr"=>"G", "ggw"=>"G", "ggs"=>"G", "ggk"=>"G", "ggm"=>"G", "ggb"=>"G", "ggd"=>"G", "ggh"=>"G", "ggv"=>"G", "ggn"=>"G", "gca"=>"A", "gct"=>"A", "gcg"=>"A", "gcc"=>"A", "gcy"=>"A", "gcr"=>"A", "gcw"=>"A", "gcs"=>"A", "gck"=>"A", "gcm"=>"A", "gcb"=>"A", "gcd"=>"A", "gch"=>"A", "gcv"=>"A", "gcn"=>"A", "caa"=>"Q", "cat"=>"H", "cag"=>"Q", "cac"=>"H", "cay"=>"H", "car"=>"Q", "cta"=>"L", "ctt"=>"L", "ctg"=>"L", "ctc"=>"L", "cty"=>"L", "ctr"=>"L", "ctw"=>"L", "cts"=>"L", "ctk"=>"L", "ctm"=>"L", "ctb"=>"L", "ctd"=>"L", "cth"=>"L", "ctv"=>"L", "ctn"=>"L", "cga"=>"R", "cgt"=>"R", "cgg"=>"R", "cgc"=>"R", "cgy"=>"R", "cgr"=>"R", "cgw"=>"R", "cgs"=>"R", "cgk"=>"R", "cgm"=>"R", "cgb"=>"R", "cgd"=>"R", "cgh"=>"R", "cgv"=>"R", "cgn"=>"R", "cca"=>"P", "cct"=>"P", "ccg"=>"P", "ccc"=>"P", "ccy"=>"P", "ccr"=>"P", "ccw"=>"P", "ccs"=>"P", "cck"=>"P", "ccm"=>"P", "ccb"=>"P", "ccd"=>"P", "cch"=>"P", "ccv"=>"P", "ccn"=>"P", "yta"=>"L", "ytg"=>"L", "ytr"=>"L", "mga"=>"R", "mgg"=>"R", "mgr"=>"R"}, 23 => {"aaa"=>"K", "aat"=>"N", "aag"=>"K", "aac"=>"N", "aay"=>"N", "aar"=>"K", "ata"=>"I", "att"=>"I", "atg"=>"M", "atc"=>"I", "aty"=>"I", "atw"=>"I", "atm"=>"I", "ath"=>"I", "aga"=>"R", "agt"=>"S", "agg"=>"R", "agc"=>"S", "agy"=>"S", "agr"=>"R", "aca"=>"T", "act"=>"T", "acg"=>"T", "acc"=>"T", "acy"=>"T", "acr"=>"T", "acw"=>"T", "acs"=>"T", "ack"=>"T", "acm"=>"T", "acb"=>"T", "acd"=>"T", "ach"=>"T", "acv"=>"T", "acn"=>"T", "taa"=>"*", "tat"=>"Y", "tag"=>"*", "tac"=>"Y", "tay"=>"Y", "tar"=>"*", "tta"=>"*", "ttt"=>"F", "ttg"=>"L", "ttc"=>"F", "tty"=>"F", "tga"=>"*", "tgt"=>"C", "tgg"=>"W", "tgc"=>"C", "tgy"=>"C", "tca"=>"S", "tct"=>"S", "tcg"=>"S", "tcc"=>"S", "tcy"=>"S", "tcr"=>"S", "tcw"=>"S", "tcs"=>"S", "tck"=>"S", "tcm"=>"S", "tcb"=>"S", "tcd"=>"S", "tch"=>"S", "tcv"=>"S", "tcn"=>"S", "tra"=>"*", "twa"=>"*", "tka"=>"*", "tda"=>"*", "gaa"=>"E", "gat"=>"D", "gag"=>"E", "gac"=>"D", "gay"=>"D", "gar"=>"E", "gta"=>"V", "gtt"=>"V", "gtg"=>"V", "gtc"=>"V", "gty"=>"V", "gtr"=>"V", "gtw"=>"V", "gts"=>"V", "gtk"=>"V", "gtm"=>"V", "gtb"=>"V", "gtd"=>"V", "gth"=>"V", "gtv"=>"V", "gtn"=>"V", "gga"=>"G", "ggt"=>"G", "ggg"=>"G", "ggc"=>"G", "ggy"=>"G", "ggr"=>"G", "ggw"=>"G", "ggs"=>"G", "ggk"=>"G", "ggm"=>"G", "ggb"=>"G", "ggd"=>"G", "ggh"=>"G", "ggv"=>"G", "ggn"=>"G", "gca"=>"A", "gct"=>"A", "gcg"=>"A", "gcc"=>"A", "gcy"=>"A", "gcr"=>"A", "gcw"=>"A", "gcs"=>"A", "gck"=>"A", "gcm"=>"A", "gcb"=>"A", "gcd"=>"A", "gch"=>"A", "gcv"=>"A", "gcn"=>"A", "caa"=>"Q", "cat"=>"H", "cag"=>"Q", "cac"=>"H", "cay"=>"H", "car"=>"Q", "cta"=>"L", "ctt"=>"L", "ctg"=>"L", "ctc"=>"L", "cty"=>"L", "ctr"=>"L", "ctw"=>"L", "cts"=>"L", "ctk"=>"L", "ctm"=>"L", "ctb"=>"L", "ctd"=>"L", "cth"=>"L", "ctv"=>"L", "ctn"=>"L", "cga"=>"R", "cgt"=>"R", "cgg"=>"R", "cgc"=>"R", "cgy"=>"R", "cgr"=>"R", "cgw"=>"R", "cgs"=>"R", "cgk"=>"R", "cgm"=>"R", "cgb"=>"R", "cgd"=>"R", "cgh"=>"R", "cgv"=>"R", "cgn"=>"R", "cca"=>"P", "cct"=>"P", "ccg"=>"P", "ccc"=>"P", "ccy"=>"P", "ccr"=>"P", "ccw"=>"P", "ccs"=>"P", "cck"=>"P", "ccm"=>"P", "ccb"=>"P", "ccd"=>"P", "cch"=>"P", "ccv"=>"P", "ccn"=>"P", "ytg"=>"L", "mga"=>"R", "mgg"=>"R", "mgr"=>"R"}, } end #CodonTable end #Bio bio-2.0.3/lib/bio/data/na.rb0000644000175000017500000000717414141516614015002 0ustar nileshnilesh# # = bio/data/na.rb - Nucleic Acids # # Copyright:: Copyright (C) 2001, 2005 # Toshiaki Katayama # License:: The Ruby License # # $Id:$ # # == Synopsis # # Bio::NucleicAcid class contains data related to nucleic acids. # # == Usage # # Examples: # # require 'bio' # # puts "### na = Bio::NucleicAcid.new" # na = Bio::NucleicAcid.new # # puts "# na.to_re('yrwskmbdhvnatgc')" # p na.to_re('yrwskmbdhvnatgc') # # puts "# Bio::NucleicAcid.to_re('yrwskmbdhvnatgc')" # p Bio::NucleicAcid.to_re('yrwskmbdhvnatgc') # # puts "# na.weight('A')" # p na.weight('A') # # puts "# Bio::NucleicAcid.weight('A')" # p Bio::NucleicAcid.weight('A') # # puts "# na.weight('atgc')" # p na.weight('atgc') # # puts "# Bio::NucleicAcid.weight('atgc')" # p Bio::NucleicAcid.weight('atgc') # module Bio class NucleicAcid module Data # IUPAC code # * Faisst and Meyer (Nucleic Acids Res. 20:3-26, 1992) # * http://www.ncbi.nlm.nih.gov/collab/FT/ NAMES = { 'y' => '[tc]', 'r' => '[ag]', 'w' => '[at]', 's' => '[gc]', 'k' => '[tg]', 'm' => '[ac]', 'b' => '[tgc]', 'd' => '[atg]', 'h' => '[atc]', 'v' => '[agc]', 'n' => '[atgc]', 'a' => 'a', 't' => 't', 'g' => 'g', 'c' => 'c', 'u' => 'u', 'A' => 'Adenine', 'T' => 'Thymine', 'G' => 'Guanine', 'C' => 'Cytosine', 'U' => 'Uracil', 'Y' => 'pYrimidine', 'R' => 'puRine', 'W' => 'Weak', 'S' => 'Strong', 'K' => 'Keto', 'M' => 'aroMatic', 'B' => 'not A', 'D' => 'not C', 'H' => 'not G', 'V' => 'not T', } WEIGHT = { # Calculated by BioPerl's Bio::Tools::SeqStats.pm :-) 'a' => 135.15, 't' => 126.13, 'g' => 151.15, 'c' => 111.12, 'u' => 112.10, :adenine => 135.15, :thymine => 126.13, :guanine => 151.15, :cytosine => 111.12, :uracil => 112.10, :deoxyribose_phosphate => 196.11, :ribose_phosphate => 212.11, :hydrogen => 1.00794, :water => 18.015, } def weight(x = nil, rna = nil) if x if x.length > 1 if rna phosphate = WEIGHT[:ribose_phosphate] else phosphate = WEIGHT[:deoxyribose_phosphate] end hydrogen = WEIGHT[:hydrogen] water = WEIGHT[:water] total = 0.0 x.each_byte do |byte| base = byte.chr.downcase if WEIGHT[base] total += WEIGHT[base] + phosphate - hydrogen * 2 else raise "Error: invalid nucleic acid '#{base}'" end end total -= water * (x.length - 1) else WEIGHT[x.to_s.downcase] end else WEIGHT end end def [](x) NAMES[x] end # backward compatibility def names NAMES end alias na names def name(x) NAMES[x.to_s.upcase] end def to_re(seq, rna = false) replace = { 'y' => '[tcy]', 'r' => '[agr]', 'w' => '[atw]', 's' => '[gcs]', 'k' => '[tgk]', 'm' => '[acm]', 'b' => '[tgcyskb]', 'd' => '[atgrwkd]', 'h' => '[atcwmyh]', 'v' => '[agcmrsv]', 'n' => '[atgcyrwskmbdhvn]' } replace.default = '.' str = seq.to_s.downcase str.gsub!(/[^atgcu]/) { |na| replace[na] } if rna str.tr!("t", "u") end Regexp.new(str) end end # as instance methods include Data # as class methods extend Data end end # module Bio bio-2.0.3/lib/bio/data/aa.rb0000644000175000017500000001114114141516614014752 0ustar nileshnilesh# # = bio/data/aa.rb - Amino Acids # # Copyright:: Copyright (C) 2001, 2005 # Toshiaki Katayama # License:: The Ruby License # # $Id:$ # module Bio class AminoAcid module Data # IUPAC code # * http://www.iupac.org/ # * http://www.chem.qmw.ac.uk/iubmb/newsletter/1999/item3.html # * http://www.ebi.ac.uk/RESID/faq.html NAMES = { 'A' => 'Ala', 'C' => 'Cys', 'D' => 'Asp', 'E' => 'Glu', 'F' => 'Phe', 'G' => 'Gly', 'H' => 'His', 'I' => 'Ile', 'K' => 'Lys', 'L' => 'Leu', 'M' => 'Met', 'N' => 'Asn', 'P' => 'Pro', 'Q' => 'Gln', 'R' => 'Arg', 'S' => 'Ser', 'T' => 'Thr', 'V' => 'Val', 'W' => 'Trp', 'Y' => 'Tyr', 'B' => 'Asx', # D/N 'Z' => 'Glx', # E/Q 'J' => 'Xle', # I/L 'U' => 'Sec', # 'uga' (stop) 'O' => 'Pyl', # 'uag' (stop) 'X' => 'Xaa', # (unknown) 'Ala' => 'alanine', 'Cys' => 'cysteine', 'Asp' => 'aspartic acid', 'Glu' => 'glutamic acid', 'Phe' => 'phenylalanine', 'Gly' => 'glycine', 'His' => 'histidine', 'Ile' => 'isoleucine', 'Lys' => 'lysine', 'Leu' => 'leucine', 'Met' => 'methionine', 'Asn' => 'asparagine', 'Pro' => 'proline', 'Gln' => 'glutamine', 'Arg' => 'arginine', 'Ser' => 'serine', 'Thr' => 'threonine', 'Val' => 'valine', 'Trp' => 'tryptophan', 'Tyr' => 'tyrosine', 'Asx' => 'asparagine/aspartic acid [DN]', 'Glx' => 'glutamine/glutamic acid [EQ]', 'Xle' => 'isoleucine/leucine [IL]', 'Sec' => 'selenocysteine', 'Pyl' => 'pyrrolysine', 'Xaa' => 'unknown [A-Z]', } # AAindex FASG760101 - Molecular weight (Fasman, 1976) # Fasman, G.D., ed. # Handbook of Biochemistry and Molecular Biology", 3rd ed., # Proteins - Volume 1, CRC Press, Cleveland (1976) WEIGHT = { 'A' => 89.09, 'C' => 121.15, # 121.16 according to the Wikipedia 'D' => 133.10, 'E' => 147.13, 'F' => 165.19, 'G' => 75.07, 'H' => 155.16, 'I' => 131.17, 'K' => 146.19, 'L' => 131.17, 'M' => 149.21, 'N' => 132.12, 'P' => 115.13, 'Q' => 146.15, 'R' => 174.20, 'S' => 105.09, 'T' => 119.12, 'U' => 168.06, 'V' => 117.15, 'W' => 204.23, 'Y' => 181.19, } def weight(x = nil) return WEIGHT unless x if x.length > 1 total = 0.0 x.each_byte do |byte| aa = byte.chr.upcase if WEIGHT[aa] total += WEIGHT[aa] else raise "Error: invalid amino acid '#{aa}'" end end total -= NucleicAcid.weight[:water] * (x.length - 1) else WEIGHT[x] end end def [](x) NAMES[x] end # backward compatibility def names NAMES end alias aa names def name(x) str = NAMES[x] if str and str.length == 3 NAMES[str] else str end end def to_1(x) case x.to_s.length when 1 x when 3 three2one(x) else name2one(x) end end alias one to_1 def to_3(x) case x.to_s.length when 1 one2three(x) when 3 x else name2three(x) end end alias three to_3 def one2three(x) if x and x.length != 1 raise ArgumentError else NAMES[x] end end def three2one(x) if x and x.length != 3 raise ArgumentError else reverse[x] end end def one2name(x) if x and x.length != 1 raise ArgumentError else three2name(NAMES[x]) end end def name2one(x) str = reverse[x.to_s.downcase] if str and str.length == 3 three2one(str) else str end end def three2name(x) if x and x.length != 3 raise ArgumentError else NAMES[x] end end def name2three(x) reverse[x.downcase] end def to_re(seq) replace = { 'B' => '[DNB]', 'Z' => '[EQZ]', 'J' => '[ILJ]', 'X' => '[ACDEFGHIKLMNPQRSTVWYUOX]', } replace.default = '.' str = seq.to_s.upcase str.gsub!(/[^ACDEFGHIKLMNPQRSTVWYUO]/) { |aa| replace[aa] } Regexp.new(str) end private def reverse @reverse ||= NAMES.invert end end # as instance methods include Data # as class methods extend Data end end # module Bio bio-2.0.3/lib/bio/tree/0000755000175000017500000000000014141516614014074 5ustar nileshnileshbio-2.0.3/lib/bio/tree/output.rb0000644000175000017500000001732114141516614015765 0ustar nileshnilesh# # = bio/tree/output.rb - Phylogenetic tree formatter # # Copyright:: Copyright (C) 2004-2006 # Naohisa Goto # License:: The Ruby License # # # == Description # # This file contains formatter of Newick, NHX and Phylip distance matrix. # # == References # # * http://evolution.genetics.washington.edu/phylip/newick_doc.html # * http://www.phylosoft.org/forester/NHX.html # module Bio class Tree #--- # newick output #+++ # default options DEFAULT_OPTIONS = { :indent => ' ' } def __get_option(key, options) if (r = options[key]) != nil then r elsif @options && (r = @options[key]) != nil then r else DEFAULT_OPTIONS[key] end end private :__get_option # formats Newick label (unquoted_label or quoted_label) def __to_newick_format_label(str, options) if __get_option(:parser, options) == :naive then return str.to_s end str = str.to_s if /([\(\)\,\:\[\]\_\'\x00-\x1f\x7f])/ =~ str then # quoted_label return "\'" + str.gsub(/\'/, "\'\'") + "\'" end # unquoted_label return str.gsub(/ /, '_') end private :__to_newick_format_label # formats leaf def __to_newick_format_leaf(node, edge, options) label = __to_newick_format_label(get_node_name(node), options) dist = get_edge_distance_string(edge) bs = get_node_bootstrap_string(node) if __get_option(:branch_length_style, options) == :disabled dist = nil end case __get_option(:bootstrap_style, options) when :disabled label + (dist ? ":#{dist}" : '') when :molphy label + (dist ? ":#{dist}" : '') + (bs ? "[#{bs}]" : '') when :traditional label + (bs ? bs : '') + (dist ? ":#{dist}" : '') else # default: same as molphy style label + (dist ? ":#{dist}" : '') + (bs ? "[#{bs}]" : '') end end private :__to_newick_format_leaf # formats leaf for NHX def __to_newick_format_leaf_NHX(node, edge, options) label = __to_newick_format_label(get_node_name(node), options) dist = get_edge_distance_string(edge) bs = get_node_bootstrap_string(node) if __get_option(:branch_length_style, options) == :disabled dist = nil end nhx = {} # bootstrap nhx[:B] = bs if bs and !(bs.empty?) # EC number nhx[:E] = node.ec_number if node.instance_eval { defined?(@ec_number) && self.ec_number } # scientific name nhx[:S] = node.scientific_name if node.instance_eval { defined?(@scientific_name) && self.scientific_name } # taxonomy id nhx[:T] = node.taxonomy_id if node.instance_eval { defined?(@taxonomy_id) && self.taxonomy_id } # :D (gene duplication or speciation) if node.instance_eval { defined?(@events) && !(self.events.empty?) } then if node.events.include?(:gene_duplication) nhx[:D] = 'Y' elsif node.events.include?(:speciation) nhx[:D] = 'N' end end # log likelihood nhx[:L] = edge.log_likelihood if edge.instance_eval { defined?(@log_likelihood) && self.log_likelihood } # width nhx[:W] = edge.width if edge.instance_eval { defined?(@width) && self.width } # merges other parameters flag = node.instance_eval { defined? @nhx_parameters } nhx.merge!(node.nhx_parameters) if flag flag = edge.instance_eval { defined? @nhx_parameters } nhx.merge!(edge.nhx_parameters) if flag nhx_string = nhx.keys.sort{ |a,b| a.to_s <=> b.to_s }.collect do |key| "#{key.to_s}=#{nhx[key].to_s}" end.join(':') nhx_string = "[&&NHX:" + nhx_string + "]" unless nhx_string.empty? label + (dist ? ":#{dist}" : '') + nhx_string end private :__to_newick_format_leaf_NHX # def __to_newick(parents, source, depth, format_leaf, options, &block) result = [] if indent_string = __get_option(:indent, options) then indent0 = indent_string * depth indent = indent_string * (depth + 1) newline = "\n" else indent0 = indent = newline = '' end out_edges = self.out_edges(source) if block_given? then out_edges.sort! { |edge1, edge2| yield(edge1[1], edge2[1]) } else out_edges.sort! do |edge1, edge2| o1 = edge1[1].order_number o2 = edge2[1].order_number if o1 and o2 then o1 <=> o2 else edge1[1].name.to_s <=> edge2[1].name.to_s end end end out_edges.each do |src, tgt, edge| if parents.include?(tgt) then ;; elsif self.out_degree(tgt) == 1 then result << indent + __send__(format_leaf, tgt, edge, options) else result << __to_newick([ src ].concat(parents), tgt, depth + 1, format_leaf, options) + __send__(format_leaf, tgt, edge, options) end end indent0 + "(" + newline + result.join(',' + newline) + (result.size > 0 ? newline : '') + indent0 + ')' end private :__to_newick # Returns a newick formatted string. # If block is given, the order of the node is sorted # (as the same manner as Enumerable#sort). # # Available options: # :indent:: # indent string; set false to disable (default: ' ') # :bootstrap_style:: # :disabled disables bootstrap representations. # :traditional for traditional style. # :molphy for Molphy style (default). def output_newick(options = {}, &block) #:yields: node1, node2 root = @root root ||= self.nodes.first return '();' unless root __to_newick([], root, 0, :__to_newick_format_leaf, options, &block) + __to_newick_format_leaf(root, Edge.new, options) + ";\n" end alias newick output_newick # Returns a NHX (New Hampshire eXtended) formatted string. # If block is given, the order of the node is sorted # (as the same manner as Enumerable#sort). # # Available options: # :indent:: # indent string; set false to disable (default: ' ') # def output_nhx(options = {}, &block) #:yields: node1, node2 root = @root root ||= self.nodes.first return '();' unless root __to_newick([], root, 0, :__to_newick_format_leaf_NHX, options, &block) + __to_newick_format_leaf_NHX(root, Edge.new, options) + ";\n" end # Returns formatted text (or something) of the tree # Currently supported format is: :newick, :nhx def output(format, *arg, &block) case format when :newick output_newick(*arg, &block) when :nhx output_nhx(*arg, &block) when :phylip_distance_matrix output_phylip_distance_matrix(*arg, &block) else raise 'Unknown format' end end #--- # This method isn't suitable to written in this file? #+++ # Generates phylip-style distance matrix as a string. # if nodes is not given, all leaves in the tree are used. # If the names of some of the given (or default) nodes # are not defined or are empty, the names are automatically generated. def output_phylip_distance_matrix(nodes = nil, options = {}) nodes = self.leaves unless nodes names = nodes.collect do |x| y = get_node_name(x) y = sprintf("%x", x.__id__.abs) if y.empty? y end m = self.distance_matrix(nodes) Bio::Phylip::DistanceMatrix.generate(m, names, options) end end #class Tree end #module Bio bio-2.0.3/lib/bio/appl/0000755000175000017500000000000014141516614014071 5ustar nileshnileshbio-2.0.3/lib/bio/appl/gcg/0000755000175000017500000000000014141516614014631 5ustar nileshnileshbio-2.0.3/lib/bio/appl/gcg/msf.rb0000644000175000017500000001215614141516614015750 0ustar nileshnilesh# # = bio/appl/gcg/msf.rb - GCG multiple sequence alignment (.msf) parser class # # Copyright:: Copyright (C) 2003, 2006 # Naohisa Goto # License:: The Ruby License # # $Id:$ # # = About Bio::GCG::Msf # # Please refer document of Bio::GCG::Msf. # #--- # (depends on autoload) #require 'bio/appl/gcg/seq' #+++ module Bio module GCG # The msf is a multiple sequence alignment format developed by Wisconsin. # Bio::GCG::Msf is a msf format parser. class Msf #< DB # delimiter used by Bio::FlatFile DELIMITER = RS = nil # Creates a new Msf object. def initialize(str) str = str.sub(/\A[\r\n]+/, '') preamble, @data = str.split(/^\/\/$/, 2) preamble.sub!(/\A\!\![A-Z]+\_MULTIPLE\_ALIGNMENT.*/, '') @heading = $& # '!!NA_MULTIPLE_ALIGNMENT 1.0' or like this preamble.sub!(/.*\.\.\s*$/m, '') @description = $&.to_s.sub(/^.*\.\.\s*$/, '').to_s d = $&.to_s if m = /^(?:(.+)\s+)?MSF\:\s+(\d+)\s+Type\:\s+(\w)\s+(.+)\s+(Comp)?Check\:\s+(\d+)/.match(d) then @entry_id = m[1].to_s.strip @length = (m[2] ? m[2].to_i : nil) @seq_type = m[3] @date = m[4].to_s.strip @checksum = (m[6] ? m[6].to_i : nil) end @seq_info = [] preamble.each_line do |x| if /Name\: / =~ x then s = {} x.scan(/(\S+)\: +(\S*)/) { |y| s[$1] = $2 } @seq_info << s end end @description.sub!(/\A(\r\n|\r|\n)/, '') @align = nil end # description attr_reader :description # ID of the alignment attr_reader :entry_id # alignment length attr_reader :length # sequence type ("N" for DNA/RNA or "P" for protein) attr_reader :seq_type # date attr_reader :date # checksum attr_reader :checksum # heading # ('!!NA_MULTIPLE_ALIGNMENT 1.0' or whatever like this) attr_reader :heading #--- ## data (internally used, will be obsoleted) #attr_reader :data # ## seq. info. (internally used, will be obsoleted) #attr_reader :seq_info #+++ # symbol comparison table def symbol_comparison_table unless defined?(@symbol_comparison_table) /Symbol comparison table\: +(\S+)/ =~ @description @symbol_comparison_table = $1 end @symbol_comparison_table end # gap weight def gap_weight unless defined?(@gap_weight) /GapWeight\: +(\S+)/ =~ @description @gap_weight = $1 end @gap_weight end # gap length weight def gap_length_weight unless defined?(@gap_length_weight) /GapLengthWeight\: +(\S+)/ =~ @description @gap_length_weight = $1 end @gap_length_weight end # CompCheck field def compcheck unless defined?(@compcheck) if /CompCheck\: +(\d+)/ =~ @description then @compcheck = $1.to_i else @compcheck = nil end end @compcheck end # parsing def do_parse return if @align a = @data.split(/\r?\n\r?\n/) @seq_data = Array.new(@seq_info.size) @seq_data.collect! { |x| Array.new } a.each do |x| next if x.strip.empty? b = x.sub(/\A[\r\n]+/, '').split(/[\r\n]+/) nw = 0 if b.size > @seq_info.size then if /^ +/ =~ b.shift.to_s nw = $&.to_s.length end end if nw > 0 then b.each_with_index { |y, i| y[0, nw] = ''; @seq_data[i] << y } else b.each_with_index { |y, i| @seq_data[i] << y.strip.split(/ +/, 2)[1].to_s } end end case seq_type when 'P', 'p' k = Bio::Sequence::AA when 'N', 'n' k = Bio::Sequence::NA else k = Bio::Sequence::Generic end @seq_data.collect! do |x| y = x.join('') y.gsub!(/[\s\d]+/, '') k.new(y) end aln = Bio::Alignment.new @seq_data.each_with_index do |x, i| aln.store(@seq_info[i]['Name'], x) end @align = aln end private :do_parse # returns Bio::Alignment object. def alignment do_parse @align end # gets seq data (used internally) (will be obsoleted) def seq_data do_parse @seq_data end # validates checksum def validate_checksum do_parse valid = true total = 0 @seq_data.each_with_index do |x, i| sum = Bio::GCG::Seq.calc_checksum(x) if sum != @seq_info[i]['Check'].to_i valid = false break end total += sum end return false unless valid if @checksum != 0 # "Check:" field of BioPerl is always 0 valid = ((total % 10000) == @checksum) end valid end end #class Msf end #module GCG end # module Bio bio-2.0.3/lib/bio/appl/gcg/seq.rb0000644000175000017500000001264414141516614015755 0ustar nileshnilesh# # = bio/appl/gcg/seq.rb - GCG sequence file format class (.seq/.pep file) # # Copyright:: Copyright (C) 2003, 2006 # Naohisa Goto # License:: The Ruby License # # $Id: seq.rb,v 1.3 2007/04/05 23:35:39 trevor Exp $ # # = About Bio::GCG::Seq # # Please refer document of Bio::GCG::Seq. # module Bio module GCG # # = Bio::GCG::Seq # # This is GCG sequence file format (.seq or .pep) parser class. # # = References # # * Information about GCG Wisconsin Package(R) # http://www.accelrys.com/products/gcg_wisconsin_package . # * EMBOSS sequence formats # http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Themes/SequenceFormats.html # * BioPerl document # http://docs.bioperl.org/releases/bioperl-1.2.3/Bio/SeqIO/gcg.html class Seq #< DB # delimiter used by Bio::FlatFile DELIMITER = RS = nil # Creates new instance of this class. # str must be a GCG seq formatted string. def initialize(str) @heading = str[/.*/] # '!!NA_SEQUENCE 1.0' or like this str = str.sub(/.*/, '') str.sub!(/.*\.\.$/m, '') @definition = $&.to_s.sub(/^.*\.\.$/, '').to_s desc = $&.to_s if m = /(.+)\s+Length\:\s+(\d+)\s+(.+)\s+Type\:\s+(\w)\s+Check\:\s+(\d+)/.match(desc) then @entry_id = m[1].to_s.strip @length = (m[2] ? m[2].to_i : nil) @date = m[3].to_s.strip @seq_type = m[4] @checksum = (m[5] ? m[5].to_i : nil) end @data = str @seq = nil @definition.strip! end # ID field. attr_reader :entry_id # Description field. attr_reader :definition # "Length:" field. # Note that sometimes this might differ from real sequence length. attr_reader :length # Date field of this entry. attr_reader :date # "Type:" field, which indicates sequence type. # "N" means nucleic acid sequence, "P" means protein sequence. attr_reader :seq_type # "Check:" field, which indicates checksum of current sequence. attr_reader :checksum # heading # ('!!NA_SEQUENCE 1.0' or whatever like this) attr_reader :heading #--- ## data (internally used, will be obsoleted) #attr_reader :data #+++ # Sequence data. # The class of the sequence is Bio::Sequence::NA, Bio::Sequence::AA # or Bio::Sequence::Generic, according to the sequence type. def seq unless @seq then case @seq_type when 'N', 'n' k = Bio::Sequence::NA when 'P', 'p' k = Bio::Sequence::AA else k = Bio::Sequence end @seq = k.new(@data.tr('^-a-zA-Z.~', '')) end @seq end # If you know the sequence is AA, use this method. # Returns a Bio::Sequence::AA object. # # If you call naseq for protein sequence, # or aaseq for nucleic sequence, RuntimeError will be raised. def aaseq if seq.is_a?(Bio::Sequence::AA) then @seq else raise 'seq_type != \'P\'' end end # If you know the sequence is NA, use this method. # Returens a Bio::Sequence::NA object. # # If you call naseq for protein sequence, # or aaseq for nucleic sequence, RuntimeError will be raised. def naseq if seq.is_a?(Bio::Sequence::NA) then @seq else raise 'seq_type != \'N\'' end end # Validates checksum. # If validation succeeds, returns true. # Otherwise, returns false. def validate_checksum checksum == self.class.calc_checksum(seq) end #--- # class methods #+++ # Calculates checksum from given string. def self.calc_checksum(str) # Reference: Bio::SeqIO::gcg of BioPerl-1.2.3 idx = 0 sum = 0 str.upcase.tr('^A-Z.~', '').each_byte do |c| idx += 1 sum += idx * c idx = 0 if idx >= 57 end (sum % 10000) end # Creates a new GCG sequence format text. # Parameters can be omitted. # # Examples: # Bio::GCG::Seq.to_gcg(:definition=>'H.sapiens DNA', # :seq_type=>'N', :entry_id=>'gi-1234567', # :seq=>seq, :date=>date) # def self.to_gcg(hash) seq = hash[:seq] if seq.is_a?(Bio::Sequence::NA) then seq_type = 'N' elsif seq.is_a?(Bio::Sequence::AA) then seq_type = 'P' else seq_type = (hash[:seq_type] or 'P') end if seq_type == 'N' then head = '!!NA_SEQUENCE 1.0' else head = '!!AA_SEQUENCE 1.0' end date = (hash[:date] or Time.now.strftime('%B %d, %Y %H:%M')) entry_id = hash[:entry_id].to_s.strip len = seq.length checksum = self.calc_checksum(seq) definition = hash[:definition].to_s.strip seq = seq.upcase.gsub(/.{1,50}/, "\\0\n") seq.gsub!(/.{10}/, "\\0 ") w = len.to_s.size + 1 i = 1 seq.gsub!(/^/) { |x| s = sprintf("\n%*d ", w, i); i += 50; s } [ head, "\n", definition, "\n\n", "#{entry_id} Length: #{len} #{date} " \ "Type: #{seq_type} Check: #{checksum} ..\n", seq, "\n" ].join('') end end #class Seq end #module GCG end #module Bio bio-2.0.3/lib/bio/appl/sim4/0000755000175000017500000000000014141516614014745 5ustar nileshnileshbio-2.0.3/lib/bio/appl/sim4/report.rb0000644000175000017500000004241014141516614016606 0ustar nileshnilesh# # = bio/appl/sim4/report.rb - sim4 result parser # # Copyright:: Copyright (C) 2004 GOTO Naohisa # License:: The Ruby License # # $Id:$ # # The sim4 report parser classes. # # == References # # * Florea, L., et al., A Computer program for aligning a cDNA sequence # with a genomic DNA sequence, Genome Research, 8, 967--974, 1998. # http://www.genome.org/cgi/content/abstract/8/9/967 # module Bio class Sim4 # Bio::Sim4::Report is the sim4 report parser class. # Its object may contain some Bio::Sim4::Report::Hit objects. class Report #< DB #-- # format: A=0, A=3, or A=4 #++ # Delimiter of each entry. Bio::FlatFile uses it. # In Bio::Sim4::Report, it it nil (1 entry 1 file). DELIMITER = RS = nil # 1 entry 1 file # Creates new Bio::Sim4::Report object from String. # You can use Bio::FlatFile to read a file. # Currently, format A=0, A=3, and A=4 are supported. # (A=1, A=2, A=5 are NOT supported yet.) # # Note that 'seq1' in sim4 result is always regarded as 'query', # and 'seq2' is always regarded as 'subject'(target, hit). # # Note that first 'seq1' informations are used for # Bio::Sim4::Report#query_id, #query_def, #query_len, and #seq1 methods. def initialize(text) @hits = [] @all_hits = [] overrun = '' text.each_line("\n\nseq1 = ") do |str| str = str.sub(/\A\s+/, '') str.sub!(/\n(^seq1 \= .*)/m, "\n") # remove trailing hits for sure tmp = $1.to_s hit = Hit.new(overrun + str) overrun = tmp unless hit.instance_eval { @data.empty? } then @hits << hit end @all_hits << hit end @seq1 = @all_hits[0].seq1 end # Returns hits of the entry. # Unlike Bio::Sim4::Report#all_hits, it returns # hits which have alignments. # Returns an Array of Bio::Sim4::Report::Hit objects. attr_reader :hits # Returns all hits of the entry. # Unlike Bio::Sim4::Report#hits, it returns # results of all trials of pairwise alignment. # This would be a Bio::Sim4 specific method. # Returns an Array of Bio::Sim4::Report::Hit objects. attr_reader :all_hits # Returns sequence informations of 'seq1'. # Returns a Bio::Sim4::Report::SeqDesc object. # This would be a Bio::Sim4 specific method. attr_reader :seq1 # Bio::Sim4::Report::SeqDesc stores sequence information of # query or subject of sim4 report. class SeqDesc #-- # description/definitions of a sequence #++ # Creates a new object. # It is designed to be called internally from Bio::Sim4::Report object. # Users shall not use it directly. def initialize(seqid, seqdef, len, filename) @entry_id = seqid @definition = seqdef @len = len @filename = filename end # identifier of the sequence attr_reader :entry_id # definition of the sequence attr_reader :definition # sequence length of the sequence attr_reader :len # filename of the sequence attr_reader :filename # Parses part of sim4 result text and creates new SeqDesc object. # It is designed to be called internally from Bio::Sim4::Report object. # Users shall not use it directly. def self.parse(str, str2 = nil) /^seq[12] \= (.*)(?: \((.*)\))?\,\s*(\d+)\s*bp\s*$/ =~ str seqid = $2 filename = $1 len = $3.to_i if str2 then seqdef = str2.sub(/^\>\s*/, '') seqid =seqdef.split(/\s+/, 2)[0] unless seqid else seqdef = (seqid or filename) seqid = filename unless seqid end self.new(seqid, seqdef, len, filename) end end #class SeqDesc # Sequence segment pair of the sim4 result. # Similar to Bio::Blast::Report::HSP but lacks many methods. # For mRNA-genome mapping programs, # unlike other homology search programs, # the class is used not only for exons but also for introns. # (Note that intron data would not be available according to run-time # options of the program.) class SegmentPair #-- # segment pair (like Bio::BLAST::*::Report::HSP) #++ # Creates a new SegmentPair object. # It is designed to be called internally from # Bio::Sim4::Report::Hit object. # Users shall not use it directly. def initialize(seq1, seq2, midline = nil, percent_identity = nil, direction = nil) @seq1 = seq1 @seq2 = seq2 @midline = midline @percent_identity = percent_identity @direction = direction end # Returns segment informations of 'seq1'. # Returns a Bio::Sim4::Report::Segment object. # These would be Bio::Sim4 specific methods. attr_reader :seq1 # Returns segment informations of 'seq2'. # Returns a Bio::Sim4::Report::Segment object. # These would be Bio::Sim4 specific methods. attr_reader :seq2 # Returns the "midline" of the segment pair. # Returns nil if no alignment data are available. attr_reader :midline # Returns percent identity of the segment pair. attr_reader :percent_identity # Returns directions of mapping. # Maybe one of "->", "<-", "==" or "" or nil. # This would be a Bio::Sim4 specific method. attr_reader :direction # Parses part of sim4 result text and creates a new SegmentPair object. # It is designed to be called internally from # Bio::Sim4::Report::Hit class. # Users shall not use it directly. def self.parse(str, aln) /^(\d+)\-(\d+)\s*\((\d+)\-(\d+)\)\s*([\d\.]+)\%\s*([\-\<\>\=]*)/ =~ str self.new(Segment.new($1, $2, aln[0]), Segment.new($3, $4, aln[2]), aln[1], $5, $6) end # Parses part of sim4 result text and creates a new SegmentPair # object when the seq1 is a intron. # It is designed to be called internally from # Bio::Sim4::Report::Hit class. # Users shall not use it directly. def self.seq1_intron(prev_e, e, aln) self.new(Segment.new(prev_e.seq1.to+1, e.seq1.from-1, aln[0]), Segment.new(nil, nil, aln[2]), aln[1]) end # Parses part of sim4 result text and creates a new SegmentPair # object when seq2 is a intron. # It is designed to be called internally from # Bio::Sim4::Report::Hit class. # Users shall not use it directly. def self.seq2_intron(prev_e, e, aln) self.new(Segment.new(nil, nil, aln[0]), Segment.new(prev_e.seq2.to+1, e.seq2.from-1, aln[2]), aln[1]) end # Parses part of sim4 result text and creates a new SegmentPair # object for regions which can not be aligned correctly. # It is designed to be called internally from # Bio::Sim4::Report::Hit class. # Users shall not use it directly. def self.both_intron(prev_e, e, aln) self.new(Segment.new(prev_e.seq1.to+1, e.seq1.from-1, aln[0]), Segment.new(prev_e.seq2.to+1, e.seq2.from-1, aln[2]), aln[1]) end #-- # Bio::BLAST::*::Report::Hsp compatible methods # Methods already defined: midline, percent_identity #++ # start position of the query (the first position is 1) def query_from; @seq1.from; end # end position of the query (including its position) def query_to; @seq1.to; end # query sequence (with gaps) of the alignment of the segment pair. def qseq; @seq1.seq; end # start position of the hit(target) (the first position is 1) def hit_from; @seq2.from; end # end position of the hit(target) (including its position) def hit_to; @seq2.to; end # hit(target) sequence (with gaps) of the alignment # of the segment pair. def hseq; @seq2.seq; end # Returns alignment length of the segment pair. # Returns nil if no alignment data are available. def align_len (@midline and @seq1.seq and @seq2.seq) ? @midline.length : nil end end #class SegmentPair # Segment informations of a segment pair. class Segment #-- # the segment of a sequence #++ # Creates a new Segment object. # It is designed to be called internally from # Bio::Sim4::Report::SegmentPair class. # Users shall not use it directly. def initialize(pos_st, pos_ed, seq = nil) @from = pos_st.to_i @to = pos_ed.to_i @seq = seq end # start position of the segment (the first position is 1) attr_reader :from # end position of the segment (including its position) attr_reader :to # sequence (with gaps) of the segment attr_reader :seq end #class Segment # Hit object of the sim4 result. # Similar to Bio::Blast::Report::Hit but lacks many methods. class Hit # Parses part of sim4 result text and creates a new Hit object. # It is designed to be called internally from Bio::Sim4::Report class. # Users shall not use it directly. def initialize(str) @data = str.split(/\n(?:\r?\n)+/) parse_seqdesc end # Parses sequence descriptions. def parse_seqdesc # seq1: query, seq2: target(hit) a0 = @data.shift.split(/\r?\n/) if @data[0].to_s =~ /^\>/ then a1 = @data.shift.split(/\r?\n/) else a1 = [] end @seq1 = SeqDesc.parse(a0[0], a1[0]) @seq2 = SeqDesc.parse(a0[1], a1[1]) if @data[0].to_s.sub!(/\A\(complement\)\s*$/, '') then @complement = true @data.shift if @data[0].strip.empty? else @complement = nil end end private :parse_seqdesc # Returns sequence informations of 'seq1'. # Returns a Bio::Sim4::Report::SeqDesc object. # This would be Bio::Sim4 specific method. attr_reader :seq1 # Returns sequence informations of 'seq2'. # Returns a Bio::Sim4::Report::SeqDesc object. # This would be Bio::Sim4 specific method. attr_reader :seq2 # Returns true if the hit reports '-'(complemental) strand # search result. # Otherwise, return false or nil. # This would be a Bio::Sim4 specific method. def complement? @complement end # Parses segment pair. def parse_segmentpairs aln = (self.align ? self.align.dup : []) exo = [] #exons itr = [] #introns sgp = [] #segmentpairs prev_e = nil return unless @data[0] @data[0].split(/\r?\n/).each do |str| ai = (prev_e ? aln.shift : nil) a = (aln.shift or []) e = SegmentPair.parse(str, a) exo << e if ai then # intron data in alignment if ai[1].strip.empty? then i = SegmentPair.both_intron(prev_e, e, ai) elsif ai[2].strip.empty? then i = SegmentPair.seq1_intron(prev_e, e, ai) else i = SegmentPair.seq2_intron(prev_e, e, ai) end itr << i sgp << i end sgp << e prev_e = e end @exons = exo @introns = itr @segmentpairs = sgp end private :parse_segmentpairs # Parses alignment. def parse_align s1 = []; ml = []; s2 = [] blocks = [] blocks.push [ s1, ml, s2 ] dat = @data[1..-1] return unless dat dat.each do |str| a = str.split(/\r?\n/) ruler = a.shift # First line, for example, # " 50 . : . : . : . : . :" # When the number is 0, forced to be a separated block if /^\s*(\d+)/ =~ ruler and $1.to_i == 0 and !ml.empty? then s1 = []; ml = []; s2 = [] blocks.push [ s1, ml, s2 ] end # For example, # " 190 GAGTCATGCATGATACAA CTTATATATGTACTTAGCGGCA" # " ||||||||||||||||||<<<...<<<-||-|||||||||||||||||||" # " 400 GAGTCATGCATGATACAACTT...AGCGCT ATATATGTACTTAGCGGCA" if /^(\s*\d+\s)(.+)$/ =~ a[0] then range = ($1.length)..($1.length + $2.chomp.length - 1) a.collect! { |x| x[range] } s1 << a.shift ml << a.shift s2 << a.shift end end #each alx_all = [] blocks.each do |ary| s1, ml, s2 = ary alx = ml.join('').split(/([\<\>]+\.+[\<\>]+)/) seq1 = s1.join(''); seq2 = s2.join('') i = 0 alx.collect! do |x| len = x.length y = [ seq1[i, len], x, seq2[i, len] ] i += len y end # adds virtual intron information if necessary alx_all.push([ '', '', '' ]) unless alx_all.empty? alx_all.concat alx end @align = alx_all end private :parse_align # Returns exons of the hit. # Each exon is a Bio::Sim4::Report::SegmentPair object. def exons unless defined?(@exons); parse_segmentpairs; end @exons end # Returns segment pairs (exons and introns) of the hit. # Each segment pair is a Bio::Sim4::Report::SegmentPair object. # Returns an array of Bio::Sim4::Report::SegmentPair objects. # (Note that intron data is not always available # according to run-time options of the program.) def segmentpairs unless defined?(@segmentpairs); parse_segmentpairs; end @segmentpairs end # Returns introns of the hit. # Some of them would contain untranscribed regions. # Returns an array of Bio::Sim4::Report::SegmentPair objects. # (Note that intron data is not always available # according to run-time options of the program.) def introns unless defined?(@introns); parse_segmentpairs; end @introns end # Returns alignments. # Returns an Array of arrays. # Each array contains sequence of seq1, midline, sequence of seq2, # respectively. # This would be a Bio::Sim4 specific method. def align unless defined?(@align); parse_align; end @align end #-- # Bio::BLAST::*::Report::Hit compatible methods #++ # Length of the query sequence. # Same as Bio::Sim4::Report#query_len. def query_len; seq1.len; end # Identifier of the query sequence. # Same as Bio::Sim4::Report#query_id. def query_id; seq1.entry_id; end # Definition of the query sequence # Same as Bio::Sim4::Report#query_def. def query_def; seq1.definition; end # length of the hit(target) sequence def target_len; seq2.len; end # Identifier of the hit(target) sequence def target_id; seq2.entry_id; end # Definition of the hit(target) sequence def target_def; seq2.definition; end alias hit_id target_id alias len target_len alias definition target_def alias hsps exons # Iterates over each exon of the hit. # Yields a Bio::Sim4::Report::SegmentPair object. def each(&x) #:yields: segmentpair exons.each(&x) end end #class Hit #-- #Bio::BLAST::*::Report compatible methods #++ # Returns number of hits. # Same as hits.size. def num_hits; @hits.size; end # Iterates over each hits of the sim4 result. # Same as hits.each. # Yields a Bio::Sim4::Report::Hit object. def each_hit(&x) #:yields: hit @hits.each(&x) end alias each each_hit # Returns the definition of query sequence. # The value will be filename or (first word of) sequence definition # according to sim4 run-time options. def query_def; @seq1.definition; end # Returns the identifier of query sequence. # The value will be filename or (first word of) sequence definition # according to sim4 run-time options. def query_id; @seq1.entry_id; end # Returns the length of query sequence. def query_len; @seq1.len; end end #class Report end #class Sim4 end #module Bio =begin = Bio::Sim4::Report = References * (()) Florea, L., et al., A Computer program for aligning a cDNA sequence with a genomic DNA sequence, Genome Research, 8, 967--974, 1998. =end bio-2.0.3/lib/bio/appl/sim4.rb0000644000175000017500000000645714141516614015306 0ustar nileshnilesh# # = bio/appl/sim4.rb - sim4 wrapper class # # Copyright:: Copyright (C) 2004 GOTO Naohisa # License:: The Ruby License # # $Id: sim4.rb,v 1.10 2007/04/05 23:35:39 trevor Exp $ # # The sim4 execution wrapper class. # # == References # # * Florea, L., et al., A Computer program for aligning a cDNA sequence # with a genomic DNA sequence, Genome Research, 8, 967--974, 1998. # http://www.genome.org/cgi/content/abstract/8/9/967 # require 'tempfile' require 'bio/command' module Bio # The sim4 execution wrapper class. class Sim4 autoload :Report, 'bio/appl/sim4/report' # Creates a new sim4 execution wrapper object. # [+program+] Program name. Usually 'sim4' in UNIX. # [+database+] Default file name of database('seq2'). # [+option+] Options (array of strings). def initialize(program = 'sim4', database = nil, opt = []) @program = program @options = opt @database = database #seq2 @command = nil @output = nil @report = nil end # default file name of database('seq2') attr_accessor :database # name of the program (usually 'sim4' in UNIX) attr_reader :program # options attr_accessor :options # option is deprecated. Instead, please use options. def option warn "option is deprecated. Please use options." options end # last command-line strings executed by the object attr_reader :command #--- # last messages of program reported to the STDERR #attr_reader :log #+++ #log is deprecated (no replacement) and returns empty string. def log warn "log is deprecated (no replacement) and returns empty string." '' end # last result text (String) attr_reader :output # last result. Returns a Bio::Sim4::Report object. attr_reader :report # Executes the sim4 program. # seq1 shall be a Bio::Sequence object. # Returns a Bio::Sim4::Report object. def query(seq1) tf = Tempfile.open('sim4') tf.print seq1.to_fasta('seq1', 70) tf.close(false) r = exec_local(tf.path) tf.close(true) r end # Executes the sim4 program. # Perform mRNA-genome alignment between given sequences. # seq1 and seq2 should be Bio::Sequence objects. # Returns a Bio::Sim4::Report object. def query_pairwise(seq1, seq2) tf = Tempfile.open('sim4') tf.print seq1.to_fasta('seq1', 70) tf.close(false) tf2 = Tempfile.open('seq2') tf2.print seq1.to_fasta('seq2', 70) tf2.close(false) r = exec_local(tf.path, tf2.path) tf.close(true) tf2.close(true) r end # Executes the sim4 program. # Perform mRNA-genome alignment between sequences in given files. # filename1 and filename2 should be file name strings. # If filename2 is not specified, using self.database. def exec_local(filename1, filename2 = nil) @command = [ @program, filename1, (filename2 or @database), *@options ] @output = nil @report = nil Bio::Command.call_command(@command) do |io| io.close_write @output = io.read @report = Bio::Sim4::Report.new(@output) end @report end alias exec exec_local end #class Sim4 end #module Bio bio-2.0.3/lib/bio/appl/spidey/0000755000175000017500000000000014141516614015366 5ustar nileshnileshbio-2.0.3/lib/bio/appl/spidey/report.rb0000644000175000017500000004603014141516614017231 0ustar nileshnilesh# # = bio/appl/spidey/report.rb - SPIDEY result parser # # Copyright:: Copyright (C) 2004 GOTO Naohisa # License:: The Ruby License # # $Id: report.rb,v 1.10 2007/04/05 23:35:40 trevor Exp $ # # NCBI Spidey result parser. # Currently, output of default (-p 0 option) or -p 1 option are supported. # # == Notes # # The mRNA sequence is regarded as a query, and # the enomic sequence is regarded as a target (subject, hit). # # == References # # * Wheelan, S.J., et al., Spidey: a tool for mRNA-to-genomic alignments, # Genome Research, 11, 1952--1957, 2001. # http://www.genome.org/cgi/content/abstract/11/11/1952 # * http://www.ncbi.nlm.nih.gov/spidey/ # require 'bio' module Bio class Spidey # Spidey report parser class. # Please see bio/appl/spidey/report.rb for details. # # Its object may contain some Bio::Spidey::Report::Hit objects. class Report #< DB #-- # File format: -p 0 (default) or -p 1 options #++ # Delimiter of each entry. Bio::FlatFile uses it. DELIMITER = RS = "\n--SPIDEY " # (Integer) excess read size included in DELIMITER. DELIMITER_OVERRUN = 9 # "--SPIDEY ".length # Creates a new Bio::Spidey::Report object from String. # You can use Bio::FlatFile to read a file. def initialize(str) str = str.sub(/\A\s+/, '') str.sub!(/\n(^\-\-SPIDEY .*)/m, '') # remove trailing entries for sure @entry_overrun = $1 data = str.split(/\r?\n(?:\r?\n)+/) d0 = data.shift.to_s.split(/\r?\n/) @hit = Hit.new(data, d0) @all_hits = [ @hit ] if d0.empty? or /\ANo alignment found\.\s*\z/ =~ d0[-1] then @hits = [] else @hits = [ @hit ] end end # piece of next entry. Bio::FlatFile uses it. attr_reader :entry_overrun # Returns an Array of Bio::Spidey::Report::Hit objects. # Because current version of SPIDEY supports only 1 genomic sequences, # the number of hits is 1 or 0. attr_reader :hits # Returns an Array of Bio::Spidey::Report::Hit objects. # Unlike Bio::Spidey::Report#hits, the method returns # results of all trials of pairwise alignment. # This would be a Bio::Spidey specific method. attr_reader :all_hits # SeqDesc stores sequence information of query or subject. class SeqDesc #-- # description/definitions of a sequence #++ # Creates a new SeqDesc object. # It is designed to be called from Bio::Spidey::Report::* classes. # Users shall not call it directly. def initialize(seqid, seqdef, len) @entry_id = seqid @definition = seqdef @len = len end # Identifier of the sequence. attr_reader :entry_id # Definition of the sequence. attr_reader :definition # Length of the sequence. attr_reader :len # Parses piece of Spidey result text and creates a new SeqDesc object. # It is designed to be called from Bio::Spidey::Report::* classes. # Users shall not call it directly. def self.parse(str) /^(Genomic|mRNA)\:\s*(([^\s]*) (.+))\, (\d+) bp\s*$/ =~ str.to_s seqid = $3 seqdef = $2 len = ($5 ? $5.to_i : nil) self.new(seqid, seqdef, len) end end #class SeqDesc # Sequence segment pair of Spidey result. # Similar to Bio::Blast::Report::Hsp but lacks many methods. # For mRNA-genome mapping programs, unlike other homology search # programs, the class is used not only for exons but also for introns. # (Note that intron data would not be available according to run-time # options of the program.) class SegmentPair #-- # segment pair (like Bio::BLAST::*::Report::Hsp) #++ # Creates a new SegmentPair object. # It is designed to be called from Bio::Spidey::Report::* classes. # Users shall not call it directly. def initialize(genomic, mrna, midline, aaseqline, percent_identity, mismatches, gaps, splice_site, align_len) @genomic = genomic @mrna = mrna @midline = midline @aaseqline = aaseqline @percent_identity = percent_identity @mismaches = mismatches @gaps = gaps @splice_site = splice_site @align_len = align_len end # Returns segment informations of the 'Genomic'. # Returns a Bio::Spidey::Report::Segment object. # This would be a Bio::Spidey specific method. attr_reader :genomic # Returns segment informations of the 'mRNA'. # Returns a Bio::Spidey::Report::Segment object. # This would be a Bio::Spidey specific method. attr_reader :mrna # Returns the middle line of the alignment of the segment pair. # Returns nil if no alignment data are available. attr_reader :midline # Returns amino acide sequence in alignment. # Returns String, because white spaces is also important. # Returns nil if no alignment data are available. attr_reader :aaseqline # Returns percent identity of the segment pair. attr_reader :percent_identity # Returns mismatches. attr_reader :mismatches alias mismatch_count mismatches # Returns gaps. attr_reader :gaps # Returns splice site information. # Returns a hash which contains :d and :a for keys and # 0, 1, or nil for values. # This would be a Bio::Spidey specific methods. attr_reader :splice_site # Returns alignment length of the segment pair. # Returns nil if no alignment data are available. attr_reader :align_len # Creates a new SegmentPair object when the segment pair is an intron. # It is designed to be called internally from # Bio::Spidey::Report::* classes. # Users shall not call it directly. def self.new_intron(from, to, strand, aln) genomic = Segment.new(from, to, strand, aln[0]) mrna = Segment.new(nil, nil, nil, aln[2]) midline = aln[1] aaseqline = aln[3] self.new(genomic, mrna, midline, aaseqline, nil, nil, nil, nil, nil) end # Parses a piece of Spidey result text and creates a new # SegmentPair object. # It is designed to be called internally from # Bio::Spidey::Report::* classes. # Users shall not call it directly. def self.parse(str, strand, complement, aln) /\AExon\s*\d+(\(\-\))?\:\s*(\d+)\-(\d+)\s*\(gen\)\s+(\d+)\-(\d+)\s*\(mRNA\)\s+id\s*([\d\.]+)\s*\%\s+mismatches\s+(\d+)\s+gaps\s+(\d+)\s+splice site\s*\(d +a\)\s*\:\s*(\d+)\s+(\d+)/ =~ str if strand == 'minus' then genomic = Segment.new($3, $2, strand, aln[0]) else genomic = Segment.new($2, $3, 'plus', aln[0]) end if complement then mrna = Segment.new($4, $5, 'minus', aln[2]) else mrna = Segment.new($4, $5, 'plus', aln[2]) end percent_identity = $6 mismatches = ($7 ? $7.to_i : nil) gaps = ($8 ? $8.to_i : nil) splice_site = { :d => ($9 ? $9.to_i : nil), :a => ($10 ? $10.to_i : nil) } midline = aln[1] aaseqline = aln[3] self.new(genomic, mrna, midline, aaseqline, percent_identity, mismatches, gaps, splice_site, (midline ? midline.length : nil)) end #-- # Bio::BLAST::*::Report::Hsp compatible methods # Methods already defined: midline, percent_identity, # gaps, align_len, mismatch_count #++ # Returns start position of the mRNA (query) (the first position is 1). def query_from; @mrna.from; end # Returns end position (including its position) of the mRNA (query). def query_to; @mrna.to; end # Returns the sequence (with gaps) of the mRNA (query). def qseq; @mrna.seq; end # Returns strand information of the mRNA (query). # Returns 'plus', 'minus', or nil. def query_strand; @mrna.strand; end # Returns start position of the genomic (target, hit) # (the first position is 1). def hit_from; @genomic.from; end # Returns end position (including its position) of the # genomic (target, hit). def hit_to; @genomic.to; end # Returns the sequence (with gaps) of the genomic (target, hit). def hseq; @genomic.seq; end # Returns strand information of the genomic (target, hit). # Returns 'plus', 'minus', or nil. def hit_strand; @genomic.strand; end end #class SegmentPair # Segment informations of a segment pair. class Segment # Creates a new Segment object. # It is designed to be called internally from # Bio::Spidey::Report::* classes. # Users shall not call it directly. def initialize(pos_st, pos_ed, strand = nil, seq = nil) @from = pos_st ? pos_st.to_i : nil @to = pos_ed ? pos_ed.to_i : nil @strand = strand @seq = seq end # start position attr_reader :from # end position attr_reader :to # strand information attr_reader :strand # sequence data attr_reader :seq end #class Segment # Hit object of Spidey result. # Similar to Bio::Blast::Report::Hit but lacks many methods. class Hit # Creates a new Hit object. # It is designed to be called internally from # Bio::Spidey::Report::* classes. # Users shall not call it directly. def initialize(data, d0) @data = data @d0 = d0 end # Fetches fields. def field_fetch(t, ary) reg = Regexp.new(/^#{Regexp.escape(t)}\:\s*(.+)\s*$/) if ary.find { |x| reg =~ x } $1.strip else nil end end private :field_fetch # Parses information about strand. def parse_strand x = field_fetch('Strand', @d0) if x =~ /^(.+)Reverse +complement\s*$/ then @strand = $1.strip @complement = true else @strand = x @complement = nil end end private :parse_strand # Returns strand information of the hit. # Returns 'plus', 'minus', or nil. # This would be a Bio::Spidey specific method. def strand unless defined?(@strand); parse_strand; end @strand end # Returns true if the result reports 'Reverse complement'. # Otherwise, return false or nil. # This would be a Bio::Spidey specific method. def complement? unless defined?(@complement); parse_strand; end @complement end # Returns number of exons in the hit. def number_of_exons unless defined?(@number_of_exons) @number_of_exons = field_fetch('Number of exons', @d0).to_i end @number_of_exons end # Returns number of splice sites of the hit. def number_of_splice_sites unless defined?(@number_of_splice_sites) @number_of_splice_sites = field_fetch('Number of splice sites', @d0).to_i end @number_of_splice_sites end # Returns overall percent identity of the hit. def percent_identity unless defined?(@percent_identity) x = field_fetch('overall percent identity', @d0) @percent_identity = (/([\d\.]+)\s*\%/ =~ x.to_s) ? $1 : nil end @percent_identity end # Returns missing mRNA ends of the hit. def missing_mrna_ends unless defined?(@missing_mrna_ends) @missing_mrna_ends = field_fetch('Missing mRNA ends', @d0) end @missing_mrna_ends end # Returns sequence informations of the 'Genomic'. # Returns a Bio::Spidey::Report::SeqDesc object. # This would be a Bio::Spidey specific method. def genomic unless defined?(@genomic) @genomic = SeqDesc.parse(@d0.find { |x| /^Genomic\:/ =~ x }) end @genomic end # Returns sequence informations of the mRNA. # Returns a Bio::Spidey::Report::SeqDesc object. # This would be a Bio::Spidey specific method. def mrna unless defined?(@mrna) @mrna = SeqDesc.parse(@d0.find { |x| /^mRNA\:/ =~ x }) end @mrna end # Parses segment pairs. def parse_segmentpairs aln = self.align.dup ex = [] itr = [] segpairs = [] cflag = self.complement? strand = self.strand if strand == 'minus' then d_to = 1; d_from = -1 else d_to = -1; d_from = 1 end @d0.each do |x| #p x if x =~ /^Exon\s*\d+(\(.*\))?\:/ then if a = aln.shift then y = SegmentPair.parse(x, strand, cflag, a[1]) ex << y if a[0][0].to_s.length > 0 then to = y.genomic.from + d_to i0 = SegmentPair.new_intron(nil, to, strand, a[0]) itr << i0 segpairs << i0 end segpairs << y if a[2][0].to_s.length > 0 then from = y.genomic.to + d_from i2 = SegmentPair.new_intron(from, nil, strand, a[2]) itr << i2 segpairs << i2 end else y = SegmentPair.parse(x, strand, cflag, []) ex << y segpairs << y end end end @exons = ex @introns = itr @segmentpairs = segpairs end private :parse_segmentpairs # Returns exons of the hit. # Returns an array of Bio::Spidey::Report::SegmentPair object. def exons unless defined?(@exons); parse_segmentpairs; end @exons end # Returns introns of the hit. # Some of them would contain untranscribed regions. # Returns an array of Bio::Spidey::Report::SegmentPair objects. # (Note that intron data is not always available # according to run-time options of the program.) def introns unless defined?(@introns); parse_segmentpairs; end @introns end # Returns segment pairs (exons and introns) of the hit. # Each segment pair is a Bio::Spidey::Report::SegmentPair object. # Returns an array of Bio::Spidey::Report::SegmentPair objects. # (Note that intron data is not always available # according to run-time options of the program.) def segmentpairs unless defined?(@segmentparis); parse_segmentpairs; end @segmentpairs end # Returns alignments. # Returns an Array of arrays. # This would be a Bio::Spidey specific method. def align unless defined?(@align); parse_align; end @align end # Parses alignment lines. def parse_align_lines(data) misc = [ [], [], [], [] ] data.each do |x| a = x.split(/\r?\n/) if g = a.shift then misc[0] << g (1..3).each do |i| if y = a.shift then if y.length < g.length y << ' ' * (g.length - y.length) end misc[i] << y else misc[i] << ' ' * g.length end end end end misc.collect! { |x| x.join('') } left = [] if /\A +/ =~ misc[2] then len = $&.size left = misc.collect { |x| x[0, len] } misc.each { |x| x[0, len] = '' } end right = [] if / +\z/ =~ misc[2] then len = $&.size right = misc.collect { |x| x[(-len)..-1] } misc.each { |x| x[(-len)..-1] = '' } end body = misc [ left, body, right ] end private :parse_align_lines # Parses alignments. def parse_align r = [] data = @data while !data.empty? a = [] while x = data.shift and !(x =~ /^(Genomic|Exon\s*\d+)\:/) a.push x end r.push parse_align_lines(a) unless a.empty? end @align = r end private :parse_align #-- # Bio::BLAST::*::Report::Hit compatible methods #++ # Length of the mRNA (query) sequence. # Same as Bio::Spidey::Report#query_len. def query_len; mrna.len; end # Identifier of the mRNA (query). # Same as Bio::Spidey::Report#query_id. def query_id; mrna.entry_id; end # Definition of the mRNA (query). # Same as Bio::Spidey::Report#query_def. def query_def; mrna.definition; end # The genomic (target) sequence length. def target_len; genomic.len; end # Identifier of the genomic (target) sequence. def target_id; genomic.entry_id; end # Definition of the genomic (target) sequence. def target_def; genomic.definition; end alias hit_id target_id alias len target_len alias definition target_def alias hsps exons # Iterates over each exon of the hit. # Yields Bio::Spidey::Report::SegmentPair object. def each(&x) #:yields: segmentpair exons.each(&x) end end #class Hit # Returns sequence informationsof the mRNA. # Returns a Bio::Spidey::Report::SeqDesc object. # This would be a Bio::Spidey specific method. def mrna; @hit.mrna; end #-- #Bio::BLAST::*::Report compatible methods #++ # Returns number of hits. # Same as hits.size. def num_hits; @hits.size; end # Iterates over each hits. # Same as hits.each. # Yields a Bio::Spidey::Report::Hit object. def each_hit(&x) #:yields: hit @hits.each(&x) end alias each each_hit # Returns definition of the mRNA (query) sequence. def query_def; @hit.mrna.definition; end # Returns identifier of the mRNA (query) sequence. def query_id; @hit.mrna.entry_id; end # Returns the length of the mRNA (query) sequence. def query_len; @hit.mrna.len; end end #class Report end #class Spidey end #module Bio bio-2.0.3/lib/bio/appl/fasta.rb0000644000175000017500000001353014141516614015516 0ustar nileshnilesh# # = bio/appl/fasta.rb - FASTA wrapper # # Copyright:: Copyright (C) 2001, 2002 Toshiaki Katayama # License:: The Ruby License # # $Id:$ # require 'net/http' require 'uri' require 'bio/command' require 'shellwords' module Bio class Fasta autoload :Report, 'bio/appl/fasta/format10' #autoload :?????, 'bio/appl/fasta/format6' # Returns a FASTA factory object (Bio::Fasta). def initialize(program, db, opt = [], server = 'local') @format = 10 @program = program @db = db @server = server @ktup = nil @matrix = nil @output = '' begin a = opt.to_ary rescue NameError #NoMethodError # backward compatibility a = Shellwords.shellwords(opt) end @options = [ '-Q', '-H', '-m', @format.to_s, *a ] # need -a ? end attr_accessor :program, :db, :options, :server, :ktup, :matrix # Returns a String containing fasta execution output in as is format. attr_reader :output def option # backward compatibility Bio::Command.make_command_line(@options) end def option=(str) # backward compatibility @options = Shellwords.shellwords(str) end # Accessors for the -m option. def format=(num) @format = num.to_i if i = @options.index('-m') then @options[i+1, 1] = @format.to_s else @options << '-m' << @format.to_s end end attr_reader :format # OBSOLETE. Does nothing and shows warning messages. # # Historically, selecting parser to use ('format6' or 'format10' were # expected, but only 'format10' was available as a working parser). # def self.parser(parser) warn 'Bio::Fasta.parser is obsoleted and will soon be removed.' end # Returns a FASTA factory object (Bio::Fasta) to run FASTA search on # local computer. def self.local(program, db, option = '') self.new(program, db, option, 'local') end # Returns a FASTA factory object (Bio::Fasta) to execute FASTA search on # remote server. # # For the develpper, you can add server 'hoge' by adding # exec_hoge(query) method. # def self.remote(program, db, option = '', server = 'genomenet') self.new(program, db, option, server) end # Execute FASTA search and returns Report object (Bio::Fasta::Report). def query(query) return self.send("exec_#{@server}", query.to_s) end private def parse_result(data) Report.new(data) end def exec_local(query) cmd = [ @program, *@options ] cmd.concat([ '@', @db ]) cmd.push(@ktup) if @ktup report = nil @output = Bio::Command.query_command(cmd, query) report = parse_result(@output) return report end # == Available databases for Fasta.remote(@program, @db, option, 'genomenet') # # See http://fasta.genome.jp/ideas/ideas.html#fasta for more details. # # ----------+-------+--------------------------------------------------- # @program | query | @db (supported in GenomeNet) # ----------+-------+--------------------------------------------------- # fasta | AA | nr-aa, genes, vgenes.pep, swissprot, swissprot-upd, # | | pir, prf, pdbstr # +-------+--------------------------------------------------- # | NA | nr-nt, genbank-nonst, gbnonst-upd, dbest, dbgss, # | | htgs, dbsts, embl-nonst, embnonst-upd, epd, # | | genes-nt, genome, vgenes.nuc # ----------+-------+--------------------------------------------------- # tfasta | AA | nr-nt, genbank-nonst, gbnonst-upd, dbest, dbgss, # | | htgs, dbsts, embl-nonst, embnonst-upd, # | | genes-nt, genome, vgenes.nuc # ----------+-------+--------------------------------------------------- # def exec_genomenet(query) host = "fasta.genome.jp" #path = "/sit-bin/nph-fasta" path = "/sit-bin/fasta" # 2005.08.12 form = { 'style' => 'raw', 'prog' => @program, 'dbname' => @db, 'sequence' => query, 'other_param' => Bio::Command.make_command_line_unix(@options), 'ktup_value' => @ktup, 'matrix' => @matrix, } form.keys.each do |k| form.delete(k) unless form[k] end report = nil begin http = Bio::Command.new_http(host) http.open_timeout = 3000 http.read_timeout = 6000 result = Bio::Command.http_post_form(http, path, form) # workaround 2006.8.1 - fixed for new batch queuing system case result.code when "302" result_location = result.header['location'] result_uri = URI.parse(result_location) result_path = result_uri.path done = false until done result = http.get(result_path) if result.body[/Your job ID is/] sleep 15 else done = true end end end @output = result.body.to_s # workaround 2005.08.12 re = %r{Show all result}i # " if path = @output[re, 1] result = http.get(path) @output = result.body txt = @output.to_s.split(/\/)[1] raise 'cannot understand response' unless txt txt.sub!(/\<\/pre\>.*\z/m, '') txt.sub!(/.*^((T?FASTA|SSEARCH) (searches|compares))/m, '\1') txt.sub!(/^\
.*\n/, '') txt.sub!(/^\]*\>([^\<\>]+)/ db = $1.freeze desc = $2.strip.freeze databases[key].push db dbdescs[key][db] = desc end end # mine-aa and mine-nt should be removed [ 'prot', 'nucl' ].each do |mol| ary = databases[mol] || [] hash = dbdescs[mol] || {} [ 'mine-aa', 'mine-nt' ].each do |k| ary.delete(k) hash.delete(k) end databases[mol] = ary.freeze dbdescs[mol] = hash end [ databases, dbdescs ].each do |h| prot = h['prot'] nucl = h['nucl'] h.delete('prot') h.delete('nucl') h['blastp'] = prot h['blastx'] = prot h['blastn'] = nucl h['tblastn'] = nucl h['tblastx'] = nucl end @databases = databases @database_descriptions = dbdescs @parse_databases = true true end private :_parse_databases end #module Information extend Information private # executes BLAST and returns result as a string def exec_genomenet(query) host = Host #host = "blast.genome.jp" #path = "/sit-bin/nph-blast" #path = "/sit-bin/blast" #2005.08.12 path = "/tools-bin/blast" #2012.01.12 options = make_command_line_options opt = Bio::Blast::NCBIOptions.new(options) program = opt.delete('-p') db = opt.delete('-d') # When database name starts with mine-aa or mine-nt, # space-separated list of KEGG organism codes can be given. # For example, "mine-aa eco bsu hsa". if /\A(mine-(aa|nt))\s+/ =~ db.to_s then db = $1 myspecies = {} myspecies["myspecies-#{$2}"] = $' end matrix = opt.delete('-M') || 'blosum62' filter = opt.delete('-F') || 'T' opt_v = opt.delete('-v') || 500 # default value for GenomeNet opt_b = opt.delete('-b') || 250 # default value for GenomeNet # format, not for form parameters, but included in option string opt_m = opt.get('-m') || '7' # default of BioRuby GenomeNet factory opt.set('-m', opt_m) optstr = Bio::Command.make_command_line_unix(opt.options) form = { 'style' => 'raw', 'prog' => program, 'dbname' => db, 'sequence' => query, 'other_param' => optstr, 'matrix' => matrix, 'filter' => filter, 'V_value' => opt_v, 'B_value' => opt_b, 'alignment_view' => 0, } form.merge!(myspecies) if myspecies form.keys.each do |k| form.delete(k) unless form[k] end begin http = Bio::Command.new_https(host) http.open_timeout = 300 http.read_timeout = 600 result = Bio::Command.http_post_form(http, path, form) @output = result.body # workaround 2008.8.13 if result.code == '302' then newuri = URI.parse(result['location']) newpath = newuri.path result = http.get(newpath) @output = result.body # waiting for BLAST finished while /Your job ID is/ =~ @output and /Your result will be displayed here\.?\/i =~ @output if /This page will be reloaded automatically in\s*((\d+)\s*min\.)?\s*((\d+)\s*sec\.)?/ =~ @output then reloadtime = $2.to_i * 60 + $4.to_i reloadtime = 300 if reloadtime > 300 reloadtime = 1 if reloadtime < 1 else reloadtime = 5 end if $VERBOSE then $stderr.puts "waiting #{reloadtime} sec to reload #{newuri.to_s}" end sleep(reloadtime) result = http.get(newpath) @output = result.body end end # workaround 2005.08.12 + 2011.01.27 + 2011.7.22 if /\Show all result\<\/A\>/i =~ @output.to_s then all_prefix = $1 all_path = $2 all_prefix = "https://#{Host}" if all_prefix.to_s.empty? all_uri = all_prefix + all_path @output = Bio::Command.read_uri(all_uri) case all_path when /\.txt\z/ ; # don't touch the data else txt = @output.to_s.split(/\/)[1] raise 'cannot understand response' unless txt txt.sub!(/\<\/pre\>.*\z/m, '') txt.sub!(/.*^ \-{20,}\s*/m, '') @output = txt end else raise 'cannot understand response' end end # for -m 0 (NCBI BLAST default) output, html tags are removed. if opt_m.to_i == 0 then #@output_bak = @output txt = @output.sub!(/^\