ruby-feedparser-0.9.2/0000755000175000017500000000000012207254570015037 5ustar terceiroterceiroruby-feedparser-0.9.2/LICENSE0000644000175000017500000000505412202272501016035 0ustar terceiroterceiroRuby-Feedparser is copyrighted free software by Lucas Nussbaum and others. You can redistribute it and/or modify it under either the terms of the GPL (see COPYING file), or the conditions below: 1. You may make and give away verbatim copies of the source form of the software without restriction, provided that you duplicate all of the original copyright notices and associated disclaimers. 2. You may modify your copy of the software in any way, provided that you do at least ONE of the following: a) place your modifications in the Public Domain or otherwise make them Freely Available, such as by posting said modifications to Usenet or an equivalent medium, or by allowing the author to include your modifications in the software. b) use the modified software only within your corporation or organization. c) rename any non-standard executables so the names do not conflict with standard executables, which must also be provided. d) make other distribution arrangements with the author. 3. You may distribute the software in object code or executable form, provided that you do at least ONE of the following: a) distribute the executables and library files of the software, together with instructions (in the manual page or equivalent) on where to get the original distribution. b) accompany the distribution with the machine-readable source of the software. c) give non-standard executables non-standard names, with instructions on where to get the original software distribution. d) make other distribution arrangements with the author. 4. You may modify and include the part of the software into any other software (possibly commercial). But some files in the distribution are not written by the author, so that they are not under this terms. They are gc.c(partly), utils.c(partly), regex.[ch], st.[ch] and some files under the ./missing directory. See each file for the copying condition. 5. The scripts and library files supplied as input to or produced as output from the software do not automatically fall under the copyright of the software, but belong to whomever generated them, and may be sold commercially, and may be aggregated with this software. 6. THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. ruby-feedparser-0.9.2/COPYING0000644000175000017500000004311012202272501016056 0ustar terceiroterceiro GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Library General Public License instead of this License. ruby-feedparser-0.9.2/test/0000755000175000017500000000000012207254570016016 5ustar terceiroterceiroruby-feedparser-0.9.2/test/text_output/0000755000175000017500000000000012207254570020422 5ustar terceiroterceiroruby-feedparser-0.9.2/test/tc_sgml_parser.rb0000644000175000017500000000071512202525032021337 0ustar terceiroterceiro# encoding: UTF-8 require 'test/unit' require 'mocha/setup' require 'feedparser/sgml-parser' class SGMLParserTest < Test::Unit::TestCase def test_numerical_charref parser = FeedParser::SGMLParser.new parser.expects(:unknown_charref).with('215') parser.handle_charref('215') end def test_non_numerical_charref parser = FeedParser::SGMLParser.new parser.expects(:handle_data).with('amp') parser.handle_charref('amp') end end ruby-feedparser-0.9.2/test/textwrapped_output/0000755000175000017500000000000012207254570022005 5ustar terceiroterceiroruby-feedparser-0.9.2/test/tc_feed_parse.rb0000755000175000017500000001254612207254104021132 0ustar terceiroterceiro# encoding: UTF-8 $:.unshift File.join(File.dirname(__FILE__), '..', 'lib') require 'test/unit' require 'feedparser' # This class includes some basic tests of the parser. More detailed test is # made by tc_parser.rb class FeedParserTest < Test::Unit::TestCase # From http://my.netscape.com/publish/formats/rss-spec-0.91.html def test_parse_rss091_1 ch = FeedParser::Feed::new <<-EOF en News and commentary from the cross-platform scripting community. http://www.scripting.com/ Scripting News http://www.scripting.com/ Scripting News http://www.scripting.com/gifs/tinyScriptingNews.gif EOF assert_equal('Scripting News', ch.title) assert_equal('http://www.scripting.com/', ch.link) assert_equal('News and commentary from the cross-platform scripting community.', ch.description) assert_equal([], ch.items) end def test_parse_rss091_complete ch = FeedParser::Feed::new <<-EOF Copyright 1997-1999 UserLand Software, Inc. Thu, 08 Jul 1999 07:00:00 GMT Thu, 08 Jul 1999 16:20:26 GMT http://my.userland.com/stories/storyReader$11 News and commentary from the cross-platform scripting community. http://www.scripting.com/ Scripting News http://www.scripting.com/ Scripting News http://www.scripting.com/gifs/tinyScriptingNews.gif 40 78 What is this used for? dave@userland.com (Dave Winer) dave@userland.com (Dave Winer) en-us 67891011 Sunday (PICS-1.1 "http://www.rsac.org/ratingsv01.html" l gen true comment "RSACi North America Server" for "http://www.rsac.org" on "1996.04.16T08:15-0500" r (n 0 s 0 v 0 l 0)) stuff http://bar This is an article about some stuff second item's title http://link2 aa bb cc dd ee ff Search Now! Enter your search <terms> find http://my.site.com/search.cgi EOF assert_equal('Scripting News', ch.title) assert_equal('http://www.scripting.com/', ch.link) assert_equal('News and commentary from the cross-platform scripting community.', ch.description) assert_equal(2, ch.items.length) assert_equal('http://bar', ch.items[0].link) assert_equal('

This is an article about some stuff

', ch.items[0].content) assert_equal('stuff', ch.items[0].title) assert_equal('http://link2', ch.items[1].link) assert_equal("

aa bb cc\n dd ee ff

", ch.items[1].content) assert_equal('second item\'s title', ch.items[1].title) end def test_enclosures ch = FeedParser::Feed::new <<-EOF EOF # the third one should be removed because an enclosure should have an url, or it's useless. assert_equal([["url1", "1", "type1"], ["url2", nil, "type2"], ["url1", "1", nil]], ch.items[0].enclosures) end def test_recode_utf8 assert_equal 'UTF-8', FeedParser.recode("áéíóú").encoding.name end def test_recode_iso88519 assert_equal 'UTF-8', FeedParser.recode("áéíóú".encode('iso-8859-1')).encoding.name end def test_recode_utf8_mixed_with_ASCIIBIT recoded = FeedParser.recode("áé\x8Díóú") assert_equal'UTF-8', recoded.encoding.name assert_equal 'áéíóú', recoded end def test_recode_unicode_char assert_equal "1280×1024", FeedParser.recode("1280×1024") end def test_almost_valid_iso88591 input = "Codifica\xE7\xE3o \x96 quase v\xE1lida" assert_equal "Codificação quase válida", FeedParser.recode(input) end def test_feed_origin feed = FeedParser::Feed.new(nil, 'http://foo.com/feed') assert_equal "http://foo.com", feed.origin end def test_item_origin feed = FeedParser::Feed.new(nil, 'http://foo.com/feed') item = FeedParser::FeedItem.new(nil, feed) item.link = '/foo/bar' assert_equal 'http://foo.com/foo/bar', item.link end def test_item_origin_no_link item = FeedParser::FeedItem.new(nil, nil) assert_nil item.link end def test_item_no_feed item = FeedParser::FeedItem.new(nil, nil) item.link = '/foo/bar' assert_equal '/foo/bar', item.link end end ruby-feedparser-0.9.2/test/source/0000755000175000017500000000000012207254570017316 5ustar terceiroterceiroruby-feedparser-0.9.2/test/tc_textwrappedoutput.rb0000755000175000017500000000304312207206023022652 0ustar terceiroterceiro#!/usr/bin/ruby -w $:.unshift File.join(File.dirname(__FILE__), '..', 'lib') require 'test/unit' require 'feedparser' class TextWrappedOutputTest < Test::Unit::TestCase if File::directory?('test/source') SRCDIR = 'test/source' DSTDIR = 'test/textwrapped_output' elsif File::directory?('source') SRCDIR = 'source' DSTDIR = 'textwrapped_output' else raise 'source directory not found.' end Dir.foreach(SRCDIR) do |f| next if f !~ /.xml$/ testname = 'test_' + File.basename(f).gsub(/\W/, '_') define_method(testname) do str = File::read(SRCDIR + '/' + f) chan = FeedParser::Feed::new(str) chanstr = chan.to_text(false, 72) # localtime set to false if File::exist?(DSTDIR + '/' + f.gsub(/.xml$/, '.output')) output = File::read(DSTDIR + '/' + f.gsub(/.xml$/, '.output')) if output != chanstr File::open(DSTDIR + '/' + f.gsub(/.xml$/, '.output.new'), "w") do |fd| fd.print(chanstr) end assert( false, [ "Test failed for #{f}.", " Check: diff -u #{DSTDIR + '/' + f.gsub(/.xml$/, '.output')}{,.new}", " Commit: mv -f #{DSTDIR + '/' + f.gsub(/.xml$/, '.output')}{.new,}", ].join("\n") ) end else File::open(DSTDIR + '/' + f.gsub(/.xml$/, '.output'), "w") do |f| f.print(chanstr) end assert(false, "Missing #{DSTDIR + '/' + f.gsub(/.xml$/, '.output')}. Writing it, but check manually!") end end end end ruby-feedparser-0.9.2/test/tc_textoutput.rb0000755000175000017500000000301212207206025021265 0ustar terceiroterceiro#!/usr/bin/ruby -w $:.unshift File.join(File.dirname(__FILE__), '..', 'lib') require 'test/unit' require 'feedparser' class TextOutputTest < Test::Unit::TestCase if File::directory?('test/source') SRCDIR = 'test/source' DSTDIR = 'test/text_output' elsif File::directory?('source') SRCDIR = 'source' DSTDIR = 'text_output' else raise 'source directory not found.' end Dir.foreach(SRCDIR) do |f| next if f !~ /.xml$/ testname = 'test_' + File.basename(f).gsub(/\W/, '_') define_method(testname) do str = File::read(SRCDIR + '/' + f) chan = FeedParser::Feed::new(str) chanstr = chan.to_text(false) # localtime set to false if File::exist?(DSTDIR + '/' + f.gsub(/.xml$/, '.output')) output = File::read(DSTDIR + '/' + f.gsub(/.xml$/, '.output')) if output != chanstr File::open(DSTDIR + '/' + f.gsub(/.xml$/, '.output.new'), "w") do |fd| fd.print(chanstr) end assert( false, [ "Test failed for #{f}.", " Check: diff -u #{DSTDIR + '/' + f.gsub(/.xml$/, '.output')}{,.new}", " Commit: mv -f #{DSTDIR + '/' + f.gsub(/.xml$/, '.output')}{.new,}", ].join("\n") ) end else File::open(DSTDIR + '/' + f.gsub(/.xml$/, '.output'), "w") do |f| f.print(chanstr) end assert(false, "Missing #{DSTDIR + '/' + f.gsub(/.xml$/, '.output')}. Writing it, but check manually!") end end end end ruby-feedparser-0.9.2/test/parser_output/0000755000175000017500000000000012207254570020732 5ustar terceiroterceiroruby-feedparser-0.9.2/test/tc_htmloutput.rb0000755000175000017500000000326612207206041021256 0ustar terceiroterceiro#!/usr/bin/ruby -w $:.unshift File.join(File.dirname(__FILE__), '..', 'lib') $:.unshift File.join(File.dirname(__FILE__), '..', 'test') $:.unshift File.join(File.dirname(__FILE__), 'lib') $:.unshift File.join(File.dirname(__FILE__), 'test') require 'test/unit' require 'feedparser' require 'feedparser/html-output' class HTMLOutputTest < Test::Unit::TestCase if File::directory?('test/source') SRCDIR = 'test/source' DSTDIR = 'test/html_output' elsif File::directory?('source') SRCDIR = 'source' DSTDIR = 'html_output' else raise 'source directory not found.' end Dir.foreach(SRCDIR) do |f| next if f !~ /.xml$/ testname = 'test_' + File.basename(f).gsub(/\W/, '_') define_method(testname) do str = File::read(SRCDIR + '/' + f) chan = FeedParser::Feed::new(str) chanstr = chan.to_html(false) if File::exist?(DSTDIR + '/' + f.gsub(/.xml$/, '.output')) output = File::read(DSTDIR + '/' + f.gsub(/.xml$/, '.output')) if output != chanstr File::open(DSTDIR + '/' + f.gsub(/.xml$/, '.output.new'), "w") do |fd| fd.print(chanstr) end assert( false, [ "Test failed for #{f}.", " Check: diff -u #{DSTDIR + '/' + f.gsub(/.xml$/, '.output')}{,.new}", " Commit: mv -f #{DSTDIR + '/' + f.gsub(/.xml$/, '.output')}{.new,}", ].join("\n") ) end else File::open(DSTDIR + '/' + f.gsub(/.xml$/, '.output'), "w") do |f| f.print(chanstr) end assert(false, "Missing #{DSTDIR + '/' + f.gsub(/.xml$/, '.output')}. Writing it, but check manually!") end end end end ruby-feedparser-0.9.2/test/ts_feedparser.rb0000755000175000017500000000055612202272501021167 0ustar terceiroterceiro#!/usr/bin/ruby -w $:.unshift File.join(File.dirname(__FILE__), '..', 'lib') $:.unshift File.join(File.dirname(__FILE__), '..', 'test') $:.unshift File.join(File.dirname(__FILE__), 'lib') $:.unshift File.join(File.dirname(__FILE__), 'test') require 'tc_feed_parse' require 'tc_htmloutput' require 'tc_parser' require 'tc_textoutput' require 'tc_textwrappedoutput' ruby-feedparser-0.9.2/test/tc_html2text_parser.rb0000644000175000017500000000174212202526262022337 0ustar terceiroterceiro# encoding: UTF-8 require 'test/unit' require 'feedparser/feedparser' class Html2TextParserTest < Test::Unit::TestCase def test_next_img_index parser = FeedParser::HTML2TextParser.new assert_equal 'A', parser.next_img_index assert_equal 'B', parser.next_img_index end def test_numerical_entity parser = FeedParser::HTML2TextParser.new parser.feed('1280×1024') parser.close assert_equal "1280×1024", parser.savedata end def test_numerical_entity_large_known parser = FeedParser::HTML2TextParser.new parser.feed('→') parser.close assert_equal "→", parser.savedata end def test_numerical_entity_large parser = FeedParser::HTML2TextParser.new parser.feed('✐') parser.close assert_equal "✐", parser.savedata end def test_non_numerical_entity parser = FeedParser::HTML2TextParser.new parser.feed('HTML&CO') parser.close assert_equal "HTML&CO", parser.savedata end end ruby-feedparser-0.9.2/test/html_output/0000755000175000017500000000000012207254570020402 5ustar terceiroterceiroruby-feedparser-0.9.2/test/tc_parser.rb0000755000175000017500000000275612207206034020332 0ustar terceiroterceiro#!/usr/bin/ruby -w $:.unshift File.join(File.dirname(__FILE__), '..', 'lib') require 'test/unit' require 'feedparser' class ParserTest < Test::Unit::TestCase if File::directory?('test/source') SRCDIR = 'test/source' DSTDIR = 'test/parser_output' elsif File::directory?('source') SRCDIR = 'source' DSTDIR = 'parser_output' else raise 'source directory not found.' end Dir.foreach(SRCDIR) do |f| next if f !~ /.xml$/ testname = 'test_' + File.basename(f).gsub(/\W/, '_') define_method(testname) do str = File::read(SRCDIR + '/' + f) chan = FeedParser::Feed::new(str) chanstr = chan.to_s(false) if File::exist?(DSTDIR + '/' + f.gsub(/.xml$/, '.output')) output = File::read(DSTDIR + '/' + f.gsub(/.xml$/, '.output')) if output != chanstr File::open(DSTDIR + '/' + f.gsub(/.xml$/, '.output.new'), "w") do |fd| fd.print(chanstr) end assert( false, [ "Test failed for #{f}.", " Check: diff -u #{DSTDIR + '/' + f.gsub(/.xml$/, '.output')}{,.new}", " Commit: mv -f #{DSTDIR + '/' + f.gsub(/.xml$/, '.output')}{.new,}", ].join("\n") ) end else File::open(DSTDIR + '/' + f.gsub(/.xml$/, '.output'), "w") do |f| f.print(chanstr) end assert(false, "Missing #{DSTDIR + '/' + f.gsub(/.xml$/, '.output')}. Writing it, but check manually!") end end end end ruby-feedparser-0.9.2/ChangeLog0000644000175000017500000000461712202272501016606 0ustar terceiroterceiroRuby-Feedparser 0.7 (27/07/2009) ================================ * Handled several creators per feed item * Fix bug with urls into tag attributes * Better item categories support * Reworked text output formatting * Ignore ­, as some blog software (dotclear2) misuse it. Ruby-Feedparser 0.6 (23/07/2008) ================================ * Moved to_human_readable from class Fixnum to class Integer. * Correctly parse http://www.tbray.org/ongoing/ongoing.atom. Thanks to Janico Greifenberg for reporting this. * String#html2text now takes an additional wrapto parameter, allowing to wrap the text to a specified number of chars. Thanks to Maxime Petazzoni for the patch. Ruby-Feedparser 0.5 (26/10/2007) ================================ * Fixed a bug with items with both non-escaped and escaped HTML. Reported, then patch provided by Gregory Hartman . * In Atom feeds, use the date provided in , and use it in preference to the one in if both are available. Closes gna bug #8987. * "require 'feedparser'" now requires 'feedparser/text-output'. Fixes a bug reported by Sebastian Probst Eide. * Make checks for HTML tags case-insensitive. Broke Dilbert feeds!! Reported by Michal Čihař. Closes gna bug #10199. Ruby-Feedparser 0.4 (01/05/2007) ================================ * Fixed a problem with html entities in the items' titles. * Date was not fetched for blogspot's atom feeds. Patch from Jason Ling . * Tests are now timezone-friendly. (closes GNA bug #8145). * Much nicer text output. Ruby-Feedparser 0.3 (01/12/2006) ================================ * Much nicer HTML output * Fixed a problem with some feeds with broken enclosures (without url) * Now automatically fixes non-absolute or * Fixed small parser bugs * Now displays enclosures in the text and html outputs. Ready for podcasting :-) * Now escape title, creator, subject and category internally. This minor fix avoids & stuff in the titles, for example. Ruby-Feedparser 0.2 (05/06/2006) ================================ * Fixed a problem when parsing some ATOM feeds with without type attribute. (Thanks Michal Cihar !) * FeedParser::Feed and FeedParser::FeedItem now have an xml attribute to get the related REXML::Element. * support in RSS. Ruby-Feedparser 0.1 (24/11/2005) ================================ * first public release. ruby-feedparser-0.9.2/README0000644000175000017500000000070412202272501015705 0ustar terceiroterceiro Ruby-Feedparser ----------------- by Lucas Nussbaum Currently, all the information is provided on http://home.gna.org/ruby-feedparser/ If you need to ask questions, feel free to ask them on the ruby-feedparser-devel@gna.org mailing list. Ruby-Feedparser is released under the Ruby license (see the LICENSE file), which is compatible with the GNU GPL (see the COPYING file) via an explicit dual-licensing clause. ruby-feedparser-0.9.2/setup.rb0000755000175000017500000010652212202272501016522 0ustar terceiroterceiro#!/usr/bin/ruby # # setup.rb # # Copyright (c) 2000-2005 Minero Aoki # # This program is free software. # You can distribute/modify this program under the terms of # the GNU LGPL, Lesser General Public License version 2.1. # unless Enumerable.method_defined?(:map) # Ruby 1.4.6 module Enumerable alias map collect end end unless File.respond_to?(:read) # Ruby 1.6 def File.read(fname) open(fname) {|f| return f.read } end end unless Errno.const_defined?(:ENOTEMPTY) # Windows? module Errno class ENOTEMPTY # We do not raise this exception, implementation is not needed. end end end def File.binread(fname) open(fname, 'rb') {|f| return f.read } end # for corrupted Windows' stat(2) def File.dir?(path) File.directory?((path[-1,1] == '/') ? path : path + '/') end class ConfigTable include Enumerable def initialize(rbconfig) @rbconfig = rbconfig @items = [] @table = {} # options @install_prefix = nil @config_opt = nil @verbose = true @no_harm = false end attr_accessor :install_prefix attr_accessor :config_opt attr_writer :verbose def verbose? @verbose end attr_writer :no_harm def no_harm? @no_harm end def [](key) lookup(key).resolve(self) end def []=(key, val) lookup(key).set val end def names @items.map {|i| i.name } end def each(&block) @items.each(&block) end def key?(name) @table.key?(name) end def lookup(name) @table[name] or setup_rb_error "no such config item: #{name}" end def add(item) @items.push item @table[item.name] = item end def remove(name) item = lookup(name) @items.delete_if {|i| i.name == name } @table.delete_if {|name, i| i.name == name } item end def load_script(path, inst = nil) if File.file?(path) MetaConfigEnvironment.new(self, inst).instance_eval File.read(path), path end end def savefile '.config' end def load_savefile begin File.foreach(savefile()) do |line| k, v = *line.split(/=/, 2) self[k] = v.strip end rescue Errno::ENOENT setup_rb_error $!.message + "\n#{File.basename($0)} config first" end end def save @items.each {|i| i.value } File.open(savefile(), 'w') {|f| @items.each do |i| f.printf "%s=%s\n", i.name, i.value if i.value? and i.value end } end def load_standard_entries standard_entries(@rbconfig).each do |ent| add ent end end def standard_entries(rbconfig) c = rbconfig rubypath = File.join(c['bindir'], c['ruby_install_name'] + c['EXEEXT']) major = c['MAJOR'].to_i minor = c['MINOR'].to_i teeny = c['TEENY'].to_i version = "#{major}.#{minor}" # ruby ver. >= 1.4.4? newpath_p = ((major >= 2) or ((major == 1) and ((minor >= 5) or ((minor == 4) and (teeny >= 4))))) if c['rubylibdir'] # V > 1.6.3 libruby = "#{c['prefix']}/lib/ruby" librubyver = c['rubylibdir'] librubyverarch = c['archdir'] siteruby = c['sitedir'] siterubyver = c['sitelibdir'] siterubyverarch = c['sitearchdir'] elsif newpath_p # 1.4.4 <= V <= 1.6.3 libruby = "#{c['prefix']}/lib/ruby" librubyver = "#{c['prefix']}/lib/ruby/#{version}" librubyverarch = "#{c['prefix']}/lib/ruby/#{version}/#{c['arch']}" siteruby = c['sitedir'] siterubyver = "$siteruby/#{version}" siterubyverarch = "$siterubyver/#{c['arch']}" else # V < 1.4.4 libruby = "#{c['prefix']}/lib/ruby" librubyver = "#{c['prefix']}/lib/ruby/#{version}" librubyverarch = "#{c['prefix']}/lib/ruby/#{version}/#{c['arch']}" siteruby = "#{c['prefix']}/lib/ruby/#{version}/site_ruby" siterubyver = siteruby siterubyverarch = "$siterubyver/#{c['arch']}" end parameterize = lambda {|path| path.sub(/\A#{Regexp.quote(c['prefix'])}/, '$prefix') } if arg = c['configure_args'].split.detect {|arg| /--with-make-prog=/ =~ arg } makeprog = arg.sub(/'/, '').split(/=/, 2)[1] else makeprog = 'make' end [ ExecItem.new('installdirs', 'std/site/home', 'std: install under libruby; site: install under site_ruby; home: install under $HOME')\ {|val, table| case val when 'std' table['rbdir'] = '$librubyver' table['sodir'] = '$librubyverarch' when 'site' table['rbdir'] = '$siterubyver' table['sodir'] = '$siterubyverarch' when 'home' setup_rb_error '$HOME was not set' unless ENV['HOME'] table['prefix'] = ENV['HOME'] table['rbdir'] = '$libdir/ruby' table['sodir'] = '$libdir/ruby' end }, PathItem.new('prefix', 'path', c['prefix'], 'path prefix of target environment'), PathItem.new('bindir', 'path', parameterize.call(c['bindir']), 'the directory for commands'), PathItem.new('libdir', 'path', parameterize.call(c['libdir']), 'the directory for libraries'), PathItem.new('datadir', 'path', parameterize.call(c['datadir']), 'the directory for shared data'), PathItem.new('mandir', 'path', parameterize.call(c['mandir']), 'the directory for man pages'), PathItem.new('sysconfdir', 'path', parameterize.call(c['sysconfdir']), 'the directory for system configuration files'), PathItem.new('localstatedir', 'path', parameterize.call(c['localstatedir']), 'the directory for local state data'), PathItem.new('libruby', 'path', libruby, 'the directory for ruby libraries'), PathItem.new('librubyver', 'path', librubyver, 'the directory for standard ruby libraries'), PathItem.new('librubyverarch', 'path', librubyverarch, 'the directory for standard ruby extensions'), PathItem.new('siteruby', 'path', siteruby, 'the directory for version-independent aux ruby libraries'), PathItem.new('siterubyver', 'path', siterubyver, 'the directory for aux ruby libraries'), PathItem.new('siterubyverarch', 'path', siterubyverarch, 'the directory for aux ruby binaries'), PathItem.new('rbdir', 'path', '$siterubyver', 'the directory for ruby scripts'), PathItem.new('sodir', 'path', '$siterubyverarch', 'the directory for ruby extentions'), PathItem.new('rubypath', 'path', rubypath, 'the path to set to #! line'), ProgramItem.new('rubyprog', 'name', rubypath, 'the ruby program using for installation'), ProgramItem.new('makeprog', 'name', makeprog, 'the make program to compile ruby extentions'), SelectItem.new('shebang', 'all/ruby/never', 'ruby', 'shebang line (#!) editing mode'), BoolItem.new('without-ext', 'yes/no', 'no', 'does not compile/install ruby extentions') ] end private :standard_entries def load_multipackage_entries multipackage_entries().each do |ent| add ent end end def multipackage_entries [ PackageSelectionItem.new('with', 'name,name...', '', 'ALL', 'package names that you want to install'), PackageSelectionItem.new('without', 'name,name...', '', 'NONE', 'package names that you do not want to install') ] end private :multipackage_entries ALIASES = { 'std-ruby' => 'librubyver', 'stdruby' => 'librubyver', 'rubylibdir' => 'librubyver', 'archdir' => 'librubyverarch', 'site-ruby-common' => 'siteruby', # For backward compatibility 'site-ruby' => 'siterubyver', # For backward compatibility 'bin-dir' => 'bindir', 'bin-dir' => 'bindir', 'rb-dir' => 'rbdir', 'so-dir' => 'sodir', 'data-dir' => 'datadir', 'ruby-path' => 'rubypath', 'ruby-prog' => 'rubyprog', 'ruby' => 'rubyprog', 'make-prog' => 'makeprog', 'make' => 'makeprog' } def fixup ALIASES.each do |ali, name| @table[ali] = @table[name] end @items.freeze @table.freeze @options_re = /\A--(#{@table.keys.join('|')})(?:=(.*))?\z/ end def parse_opt(opt) m = @options_re.match(opt) or setup_rb_error "config: unknown option #{opt}" m.to_a[1,2] end def dllext @rbconfig['DLEXT'] end def value_config?(name) lookup(name).value? end class Item def initialize(name, template, default, desc) @name = name.freeze @template = template @value = default @default = default @description = desc end attr_reader :name attr_reader :description attr_accessor :default alias help_default default def help_opt "--#{@name}=#{@template}" end def value? true end def value @value end def resolve(table) @value.gsub(%r<\$([^/]+)>) { table[$1] } end def set(val) @value = check(val) end private def check(val) setup_rb_error "config: --#{name} requires argument" unless val val end end class BoolItem < Item def config_type 'bool' end def help_opt "--#{@name}" end private def check(val) return 'yes' unless val case val when /\Ay(es)?\z/i, /\At(rue)?\z/i then 'yes' when /\An(o)?\z/i, /\Af(alse)\z/i then 'no' else setup_rb_error "config: --#{@name} accepts only yes/no for argument" end end end class PathItem < Item def config_type 'path' end private def check(path) setup_rb_error "config: --#{@name} requires argument" unless path path[0,1] == '$' ? path : File.expand_path(path) end end class ProgramItem < Item def config_type 'program' end end class SelectItem < Item def initialize(name, selection, default, desc) super @ok = selection.split('/') end def config_type 'select' end private def check(val) unless @ok.include?(val.strip) setup_rb_error "config: use --#{@name}=#{@template} (#{val})" end val.strip end end class ExecItem < Item def initialize(name, selection, desc, &block) super name, selection, nil, desc @ok = selection.split('/') @action = block end def config_type 'exec' end def value? false end def resolve(table) setup_rb_error "$#{name()} wrongly used as option value" end undef set def evaluate(val, table) v = val.strip.downcase unless @ok.include?(v) setup_rb_error "invalid option --#{@name}=#{val} (use #{@template})" end @action.call v, table end end class PackageSelectionItem < Item def initialize(name, template, default, help_default, desc) super name, template, default, desc @help_default = help_default end attr_reader :help_default def config_type 'package' end private def check(val) unless File.dir?("packages/#{val}") setup_rb_error "config: no such package: #{val}" end val end end class MetaConfigEnvironment def initialize(config, installer) @config = config @installer = installer end def config_names @config.names end def config?(name) @config.key?(name) end def bool_config?(name) @config.lookup(name).config_type == 'bool' end def path_config?(name) @config.lookup(name).config_type == 'path' end def value_config?(name) @config.lookup(name).config_type != 'exec' end def add_config(item) @config.add item end def add_bool_config(name, default, desc) @config.add BoolItem.new(name, 'yes/no', default ? 'yes' : 'no', desc) end def add_path_config(name, default, desc) @config.add PathItem.new(name, 'path', default, desc) end def set_config_default(name, default) @config.lookup(name).default = default end def remove_config(name) @config.remove(name) end # For only multipackage def packages raise '[setup.rb fatal] multi-package metaconfig API packages() called for single-package; contact application package vendor' unless @installer @installer.packages end # For only multipackage def declare_packages(list) raise '[setup.rb fatal] multi-package metaconfig API declare_packages() called for single-package; contact application package vendor' unless @installer @installer.packages = list end end end # class ConfigTable # This module requires: #verbose?, #no_harm? module FileOperations def mkdir_p(dirname, prefix = nil) dirname = prefix + File.expand_path(dirname) if prefix $stderr.puts "mkdir -p #{dirname}" if verbose? return if no_harm? # Does not check '/', it's too abnormal. dirs = File.expand_path(dirname).split(%r<(?=/)>) if /\A[a-z]:\z/i =~ dirs[0] disk = dirs.shift dirs[0] = disk + dirs[0] end dirs.each_index do |idx| path = dirs[0..idx].join('') Dir.mkdir path unless File.dir?(path) end end def rm_f(path) $stderr.puts "rm -f #{path}" if verbose? return if no_harm? force_remove_file path end def rm_rf(path) $stderr.puts "rm -rf #{path}" if verbose? return if no_harm? remove_tree path end def remove_tree(path) if File.symlink?(path) remove_file path elsif File.dir?(path) remove_tree0 path else force_remove_file path end end def remove_tree0(path) Dir.foreach(path) do |ent| next if ent == '.' next if ent == '..' entpath = "#{path}/#{ent}" if File.symlink?(entpath) remove_file entpath elsif File.dir?(entpath) remove_tree0 entpath else force_remove_file entpath end end begin Dir.rmdir path rescue Errno::ENOTEMPTY # directory may not be empty end end def move_file(src, dest) force_remove_file dest begin File.rename src, dest rescue File.open(dest, 'wb') {|f| f.write File.binread(src) } File.chmod File.stat(src).mode, dest File.unlink src end end def force_remove_file(path) begin remove_file path rescue end end def remove_file(path) File.chmod 0777, path File.unlink path end def install(from, dest, mode, prefix = nil) $stderr.puts "install #{from} #{dest}" if verbose? return if no_harm? realdest = prefix ? prefix + File.expand_path(dest) : dest realdest = File.join(realdest, File.basename(from)) if File.dir?(realdest) str = File.binread(from) if diff?(str, realdest) verbose_off { rm_f realdest if File.exist?(realdest) } File.open(realdest, 'wb') {|f| f.write str } File.chmod mode, realdest File.open("#{objdir_root()}/InstalledFiles", 'a') {|f| if prefix f.puts realdest.sub(prefix, '') else f.puts realdest end } end end def diff?(new_content, path) return true unless File.exist?(path) new_content != File.binread(path) end def command(*args) $stderr.puts args.join(' ') if verbose? system(*args) or raise RuntimeError, "system(#{args.map{|a| a.inspect }.join(' ')}) failed" end def ruby(*args) command config('rubyprog'), *args end def make(task = nil) command(*[config('makeprog'), task].compact) end def extdir?(dir) File.exist?("#{dir}/MANIFEST") or File.exist?("#{dir}/extconf.rb") end def files_of(dir) Dir.open(dir) {|d| return d.select {|ent| File.file?("#{dir}/#{ent}") } } end DIR_REJECT = %w( . .. CVS SCCS RCS CVS.adm .svn ) def directories_of(dir) Dir.open(dir) {|d| return d.select {|ent| File.dir?("#{dir}/#{ent}") } - DIR_REJECT } end end # This module requires: #srcdir_root, #objdir_root, #relpath module HookScriptAPI def get_config(key) @config[key] end alias config get_config # obsolete: use metaconfig to change configuration def set_config(key, val) @config[key] = val end # # srcdir/objdir (works only in the package directory) # def curr_srcdir "#{srcdir_root()}/#{relpath()}" end def curr_objdir "#{objdir_root()}/#{relpath()}" end def srcfile(path) "#{curr_srcdir()}/#{path}" end def srcexist?(path) File.exist?(srcfile(path)) end def srcdirectory?(path) File.dir?(srcfile(path)) end def srcfile?(path) File.file?(srcfile(path)) end def srcentries(path = '.') Dir.open("#{curr_srcdir()}/#{path}") {|d| return d.to_a - %w(. ..) } end def srcfiles(path = '.') srcentries(path).select {|fname| File.file?(File.join(curr_srcdir(), path, fname)) } end def srcdirectories(path = '.') srcentries(path).select {|fname| File.dir?(File.join(curr_srcdir(), path, fname)) } end end class ToplevelInstaller Version = '3.4.1' Copyright = 'Copyright (c) 2000-2005 Minero Aoki' TASKS = [ [ 'all', 'do config, setup, then install' ], [ 'config', 'saves your configurations' ], [ 'show', 'shows current configuration' ], [ 'setup', 'compiles ruby extentions and others' ], [ 'install', 'installs files' ], [ 'test', 'run all tests in test/' ], [ 'clean', "does `make clean' for each extention" ], [ 'distclean',"does `make distclean' for each extention" ] ] def ToplevelInstaller.invoke config = ConfigTable.new(load_rbconfig()) config.load_standard_entries config.load_multipackage_entries if multipackage? config.fixup klass = (multipackage?() ? ToplevelInstallerMulti : ToplevelInstaller) klass.new(File.dirname($0), config).invoke end def ToplevelInstaller.multipackage? File.dir?(File.dirname($0) + '/packages') end def ToplevelInstaller.load_rbconfig if arg = ARGV.detect {|arg| /\A--rbconfig=/ =~ arg } ARGV.delete(arg) load File.expand_path(arg.split(/=/, 2)[1]) $".push 'rbconfig.rb' else require 'rbconfig' end ::Config::CONFIG end def initialize(ardir_root, config) @ardir = File.expand_path(ardir_root) @config = config # cache @valid_task_re = nil end def config(key) @config[key] end def inspect "#<#{self.class} #{__id__()}>" end def invoke run_metaconfigs case task = parsearg_global() when nil, 'all' parsearg_config init_installers exec_config exec_setup exec_install else case task when 'config', 'test' ; when 'clean', 'distclean' @config.load_savefile if File.exist?(@config.savefile) else @config.load_savefile end __send__ "parsearg_#{task}" init_installers __send__ "exec_#{task}" end end def run_metaconfigs @config.load_script "#{@ardir}/metaconfig" end def init_installers @installer = Installer.new(@config, @ardir, File.expand_path('.')) end # # Hook Script API bases # def srcdir_root @ardir end def objdir_root '.' end def relpath '.' end # # Option Parsing # def parsearg_global while arg = ARGV.shift case arg when /\A\w+\z/ setup_rb_error "invalid task: #{arg}" unless valid_task?(arg) return arg when '-q', '--quiet' @config.verbose = false when '--verbose' @config.verbose = true when '--help' print_usage $stdout exit 0 when '--version' puts "#{File.basename($0)} version #{Version}" exit 0 when '--copyright' puts Copyright exit 0 else setup_rb_error "unknown global option '#{arg}'" end end nil end def valid_task?(t) valid_task_re() =~ t end def valid_task_re @valid_task_re ||= /\A(?:#{TASKS.map {|task,desc| task }.join('|')})\z/ end def parsearg_no_options unless ARGV.empty? task = caller(0).first.slice(%r<`parsearg_(\w+)'>, 1) setup_rb_error "#{task}: unknown options: #{ARGV.join(' ')}" end end alias parsearg_show parsearg_no_options alias parsearg_setup parsearg_no_options alias parsearg_test parsearg_no_options alias parsearg_clean parsearg_no_options alias parsearg_distclean parsearg_no_options def parsearg_config evalopt = [] set = [] @config.config_opt = [] while i = ARGV.shift if /\A--?\z/ =~ i @config.config_opt = ARGV.dup break end name, value = *@config.parse_opt(i) if @config.value_config?(name) @config[name] = value else evalopt.push [name, value] end set.push name end evalopt.each do |name, value| @config.lookup(name).evaluate value, @config end # Check if configuration is valid set.each do |n| @config[n] if @config.value_config?(n) end end def parsearg_install @config.no_harm = false @config.install_prefix = '' while a = ARGV.shift case a when '--no-harm' @config.no_harm = true when /\A--prefix=/ path = a.split(/=/, 2)[1] path = File.expand_path(path) unless path[0,1] == '/' @config.install_prefix = path else setup_rb_error "install: unknown option #{a}" end end end def print_usage(out) out.puts 'Typical Installation Procedure:' out.puts " $ ruby #{File.basename $0} config" out.puts " $ ruby #{File.basename $0} setup" out.puts " # ruby #{File.basename $0} install (may require root privilege)" out.puts out.puts 'Detailed Usage:' out.puts " ruby #{File.basename $0} " out.puts " ruby #{File.basename $0} [] []" fmt = " %-24s %s\n" out.puts out.puts 'Global options:' out.printf fmt, '-q,--quiet', 'suppress message outputs' out.printf fmt, ' --verbose', 'output messages verbosely' out.printf fmt, ' --help', 'print this message' out.printf fmt, ' --version', 'print version and quit' out.printf fmt, ' --copyright', 'print copyright and quit' out.puts out.puts 'Tasks:' TASKS.each do |name, desc| out.printf fmt, name, desc end fmt = " %-24s %s [%s]\n" out.puts out.puts 'Options for CONFIG or ALL:' @config.each do |item| out.printf fmt, item.help_opt, item.description, item.help_default end out.printf fmt, '--rbconfig=path', 'rbconfig.rb to load',"running ruby's" out.puts out.puts 'Options for INSTALL:' out.printf fmt, '--no-harm', 'only display what to do if given', 'off' out.printf fmt, '--prefix=path', 'install path prefix', '' out.puts end # # Task Handlers # def exec_config @installer.exec_config @config.save # must be final end def exec_setup @installer.exec_setup end def exec_install @installer.exec_install end def exec_test @installer.exec_test end def exec_show @config.each do |i| printf "%-20s %s\n", i.name, i.value if i.value? end end def exec_clean @installer.exec_clean end def exec_distclean @installer.exec_distclean end end # class ToplevelInstaller class ToplevelInstallerMulti < ToplevelInstaller include FileOperations def initialize(ardir_root, config) super @packages = directories_of("#{@ardir}/packages") raise 'no package exists' if @packages.empty? @root_installer = Installer.new(@config, @ardir, File.expand_path('.')) end def run_metaconfigs @config.load_script "#{@ardir}/metaconfig", self @packages.each do |name| @config.load_script "#{@ardir}/packages/#{name}/metaconfig" end end attr_reader :packages def packages=(list) raise 'package list is empty' if list.empty? list.each do |name| raise "directory packages/#{name} does not exist"\ unless File.dir?("#{@ardir}/packages/#{name}") end @packages = list end def init_installers @installers = {} @packages.each do |pack| @installers[pack] = Installer.new(@config, "#{@ardir}/packages/#{pack}", "packages/#{pack}") end with = extract_selection(config('with')) without = extract_selection(config('without')) @selected = @installers.keys.select {|name| (with.empty? or with.include?(name)) \ and not without.include?(name) } end def extract_selection(list) a = list.split(/,/) a.each do |name| setup_rb_error "no such package: #{name}" unless @installers.key?(name) end a end def print_usage(f) super f.puts 'Inluded packages:' f.puts ' ' + @packages.sort.join(' ') f.puts end # # Task Handlers # def exec_config run_hook 'pre-config' each_selected_installers {|inst| inst.exec_config } run_hook 'post-config' @config.save # must be final end def exec_setup run_hook 'pre-setup' each_selected_installers {|inst| inst.exec_setup } run_hook 'post-setup' end def exec_install run_hook 'pre-install' each_selected_installers {|inst| inst.exec_install } run_hook 'post-install' end def exec_test run_hook 'pre-test' each_selected_installers {|inst| inst.exec_test } run_hook 'post-test' end def exec_clean rm_f @config.savefile run_hook 'pre-clean' each_selected_installers {|inst| inst.exec_clean } run_hook 'post-clean' end def exec_distclean rm_f @config.savefile run_hook 'pre-distclean' each_selected_installers {|inst| inst.exec_distclean } run_hook 'post-distclean' end # # lib # def each_selected_installers Dir.mkdir 'packages' unless File.dir?('packages') @selected.each do |pack| $stderr.puts "Processing the package `#{pack}' ..." if verbose? Dir.mkdir "packages/#{pack}" unless File.dir?("packages/#{pack}") Dir.chdir "packages/#{pack}" yield @installers[pack] Dir.chdir '../..' end end def run_hook(id) @root_installer.run_hook id end # module FileOperations requires this def verbose? @config.verbose? end # module FileOperations requires this def no_harm? @config.no_harm? end end # class ToplevelInstallerMulti class Installer FILETYPES = %w( bin lib ext data conf man ) include FileOperations include HookScriptAPI def initialize(config, srcroot, objroot) @config = config @srcdir = File.expand_path(srcroot) @objdir = File.expand_path(objroot) @currdir = '.' end def inspect "#<#{self.class} #{File.basename(@srcdir)}>" end def noop(rel) end # # Hook Script API base methods # def srcdir_root @srcdir end def objdir_root @objdir end def relpath @currdir end # # Config Access # # module FileOperations requires this def verbose? @config.verbose? end # module FileOperations requires this def no_harm? @config.no_harm? end def verbose_off begin save, @config.verbose = @config.verbose?, false yield ensure @config.verbose = save end end # # TASK config # def exec_config exec_task_traverse 'config' end alias config_dir_bin noop alias config_dir_lib noop def config_dir_ext(rel) extconf if extdir?(curr_srcdir()) end alias config_dir_data noop alias config_dir_conf noop alias config_dir_man noop def extconf ruby "#{curr_srcdir()}/extconf.rb", *@config.config_opt end # # TASK setup # def exec_setup exec_task_traverse 'setup' end def setup_dir_bin(rel) files_of(curr_srcdir()).each do |fname| update_shebang_line "#{curr_srcdir()}/#{fname}" end end alias setup_dir_lib noop def setup_dir_ext(rel) make if extdir?(curr_srcdir()) end alias setup_dir_data noop alias setup_dir_conf noop alias setup_dir_man noop def update_shebang_line(path) return if no_harm? return if config('shebang') == 'never' old = Shebang.load(path) if old $stderr.puts "warning: #{path}: Shebang line includes too many args. It is not portable and your program may not work." if old.args.size > 1 new = new_shebang(old) return if new.to_s == old.to_s else return unless config('shebang') == 'all' new = Shebang.new(config('rubypath')) end $stderr.puts "updating shebang: #{File.basename(path)}" if verbose? open_atomic_writer(path) {|output| File.open(path, 'rb') {|f| f.gets if old # discard output.puts new.to_s output.print f.read } } end def new_shebang(old) if /\Aruby/ =~ File.basename(old.cmd) Shebang.new(config('rubypath'), old.args) elsif File.basename(old.cmd) == 'env' and old.args.first == 'ruby' Shebang.new(config('rubypath'), old.args[1..-1]) else return old unless config('shebang') == 'all' Shebang.new(config('rubypath')) end end def open_atomic_writer(path, &block) tmpfile = File.basename(path) + '.tmp' begin File.open(tmpfile, 'wb', &block) File.rename tmpfile, File.basename(path) ensure File.unlink tmpfile if File.exist?(tmpfile) end end class Shebang def Shebang.load(path) line = nil File.open(path) {|f| line = f.gets } return nil unless /\A#!/ =~ line parse(line) end def Shebang.parse(line) cmd, *args = *line.strip.sub(/\A\#!/, '').split(' ') new(cmd, args) end def initialize(cmd, args = []) @cmd = cmd @args = args end attr_reader :cmd attr_reader :args def to_s "#! #{@cmd}" + (@args.empty? ? '' : " #{@args.join(' ')}") end end # # TASK install # def exec_install rm_f 'InstalledFiles' exec_task_traverse 'install' end def install_dir_bin(rel) install_files targetfiles(), "#{config('bindir')}/#{rel}", 0755 end def install_dir_lib(rel) install_files libfiles(), "#{config('rbdir')}/#{rel}", 0644 end def install_dir_ext(rel) return unless extdir?(curr_srcdir()) install_files rubyextentions('.'), "#{config('sodir')}/#{File.dirname(rel)}", 0555 end def install_dir_data(rel) install_files targetfiles(), "#{config('datadir')}/#{rel}", 0644 end def install_dir_conf(rel) # FIXME: should not remove current config files # (rename previous file to .old/.org) install_files targetfiles(), "#{config('sysconfdir')}/#{rel}", 0644 end def install_dir_man(rel) install_files targetfiles(), "#{config('mandir')}/#{rel}", 0644 end def install_files(list, dest, mode) mkdir_p dest, @config.install_prefix list.each do |fname| install fname, dest, mode, @config.install_prefix end end def libfiles glob_reject(%w(*.y *.output), targetfiles()) end def rubyextentions(dir) ents = glob_select("*.#{@config.dllext}", targetfiles()) if ents.empty? setup_rb_error "no ruby extention exists: 'ruby #{$0} setup' first" end ents end def targetfiles mapdir(existfiles() - hookfiles()) end def mapdir(ents) ents.map {|ent| if File.exist?(ent) then ent # objdir else "#{curr_srcdir()}/#{ent}" # srcdir end } end # picked up many entries from cvs-1.11.1/src/ignore.c JUNK_FILES = %w( core RCSLOG tags TAGS .make.state .nse_depinfo #* .#* cvslog.* ,* .del-* *.olb *~ *.old *.bak *.BAK *.orig *.rej _$* *$ *.org *.in .* ) def existfiles glob_reject(JUNK_FILES, (files_of(curr_srcdir()) | files_of('.'))) end def hookfiles %w( pre-%s post-%s pre-%s.rb post-%s.rb ).map {|fmt| %w( config setup install clean ).map {|t| sprintf(fmt, t) } }.flatten end def glob_select(pat, ents) re = globs2re([pat]) ents.select {|ent| re =~ ent } end def glob_reject(pats, ents) re = globs2re(pats) ents.reject {|ent| re =~ ent } end GLOB2REGEX = { '.' => '\.', '$' => '\$', '#' => '\#', '*' => '.*' } def globs2re(pats) /\A(?:#{ pats.map {|pat| pat.gsub(/[\.\$\#\*]/) {|ch| GLOB2REGEX[ch] } }.join('|') })\z/ end # # TASK test # TESTDIR = 'test' def exec_test unless File.directory?('test') $stderr.puts 'no test in this package' if verbose? return end $stderr.puts 'Running tests...' if verbose? begin require 'test/unit' rescue LoadError setup_rb_error 'test/unit cannot loaded. You need Ruby 1.8 or later to invoke this task.' end runner = Test::Unit::AutoRunner.new(true) runner.to_run << TESTDIR runner.run end # # TASK clean # def exec_clean exec_task_traverse 'clean' rm_f @config.savefile rm_f 'InstalledFiles' end alias clean_dir_bin noop alias clean_dir_lib noop alias clean_dir_data noop alias clean_dir_conf noop alias clean_dir_man noop def clean_dir_ext(rel) return unless extdir?(curr_srcdir()) make 'clean' if File.file?('Makefile') end # # TASK distclean # def exec_distclean exec_task_traverse 'distclean' rm_f @config.savefile rm_f 'InstalledFiles' end alias distclean_dir_bin noop alias distclean_dir_lib noop def distclean_dir_ext(rel) return unless extdir?(curr_srcdir()) make 'distclean' if File.file?('Makefile') end alias distclean_dir_data noop alias distclean_dir_conf noop alias distclean_dir_man noop # # Traversing # def exec_task_traverse(task) run_hook "pre-#{task}" FILETYPES.each do |type| if type == 'ext' and config('without-ext') == 'yes' $stderr.puts 'skipping ext/* by user option' if verbose? next end traverse task, type, "#{task}_dir_#{type}" end run_hook "post-#{task}" end def traverse(task, rel, mid) dive_into(rel) { run_hook "pre-#{task}" __send__ mid, rel.sub(%r[\A.*?(?:/|\z)], '') directories_of(curr_srcdir()).each do |d| traverse task, "#{rel}/#{d}", mid end run_hook "post-#{task}" } end def dive_into(rel) return unless File.dir?("#{@srcdir}/#{rel}") dir = File.basename(rel) Dir.mkdir dir unless File.dir?(dir) prevdir = Dir.pwd Dir.chdir dir $stderr.puts '---> ' + rel if verbose? @currdir = rel yield Dir.chdir prevdir $stderr.puts '<--- ' + rel if verbose? @currdir = File.dirname(rel) end def run_hook(id) path = [ "#{curr_srcdir()}/#{id}", "#{curr_srcdir()}/#{id}.rb" ].detect {|cand| File.file?(cand) } return unless path begin instance_eval File.read(path), path, 1 rescue raise if $DEBUG setup_rb_error "hook #{path} failed:\n" + $!.message end end end # class Installer class SetupError < StandardError; end def setup_rb_error(msg) raise SetupError, msg end if $0 == __FILE__ begin ToplevelInstaller.invoke rescue SetupError raise if $DEBUG $stderr.puts $!.message $stderr.puts "Try 'ruby #{$0} --help' for detailed usage." exit 1 end end ruby-feedparser-0.9.2/tools/0000755000175000017500000000000012207254570016177 5ustar terceiroterceiroruby-feedparser-0.9.2/tools/doctoweb.bash0000755000175000017500000000127712202272501020643 0ustar terceiroterceiro#!/bin/bash if [ -z $CVSDIR ]; then CVSDIR=$HOME/dev/ruby-feedparser/website fi TARGET=$CVSDIR/rdoc echo "Copying rdoc documentation to $TARGET." if [ ! -d $TARGET ]; then echo "$TARGET doesn't exist, exiting." exit 1 fi rsync -a rdoc/ $TARGET/ echo "###########################################################" echo "CVS status :" cd $TARGET svn st echo "CVS Adding files." while [ $(svn st | grep "^? " | wc -l) -gt 0 ]; do svn add $(svn st | grep "^? " | awk '{print $2}') done echo "###########################################################" echo "CVS status after adding missing files:" svn st echo "Commit changes now with" echo "# (cd $TARGET && svn commit -m \"rdoc update\")" exit 0 ruby-feedparser-0.9.2/Rakefile0000644000175000017500000000373212207254556016515 0ustar terceiroterceirorequire 'rake/testtask' require 'rdoc/task' require 'rubygems/package_task' require 'rake' require 'find' # Globals PKG_NAME = 'ruby-feedparser' PKG_VERSION = '0.9.2' PKG_FILES = [ 'ChangeLog', 'README', 'COPYING', 'LICENSE', 'setup.rb', 'Rakefile'] Find.find('lib/', 'test/', 'tools/') do |f| if FileTest.directory?(f) and f =~ /\.svn/ Find.prune else PKG_FILES << f end end PKG_FILES.reject! { |f| f =~ /^test\/(source|.*_output)\// } task :default => [:test] Rake::TestTask.new do |t| t.libs << "test" t.test_files = FileList['test/tc_*.rb'] end Rake::RDocTask.new do |rd| f = [] Find.find('lib/') do |file| if FileTest.directory?(file) and file =~ /\.svn/ Find.prune else f << file if not FileTest.directory?(file) end end f.delete('lib/feedparser.rb') # hack to document the Feedparser module properly f.unshift('lib/feedparser.rb') rd.rdoc_files.include(f) rd.options << '--all' rd.options << '--diagram' rd.options << '--fileboxes' rd.options << '--inline-source' rd.options << '--line-numbers' rd.rdoc_dir = 'rdoc' end task :doctoweb => [:rdoc] do |t| # copies the rdoc to the CVS repository for ruby-feedparser website # repository is in $CVSDIR (default: ~/dev/ruby-feedparser-web) sh "tools/doctoweb.bash" end Rake::PackageTask.new(PKG_NAME, PKG_VERSION) do |p| p.need_tar = true p.need_zip = true p.package_files = PKG_FILES end # "Gem" part of the Rakefile begin spec = Gem::Specification.new do |s| s.platform = Gem::Platform::RUBY s.summary = "Ruby library to parse ATOM and RSS feeds" s.name = PKG_NAME s.version = PKG_VERSION s.requirements << 'none' s.require_path = 'lib' s.autorequire = 'feedparser' s.files = PKG_FILES s.description = "Ruby library to parse ATOM and RSS feeds" s.authors = ['Lucas Nussbaum'] s.add_runtime_dependency 'magic' end Gem::PackageTask.new(spec) do |pkg| pkg.need_zip = true pkg.need_tar = true end rescue LoadError puts "Will not generate gem." end ruby-feedparser-0.9.2/lib/0000755000175000017500000000000012207254570015605 5ustar terceiroterceiroruby-feedparser-0.9.2/lib/feedparser/0000755000175000017500000000000012207254570017725 5ustar terceiroterceiroruby-feedparser-0.9.2/lib/feedparser/html2text-parser.rb0000644000175000017500000002206712202521625023477 0ustar terceiroterceirorequire 'feedparser/sgml-parser' module FeedParser # this class provides a simple SGML parser that removes HTML tags class HTML2TextParser < SGMLParser attr_reader :savedata def initialize(verbose = false) @savedata = '' @pre = false @href = nil @links = [] @curlink = [] @imgs = [] @img_index = 'A' super(verbose) end def next_img_index idx = @img_index @img_index = @img_index.next idx end def handle_data(data) # let's remove all CR if not @pre data.gsub!(/\n/, ' ') data.gsub!(/( )+/, ' ') end data = FeedParser.recode(data) @savedata << data.encode(Encoding::UTF_8) end def unknown_starttag(tag, attrs) case tag when 'p', 'h4' @savedata << "\n\n" when 'h1' @savedata << "\n\n " when 'h2' @savedata << "\n\n " when 'h3' @savedata << "\n\n " when 'br' @savedata << "\n" when 'ul' @savedata << "\n" when 'li' @savedata << "\n - " when 'b' @savedata << '*' when 'strong' @savedata << '*' when 'em' @savedata << '*' when 'u' @savedata << '_' when 'i' @savedata << '/' when 'pre' @savedata << "\n\n" @pre = true when 'a' # find href in args @href = nil attrs.each do |a| if a[0] == 'href' @href = a[1] end end if @href @href.gsub!(/^("|'|)(.*)("|')$/,'\2') @curlink = @links.find_index(@href) if @curlink.nil? @links << @href @curlink = @links.length else @curlink += 1 end end when 'img' # find src in args src = nil attrs.each do |a| if a[0] == 'src' src = a[1] end end if src src.gsub!(/^("|'|)(.*)("|')$/,'\2') i = @imgs.index { |e| e[1] == src } if i.nil? idx = next_img_index @imgs << [ idx, src ] else idx = @imgs[i][0] end @savedata << "[#{idx}]" end else # puts "unknown tag: #{tag}" end end def close super if @links.length > 0 @savedata << "\n\n" @links.each_index do |i| @savedata << "[#{i+1}] #{@links[i]}\n" end end if @imgs.length > 0 @savedata << "\n\n" @imgs.each do |i| @savedata << "[#{i[0]}] #{i[1]}\n" end end end def unknown_endtag(tag) case tag when 'ul' @savedata << "\n" when 'b' @savedata << '*' when 'strong' @savedata << '*' when 'em' @savedata << '*' when 'u' @savedata << '_' when 'i' @savedata << '/' when 'pre' @savedata << "\n\n" @pre = false when 'a' if @href @savedata << "[#{@curlink}]" @href = nil end end end def unknown_charref(ref) handle_data([ref.to_i].pack('U*')) end def HTML2TextParser.entities return HTML_ENTITIES end HTML_ENTITIES = { "quot" => 34, "amp" => 38, "lt" => 60, "gt" => 62, "apos" => 39, "nbsp" => 160, "iexcl" => 161, "cent" => 162, "pound" => 163, "curren" => 164, "yen" => 165, "brvbar" => 166, "sect" => 167, "uml" => 168, "copy" => 169, "ordf" => 170, "laquo" => 171, "not" => 172, "shy" => 173, "reg" => 174, "macr" => 175, "deg" => 176, "plusmn" => 177, "sup2" => 178, "sup3" => 179, "acute" => 180, "micro" => 181, "para" => 182, "middot" => 183, "cedil" => 184, "sup1" => 185, "ordm" => 186, "raquo" => 187, "frac14" => 188, "frac12" => 189, "frac34" => 190, "iquest" => 191, "Agrave" => 192, "Aacute" => 193, "Acirc" => 194, "Atilde" => 195, "Auml" => 196, "Aring" => 197, "AElig" => 198, "Ccedil" => 199, "Egrave" => 200, "Eacute" => 201, "Ecirc" => 202, "Euml" => 203, "Igrave" => 204, "Iacute" => 205, "Icirc" => 206, "Iuml" => 207, "ETH" => 208, "Ntilde" => 209, "Ograve" => 210, "Oacute" => 211, "Ocirc" => 212, "Otilde" => 213, "Ouml" => 214, "times" => 215, "Oslash" => 216, "Ugrave" => 217, "Uacute" => 218, "Ucirc" => 219, "Uuml" => 220, "Yacute" => 221, "THORN" => 222, "szlig" => 223, "agrave" => 224, "aacute" => 225, "acirc" => 226, "atilde" => 227, "auml" => 228, "aring" => 229, "aelig" => 230, "ccedil" => 231, "egrave" => 232, "eacute" => 233, "ecirc" => 234, "euml" => 235, "igrave" => 236, "iacute" => 237, "icirc" => 238, "iuml" => 239, "eth" => 240, "ntilde" => 241, "ograve" => 242, "oacute" => 243, "ocirc" => 244, "otilde" => 245, "ouml" => 246, "divide" => 247, "oslash" => 248, "ugrave" => 249, "uacute" => 250, "ucirc" => 251, "uuml" => 252, "yacute" => 253, "thorn" => 254, "yuml" => 255, "fnof" => 402, "Alpha" => 913, "Beta" => 914, "Gamma" => 915, "Delta" => 916, "Epsilon" => 917, "Zeta" => 918, "Eta" => 919, "Theta" => 920, "Iota" => 921, "Kappa" => 922, "Lambda" => 923, "Mu" => 924, "Nu" => 925, "Xi" => 926, "Omicron" => 927, "Pi" => 928, "Rho" => 929, "Sigma" => 931, "Tau" => 932, "Upsilon" => 933, "Phi" => 934, "Chi" => 935, "Psi" => 936, "Omega" => 937, "alpha" => 945, "beta" => 946, "gamma" => 947, "delta" => 948, "epsilon" => 949, "zeta" => 950, "eta" => 951, "theta" => 952, "iota" => 953, "kappa" => 954, "lambda" => 955, "mu" => 956, "nu" => 957, "xi" => 958, "omicron" => 959, "pi" => 960, "rho" => 961, "sigmaf" => 962, "sigma" => 963, "tau" => 964, "upsilon" => 965, "phi" => 966, "chi" => 967, "psi" => 968, "omega" => 969, "thetasym" => 977, "upsih" => 978, "piv" => 982, "bull" => 8226, "hellip" => 8230, "prime" => 8242, "Prime" => 8243, "oline" => 8254, "frasl" => 8260, "weierp" => 8472, "image" => 8465, "real" => 8476, "trade" => 8482, "alefsym" => 8501, "larr" => 8592, "uarr" => 8593, "rarr" => 8594, "darr" => 8595, "harr" => 8596, "crarr" => 8629, "lArr" => 8656, "uArr" => 8657, "rArr" => 8658, "dArr" => 8659, "hArr" => 8660, "forall" => 8704, "part" => 8706, "exist" => 8707, "empty" => 8709, "nabla" => 8711, "isin" => 8712, "notin" => 8713, "ni" => 8715, "prod" => 8719, "sum" => 8721, "minus" => 8722, "lowast" => 8727, "radic" => 8730, "prop" => 8733, "infin" => 8734, "ang" => 8736, "and" => 8743, "or" => 8744, "cap" => 8745, "cup" => 8746, "int" => 8747, "there4" => 8756, "sim" => 8764, "cong" => 8773, "asymp" => 8776, "ne" => 8800, "equiv" => 8801, "le" => 8804, "ge" => 8805, "sub" => 8834, "sup" => 8835, "nsub" => 8836, "sube" => 8838, "supe" => 8839, "oplus" => 8853, "otimes" => 8855, "perp" => 8869, "sdot" => 8901, "lceil" => 8968, "rceil" => 8969, "lfloor" => 8970, "rfloor" => 8971, "lang" => 9001, "rang" => 9002, "loz" => 9674, "spades" => 9824, "clubs" => 9827, "hearts" => 9829, "diams" => 9830, "OElig" => 338, "oelig" => 339, "Scaron" => 352, "scaron" => 353, "Yuml" => 376, "circ" => 710, "tilde" => 732, "ensp" => 8194, "emsp" => 8195, "thinsp" => 8201, "zwnj" => 8204, "zwj" => 8205, "lrm" => 8206, "rlm" => 8207, "ndash" => 8211, "mdash" => 8212, "lsquo" => 8216, "rsquo" => 8217, "sbquo" => 8218, "ldquo" => 8220, "rdquo" => 8221, "bdquo" => 8222, "dagger" => 8224, "Dagger" => 8225, "permil" => 8240, "lsaquo" => 8249, "rsaquo" => 8250, "euro" => 8364 } def unknown_entityref(ref) # hack to avoid considering ­, as it is misused by some blog software (dotclear2) # see http://www.cs.tut.fi/~jkorpela/shy.html if ref == 'shy' handle_data('') elsif HTML_ENTITIES.has_key?(ref) handle_data([HTML_ENTITIES[ref]].pack('U*')) else handle_data(ref) end end end end ruby-feedparser-0.9.2/lib/feedparser/filesizes.rb0000644000175000017500000000053012202272501022232 0ustar terceiroterceiroclass Integer def to_human_readable n = self if n < 1024 return "#{n} B" elsif n >= 1024 and n < 1024*1024 return "%.1f KB" % (n.to_f / 1024) elsif n >= 1024*1024 and n < 1024*1024*1024 return "%.1f MB" % (n.to_f / (1024*1024)) else return "%.1f GB" % (n.to_f / (1024*1024*1024)) end end end ruby-feedparser-0.9.2/lib/feedparser/text-output.rb0000644000175000017500000000530412207247375022603 0ustar terceiroterceirorequire 'feedparser/html2text-parser' require 'feedparser/filesizes' class String # Convert an HTML text to plain text def html2text(wrapto = false) text = self.clone # parse HTML p = FeedParser::HTML2TextParser::new(true) p.feed(text) p.close text = p.savedata # remove leading and trailing whilespace text.gsub!(/\A\s*/m, '') text.gsub!(/\s*\Z/m, '') # remove whitespace around \n text.gsub!(/ *\n/m, "\n") text.gsub!(/\n */m, "\n") # and duplicates \n text.gsub!(/\n\n+/m, "\n\n") # and remove duplicated whitespace text.gsub!(/[ \t]+/, ' ') # finally, wrap the text if requested return wrap_text(text, wrapto) if wrapto text end def wrap_text(text, wrapto = 72) text.gsub(/(.{1,#{wrapto}})( +|$)\n?/, "\\1\\2\n") end end module FeedParser class Feed def to_text(localtime = true, wrapto = false) s = '' s += "Type: #{@type}\n" s += "Encoding: #{@encoding}\n" s += "Title: #{@title}\n" s += "Link: #{@link}\n" if @description s += "Description: #{@description.html2text}\n" else s += "Description:\n" end s += "Creator: #{@creator}\n" s += "\n" @items.each do |i| s += '*' * 40 + "\n" s += i.to_text(localtime, wrapto) end s end end class FeedItem def to_text(localtime = true, wrapto = false, header = true) s = "" if header s += "Item: " s += @title if @title s += "\n<#{link}>" if link if @date if localtime s += "\nDate: #{@date.to_s}" else s += "\nDate: #{@date.getutc.to_s}" end end s += "\n" else s += "<#{link}>\n\n" if link end s += "#{@content.html2text(wrapto).chomp}\n" if @content if @enclosures and @enclosures.length > 0 s += "\nFiles:" @enclosures.each do |e| s += "\n #{e[0]} (#{e[1].to_i.to_human_readable}, #{e[2]})" end end if not header s += "-- " end s += "\nFeed: " s += @feed.title if @feed.title s += "\n<#{@feed.link}>" if @feed.link if not header s += "\nItem: " s += @title if @title s += "\n<#{link}>" if link if @date if localtime s += "\nDate: #{@date.to_s}" else s += "\nDate: #{@date.getutc.to_s}" end end end s += "\nAuthor: #{creator}" if creator s += "\nSubject: #{@subject}" if @subject s += "\nFiled under: #{@categories.join(', ')}" unless @categories.empty? s += "\n" # final newline, for compat with history s end end end ruby-feedparser-0.9.2/lib/feedparser/rexml_patch.rb0000644000175000017500000000107112202272501022544 0ustar terceiroterceirorequire 'feedparser/textconverters' # Patch for REXML # Very ugly patch to make REXML error-proof. # The problem is REXML uses IConv, which isn't error-proof at all. # With those changes, it uses unpack/pack with some error handling module REXML module Encoding def decode(str) return str.toUTF8(@encoding) end def encode(str) return str end def encoding=(enc) return if defined? @encoding and enc == @encoding @encoding = enc || 'utf-8' end end class Element def children @children end end end ruby-feedparser-0.9.2/lib/feedparser/sgml-parser.rb0000644000175000017500000001746212202525137022513 0ustar terceiroterceiro# A parser for SGML, using the derived class as static DTD. # from http://raa.ruby-lang.org/project/html-parser module FeedParser class SGMLParser # Regular expressions used for parsing: Interesting = /[&<]/ Incomplete = Regexp.compile('&([a-zA-Z][a-zA-Z0-9]*|#[0-9]*)?|' + '<([a-zA-Z][^<>]*|/([a-zA-Z][^<>]*)?|' + '![^<>]*)?') Entityref = /&([a-zA-Z][-.a-zA-Z0-9]*);/ Charref = /&#([0-9]+);/ Starttagopen = /<[>a-zA-Z]/ Endtagopen = /<\/[<>a-zA-Z]/ Endbracket = /[<>]/ Special = /]*>/ Commentopen = /\n" s += i.to_html(localtime) end s += "\n\n" s end end class FeedItem def to_html_with_headers(localtime = true) s = <<-EOF EOF s += to_html(localtime) s += "\n\n" s end def to_html(localtime = true) s = <<-EOF
EOF r = "" r += "\n" if @feed.link if @feed.title r += "#{@feed.title.escape_html}\n" elsif @feed.link r += "#{@feed.link.escape_html}\n" else r += "Unnamed feed\n" end r += "\n" if @feed.link headline = "\n" s += (headline % ["Feed:", r]) r = "" r += "" if link if @title r += "#{@title.escape_html}\n" elsif link r += "#{link.escape_html}\n" end r += "\n" if link s += (headline % ["Item:", r]) s += "
%s%s
\n" s += "\n" if @content and @content !~ /\A\s* 0 s += <<-EOF
EOF s += '' s += "\n" @enclosures.each do |e| s += "\n" end s += "
Files:
   #{e[0].split('/')[-1]} (#{e[1].to_i.to_human_readable}, #{e[2]})
\n" end s += "\n
\n" s += '' + "\n" l = '' + "\n" if @date if localtime s += l % [ 'Date:', @date.to_s ] else s += l % [ 'Date:', @date.getutc.to_s ] end end s += l % [ 'Author:', creator.escape_html ] if creator s += l % [ 'Subject:', @subject.escape_html ] if @subject s += l % [ 'Filed under:', @categories.join(', ').escape_html ] unless @categories.empty? s += "
%s  %s
\n" s end end end ruby-feedparser-0.9.2/lib/feedparser/textconverters.rb0000644000175000017500000000624512202522114023343 0ustar terceiroterceiro# for URI::regexp require 'uri' require 'feedparser/html2text-parser' # This class provides various converters class String # is this text HTML ? search for tags. used by String#text2html def html? return (self =~ /

/i) || (self =~ /<\/p>/i) || (self =~ /
/i) || (self =~ //i) || (self =~ /<\/a>/i) || (self =~ //i) end # returns true if the text contains escaped HTML (with HTML entities). used by String#text2html def escaped_html? return (self =~ /<img src=/i) || (self =~ /<a href=/i) || (self =~ /<br(\/| \/|)>/i) || (self =~ /<p>/i) end def escape_html r = self.gsub('&', '&') r = r.gsub('<', '<') r = r.gsub('>', '>') r end MY_ENTITIES = {} FeedParser::HTML2TextParser::entities.each do |k, v| MY_ENTITIES["&#{k};"] = [v].pack('U*') MY_ENTITIES["&##{v};"] = [v].pack('U*') end # un-escape HTML in the text. used by String#text2html def unescape_html r = self MY_ENTITIES.each do |k, v| r = r.gsub(k, v) end r end # convert text to HTML def text2html(feed) text = self.clone realhtml = text.html? eschtml = text.escaped_html? # fix for RSS feeds with both real and escaped html (crazy!): # we take the first one if (realhtml && eschtml) if (realhtml < eschtml) eschtml = nil else realhtml = nil end end if realhtml # do nothing elsif eschtml text = text.unescape_html else # paragraphs text.gsub!(/\A\s*(.*)\Z/m, '

\1

') text.gsub!(/\s*\n(\s*\n)+\s*/, "

\n

") # uris text.gsub!(/([^'"])(#{URI::regexp(['http','ftp','https'])})/, '\1\2') end # Handle broken hrefs in and if feed and feed.link text.gsub!(/(\s(src|href)=['"])([^'"]*)(['"])/) do |m| begin first, url, last = $1, $3, $4 if (url =~ /^\s*\w+:\/\//) or (url =~ /^\s*\w+:\w/) m elsif url =~ /^\// (first + feed.link.split(/\//)[0..2].join('/') + url + last) else t = feed.link.split(/\//) if t.length == 3 # http://toto with no trailing / (first + feed.link + '/' + url + last) else if feed.link =~ /\/$/ (first + feed.link + url + last) else (first + t[0...-1].join('/') + '/' + url + last) end end end rescue m end end end text end # Remove white space around the text def rmWhiteSpace! return self.gsub!(/\A\s*/m, '').gsub!(/\s*\Z/m,'') end # Convert a text in inputenc to a text in UTF8 # must take care of wrong input locales def toUTF8(inputenc) if inputenc.downcase != 'utf-8' # it is said it is not UTF-8. Ensure it is REALLY not UTF-8 begin if self.unpack('U*').pack('U*') == self return self end rescue # do nothing end begin return self.unpack('C*').pack('U*') rescue return self #failsafe solution. but a dirty one :-) end else return self end end end ruby-feedparser-0.9.2/lib/feedparser.rb0000644000175000017500000000146312202272501020243 0ustar terceiroterceiro# =Ruby-feedparser - ATOM/RSS feed parser for Ruby # License:: Ruby's license (see the LICENSE file) or GNU GPL, at your option. # Website::http://home.gna.org/ruby-feedparser/ # # ==Introduction # # Ruby-Feedparser is an RSS and Atom parser for Ruby. # Ruby-feedparser is : # * based on REXML # * built for robustness : most feeds are not valid, a parser can't ignore that # * fully unit-tested # * easy to use (it can output text or HTML easily) # # ==Example # require 'net/http' # require 'feedparser' # require 'uri' # s = Net::HTTP::get URI::parse('http://rss.slashdot.org/Slashdot/slashdot') # f = FeedParser::Feed::new(s) # f.title # => "Slashdot" # f.items.each { |i| puts i.title } # [...] # require 'feedparser/html-output' # f.items.each { |i| puts i.to_html } # require 'feedparser/feedparser'