buftok-0.2.0/0000755000076400007640000000000012323137232012035 5ustar pravipravibuftok-0.2.0/LICENSE.md0000644000076400007640000000470612323137232013450 0ustar pravipraviRuby is copyrighted free software by Yukihiro Matsumoto . You can redistribute it and/or modify it under either the terms of the 2-clause BSDL (see the file BSDL), or the conditions below: 1. You may make and give away verbatim copies of the source form of the software without restriction, provided that you duplicate all of the original copyright notices and associated disclaimers. 2. You may modify your copy of the software in any way, provided that you do at least ONE of the following: a) place your modifications in the Public Domain or otherwise make them Freely Available, such as by posting said modifications to Usenet or an equivalent medium, or by allowing the author to include your modifications in the software. b) use the modified software only within your corporation or organization. c) give non-standard binaries non-standard names, with instructions on where to get the original software distribution. d) make other distribution arrangements with the author. 3. You may distribute the software in object code or binary form, provided that you do at least ONE of the following: a) distribute the binaries and library files of the software, together with instructions (in the manual page or equivalent) on where to get the original distribution. b) accompany the distribution with the machine-readable source of the software. c) give non-standard binaries non-standard names, with instructions on where to get the original software distribution. d) make other distribution arrangements with the author. 4. You may modify and include the part of the software into any other software (possibly commercial). But some files in the distribution are not written by the author, so that they are not under these terms. For the list of those files and their copying conditions, see the file LEGAL. 5. The scripts and library files supplied as input to or produced as output from the software do not automatically fall under the copyright of the software, but belong to whomever generated them, and may be sold commercially, and may be aggregated with this software. 6. THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. buftok-0.2.0/CONTRIBUTING.md0000644000076400007640000000372312323137232014273 0ustar pravipravi## Contributing In the spirit of [free software][free-sw], **everyone** is encouraged to help improve this project. Here are some ways *you* can contribute: [free-sw]: http://www.fsf.org/licensing/essays/free-sw.html * Use alpha, beta, and pre-release versions. * Report bugs. * Suggest new features. * Write or edit documentation. * Write specifications. * Write code (**no patch is too small**: fix typos, add comments, clean up inconsistent whitespace). * Refactor code. * Fix [issues][]. * Review patches. [issues]: https://github.com/sferik/buftok/issues ## Submitting an Issue We use the [GitHub issue tracker][issues] to track bugs and features. Before submitting a bug report or feature request, check to make sure it hasn't already been submitted. When submitting a bug report, please include a [Gist][] that includes a stack trace and any details that may be necessary to reproduce the bug, including your gem version, Ruby version, and operating system. Ideally, a bug report should include a pull request with failing specs. [gist]: https://gist.github.com/ ## Submitting a Pull Request 1. [Fork the repository.][fork] 2. [Create a topic branch.][branch] 3. Add specs for your unimplemented feature or bug fix. 4. Run `bundle exec rake spec`. If your specs pass, return to step 3. 5. Implement your feature or bug fix. 6. Run `bundle exec rake spec`. If your specs fail, return to step 5. 7. Run `open coverage/index.html`. If your changes are not completely covered by your tests, return to step 3. 8. Run `RUBYOPT=W2 bundle exec rake spec 2>&1 | grep buftok`. If your changes produce any warnings, return to step 5. 9. Add documentation for your feature or bug fix. 10. Run `bundle exec rake yard`. If your changes are not 100% documented, go back to step 9. 11. Commit and push your changes. 12. [Submit a pull request.][pr] [fork]: http://help.github.com/fork-a-repo/ [branch]: http://learn.github.com/p/branching.html [pr]: http://help.github.com/send-pull-requests/ buftok-0.2.0/README.md0000644000076400007640000000367612323137232013330 0ustar pravipravi# BufferedTokenizer [![Gem Version](https://badge.fury.io/rb/buftok.png)][gem] [![Build Status](https://travis-ci.org/sferik/buftok.png?branch=master)][travis] [![Dependency Status](https://gemnasium.com/sferik/buftok.png?travis)][gemnasium] [![Code Climate](https://codeclimate.com/github/sferik/buftok.png)][codeclimate] [gem]: https://rubygems.org/gems/buftok [travis]: https://travis-ci.org/sferik/buftok [gemnasium]: https://gemnasium.com/sferik/buftok [codeclimate]: https://codeclimate.com/github/sferik/buftok ###### Statefully split input data by a specifiable token BufferedTokenizer takes a delimiter upon instantiation, or acts line-based by default. It allows input to be spoon-fed from some outside source which receives arbitrary length datagrams which may-or-may-not contain the token by which entities are delimited. In this respect it's ideally paired with something like [EventMachine][]. [EventMachine]: http://rubyeventmachine.com/ ## Supported Ruby Versions This library aims to support and is [tested against][travis] the following Ruby implementations: * Ruby 1.8.7 * Ruby 1.9.2 * Ruby 1.9.3 * Ruby 2.0.0 If something doesn't work on one of these interpreters, it's a bug. This library may inadvertently work (or seem to work) on other Ruby implementations, however support will only be provided for the versions listed above. If you would like this library to support another Ruby version, you may volunteer to be a maintainer. Being a maintainer entails making sure all tests run and pass on that implementation. When something breaks on your implementation, you will be responsible for providing patches in a timely fashion. If critical issues for a particular implementation exist at the time of a major release, support for that Ruby version may be dropped. ## Copyright Copyright (c) 2006-2013 Tony Arcieri, Martin Emde, Erik Michaels-Ober. Distributed under the [Ruby license][license]. [license]: http://www.ruby-lang.org/en/LICENSE.txt buftok-0.2.0/test/0000755000076400007640000000000012323137232013014 5ustar pravipravibuftok-0.2.0/test/test_buftok.rb0000644000076400007640000000162112323137232015672 0ustar pravipravirequire 'test/unit' require 'buftok' class TestBuftok < Test::Unit::TestCase def test_buftok tokenizer = BufferedTokenizer.new assert_equal %w[foo], tokenizer.extract("foo\nbar".freeze) assert_equal %w[barbaz qux], tokenizer.extract("baz\nqux\nquu".freeze) assert_equal 'quu', tokenizer.flush assert_equal '', tokenizer.flush end def test_delimiter tokenizer = BufferedTokenizer.new('<>') assert_equal ['', "foo\n"], tokenizer.extract("<>foo\n<>".freeze) assert_equal %w[bar], tokenizer.extract('bar<>baz'.freeze) assert_equal 'baz', tokenizer.flush end def test_split_delimiter tokenizer = BufferedTokenizer.new('<>'.freeze) assert_equal [], tokenizer.extract('foo<'.freeze) assert_equal %w[foo], tokenizer.extract('>bar<'.freeze) assert_equal %w[barqux<>'.freeze) assert_equal '', tokenizer.flush end end buftok-0.2.0/Gemfile0000644000076400007640000000007612323137232013333 0ustar pravipravisource 'https://rubygems.org' gem 'rake' gem 'rdoc' gemspec buftok-0.2.0/metadata.yml0000644000076400007640000000315412323137232014343 0ustar pravipravi--- !ruby/object:Gem::Specification name: buftok version: !ruby/object:Gem::Version version: 0.2.0 prerelease: platform: ruby authors: - Tony Arcieri - Martin Emde - Erik Michaels-Ober autorequire: bindir: bin cert_chain: [] date: 2013-11-22 00:00:00.000000000 Z dependencies: - !ruby/object:Gem::Dependency name: bundler requirement: !ruby/object:Gem::Requirement none: false requirements: - - ~> - !ruby/object:Gem::Version version: '1.0' type: :development prerelease: false version_requirements: !ruby/object:Gem::Requirement none: false requirements: - - ~> - !ruby/object:Gem::Version version: '1.0' description: BufferedTokenizer extracts token delimited entities from a sequence of arbitrary inputs email: sferik@gmail.com executables: [] extensions: [] extra_rdoc_files: [] files: - CONTRIBUTING.md - Gemfile - LICENSE.md - README.md - Rakefile - buftok.gemspec - lib/buftok.rb - test/test_buftok.rb homepage: https://github.com/sferik/buftok licenses: - MIT post_install_message: rdoc_options: [] require_paths: - lib required_ruby_version: !ruby/object:Gem::Requirement none: false requirements: - - ! '>=' - !ruby/object:Gem::Version version: '0' required_rubygems_version: !ruby/object:Gem::Requirement none: false requirements: - - ! '>=' - !ruby/object:Gem::Version version: 1.3.5 requirements: [] rubyforge_project: rubygems_version: 1.8.23 signing_key: specification_version: 3 summary: BufferedTokenizer extracts token delimited entities from a sequence of arbitrary inputs test_files: - test/test_buftok.rb has_rdoc: buftok-0.2.0/lib/0000755000076400007640000000000012323137232012603 5ustar pravipravibuftok-0.2.0/lib/buftok.rb0000644000076400007640000000420712323137232014425 0ustar pravipravi# BufferedTokenizer takes a delimiter upon instantiation, or acts line-based # by default. It allows input to be spoon-fed from some outside source which # receives arbitrary length datagrams which may-or-may-not contain the token # by which entities are delimited. In this respect it's ideally paired with # something like EventMachine (http://rubyeventmachine.com/). class BufferedTokenizer # New BufferedTokenizers will operate on lines delimited by a delimiter, # which is by default the global input delimiter $/ ("\n"). # # The input buffer is stored as an array. This is by far the most efficient # approach given language constraints (in C a linked list would be a more # appropriate data structure). Segments of input data are stored in a list # which is only joined when a token is reached, substantially reducing the # number of objects required for the operation. def initialize(delimiter = $/) @delimiter = delimiter @input = [] @tail = '' @trim = @delimiter.length - 1 end # Extract takes an arbitrary string of input data and returns an array of # tokenized entities, provided there were any available to extract. This # makes for easy processing of datagrams using a pattern like: # # tokenizer.extract(data).map { |entity| Decode(entity) }.each do ... # # Using -1 makes split to return "" if the token is at the end of # the string, meaning the last element is the start of the next chunk. def extract(data) if @trim > 0 tail_end = @tail.slice!(-@trim, @trim) # returns nil if string is too short data = tail_end + data if tail_end end @input << @tail entities = data.split(@delimiter, -1) @tail = entities.shift unless entities.empty? @input << @tail entities.unshift @input.join @input.clear @tail = entities.pop end entities end # Flush the contents of the input buffer, i.e. return the input buffer even though # a token has not yet been encountered def flush @input << @tail buffer = @input.join @input.clear @tail = "" # @tail.clear is slightly faster, but not supported on 1.8.7 buffer end end buftok-0.2.0/buftok.gemspec0000644000076400007640000000150412323137232014674 0ustar pravipraviGem::Specification.new do |spec| spec.add_development_dependency 'bundler', '~> 1.0' spec.authors = ["Tony Arcieri", "Martin Emde", "Erik Michaels-Ober"] spec.description = %q{BufferedTokenizer extracts token delimited entities from a sequence of arbitrary inputs} spec.email = "sferik@gmail.com" spec.files = %w(CONTRIBUTING.md Gemfile LICENSE.md README.md Rakefile buftok.gemspec) spec.files += Dir.glob("lib/**/*.rb") spec.files += Dir.glob("test/**/*.rb") spec.test_files = spec.files.grep(%r{^test/}) spec.homepage = "https://github.com/sferik/buftok" spec.licenses = ['MIT'] spec.name = "buftok" spec.require_paths = ["lib"] spec.required_rubygems_version = '>= 1.3.5' spec.summary = spec.description spec.version = "0.2.0" end buftok-0.2.0/Rakefile0000644000076400007640000000314612323137232013506 0ustar pravipravirequire 'bundler' require 'rdoc/task' require 'rake/testtask' task :default => :test Bundler::GemHelper.install_tasks RDoc::Task.new do |task| task.rdoc_dir = 'doc' task.title = 'BufferedTokenizer' task.rdoc_files.include('lib/**/*.rb') end Rake::TestTask.new :test do |t| t.libs << 'lib' t.test_files = FileList['test/**/*.rb'] end desc "Benchmark the current implementation" task :bench do require 'benchmark' require File.expand_path('lib/buftok', File.dirname(__FILE__)) n = 50000 delimiter = "\n\n" frequency1 = 1000 puts "generating #{n} strings, with #{delimiter.inspect} every #{frequency1} strings..." data1 = (0...n).map do |i| (((i % frequency1 == 1) ? "\n" : "") + ("s" * i) + ((i % frequency1 == 0) ? "\n" : "")).freeze end frequency2 = 10 puts "generating #{n} strings, with #{delimiter.inspect} every #{frequency2} strings..." data2 = (0...n).map do |i| (((i % frequency2 == 1) ? "\n" : "") + ("s" * i) + ((i % frequency2 == 0) ? "\n" : "")).freeze end Benchmark.bmbm do |x| x.report("1 char, freq: #{frequency1}") do bt1 = BufferedTokenizer.new n.times { |i| bt1.extract(data1[i]) } end x.report("2 char, freq: #{frequency1}") do bt2 = BufferedTokenizer.new(delimiter) n.times { |i| bt2.extract(data1[i]) } end x.report("1 char, freq: #{frequency2}") do bt3 = BufferedTokenizer.new n.times { |i| bt3.extract(data2[i]) } end x.report("2 char, freq: #{frequency2}") do bt4 = BufferedTokenizer.new(delimiter) n.times { |i| bt4.extract(data2[i]) } end end end