gitlab-sidekiq-fetcher-0.8.0/0000755000004100000410000000000014030071153016050 5ustar www-datawww-datagitlab-sidekiq-fetcher-0.8.0/Gemfile.lock0000644000004100000410000000173714030071153020302 0ustar www-datawww-dataGEM remote: https://rubygems.org/ specs: coderay (1.1.2) connection_pool (2.2.3) diff-lcs (1.3) docile (1.3.1) json (2.1.0) method_source (0.9.0) pry (0.11.3) coderay (~> 1.1.0) method_source (~> 0.9.0) rack (2.2.3) redis (4.2.1) rspec (3.8.0) rspec-core (~> 3.8.0) rspec-expectations (~> 3.8.0) rspec-mocks (~> 3.8.0) rspec-core (3.8.0) rspec-support (~> 3.8.0) rspec-expectations (3.8.1) diff-lcs (>= 1.2.0, < 2.0) rspec-support (~> 3.8.0) rspec-mocks (3.8.0) diff-lcs (>= 1.2.0, < 2.0) rspec-support (~> 3.8.0) rspec-support (3.8.0) sidekiq (6.1.0) connection_pool (>= 2.2.2) rack (~> 2.0) redis (>= 4.2.0) simplecov (0.16.1) docile (~> 1.1) json (>= 1.8, < 3) simplecov-html (~> 0.10.0) simplecov-html (0.10.2) PLATFORMS ruby DEPENDENCIES pry rspec (~> 3) sidekiq (~> 6.1) simplecov BUNDLED WITH 1.17.2 gitlab-sidekiq-fetcher-0.8.0/RELEASE-GITLAB.md0000644000004100000410000000043214030071153020411 0ustar www-datawww-datagitlab-sidekiq-fetcher ====================== # How to publish a new release? 1. Dev-commit cycle 2. Update the version in the gemspec file, commit and tag 3. Build the gem: `gem build gitlab-sidekiq-fetcher.gemspec` 4. Upload the gem: `gem push gitlab-sidekiq-fetcher-X.X.X.gem` gitlab-sidekiq-fetcher-0.8.0/.rspec0000644000004100000410000000002614030071153017163 0ustar www-datawww-data--require spec_helper gitlab-sidekiq-fetcher-0.8.0/README.md0000644000004100000410000000706314030071153017335 0ustar www-datawww-datagitlab-sidekiq-fetcher ====================== `gitlab-sidekiq-fetcher` is an extension to Sidekiq that adds support for reliable fetches from Redis. It's based on https://github.com/TEA-ebook/sidekiq-reliable-fetch. **IMPORTANT NOTE:** Since version `0.7.0` this gem works only with `sidekiq >= 6.1` (which introduced Fetch API breaking changes). Please use version `~> 0.5` if you use older version of the `sidekiq` . **UPGRADE NOTE:** If upgrading from 0.7.0, strongly consider a full deployed step on 0.7.1 before 0.8.0; that fixes a bug in the queue name validation that will hit if sidekiq nodes running 0.7.0 see working queues named by 0.8.0. See https://gitlab.com/gitlab-org/sidekiq-reliable-fetch/-/merge_requests/22 There are two strategies implemented: [Reliable fetch](http://redis.io/commands/rpoplpush#pattern-reliable-queue) using `rpoplpush` command and semi-reliable fetch that uses regular `brpop` and `lpush` to pick the job and put it to working queue. The main benefit of "Reliable" strategy is that `rpoplpush` is atomic, eliminating a race condition in which jobs can be lost. However, it comes at a cost because `rpoplpush` can't watch multiple lists at the same time so we need to iterate over the entire queue list which significantly increases pressure on Redis when there are more than a few queues. The "semi-reliable" strategy is much more reliable than the default Sidekiq fetcher, though. Compared to the reliable fetch strategy, it does not increase pressure on Redis significantly. ### Interruption handling Sidekiq expects any job to report succcess or to fail. In the last case, Sidekiq puts `retry_count` counter into the job and keeps to re-run the job until the counter reched the maximum allowed value. When the job has not been given a chance to finish its work(to report success or fail), for example, when it was killed forcibly or when the job was requeued, after receiving TERM signal, the standard retry mechanisme does not get into the game and the job will be retried indefinatelly. This is why Reliable fetcher maintains a special counter `interrupted_count` which is used to limit the amount of such retries. In both cases, Reliable Fetcher increments counter `interrupted_count` and rejects the job from running again when the counter exceeds `max_retries_after_interruption` times (default: 3 times). Such a job will be put to `interrupted` queue. This queue mostly behaves as Sidekiq Dead queue so it only stores a limited amount of jobs for a limited term. Same as for Dead queue, all the limits are configurable via `interrupted_max_jobs` (default: 10_000) and `interrupted_timeout_in_seconds` (default: 3 months) Sidekiq option keys. You can also disable special handling of interrupted jobs by setting `max_retries_after_interruption` into `-1`. In this case, interrupted jobs will be run without any limits from Reliable Fetcher and they won't be put into Interrupted queue. ## Installation Add the following to your `Gemfile`: ```ruby gem 'gitlab-sidekiq-fetcher', require: 'sidekiq-reliable-fetch' ``` ## Configuration Enable reliable fetches by calling this gem from your Sidekiq configuration: ```ruby Sidekiq.configure_server do |config| Sidekiq::ReliableFetch.setup_reliable_fetch!(config) # … end ``` There is an additional parameter `config.options[:semi_reliable_fetch]` you can use to switch between two strategies: ```ruby Sidekiq.configure_server do |config| config.options[:semi_reliable_fetch] = true # Default value is false Sidekiq::ReliableFetch.setup_reliable_fetch!(config) end ``` ## License LGPL-3.0, see the LICENSE file. gitlab-sidekiq-fetcher-0.8.0/tests/0000755000004100000410000000000014030071153017212 5ustar www-datawww-datagitlab-sidekiq-fetcher-0.8.0/tests/README.md0000644000004100000410000000231114030071153020466 0ustar www-datawww-data# How to run reliability tests ``` cd reliability_test bundle exec ruby reliability_test.rb ``` You can adjust some parameters of the test in the `config.rb`. JOB_FETCHER can be set to one of these values: `semi`, `reliable`, `basic` You need to have redis server running on default HTTP port `6379`. To use other HTTP port, you can define `REDIS_URL` environment varible with the port you need(example: `REDIS_URL="redis://localhost:9999"`). ## How it works This tool spawns configured number of Sidekiq workers and when the amount of processed jobs is about half of origin number it will kill all the workers with `kill -9` and then it will spawn new workers again until all the jobs are processed. To track the process and counters we use Redis keys/counters. # How to run interruption tests ``` cd tests/interruption # Verify "KILL" signal bundle exec ruby test_kill_signal.rb # Verify "TERM" signal bundle exec ruby test_term_signal.rb ``` It requires Redis to be running on 6379 port. ## How it works It spawns Sidekiq workers then creates a job that will kill itself after a moment. The reliable fetcher will bring it back. The purpose is to verify that job is run no more then allowed number of times. gitlab-sidekiq-fetcher-0.8.0/tests/interruption/0000755000004100000410000000000014030071153021754 5ustar www-datawww-datagitlab-sidekiq-fetcher-0.8.0/tests/interruption/worker.rb0000644000004100000410000000042614030071153023614 0ustar www-datawww-data# frozen_string_literal: true class RetryTestWorker include Sidekiq::Worker def perform(signal = 'KILL', wait_seconds = 1) Sidekiq.redis do |redis| redis.incr('times_has_been_run') end Process.kill(signal, Process.pid) sleep wait_seconds end end gitlab-sidekiq-fetcher-0.8.0/tests/interruption/test_term_signal.rb0000755000004100000410000000110114030071153025640 0ustar www-datawww-data# frozen_string_literal: true require 'sidekiq' require_relative 'config' require_relative '../support/utils' EXPECTED_NUM_TIMES_BEEN_RUN = 3 NUM_WORKERS = EXPECTED_NUM_TIMES_BEEN_RUN + 1 Sidekiq.redis(&:flushdb) pids = spawn_workers(NUM_WORKERS) RetryTestWorker.perform_async('TERM', 60) sleep 300 Sidekiq.redis do |redis| times_has_been_run = redis.get('times_has_been_run').to_i assert 'The job has been run', times_has_been_run, EXPECTED_NUM_TIMES_BEEN_RUN end assert 'Found interruption exhausted jobs', Sidekiq::InterruptedSet.new.size, 1 stop_workers(pids) gitlab-sidekiq-fetcher-0.8.0/tests/interruption/test_kill_signal.rb0000755000004100000410000000106514030071153025635 0ustar www-datawww-data# frozen_string_literal: true require 'sidekiq' require_relative 'config' require_relative '../support/utils' EXPECTED_NUM_TIMES_BEEN_RUN = 3 NUM_WORKERS = EXPECTED_NUM_TIMES_BEEN_RUN + 1 Sidekiq.redis(&:flushdb) pids = spawn_workers(NUM_WORKERS) RetryTestWorker.perform_async sleep 300 Sidekiq.redis do |redis| times_has_been_run = redis.get('times_has_been_run').to_i assert 'The job has been run', times_has_been_run, EXPECTED_NUM_TIMES_BEEN_RUN end assert 'Found interruption exhausted jobs', Sidekiq::InterruptedSet.new.size, 1 stop_workers(pids) gitlab-sidekiq-fetcher-0.8.0/tests/interruption/config.rb0000644000004100000410000000110114030071153023537 0ustar www-datawww-data# frozen_string_literal: true require_relative '../../lib/sidekiq-reliable-fetch' require_relative 'worker' TEST_CLEANUP_INTERVAL = 20 TEST_LEASE_INTERVAL = 5 Sidekiq.configure_server do |config| config.options[:semi_reliable_fetch] = true # We need to override these parameters to not wait too long # The default values are good for production use only # These will be ignored for :basic config.options[:cleanup_interval] = TEST_CLEANUP_INTERVAL config.options[:lease_interval] = TEST_LEASE_INTERVAL Sidekiq::ReliableFetch.setup_reliable_fetch!(config) end gitlab-sidekiq-fetcher-0.8.0/tests/support/0000755000004100000410000000000014030071153020726 5ustar www-datawww-datagitlab-sidekiq-fetcher-0.8.0/tests/support/utils.rb0000644000004100000410000000072014030071153022412 0ustar www-datawww-datadef assert(text, actual, expected) if actual == expected puts "#{text}: #{actual} (Success)" else puts "#{text}: #{actual} (Failed). Expected: #{expected}" exit 1 end end def spawn_workers(number) pids = [] number.times do pids << spawn('sidekiq -q default -q high -q low -r ./config.rb') end pids end # Stop Sidekiq workers def stop_workers(pids) pids.each do |pid| Process.kill('KILL', pid) Process.wait pid end end gitlab-sidekiq-fetcher-0.8.0/tests/reliability/0000755000004100000410000000000014030071153021523 5ustar www-datawww-datagitlab-sidekiq-fetcher-0.8.0/tests/reliability/worker.rb0000644000004100000410000000043614030071153023364 0ustar www-datawww-data# frozen_string_literal: true class ReliabilityTestWorker include Sidekiq::Worker def perform # To mimic long running job and to increase the probability of losing the job sleep 1 Sidekiq.redis do |redis| redis.lpush(REDIS_FINISHED_LIST, jid) end end end gitlab-sidekiq-fetcher-0.8.0/tests/reliability/reliability_test.rb0000755000004100000410000000437514030071153025434 0ustar www-datawww-data# frozen_string_literal: true require 'sidekiq' require 'sidekiq/util' require 'sidekiq/cli' require_relative 'config' def spawn_workers_and_stop_them_on_a_half_way pids = spawn_workers wait_until do |queue_size| queue_size < NUMBER_OF_JOBS / 2 end first_half_pids, second_half_pids = split_array(pids) puts 'Killing half of the workers...' signal_to_workers('KILL', first_half_pids) puts 'Stopping another half of the workers...' signal_to_workers('TERM', second_half_pids) end def spawn_workers_and_let_them_finish puts 'Spawn workers and let them finish...' pids = spawn_workers wait_until do |queue_size| queue_size.zero? end if %i[semi reliable].include? JOB_FETCHER puts 'Waiting for clean up process that will requeue dead jobs...' sleep WAIT_CLEANUP end signal_to_workers('TERM', pids) end def wait_until loop do sleep 3 queue_size = current_queue_size puts "Jobs in the queue:#{queue_size}" break if yield(queue_size) end end def signal_to_workers(signal, pids) pids.each { |pid| Process.kill(signal, pid) } pids.each { |pid| Process.wait(pid) } end def spawn_workers pids = [] NUMBER_OF_WORKERS.times do pids << spawn('sidekiq -q default -q low -q high -r ./config.rb') end pids end def current_queue_size Sidekiq.redis { |c| c.llen('queue:default') } end def duplicates Sidekiq.redis { |c| c.llen(REDIS_FINISHED_LIST) } end # Splits array into two halves def split_array(arr) first_arr = arr.take(arr.size / 2) second_arr = arr - first_arr [first_arr, second_arr] end ########################################################## puts '########################################' puts "Mode: #{JOB_FETCHER}" puts '########################################' Sidekiq.redis(&:flushdb) jobs = [] NUMBER_OF_JOBS.times do jobs << ReliabilityTestWorker.perform_async end puts "Queued #{NUMBER_OF_JOBS} jobs" spawn_workers_and_stop_them_on_a_half_way spawn_workers_and_let_them_finish jobs_lost = 0 Sidekiq.redis do |redis| jobs.each do |job| next if redis.lrem(REDIS_FINISHED_LIST, 1, job) == 1 jobs_lost += 1 end end puts "Remaining unprocessed: #{jobs_lost}" puts "Duplicates found: #{duplicates}" if jobs_lost.zero? && duplicates.zero? exit 0 else exit 1 end gitlab-sidekiq-fetcher-0.8.0/tests/reliability/config.rb0000644000004100000410000000200214030071153023307 0ustar www-datawww-data# frozen_string_literal: true require_relative '../../lib/sidekiq-reliable-fetch' require_relative 'worker' REDIS_FINISHED_LIST = 'reliable-fetcher-finished-jids' NUMBER_OF_WORKERS = ENV['NUMBER_OF_WORKERS'] || 10 NUMBER_OF_JOBS = ENV['NUMBER_OF_JOBS'] || 1000 JOB_FETCHER = (ENV['JOB_FETCHER'] || :semi).to_sym # :basic, :semi, :reliable TEST_CLEANUP_INTERVAL = 20 TEST_LEASE_INTERVAL = 5 WAIT_CLEANUP = TEST_CLEANUP_INTERVAL + TEST_LEASE_INTERVAL + Sidekiq::ReliableFetch::HEARTBEAT_LIFESPAN Sidekiq.configure_server do |config| if %i[semi reliable].include?(JOB_FETCHER) config.options[:semi_reliable_fetch] = (JOB_FETCHER == :semi) # We need to override these parameters to not wait too long # The default values are good for production use only # These will be ignored for :basic config.options[:cleanup_interval] = TEST_CLEANUP_INTERVAL config.options[:lease_interval] = TEST_LEASE_INTERVAL Sidekiq::ReliableFetch.setup_reliable_fetch!(config) end end gitlab-sidekiq-fetcher-0.8.0/spec/0000755000004100000410000000000014030071153017002 5ustar www-datawww-datagitlab-sidekiq-fetcher-0.8.0/spec/reliable_fetch_spec.rb0000644000004100000410000000031414030071153023267 0ustar www-datawww-datarequire 'spec_helper' require 'fetch_shared_examples' require 'sidekiq/base_reliable_fetch' require 'sidekiq/reliable_fetch' describe Sidekiq::ReliableFetch do include_examples 'a Sidekiq fetcher' end gitlab-sidekiq-fetcher-0.8.0/spec/semi_reliable_fetch_spec.rb0000644000004100000410000000032514030071153024306 0ustar www-datawww-datarequire 'spec_helper' require 'fetch_shared_examples' require 'sidekiq/base_reliable_fetch' require 'sidekiq/semi_reliable_fetch' describe Sidekiq::SemiReliableFetch do include_examples 'a Sidekiq fetcher' end gitlab-sidekiq-fetcher-0.8.0/spec/fetch_shared_examples.rb0000644000004100000410000001427014030071153023650 0ustar www-datawww-datashared_examples 'a Sidekiq fetcher' do let(:queues) { ['assigned'] } before { Sidekiq.redis(&:flushdb) } describe '#retrieve_work' do let(:job) { Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo']) } let(:fetcher) { described_class.new(queues: queues) } it 'does not clean up orphaned jobs more than once per cleanup interval' do Sidekiq.redis = Sidekiq::RedisConnection.create(url: REDIS_URL, size: 10) expect(fetcher).to receive(:clean_working_queues!).once threads = 10.times.map do Thread.new do fetcher.retrieve_work end end threads.map(&:join) end it 'retrieves by order when strictly order is enabled' do fetcher = described_class.new(strict: true, queues: ['first', 'second']) Sidekiq.redis do |conn| conn.rpush('queue:first', ['msg3', 'msg2', 'msg1']) conn.rpush('queue:second', 'msg4') end jobs = (1..4).map { fetcher.retrieve_work.job } expect(jobs).to eq ['msg1', 'msg2', 'msg3', 'msg4'] end it 'does not starve any queue when queues are not strictly ordered' do fetcher = described_class.new(queues: ['first', 'second']) Sidekiq.redis do |conn| conn.rpush('queue:first', (1..200).map { |i| "msg#{i}" }) conn.rpush('queue:second', 'this_job_should_not_stuck') end jobs = (1..100).map { fetcher.retrieve_work.job } expect(jobs).to include 'this_job_should_not_stuck' end shared_examples "basic queue handling" do |queue| let (:fetcher) { described_class.new(queues: [queue]) } it 'retrieves the job and puts it to working queue' do Sidekiq.redis { |conn| conn.rpush("queue:#{queue}", job) } uow = fetcher.retrieve_work expect(working_queue_size(queue)).to eq 1 expect(uow.queue_name).to eq queue expect(uow.job).to eq job expect(Sidekiq::Queue.new(queue).size).to eq 0 end it 'does not retrieve a job from foreign queue' do Sidekiq.redis { |conn| conn.rpush("'queue:#{queue}:not", job) } expect(fetcher.retrieve_work).to be_nil Sidekiq.redis { |conn| conn.rpush("'queue:not_#{queue}", job) } expect(fetcher.retrieve_work).to be_nil Sidekiq.redis { |conn| conn.rpush("'queue:random_name", job) } expect(fetcher.retrieve_work).to be_nil end it 'requeues jobs from legacy dead working queue with incremented interrupted_count' do Sidekiq.redis do |conn| conn.rpush(legacy_other_process_working_queue_name(queue), job) end expected_job = Sidekiq.load_json(job) expected_job['interrupted_count'] = 1 expected_job = Sidekiq.dump_json(expected_job) uow = fetcher.retrieve_work expect(uow).to_not be_nil expect(uow.job).to eq expected_job Sidekiq.redis do |conn| expect(conn.llen(legacy_other_process_working_queue_name(queue))).to eq 0 end end it 'ignores working queue keys in unknown formats' do # Add a spurious non-numeric char segment at the end; this simulates any other # incorrect form in general malformed_key = "#{other_process_working_queue_name(queue)}:X" Sidekiq.redis do |conn| conn.rpush(malformed_key, job) end uow = fetcher.retrieve_work Sidekiq.redis do |conn| expect(conn.llen(malformed_key)).to eq 1 end end it 'requeues jobs from dead working queue with incremented interrupted_count' do Sidekiq.redis do |conn| conn.rpush(other_process_working_queue_name(queue), job) end expected_job = Sidekiq.load_json(job) expected_job['interrupted_count'] = 1 expected_job = Sidekiq.dump_json(expected_job) uow = fetcher.retrieve_work expect(uow).to_not be_nil expect(uow.job).to eq expected_job Sidekiq.redis do |conn| expect(conn.llen(other_process_working_queue_name(queue))).to eq 0 end end it 'does not requeue jobs from live working queue' do working_queue = live_other_process_working_queue_name(queue) Sidekiq.redis do |conn| conn.rpush(working_queue, job) end uow = fetcher.retrieve_work expect(uow).to be_nil Sidekiq.redis do |conn| expect(conn.llen(working_queue)).to eq 1 end end end context 'with various queues' do %w[assigned namespace:assigned namespace:deeper:assigned].each do |queue| it_behaves_like "basic queue handling", queue end end context 'with short cleanup interval' do let(:short_interval) { 1 } let(:fetcher) { described_class.new(queues: queues, lease_interval: short_interval, cleanup_interval: short_interval) } it 'requeues when there is no heartbeat' do Sidekiq.redis { |conn| conn.rpush('queue:assigned', job) } # Use of retrieve_work twice with a sleep ensures we have exercised the # `identity` method to create the working queue key name and that it # matches the patterns used in the cleanup uow = fetcher.retrieve_work sleep(short_interval + 1) uow = fetcher.retrieve_work # Will only receive a UnitOfWork if the job was detected as failed and requeued expect(uow).to_not be_nil end end end end def working_queue_size(queue_name) Sidekiq.redis do |c| c.llen(Sidekiq::BaseReliableFetch.working_queue_name("queue:#{queue_name}")) end end def legacy_other_process_working_queue_name(queue) "#{Sidekiq::BaseReliableFetch::WORKING_QUEUE_PREFIX}:queue:#{queue}:#{Socket.gethostname}:#{::Process.pid + 1}" end def other_process_working_queue_name(queue) "#{Sidekiq::BaseReliableFetch::WORKING_QUEUE_PREFIX}:queue:#{queue}:#{Socket.gethostname}:#{::Process.pid + 1}:#{::SecureRandom.hex(6)}" end def live_other_process_working_queue_name(queue) pid = ::Process.pid + 1 hostname = Socket.gethostname nonce = SecureRandom.hex(6) Sidekiq.redis do |conn| conn.set(Sidekiq::BaseReliableFetch.heartbeat_key("#{hostname}-#{pid}-#{nonce}"), 1) end "#{Sidekiq::BaseReliableFetch::WORKING_QUEUE_PREFIX}:queue:#{queue}:#{hostname}:#{pid}:#{nonce}" end gitlab-sidekiq-fetcher-0.8.0/spec/spec_helper.rb0000644000004100000410000001210114030071153021613 0ustar www-datawww-datarequire 'sidekiq' require 'sidekiq/util' require 'sidekiq/api' require 'pry' require 'simplecov' SimpleCov.start REDIS_URL = ENV['REDIS_URL'] || 'redis://localhost:6379/10' Sidekiq.configure_client do |config| config.redis = { url: REDIS_URL } end Sidekiq.logger.level = Logger::ERROR # This file was generated by the `rspec --init` command. Conventionally, all # specs live under a `spec` directory, which RSpec adds to the `$LOAD_PATH`. # The generated `.rspec` file contains `--require spec_helper` which will cause # this file to always be loaded, without a need to explicitly require it in any # files. # # Given that it is always loaded, you are encouraged to keep this file as # light-weight as possible. Requiring heavyweight dependencies from this file # will add to the boot time of your test suite on EVERY test run, even for an # individual file that may not need all of that loaded. Instead, consider making # a separate helper file that requires the additional dependencies and performs # the additional setup, and require it from the spec files that actually need # it. # # See http://rubydoc.info/gems/rspec-core/RSpec/Core/Configuration RSpec.configure do |config| # rspec-expectations config goes here. You can use an alternate # assertion/expectation library such as wrong or the stdlib/minitest # assertions if you prefer. config.expect_with :rspec do |expectations| # This option will default to `true` in RSpec 4. It makes the `description` # and `failure_message` of custom matchers include text for helper methods # defined using `chain`, e.g.: # be_bigger_than(2).and_smaller_than(4).description # # => "be bigger than 2 and smaller than 4" # ...rather than: # # => "be bigger than 2" expectations.include_chain_clauses_in_custom_matcher_descriptions = true end # rspec-mocks config goes here. You can use an alternate test double # library (such as bogus or mocha) by changing the `mock_with` option here. config.mock_with :rspec do |mocks| # Prevents you from mocking or stubbing a method that does not exist on # a real object. This is generally recommended, and will default to # `true` in RSpec 4. mocks.verify_partial_doubles = true end # This option will default to `:apply_to_host_groups` in RSpec 4 (and will # have no way to turn it off -- the option exists only for backwards # compatibility in RSpec 3). It causes shared context metadata to be # inherited by the metadata hash of host groups and examples, rather than # triggering implicit auto-inclusion in groups with matching metadata. config.shared_context_metadata_behavior = :apply_to_host_groups # The settings below are suggested to provide a good initial experience # with RSpec, but feel free to customize to your heart's content. =begin # This allows you to limit a spec run to individual examples or groups # you care about by tagging them with `:focus` metadata. When nothing # is tagged with `:focus`, all examples get run. RSpec also provides # aliases for `it`, `describe`, and `context` that include `:focus` # metadata: `fit`, `fdescribe` and `fcontext`, respectively. config.filter_run_when_matching :focus # Allows RSpec to persist some state between runs in order to support # the `--only-failures` and `--next-failure` CLI options. We recommend # you configure your source control system to ignore this file. config.example_status_persistence_file_path = "spec/examples.txt" # Limits the available syntax to the non-monkey patched syntax that is # recommended. For more details, see: # - http://rspec.info/blog/2012/06/rspecs-new-expectation-syntax/ # - http://www.teaisaweso.me/blog/2013/05/27/rspecs-new-message-expectation-syntax/ # - http://rspec.info/blog/2014/05/notable-changes-in-rspec-3/#zero-monkey-patching-mode config.disable_monkey_patching! # This setting enables warnings. It's recommended, but in some cases may # be too noisy due to issues in dependencies. config.warnings = true # Many RSpec users commonly either run the entire suite or an individual # file, and it's useful to allow more verbose output when running an # individual spec file. if config.files_to_run.one? # Use the documentation formatter for detailed output, # unless a formatter has already been configured # (e.g. via a command-line flag). config.default_formatter = "doc" end # Print the 10 slowest examples and example groups at the # end of the spec run, to help surface which specs are running # particularly slow. config.profile_examples = 10 # Run specs in random order to surface order dependencies. If you find an # order dependency and want to debug it, you can fix the order by providing # the seed, which is printed after each run. # --seed 1234 config.order = :random # Seed global randomization in this process using the `--seed` CLI option. # Setting this allows you to use `--seed` to deterministically reproduce # test failures related to randomization by passing the same `--seed` value # as the one that triggered the failure. Kernel.srand config.seed =end end gitlab-sidekiq-fetcher-0.8.0/spec/base_reliable_fetch_spec.rb0000644000004100000410000000604414030071153024267 0ustar www-datawww-datarequire 'spec_helper' require 'fetch_shared_examples' require 'sidekiq/base_reliable_fetch' require 'sidekiq/reliable_fetch' require 'sidekiq/semi_reliable_fetch' describe Sidekiq::BaseReliableFetch do let(:job) { Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo']) } before { Sidekiq.redis(&:flushdb) } describe 'UnitOfWork' do let(:fetcher) { Sidekiq::ReliableFetch.new(queues: ['foo']) } describe '#requeue' do it 'requeues job' do Sidekiq.redis { |conn| conn.rpush('queue:foo', job) } uow = fetcher.retrieve_work uow.requeue expect(Sidekiq::Queue.new('foo').size).to eq 1 expect(working_queue_size('foo')).to eq 0 end end describe '#acknowledge' do it 'acknowledges job' do Sidekiq.redis { |conn| conn.rpush('queue:foo', job) } uow = fetcher.retrieve_work expect { uow.acknowledge } .to change { working_queue_size('foo') }.by(-1) expect(Sidekiq::Queue.new('foo').size).to eq 0 end end end describe '#bulk_requeue' do let(:options) { { queues: %w[foo bar] } } let!(:queue1) { Sidekiq::Queue.new('foo') } let!(:queue2) { Sidekiq::Queue.new('bar') } it 'requeues the bulk' do uow = described_class::UnitOfWork jobs = [ uow.new('queue:foo', job), uow.new('queue:foo', job), uow.new('queue:bar', job) ] described_class.new(options).bulk_requeue(jobs, nil) expect(queue1.size).to eq 2 expect(queue2.size).to eq 1 end it 'puts jobs into interrupted queue' do uow = described_class::UnitOfWork interrupted_job = Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo'], interrupted_count: 3) jobs = [ uow.new('queue:foo', interrupted_job), uow.new('queue:foo', job), uow.new('queue:bar', job) ] described_class.new(options).bulk_requeue(jobs, nil) expect(queue1.size).to eq 1 expect(queue2.size).to eq 1 expect(Sidekiq::InterruptedSet.new.size).to eq 1 end it 'does not put jobs into interrupted queue if it is disabled' do Sidekiq.options[:max_retries_after_interruption] = -1 uow = described_class::UnitOfWork interrupted_job = Sidekiq.dump_json(class: 'Bob', args: [1, 2, 'foo'], interrupted_count: 3) jobs = [ uow.new('queue:foo', interrupted_job), uow.new('queue:foo', job), uow.new('queue:bar', job) ] described_class.new(options).bulk_requeue(jobs, nil) expect(queue1.size).to eq 2 expect(queue2.size).to eq 1 expect(Sidekiq::InterruptedSet.new.size).to eq 0 Sidekiq.options[:max_retries_after_interruption] = 3 end end it 'sets heartbeat' do config = double(:sidekiq_config, options: { queues: %w[foo bar] }) heartbeat_thread = described_class.setup_reliable_fetch!(config) Sidekiq.redis do |conn| sleep 0.2 # Give the time to heartbeat thread to make a loop heartbeat_key = described_class.heartbeat_key(described_class.identity) heartbeat = conn.get(heartbeat_key) expect(heartbeat).not_to be_nil end heartbeat_thread.kill end end gitlab-sidekiq-fetcher-0.8.0/.gitignore0000644000004100000410000000003114030071153020032 0ustar www-datawww-data*.gem coverage .DS_Store gitlab-sidekiq-fetcher-0.8.0/LICENSE0000644000004100000410000001674314030071153017070 0ustar www-datawww-data GNU LESSER GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. This version of the GNU Lesser General Public License incorporates the terms and conditions of version 3 of the GNU General Public License, supplemented by the additional permissions listed below. 0. Additional Definitions. As used herein, "this License" refers to version 3 of the GNU Lesser General Public License, and the "GNU GPL" refers to version 3 of the GNU General Public License. "The Library" refers to a covered work governed by this License, other than an Application or a Combined Work as defined below. An "Application" is any work that makes use of an interface provided by the Library, but which is not otherwise based on the Library. Defining a subclass of a class defined by the Library is deemed a mode of using an interface provided by the Library. A "Combined Work" is a work produced by combining or linking an Application with the Library. The particular version of the Library with which the Combined Work was made is also called the "Linked Version". The "Minimal Corresponding Source" for a Combined Work means the Corresponding Source for the Combined Work, excluding any source code for portions of the Combined Work that, considered in isolation, are based on the Application, and not on the Linked Version. The "Corresponding Application Code" for a Combined Work means the object code and/or source code for the Application, including any data and utility programs needed for reproducing the Combined Work from the Application, but excluding the System Libraries of the Combined Work. 1. Exception to Section 3 of the GNU GPL. You may convey a covered work under sections 3 and 4 of this License without being bound by section 3 of the GNU GPL. 2. Conveying Modified Versions. If you modify a copy of the Library, and, in your modifications, a facility refers to a function or data to be supplied by an Application that uses the facility (other than as an argument passed when the facility is invoked), then you may convey a copy of the modified version: a) under this License, provided that you make a good faith effort to ensure that, in the event an Application does not supply the function or data, the facility still operates, and performs whatever part of its purpose remains meaningful, or b) under the GNU GPL, with none of the additional permissions of this License applicable to that copy. 3. Object Code Incorporating Material from Library Header Files. The object code form of an Application may incorporate material from a header file that is part of the Library. You may convey such object code under terms of your choice, provided that, if the incorporated material is not limited to numerical parameters, data structure layouts and accessors, or small macros, inline functions and templates (ten or fewer lines in length), you do both of the following: a) Give prominent notice with each copy of the object code that the Library is used in it and that the Library and its use are covered by this License. b) Accompany the object code with a copy of the GNU GPL and this license document. 4. Combined Works. You may convey a Combined Work under terms of your choice that, taken together, effectively do not restrict modification of the portions of the Library contained in the Combined Work and reverse engineering for debugging such modifications, if you also do each of the following: a) Give prominent notice with each copy of the Combined Work that the Library is used in it and that the Library and its use are covered by this License. b) Accompany the Combined Work with a copy of the GNU GPL and this license document. c) For a Combined Work that displays copyright notices during execution, include the copyright notice for the Library among these notices, as well as a reference directing the user to the copies of the GNU GPL and this license document. d) Do one of the following: 0) Convey the Minimal Corresponding Source under the terms of this License, and the Corresponding Application Code in a form suitable for, and under terms that permit, the user to recombine or relink the Application with a modified version of the Linked Version to produce a modified Combined Work, in the manner specified by section 6 of the GNU GPL for conveying Corresponding Source. 1) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (a) uses at run time a copy of the Library already present on the user's computer system, and (b) will operate properly with a modified version of the Library that is interface-compatible with the Linked Version. e) Provide Installation Information, but only if you would otherwise be required to provide such information under section 6 of the GNU GPL, and only to the extent that such information is necessary to install and execute a modified version of the Combined Work produced by recombining or relinking the Application with a modified version of the Linked Version. (If you use option 4d0, the Installation Information must accompany the Minimal Corresponding Source and Corresponding Application Code. If you use option 4d1, you must provide the Installation Information in the manner specified by section 6 of the GNU GPL for conveying Corresponding Source.) 5. Combined Libraries. You may place library facilities that are a work based on the Library side by side in a single library together with other library facilities that are not Applications and are not covered by this License, and convey such a combined library under terms of your choice, if you do both of the following: a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities, conveyed under the terms of this License. b) Give prominent notice with the combined library that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work. 6. Revised Versions of the GNU Lesser General Public License. The Free Software Foundation may publish revised and/or new versions of the GNU Lesser General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Library as you received it specifies that a certain numbered version of the GNU Lesser General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that published version or of any later version published by the Free Software Foundation. If the Library as you received it does not specify a version number of the GNU Lesser General Public License, you may choose any version of the GNU Lesser General Public License ever published by the Free Software Foundation. If the Library as you received it specifies that a proxy can decide whether future versions of the GNU Lesser General Public License shall apply, that proxy's public statement of acceptance of any version is permanent authorization for you to choose that version for the Library. gitlab-sidekiq-fetcher-0.8.0/lib/0000755000004100000410000000000014030071153016616 5ustar www-datawww-datagitlab-sidekiq-fetcher-0.8.0/lib/sidekiq/0000755000004100000410000000000014030071153020247 5ustar www-datawww-datagitlab-sidekiq-fetcher-0.8.0/lib/sidekiq/base_reliable_fetch.rb0000644000004100000410000001744014030071153024524 0ustar www-datawww-data# frozen_string_literal: true require_relative 'interrupted_set' module Sidekiq class BaseReliableFetch DEFAULT_CLEANUP_INTERVAL = 60 * 60 # 1 hour HEARTBEAT_INTERVAL = 20 # seconds HEARTBEAT_LIFESPAN = 60 # seconds HEARTBEAT_RETRY_DELAY = 1 # seconds WORKING_QUEUE_PREFIX = 'working' # Defines how often we try to take a lease to not flood our # Redis server with SET requests DEFAULT_LEASE_INTERVAL = 2 * 60 # seconds LEASE_KEY = 'reliable-fetcher-cleanup-lock' # Defines the COUNT parameter that will be passed to Redis SCAN command SCAN_COUNT = 1000 # How much time a job can be interrupted DEFAULT_MAX_RETRIES_AFTER_INTERRUPTION = 3 # Regexes for matching working queue keys WORKING_QUEUE_REGEX = /#{WORKING_QUEUE_PREFIX}:(queue:.*):([^:]*:[0-9]*:[0-9a-f]*)\z/.freeze LEGACY_WORKING_QUEUE_REGEX = /#{WORKING_QUEUE_PREFIX}:(queue:.*):([^:]*:[0-9]*)\z/.freeze UnitOfWork = Struct.new(:queue, :job) do def acknowledge Sidekiq.redis { |conn| conn.lrem(Sidekiq::BaseReliableFetch.working_queue_name(queue), 1, job) } end def queue_name queue.sub(/.*queue:/, '') end def requeue Sidekiq.redis do |conn| conn.multi do |multi| multi.lpush(queue, job) multi.lrem(Sidekiq::BaseReliableFetch.working_queue_name(queue), 1, job) end end end end def self.setup_reliable_fetch!(config) fetch_strategy = if config.options[:semi_reliable_fetch] Sidekiq::SemiReliableFetch else Sidekiq::ReliableFetch end config.options[:fetch] = fetch_strategy.new(config.options) Sidekiq.logger.info('GitLab reliable fetch activated!') start_heartbeat_thread end def self.start_heartbeat_thread Thread.new do loop do begin heartbeat sleep HEARTBEAT_INTERVAL rescue => e Sidekiq.logger.error("Heartbeat thread error: #{e.message}") sleep HEARTBEAT_RETRY_DELAY end end end end def self.hostname Socket.gethostname end def self.process_nonce @@process_nonce ||= SecureRandom.hex(6) end def self.identity @@identity ||= "#{hostname}:#{$$}:#{process_nonce}" end def self.heartbeat Sidekiq.redis do |conn| conn.set(heartbeat_key(identity), 1, ex: HEARTBEAT_LIFESPAN) end Sidekiq.logger.debug("Heartbeat for #{identity}") end def self.worker_dead?(identity, conn) !conn.get(heartbeat_key(identity)) end def self.heartbeat_key(identity) "reliable-fetcher-heartbeat-#{identity.gsub(':', '-')}" end def self.working_queue_name(queue) "#{WORKING_QUEUE_PREFIX}:#{queue}:#{identity}" end attr_reader :cleanup_interval, :last_try_to_take_lease_at, :lease_interval, :queues, :use_semi_reliable_fetch, :strictly_ordered_queues def initialize(options) raise ArgumentError, 'missing queue list' unless options[:queues] @cleanup_interval = options.fetch(:cleanup_interval, DEFAULT_CLEANUP_INTERVAL) @lease_interval = options.fetch(:lease_interval, DEFAULT_LEASE_INTERVAL) @last_try_to_take_lease_at = 0 @strictly_ordered_queues = !!options[:strict] @queues = options[:queues].map { |q| "queue:#{q}" } end def retrieve_work clean_working_queues! if take_lease retrieve_unit_of_work end def retrieve_unit_of_work raise NotImplementedError, "#{self.class} does not implement #{__method__}" end def bulk_requeue(inprogress, _options) return if inprogress.empty? Sidekiq.redis do |conn| inprogress.each do |unit_of_work| conn.multi do |multi| preprocess_interrupted_job(unit_of_work.job, unit_of_work.queue, multi) multi.lrem(self.class.working_queue_name(unit_of_work.queue), 1, unit_of_work.job) end end end rescue => e Sidekiq.logger.warn("Failed to requeue #{inprogress.size} jobs: #{e.message}") end private def preprocess_interrupted_job(job, queue, conn = nil) msg = Sidekiq.load_json(job) msg['interrupted_count'] = msg['interrupted_count'].to_i + 1 if interruption_exhausted?(msg) send_to_quarantine(msg, conn) else requeue_job(queue, msg, conn) end end # If you want this method to be run in a scope of multi connection # you need to pass it def requeue_job(queue, msg, conn) with_connection(conn) do |conn| conn.lpush(queue, Sidekiq.dump_json(msg)) end Sidekiq.logger.info( message: "Pushed job #{msg['jid']} back to queue #{queue}", jid: msg['jid'], queue: queue ) end def extract_queue_and_identity(key) # New identity format is "{hostname}:{pid}:{randomhex} # Old identity format is "{hostname}:{pid}" # Queue names may also have colons (namespaced). # Expressing this in a single regex is unreadable # Test the newer expected format first, only checking the older if necessary original_queue, identity = key.scan(WORKING_QUEUE_REGEX).flatten return original_queue, identity unless original_queue.nil? || identity.nil? key.scan(LEGACY_WORKING_QUEUE_REGEX).flatten end # Detect "old" jobs and requeue them because the worker they were assigned # to probably failed miserably. def clean_working_queues! Sidekiq.logger.info('Cleaning working queues') Sidekiq.redis do |conn| conn.scan_each(match: "#{WORKING_QUEUE_PREFIX}:queue:*", count: SCAN_COUNT) do |key| original_queue, identity = extract_queue_and_identity(key) next if original_queue.nil? || identity.nil? clean_working_queue!(original_queue, key) if self.class.worker_dead?(identity, conn) end end end def clean_working_queue!(original_queue, working_queue) Sidekiq.redis do |conn| while job = conn.rpop(working_queue) preprocess_interrupted_job(job, original_queue) end end end def interruption_exhausted?(msg) return false if max_retries_after_interruption(msg['class']) < 0 msg['interrupted_count'].to_i >= max_retries_after_interruption(msg['class']) end def max_retries_after_interruption(worker_class) max_retries_after_interruption = nil max_retries_after_interruption ||= begin Object.const_get(worker_class).sidekiq_options[:max_retries_after_interruption] rescue NameError end max_retries_after_interruption ||= Sidekiq.options[:max_retries_after_interruption] max_retries_after_interruption ||= DEFAULT_MAX_RETRIES_AFTER_INTERRUPTION max_retries_after_interruption end def send_to_quarantine(msg, multi_connection = nil) Sidekiq.logger.warn( class: msg['class'], jid: msg['jid'], message: %(Reliable Fetcher: adding dead #{msg['class']} job #{msg['jid']} to interrupted queue) ) job = Sidekiq.dump_json(msg) Sidekiq::InterruptedSet.new.put(job, connection: multi_connection) end # Yield block with an existing connection or creates another one def with_connection(conn) return yield(conn) if conn Sidekiq.redis { |redis_conn| yield(redis_conn) } end def take_lease return unless allowed_to_take_a_lease? @last_try_to_take_lease_at = Time.now.to_f Sidekiq.redis do |conn| conn.set(LEASE_KEY, 1, nx: true, ex: cleanup_interval) end end def allowed_to_take_a_lease? Time.now.to_f - last_try_to_take_lease_at > lease_interval end end end gitlab-sidekiq-fetcher-0.8.0/lib/sidekiq/interrupted_set.rb0000644000004100000410000000207414030071153024017 0ustar www-datawww-datarequire 'sidekiq/api' module Sidekiq class InterruptedSet < ::Sidekiq::JobSet DEFAULT_MAX_CAPACITY = 10_000 DEFAULT_MAX_TIMEOUT = 90 * 24 * 60 * 60 # 3 months def initialize super "interrupted" end def put(message, opts = {}) now = Time.now.to_f with_multi_connection(opts[:connection]) do |conn| conn.zadd(name, now.to_s, message) conn.zremrangebyscore(name, '-inf', now - self.class.timeout) conn.zremrangebyrank(name, 0, - self.class.max_jobs) end true end # Yield block inside an existing multi connection or creates new one def with_multi_connection(conn, &block) return yield(conn) if conn Sidekiq.redis do |c| c.multi do |multi| yield(multi) end end end def retry_all each(&:retry) while size > 0 end def self.max_jobs Sidekiq.options[:interrupted_max_jobs] || DEFAULT_MAX_CAPACITY end def self.timeout Sidekiq.options[:interrupted_timeout_in_seconds] || DEFAULT_MAX_TIMEOUT end end end gitlab-sidekiq-fetcher-0.8.0/lib/sidekiq/semi_reliable_fetch.rb0000644000004100000410000000201014030071153024532 0ustar www-datawww-data# frozen_string_literal: true module Sidekiq class SemiReliableFetch < BaseReliableFetch # We want the fetch operation to timeout every few seconds so the thread # can check if the process is shutting down. This constant is only used # for semi-reliable fetch. SEMI_RELIABLE_FETCH_TIMEOUT = 2 # seconds def initialize(options) super if strictly_ordered_queues @queues = @queues.uniq @queues << SEMI_RELIABLE_FETCH_TIMEOUT end end private def retrieve_unit_of_work work = Sidekiq.redis { |conn| conn.brpop(*queues_cmd) } return unless work unit_of_work = UnitOfWork.new(*work) Sidekiq.redis do |conn| conn.lpush(self.class.working_queue_name(unit_of_work.queue), unit_of_work.job) end unit_of_work end def queues_cmd if strictly_ordered_queues @queues else queues = @queues.shuffle.uniq queues << SEMI_RELIABLE_FETCH_TIMEOUT queues end end end end gitlab-sidekiq-fetcher-0.8.0/lib/sidekiq/reliable_fetch.rb0000644000004100000410000000166414030071153023533 0ustar www-datawww-data# frozen_string_literal: true module Sidekiq class ReliableFetch < BaseReliableFetch # For reliable fetch we don't use Redis' blocking operations so # we inject a regular sleep into the loop. RELIABLE_FETCH_IDLE_TIMEOUT = 5 # seconds attr_reader :queues_size def initialize(options) super @queues = queues.uniq if strictly_ordered_queues @queues_size = queues.size end private def retrieve_unit_of_work queues_list = strictly_ordered_queues ? queues : queues.shuffle queues_list.each do |queue| work = Sidekiq.redis do |conn| conn.rpoplpush(queue, self.class.working_queue_name(queue)) end return UnitOfWork.new(queue, work) if work end # We didn't find a job in any of the configured queues. Let's sleep a bit # to avoid uselessly burning too much CPU sleep(RELIABLE_FETCH_IDLE_TIMEOUT) nil end end end gitlab-sidekiq-fetcher-0.8.0/lib/sidekiq-reliable-fetch.rb0000644000004100000410000000026114030071153023437 0ustar www-datawww-datarequire 'sidekiq' require 'sidekiq/api' require_relative 'sidekiq/base_reliable_fetch' require_relative 'sidekiq/reliable_fetch' require_relative 'sidekiq/semi_reliable_fetch' gitlab-sidekiq-fetcher-0.8.0/gitlab-sidekiq-fetcher.gemspec0000644000004100000410000000106014030071153023721 0ustar www-datawww-dataGem::Specification.new do |s| s.name = 'gitlab-sidekiq-fetcher' s.version = '0.8.0' s.authors = ['TEA', 'GitLab'] s.email = 'valery@gitlab.com' s.license = 'LGPL-3.0' s.homepage = 'https://gitlab.com/gitlab-org/sidekiq-reliable-fetch/' s.summary = 'Reliable fetch extension for Sidekiq' s.description = 'Redis reliable queue pattern implemented in Sidekiq' s.require_paths = ['lib'] s.files = `git ls-files`.split($\) s.test_files = [] s.add_dependency 'sidekiq', '~> 6.1' end gitlab-sidekiq-fetcher-0.8.0/Gemfile0000644000004100000410000000036614030071153017350 0ustar www-datawww-data# frozen_string_literal: true source "https://rubygems.org" git_source(:github) { |repo_name| "https://github.com/#{repo_name}" } group :test do gem "rspec", '~> 3' gem "pry" gem "sidekiq", '~> 6.1' gem 'simplecov', require: false end gitlab-sidekiq-fetcher-0.8.0/.gitlab-ci.yml0000644000004100000410000000216014030071153020503 0ustar www-datawww-dataimage: "ruby:2.5" before_script: - ruby -v - which ruby - gem install bundler - bundle install --jobs $(nproc) "${FLAGS[@]}" variables: REDIS_URL: "redis://redis" rspec: stage: test coverage: '/LOC \((\d+\.\d+%)\) covered.$/' script: - bundle exec rspec services: - redis:alpine artifacts: expire_in: 31d when: always paths: - coverage/ .integration: stage: test script: - cd tests/reliability - bundle exec ruby reliability_test.rb services: - redis:alpine integration_semi: extends: .integration variables: JOB_FETCHER: semi integration_reliable: extends: .integration variables: JOB_FETCHER: reliable integration_basic: extends: .integration allow_failure: yes variables: JOB_FETCHER: basic kill_interruption: stage: test script: - cd tests/interruption - bundle exec ruby test_kill_signal.rb services: - redis:alpine term_interruption: stage: test script: - cd tests/interruption - bundle exec ruby test_term_signal.rb services: - redis:alpine # rubocop: # script: # - bundle exec rubocop